The Role
The Data Engineer will be responsible for designing, building, and maintaining scalable data solutions that support analytics, business intelligence, and operational processes. This role involves developing ETL/ELT pipelines, managing cloud-based data infrastructure, and ensuring data security and governance. The ideal candidate will play a key role in optimizing data performance and supporting machine learning applications while mentoring junior engineers.
Key Responsibilities
Data Architecture & Engineering
- Design and implement scalable data architectures for analytics and machine learning applications.
- Develop and maintain ETL/ELT pipelines to ingest and process data from various sources.
- Build and optimize data warehousing solutions for efficient data storage and retrieval.
Cloud Infrastructure & Integration
- Implement cloud-based data solutions using AWS (Redshift, S3, Glue, Lambda).
- Optimize cloud solutions for cost-efficiency and performance across distributed systems.
- Lead the integration of APIs, data lakes, and batch/stream processing pipelines.
Data Governance & Security
- Ensure compliance with data privacy and security regulations.
- Monitor and enforce data quality standards for consistency and accuracy.
- Implement best practices in data governance and secure data handling.
Collaboration & Stakeholder Engagement
- Work closely with data scientists, analysts, and product teams to define data requirements.
- Engage with business stakeholders to translate requirements into actionable data solutions.
- Provide mentorship to junior engineers, fostering a culture of continuous learning.
Optimization & Continuous Improvement
- Monitor and optimize data architectures for performance and scalability.
- Stay updated with emerging technologies and best practices in data engineering and cloud computing.
- Drive process improvements to enhance efficiency and automation.
Skills and Experience
Essential:
- Proficiency in Python, SQL, and Apache Spark.
- Experience with AWS data services (Redshift, S3, Glue, Lambda).
- Strong understanding of ETL/ELT processes, data lakes, and API integration.
- Expertise in data governance, privacy, and security regulations.
- Proven ability to optimize data pipelines for performance and cost-effectiveness.
- Excellent problem-solving skills and ability to work both independently and in a team.
- Experience mentoring and leading junior engineers.
Desirable:
- Experience working with real-time data processing and machine learning pipelines.
- Knowledge of DevOps practices and CI/CD for data pipelines.
- Familiarity with data visualization tools and business intelligence platforms.