About Us
We are a cutting-edge biotech company, pioneering advancements in healthcare and life sciences through the integration of AI and machine learning. Our focus is on harnessing large-scale data to drive innovative research and product development, ultimately improving patient outcomes. We are seeking a Cloud Data Engineer with a strong ML Ops focus to build and maintain the infrastructure that powers our data-driven initiatives.
Role Overview
As a Cloud Data Engineer specializing in ML Ops, you will play a critical role in the end-to-end machine learning lifecycle. Your focus will be on developing, automating, and scaling our cloud infrastructure to support the deployment, monitoring, and optimization of machine learning models across production environments. You will work closely with data scientists and engineers to streamline workflows, ensuring that models can be deployed efficiently, retrained frequently, and monitored effectively.
Key Responsibilities
- Design, build, and manage scalable data and ML infrastructure on AWS, leveraging services like S3, Lambda, EC2, and EKS.
- Develop and manage infrastructure as code using Terraform for automated provisioning and management of cloud resources.
- Implement robust ML Ops pipelines using AWS SageMaker, Docker, and Kubernetes to automate model training, testing, deployment, and monitoring.
- Collaborate with data scientists to optimize model serving, including A/B testing, continuous retraining, and performance tuning.
- Build CI/CD pipelines to ensure fast and reliable deployment of ML models and data processing workflows.
- Monitor, troubleshoot, and optimize machine learning pipelines and cloud resources to ensure high availability and performance.
- Ensure security, compliance, and best practices in cloud architecture and data handling.
- Proven experience in AWS Cloud services, especially for machine learning and data engineering tasks.
- Strong expertise in Terraform for infrastructure as code and automation.
- Proficiency with AWS SageMaker and related ML Ops tools and technologies.
- Experience with containerization (Docker, Kubernetes) and orchestrating machine learning workflows.
- Familiarity with CI/CD tools and processes, such as Jenkins, GitLab, or CircleCI.
- Strong programming skills in Python, with experience in building data pipelines.
- Solid understanding of DevOps and ML Ops principles, including monitoring, version control, and automation.
- Knowledge of big data frameworks like Spark and experience working with large-scale datasets is a plus.