- Role – Bioinformatics Data EngineerÂ
- Scope – Outside IR35
- Start - ASAP
- Notice – 1 week to both sides
- Location – Remote (UK)
- Duration – initially for 6 months
- Rate – circa £600pd
Â
We are seeking a highly skilled and experienced Bioinformatics Data Engineer to join our dynamic team. In this role, you will significantly impact the delivery of bioinformatics data engineering and visualizations to the Oncology R&D organisation. Your work will be central to advancing our data stack, as well as our automation and observability capabilities.
Main Duties and Responsibilities
- Develop, execute, and maintain ETL pipelines for extracting, transforming, and loading data for use in cBio and other bioinformatics analysis and visualizations
- Ensure the reliability, scalability, and performance of ETL pipelines and data systems
- Troubleshoot and resolve issues related to data loading and integration into downstream systems
- Collaborate with bioinformaticians, data scientists and other stakeholders to understand and meet the data needs and requirements of the organization
- Stay up-to-date with new technologies and best practices in bioinformatics data engineering
Essential Requirements
- A background in Computer Science, Engineering, or Bioinformatics (Master level) with 5 years of relevant experience
- Familiar with bioinformatics visualizations in different omics domains including genomics, transcriptomics, proteomics, DNA methylation, etc
- Extensive experience with Python and Python data/scientific libraries like pandas, numpy/scipy, polars, etc
- Proven experience with bioinformatics visualization systems like cBioPortal, including data loading and troubleshooting
- Strong understanding of ETL processes and data pipeline development
- Ability to interact with various data sources, both structured and unstructured (e.g. HDFS, SQL, noSQL)
- Experience working across multiple scientific compute environments to create data workflows and pipelines (e.g. HPC, cloud, Unix/Linux systems)
Desirable:
- Experience with deploying data pipelines using orchestration services like Airflow, Prefect, AWS Glue, Dagster, etc
- Experience using AWS services such as S3/EBS, EC2, CloudWatch, SNS, and Lambda.
- Understanding of software development, testing and quality processes with experience with testing frameworks and documentation
- Expertise with biological/health data, especially genomics and other *omics technologies.
- Ability to understand, map, integrate, and document complex data relationship and business rules.