Machine Learning Engineer

Blank Bio

📍San Francisco, CA

Posted May 19, 2026

← Back to Jobs

Job Overview

Position

Machine Learning Engineer

Company

Blank Bio

Location

San Francisco, CA

Work Type

On-site

Job ID

li-4415913253

Job Description

About Blank Bio
Blank Bio is an applied AI research lab focused on increasing the success rates of clinical trials. We do this by training RNA foundation models that learn the patterns that shape disease progression and patient response to treatment. We aim to help pharma make more informed decisions in clinical trials by capturing the biology that makes each patient’s tumour unique.

We’re a technical team of AI scientists and engineers from companies including Recursion, Deep Genomics, DeepMind, and Amazon, and institutions including Memorial Sloan Kettering Cancer Centre, Stanford, and the Vector Institute.

The Role
As a machine learning engineer, you will be responsible for scaling our models, building the training infrastructure, and ensuring reproducibility across large-scale biological datasets. You’ll work closely with research scientists and biologists to turn cutting-edge machine learning into practical, high-impact tools for RNA biology. As an early-stage startup, we move fast, work across disciplines, and embrace ambiguity. We’re looking for people who thrive in dynamic environments, are eager to take ownership, and want to help define both the science and the culture of the company.

Responsibilities

Develop and optimize large-scale ML training pipelines for RNA foundation models.

Implement distributed training systems (multi-GPU/TPU) and optimize performance at scale.

Build infrastructure for dataset management, preprocessing, and benchmarking.

Collaborate with scientists to translate biological questions into ML tasks.

Contribute to the design and evaluation of new architectures, embeddings, and fine-tuning strategies.

Maintain high-quality engineering standards, including reproducibility, testing, and deployment readiness.

Qualifications
Must-haves

3+ years of work experience

Proficiency in Python and modern deep learning frameworks (PyTorch, JAX, or TensorFlow).

Hands-on experience training large-scale models (transformers, diffusion, or sequence models).

Strong background in distributed training, optimization, and performance profiling.

Track record of building ML systems that scale and ship.

Nice-to-haves

Experience with biological or messy, real-world scientific data.

Background in computational biology, bioinformatics, or adjacent fields.

Experience in early-stage startups or interdisciplinary ML-for-science projects.

Compensation & Benefits

Competitive salary and meaningful early-stage equity.

Comprehensive health, dental, and vision coverage.

Generous vacation and parental leave policies.

✦

Interview Prep

AI-powered insights to help you prepare

Key Skills

Required:

Preferred:

Practice Questions

💡Technical Questions (3)

1.How would you approach optimizing a distributed training pipeline for a large RNA foundation model that is experiencing severe GPU underutilization and communication bottlenecks across multiple nodes?
2.Biological datasets are notoriously messy, often containing batch effects, missing values, and varying sequence lengths. How do you design data preprocessing and infrastructure pipelines to ensure model reproducibility and robustness when dealing with this type of data?
3.Walk me through how you would translate a biological question from a research scientist—such as predicting a patient's tumor response to a specific RNA therapeutic—into a machine learning task and corresponding model architecture.

🎯Behavioral Questions (3)

1.Tell me about a time you had to build an ML system that scaled and shipped. What was your specific role, and how did you ensure it met production standards?
2.Describe a situation where you worked closely with domain experts (like biologists or scientists) who did not speak the same technical language as you. How did you bridge that gap?
3.Give an example of a time you embraced ambiguity in a dynamic work environment. How did you navigate the lack of clear direction to deliver results?

🧩Situational Questions (2)

1.You are training a new RNA foundation model, and the loss suddenly spikes and diverges after several days of stable training. How do you handle this situation?
2.A research scientist is eager to test a novel neural network architecture for an RNA sequence task, but you know it will be extremely difficult to scale across our multi-GPU setup within the current sprint. How do you handle this?

Resume Keywords

Make sure these keywords appear on your resume

RNA foundation modelsdistributed trainingmulti-GPUPyTorchJAXML infrastructurecomputational biologyreproducibilityperformance profilingtransformer architecturesdata preprocessingfine-tuning

Interested in this position? Apply directly on LinkedIn.

Apply on LinkedIn →