Data Scientist, Ai Infrastructure and MCP (onsite)

Thermo Fisher Scientific

📍San Marcos, CA

Posted May 22, 2026

← Back to Jobs

Job Overview

Position

Data Scientist, Ai Infrastructure and MCP (onsite)

Company

Thermo Fisher Scientific

Location

San Marcos, CA

Work Type

On-site

Job ID

li-4414099379

Job Description

Work Schedule
Standard (Mon-Fri)

Environmental Conditions
Office

Company Overview
Thermo Fisher Scientific Inc. is the world leader in serving science, with annual revenue exceeding $40 billion. Our mission is to enable our customers to make the world healthier, cleaner, and safer. Whether our customers are accelerating life sciences research, solving complex analytical challenges, improving patient diagnostics and therapies, or increasing productivity in their laboratories, we are here to support them.

Location / Division
This is an onsite position that can be hired in Grand Island, NY, Carlsbad, CA or Hunt Valley, MD. Location near these 3 sites is required. Relocation assistance is not provided.

How will you make an impact?
We are seeking a highly motivated Data Scientist with strong hands-on expertise in AI/ML & LLM technologies. In this role, you will be part of the Data Science team within the Biologicals and Chemicals Division, one of the fastest-growing businesses in the company supporting developers and manufacturers of biological-based therapeutics and vaccines. With our portfolio of best-in-class digital and bioprocessing products, we empower innovation from early discovery through large-scale commercial manufacturing.

Core Responsibilities

Design, develop, and maintain backend services and AI-enabled applications that support scientific and bioprocessing workflows.

Develop and optimize retrieval components (e.g., embeddings, document indexing, vector search) used in RAG-based applications.

Support integration and access to scientific and operational databases through well-defined APIs and backend services.

Package internal tools, data pipelines, and workflows as reusable AI-enabled components or agents that can be executed within orchestration or workflow frameworks.

Collaborate closely with scientists, data engineers, and platform teams to gather requirements, define technical specifications, and deliver reliable, production-ready solutions.

Create clear technical documentation and usage guides to support internal adoption and long-term maintainability.

Extended Responsibilities (Optional / Career Growth)

Build and evolve scalable LLM inference, agent execution, or orchestration services (e.g., MCP-style patterns) in production environments.

Participate in experimentation, evaluation, and optimization of AI/ML or LLM-based approaches.

Implement tooling and automation for monitoring, testing, and operational support of backend services.

Stay current with emerging AI/ML and LLM technologies and contribute ideas for future platform or capability enhancements.

Education

Bachelor’s Degree in Computer Science or a related field

Master’s or higher degree is a plus but not required

Experience & Qualifications
Must Have

2+ years of professional software engineering experience building backend or data-driven applications

Proficiency in Python and experience developing APIs, services, or microservices

Solid understanding of data fundamentals, including data preprocessing, structured data handling, and data quality considerations

Strong analytical, problem-solving, and debugging skills

Ability to work independently and collaboratively in cross-functional engineering and scientific teams

Nice to Have

Exposure to AI/ML, Generative AI, LLMs, or RAG-based systems

Familiarity with common ML or AI libraries and frameworks (e.g., PyTorch, TensorFlow, Hugging Face, LangChain)

Experience with data pipelines or distributed systems

Preferred Experience / Good Learning Opportunities

Experience with vector databases or embedding-based retrieval systems

Exposure to cloud-based development or deployment (preferably AWS)

Familiarity with agent-based systems, orchestration frameworks, or model deployment and monitoring concepts

Curiosity and enthusiasm for learning and applying emerging AI/ML and LLM technologies

Join Us
At Thermo Fisher Scientific, each one of our 70,000 extraordinary minds has a unique story to tell. Join us and contribute to our singular mission—
enabling our customers to make the world healthier, cleaner, and safer.
Apply today!

Compensation And Benefits
The salary range estimated for this position based in New York is $88,000.00–$116,000.00.

This position may also be eligible to receive a variable annual bonus based on company, team, and/or individual performance results in accordance with company policy. We offer a comprehensive Total Rewards package that our U.S. colleagues and their families can count on, which includes:

A choice of national medical and dental plans, and a national vision plan, including health incentive programs

Employee assistance and family support programs, including commuter benefits and tuition reimbursement

At least 120 hours paid time off (PTO), 10 paid holidays annually, paid parental leave (3 weeks for bonding and 8 weeks for caregiver leave), accident and life insurance, and short- and long-term disability in accordance with company policy

Retirement and savings programs, such as our competitive 401(k) U.S. retirement savings plan

Employees’ Stock Purchase Plan (ESPP) offers eligible colleagues the opportunity to purchase company stock at a discount

For more information on our benefits, please visit: https://jobs.thermofisher.com/global/en/total-rewards

✦

Interview Prep

AI-powered insights to help you prepare

Key Skills

Required:

Preferred:

Practice Questions

💡Technical Questions (3)

1.How would you approach designing and optimizing a retrieval component for a RAG-based application tailored to scientific and bioprocessing workflows?
2.Can you describe your experience building backend APIs and microservices in Python to support data-driven applications?
3.What strategies would you use to package internal data pipelines and AI tools into reusable components or agents within an orchestration framework?

🎯Behavioral Questions (3)

1.Tell me about a time you had to gather requirements from scientists or non-technical stakeholders to define technical specifications for a data product.
2.Describe a situation where you had to debug a complex data pipeline or backend service issue under time pressure.
3.Give an example of a time you had to learn a new AI/ML technology or framework quickly to deliver a project.

🧩Situational Questions (2)

1.You are tasked with integrating a new LLM-based agent into an existing bioprocessing workflow, but the scientists complain that the agent's responses are hallucinating scientific facts. How do you handle this?
2.You need to expose a critical operational database to a new AI service via an API, but the database queries are slow and causing timeouts during peak usage. What steps do you take?

Resume Keywords

Make sure these keywords appear on your resume

PythonRAGBackend APIsVector SearchLLMData PipelinesLangChainAWSMicroservicesScientific WorkflowsMCPEmbeddings

Interested in this position? Apply directly on LinkedIn.

Apply on LinkedIn →