Sr Data Scientist - GenAI

The mission of The University of Texas MD Anderson Cancer Center is to eliminate cancer in Texas, the nation, and the world through outstanding programs that integrate patient care, research, prevention, and education. Core to the success of our mission is the ability to orchestrate multidimensional data, data analytics, and machine learning to create sustainable impact within a framework of responsible AI. We are building a dynamic team to drive machine learning operations in order to accelerate the impact of AI across the enterprise, driving long-lasting improvements in cancer care.
We are seeking a Senior Data Scientist to join our team in support of the research and development of generative AI models for healthcare applications.

The chosen candidate will be instrumental in developing and validating state of the art machine learning technologies generative AI, tailored for enterprise-level use cases. In this role, the candidate will apply deep expertise in algorithm architectures, strong machine learning approaches, and rigorous scientific methodology to develop impactful foundation and generative AI models. This effort is supported by an extensive repository of contextually relevant data, including medical imaging, electronic health records (EHR), pathology, operational data, and other pertinent healthcare data.

The role requires actively collaborating with clinical and business operation experts to gather requirements and ensure AI solutions are effectively validated. The Senior Data Scientist will be work closely with a team of data scientist and machine learning engineers to ensure seamless deployment, real-world validation, and overall maintenance of ML models in a production environment. Furthermore, the Senior Data Scientist will work closely with academic data scientist to research novel machine learning approaches and facilitate the translation of research algorithms into practical applications. Beyond technical acumen, the position also requires nurturing team dynamics, fostering a culture of innovation, developing thought leadership through presentations and publications, and bolstering the technological infrastructure to advance the safe and impactful integration of AI throughout the enterprise.

Key responsibilities include:

Generative AI Development: Innovate and develop state-of-the-art machine learning technologies, focusing on generative AI, and multimodal models, suitable for complex healthcare applications.
Foundation Model Development: Research and develop foundational AI models, concentrating on the domains of imaging, text, structural data, time-series, and various healthcare-related data types.
Collaborative Integration & Validation: Work closely with clinical experts, business stakeholders, data scientists, and machine learning engineers to gather requirements, deploy, and maintain foundation and generative AI models in production environments, ensuring they are effectively validated and integrated into enterprise use.
Academic Collaboration & Translation: Engage with academic data scientists and clinical researchers to explore novel AI approaches and use-cases, facilitating the transition from research algorithms to practical healthcare solutions.
Operational Excellence & Compliance: Document and manage detailed records of model development, maintain rigorous testing and validation protocols, and ensure AI solutions are aligned with regulatory standards and ethical guidelines.
Culture Development: Support fostering a culture of innovation, continuous learning, and responsible AI development. Develop thought leadership through presentations, publications, patents, and participation in the tech community.

Technical Expertise
Hands-on experience and in-depth understanding of machine learning algorithms and modeling (e.g., supervised, unsupervised, semi-supervised or weakly supervised learning, generative models, transfer learning, optimization, etc.)
Experience developing foundational and/or generative AI models.
Experience working with open-source and closed source generative AI models.
Proficient in developing, evaluating, deploying AI/ML algorithms.
Skilled in constructing scalable data pipelines, model artifact management, and model performance analytics.
Experienced with MLOps tools and processes for data, features, code, and model management.
Strong proficiency in Python and either C++ or C#, with practical knowledge of TensorFlow, PyTorch, and Scikit-learn.
Knowledgeable about AI/ML platform infrastructure, including cloud and on-premises architectures.
Familiar with cloud-native tools, services, and computing environments (eg. Azure, AWS, GCP).

Analytical Expertise
Experience and demonstrated capability to handle challenges with vague or abstract problem definition.
In-depth knowledge of AI/ML Model Lifecycle Management.
Proficient in decision-making, problem-solving, and executing AI/ML healthcare solutions.
Skilled at the quantitatively assessing machine learning models for performance, workflow impact, and potential risks.
Competent in identifying risks and formulating mitigation plans to prevent project delays.

Oral and Written Communication
Collaborate with research data scientists, ML engineers, and software engineers to integrate machine learning models into existing systems.
Document processes, pipelines, workflows, and machine learning experiments.
Report project metrics, including progress, impact, and risks, to leadership, offering strategic recommendations for AI/ML use-case prioritization.
Manage stakeholder relations to facilitate solution adoption and address issues.
Share knowledge and offer technical assistance to researchers and colleagues.
Deliver both technical and non-technical updates in meetings and at professional gatherings.

Other duties as assigned

Education Required: Bachelor's degree in Biomedical Engineering, Electrical Engineering, Computer Engineering, Physics, Applied Mathematics, Science, Engineering, Computer Science, Statistics, Computational Biology, or related field.

Preferred Education: Doctorate (Academic)

Experience Required: Five years of experience in scientific software or industry programming with a concentration in scientific computing. With Master's degree, three years experience required. With PhD, one year of experience required.

Preferred Experience: Two years of academic or industry experience developing foundation and/or generative AI models.

Publications as first author on GenAI models, including Foundational models and/or LLMs.

It is the policy of The University of Texas MD Anderson Cancer Center to provide equal employment opportunity without regard to race, color, religion, age, national origin, sex, gender, sexual orientation, gender identity/expression, disability, protected veteran status, genetic information, or any other basis protected by institutional policy or by federal, state or local laws unless such distinction is required by law. http://www.mdanderson.org/about-us/legal-and-policy/legal-statements/eeo-affirmative-action.html

Additional Information
  • Requisition ID: 168988
  • Employment Status: Full-Time
  • Employee Status: Regular
  • Work Week: Days
  • Minimum Salary: US Dollar (USD) 119,500
  • Midpoint Salary: US Dollar (USD) 149,500
  • Maximum Salary : US Dollar (USD) 179,500
  • FLSA: exempt and not eligible for overtime pay
  • Fund Type: Hard
  • Work Location: Remote (within Texas only)
  • Pivotal Position: Yes
  • Referral Bonus Available?: Yes
  • Relocation Assistance Available?: Yes
  • Science Jobs: No

#LI-Remote