Data Analyst - Surgical Oncology Research
- Requisition #: 180605
- Department: Surgical Oncology - Research
- Location: Houston, TX
- Posted Date: 5/21/2026
The Data Scientist will join The University of Texas MD Anderson Cancer Center within the Clinical Research Informatics Program (CRIP), a specialized informatics group embedded in a surgical oncology division. The Data Scientist will support active, funded cancer research by developing analytic datasets, machine learning workflows, and large language model-based pipelines that enable high-quality, reproducible oncology research. The Clinical Research Informatics Data Scientist will work closely with clinicians, informaticians, and cancer researchers to translate complex clinical questions into robust data and AI solutions.
The University of Texas MD Anderson Cancer Center is a leading institution focused on cancer care, research, education, and prevention. As part of UT MD Anderson, the Clinical Research Informatics Data Scientist contributes to a mission-driven environment where advanced analytics and clinical insight come together to improve outcomes for patients with cancer. The Clinical Research Informatics Data Scientist role is designed for individuals motivated by translational impact, interdisciplinary collaboration, and methodological growth.
The ideal candidate is someone with a strong educational foundation or equivalent experience in data science, biomedical informatics, computer science, or a related quantitative field, combined with hands-on experience using R and/or Python for machine learning and deep learning workflows. They bring experience working with clinical or biomedical data, including unstructured clinical text, and are comfortable handling protected health information under IRB and data governance requirements. Experience with LLM-based methods, clinical research environments, or oncology-related data is preferred, along with strong collaboration skills for working effectively with clinicians and research teams.
Minimum $27.64 - Midpoint $34.62 - Maximum $41.59. The typical work schedule is Monday-Friday (minimum 40 hours/8-hour days). This is a hybrid role with a minimum of one day on-site in Houston, TX, and additional on-site presence as required for business or departmental needs.
Why Us?
This role offers the opportunity to work within a first-of-its-kind clinical research informatics program embedded directly in surgical oncology at UT MD Anderson, supporting active, high-impact cancer research using advanced analytics, machine learning, and LLM-based abstraction pipelines. The team is lean and highly collaborative, providing visible individual contributions, mentorship-oriented management, and genuine opportunities to co-author research while maintaining a sustainable work-life balance in a hybrid environment.
• Employer-paid medical coverage starting day one for employees working 30+ hours/week, plus optional group dental, vision, life, AD&D, and disability insurance.
• Accruals for PTO and Extended Illness Bank, plus paid holidays, wellness, childcare, and other leave options.
• Tuition Assistance Program after six months of service and access to extensive wellness, fitness, and employee resource groups.
• Defined-benefit pension through the Teachers Retirement System, voluntary retirement plans, and employer-paid life and reduced salary protection programs.
KEY RESPONSIBILITIES
Data Standardization, Harmonization, and Infrastructure
• Maintain standardized analytic datasets for cancer research across multiple data sources
• Develop and apply common data models, variable definitions, and ontologies
• Build data transformation pipelines using Python, R, and SQL
• Maintain metadata, data dictionaries, and analytic documentation
• Ensure data quality, completeness, and internal consistency across studies
• Provide ongoing support for database-related queries
Data Extraction and Integration
• Extract and compile data from multiple clinical and research systems
• Merge datasets from disparate sources
• Format and standardize data for analysis and reporting
Machine Learning and LLM Pipeline Development for Research
• Prepare AI-ready feature sets and longitudinal datasets for predictive modeling in oncology
• Implement data preprocessing, feature engineering, and validation workflows for ML models
• Design and implement LLM pipelines to extract cancer-specific variables from unstructured clinical text
• Develop, test, and maintain multi-prompt or multi-stage LLM workflows
• Evaluate LLM outputs using gold-standard annotations and quantitative metrics
• Support model validation, error analysis, and generalizability testing across cohorts
• Contribute to reusable analytic and modeling frameworks across disease sites
Research Collaboration and Translational Support
• Collaborate with clinicians, informaticians, programmers, and investigators on analytic workflows
• Participate in interdisciplinary research teams across oncology and informatics
• Support manuscript- and grant-related analyses with reproducible pipelines
Learning, Innovation, and Methodological Growth
• Stay current with emerging methods in machine learning, LLMs, NLP, and clinical research informatics
• Participate in training, workshops, and internal AI initiatives
• Contribute to the evolution of group AI and data analytics infrastructure
Other duties as assigned - 5%
EDUCATION
WORK EXPERIENCE
The University of Texas MD Anderson Cancer Center offers excellent benefits, including medical, dental, paid time off, retirement, tuition benefits, educational opportunities, and individual and team recognition.
This position may be responsible for maintaining the security and integrity of critical infrastructure, as defined in Section 113.001(2) of the Texas Business and Commerce Code and therefore may require routine reviews and screening. The ability to satisfy and maintain all requirements necessary to ensure the continued security and integrity of such infrastructure is a condition of hire and continued employment.
It is the policy of The University of Texas MD Anderson Cancer Center to provide equal employment opportunity without regard to race, color, religion, age, national origin, sex, gender, sexual orientation, gender identity/expression, disability, protected veteran status, genetic information, or any other basis protected by institutional policy or by federal, state, or local laws unless such distinction is required by law.http://www.mdanderson.org/about-us/legal-and-policy/legal-statements/eeo-affirmative-action.html
Additional Information
#LI-Hybrid
The University of Texas MD Anderson Cancer Center is a leading institution focused on cancer care, research, education, and prevention. As part of UT MD Anderson, the Clinical Research Informatics Data Scientist contributes to a mission-driven environment where advanced analytics and clinical insight come together to improve outcomes for patients with cancer. The Clinical Research Informatics Data Scientist role is designed for individuals motivated by translational impact, interdisciplinary collaboration, and methodological growth.
The ideal candidate is someone with a strong educational foundation or equivalent experience in data science, biomedical informatics, computer science, or a related quantitative field, combined with hands-on experience using R and/or Python for machine learning and deep learning workflows. They bring experience working with clinical or biomedical data, including unstructured clinical text, and are comfortable handling protected health information under IRB and data governance requirements. Experience with LLM-based methods, clinical research environments, or oncology-related data is preferred, along with strong collaboration skills for working effectively with clinicians and research teams.
Minimum $27.64 - Midpoint $34.62 - Maximum $41.59. The typical work schedule is Monday-Friday (minimum 40 hours/8-hour days). This is a hybrid role with a minimum of one day on-site in Houston, TX, and additional on-site presence as required for business or departmental needs.
Why Us?
This role offers the opportunity to work within a first-of-its-kind clinical research informatics program embedded directly in surgical oncology at UT MD Anderson, supporting active, high-impact cancer research using advanced analytics, machine learning, and LLM-based abstraction pipelines. The team is lean and highly collaborative, providing visible individual contributions, mentorship-oriented management, and genuine opportunities to co-author research while maintaining a sustainable work-life balance in a hybrid environment.
• Employer-paid medical coverage starting day one for employees working 30+ hours/week, plus optional group dental, vision, life, AD&D, and disability insurance.
• Accruals for PTO and Extended Illness Bank, plus paid holidays, wellness, childcare, and other leave options.
• Tuition Assistance Program after six months of service and access to extensive wellness, fitness, and employee resource groups.
• Defined-benefit pension through the Teachers Retirement System, voluntary retirement plans, and employer-paid life and reduced salary protection programs.
KEY RESPONSIBILITIES
Data Standardization, Harmonization, and Infrastructure
• Maintain standardized analytic datasets for cancer research across multiple data sources
• Develop and apply common data models, variable definitions, and ontologies
• Build data transformation pipelines using Python, R, and SQL
• Maintain metadata, data dictionaries, and analytic documentation
• Ensure data quality, completeness, and internal consistency across studies
• Provide ongoing support for database-related queries
Data Extraction and Integration
• Extract and compile data from multiple clinical and research systems
• Merge datasets from disparate sources
• Format and standardize data for analysis and reporting
Machine Learning and LLM Pipeline Development for Research
• Prepare AI-ready feature sets and longitudinal datasets for predictive modeling in oncology
• Implement data preprocessing, feature engineering, and validation workflows for ML models
• Design and implement LLM pipelines to extract cancer-specific variables from unstructured clinical text
• Develop, test, and maintain multi-prompt or multi-stage LLM workflows
• Evaluate LLM outputs using gold-standard annotations and quantitative metrics
• Support model validation, error analysis, and generalizability testing across cohorts
• Contribute to reusable analytic and modeling frameworks across disease sites
Research Collaboration and Translational Support
• Collaborate with clinicians, informaticians, programmers, and investigators on analytic workflows
• Participate in interdisciplinary research teams across oncology and informatics
• Support manuscript- and grant-related analyses with reproducible pipelines
Learning, Innovation, and Methodological Growth
• Stay current with emerging methods in machine learning, LLMs, NLP, and clinical research informatics
• Participate in training, workshops, and internal AI initiatives
• Contribute to the evolution of group AI and data analytics infrastructure
Other duties as assigned - 5%
EDUCATION
- Required: Bachelor's Degree Biology, Nursing or related field.
- Preferred: Master's Degree Biology, Nursing or related field.
WORK EXPERIENCE
- Required: 3 years Related experience.
Additional years of education may be substituted for experience on a one to one basis. - Preferred: Proficiency in R and/or Python for data science and ML workflows; hands-on experience with predictive modeling, deep learning frameworks, or LLM APIs; and familiarity with clinical or biomedical data, including unstructured clinical text.
The University of Texas MD Anderson Cancer Center offers excellent benefits, including medical, dental, paid time off, retirement, tuition benefits, educational opportunities, and individual and team recognition.
This position may be responsible for maintaining the security and integrity of critical infrastructure, as defined in Section 113.001(2) of the Texas Business and Commerce Code and therefore may require routine reviews and screening. The ability to satisfy and maintain all requirements necessary to ensure the continued security and integrity of such infrastructure is a condition of hire and continued employment.
It is the policy of The University of Texas MD Anderson Cancer Center to provide equal employment opportunity without regard to race, color, religion, age, national origin, sex, gender, sexual orientation, gender identity/expression, disability, protected veteran status, genetic information, or any other basis protected by institutional policy or by federal, state, or local laws unless such distinction is required by law.http://www.mdanderson.org/about-us/legal-and-policy/legal-statements/eeo-affirmative-action.html
Additional Information
- Requisition ID: 180605
- Employment Status: Full-Time
- Employee Status: Regular
- Work Week: Days
- Minimum Salary: US Dollar (USD) 57,500
- Midpoint Salary: US Dollar (USD) 72,000
- Maximum Salary : US Dollar (USD) 86,500
- FLSA: non-exempt and eligible for overtime pay
- Fund Type: Soft
- Work Location: Hybrid Onsite/Remote
- Pivotal Position: Yes
- Referral Bonus Available?: No
- Relocation Assistance Available?: No
#LI-Hybrid