Sr Data Scientist, Spatial Omics
- Requisition #: 170806
- Department: Genomic Med Rsch Department
- Location: Houston, TX
- Posted Date: 10/15/2024
The University of Texas MD Anderson Cancer Center has the potential to unlock the power of data by further developing and investing in talent, team science and infrastructure to optimize multidimensional data integration, analysis and application for the benefit of patients with cancer. The Institute for Data Science in Oncology (IDSO) is a signature priority program aimed at transforming the patient experience, enhancing quality of life and accelerating scientific breakthroughs via advanced, data-driven approaches to cancer care. IDSO will enable teams to search for, learn from and apply as much information as possible gathered from every patient MD Anderson has seen or will see. By growing a data-centric culture and advancing data management and analytics maturity, we provide better, state-of-science and state-of-data-science care for patients while exploring areas of cancer research and treatment currently unknown to clinical communities.
The IDSO recruits, positions and enables best-in-class data scientists to unravel seemingly insoluble problems in cancer and work toward meaningful solutions for patients. Our aims include reducing the time between diagnostic procedures and treatment decisions, advancing drug discovery efforts and bringing new, precise medicines to bedsides sooner. The IDSO centralizes our focused institutional investment in data science, as well as enables partnerships with other world-leading organizations, to operate and enhance an unprecedented oncological "data supply chain" designed to accelerate research and treatment innovation. Comprised of the best minds in a myriad of scientific and data-driven fields, the IDSO facilitates a culture grounded in our innovative "team-data science" principles, such as shared motivation, shared learning, provenance linking insights to observations and integrated data governance.
One of the five focus areas IDSO has prioritized is Focus Area 2 - Single Cell Spatial Omics to Unveil Cancer Complexity, which is co-led by Dr. Linghua Wang, an Associated Professor in the department of Genomic Medicine at MD Anderson Cancer Center and a recognized computational biologist and leader in single-cell spatial research in cancer. Our goal is to build a world-class data science hub for single-cell and spatial omics studies to accelerate the discovery and translation of cancer research at MD Anderson and beyond.
We are seeking multiple highly motivated and enthusiastic Senior Data Scientists with a strong computational background and expertise in programming and bioinformatics pipeline development. The successful candidate will lead the development and optimization of spatial multi-omics pipelines for commonly employed platforms, including spatial transcriptomics, proteomics, metabolomics, genomic, microbiome, antigen receptor profiling, tissue and molecular imaging, and other related omics platforms. The candidates will also be responsible for providing regular updates and organizing efforts for advancing pipeline development.
Dr. Linghua Wang and her group will provide guidance and support, assisting the candidates in brainstorming workflow decisions and optimizing the pipeline development process. With our significant expertise in the field, we are committed to supporting each candidate achieve excellence in their role.
JOB SPECIFIC COMPETENCIES
Technical-specific:
* Participate in the design, development, and optimization of spatial multi-omics pipelines.
* Benchmark computational methods and toolkits to ensure best practices and excellence in data processing and analysis.
* Develop computational tools as needed for new technologies and integrate them with existing platforms.
* Ensure scalability and efficiency of the pipelines to handle large and complex datasets.
* Conduct rigorous testing and validation of computational tools to ensure accuracy and reproducibility.
* Develop interactive web portals for effective data exploration and visualization.
Collaboration and Communication:
* Meet with teams and collaborators regularly to discuss projects, communicate results, and provide computational expertise. Effective collaboration is a strong component of this role.
* Work with teams to integrate data and metadata, assisting postdocs and students in analyzing and resolving issues related to data analysis as needed.
* Lead hands-on training workshops, courses, and seminars.
Research and Publication:
* Assist with code submission and distribute analytical methods and codes to the scientific community.
* Assist with manuscript and grant preparation and presentations, and present at lab meetings, departmental and institutional seminars, and appropriate scientific conferences.
Other Duties:
* Attend collaborator meetings and team working group meetings. Prioritize and manage multiple projects in a timely and resource-effective manner.
* Stay up to date with relevant literature, gather information systematically, and confer with the supervisor regarding new procedures.
Ideal Candidate will have:
Background
* Ph.D. degree in computer science, quantitative science, data science, bioinformatics, biostatistics, or a related field.
* Experience with best practices in spatial omics data analysis, including multimodal data integration.
* Demonstrated ability to develop, optimize, and maintain bioinformatics pipelines.
Technical Proficiency:
* Proficiency in at least two or more programming languages, such as R and Python.
* Proficiency in modern workflow languages, such as Nextflow, WDL, or CWL for scalable, reproducible bioinformatics pipelines.
* Proficiency in containerization technologies, including Docker, Docker Swarm, and Kubernetes, for creating reproducible and scalable environments.
* Proficiency in version control systems (e.g., Git, and GitHub) and Linux system administration.
* Familiarity with FAIR principles (Findable, Accessible, Interoperable, and Reusable) for data and tool development to ensure reproducibility and accessibility of scientific workflows.
Soft Skills:
* Must be honest, highly motivated, reliable, responsible, and well-organized.
* Strong attention to detail, with the ability to multi-task and a willingness to learn new methods and technologies.
* Excellent written and verbal communication skills.
* Ability to work effectively in a collaborative and interdisciplinary team environment (strong plus).
Additional Qualifications:
* Familiarity with cloud computing platforms (e.g., AWS, Google Cloud, Azure) is a plus.
* Knowledge of high-performance computing (HPC) environments are a plus.
Onsite Presence: Is Not Required
WORKING CONDITIONS
PHYSICAL DEMANDS
COGNITIVE DEMANDS
EDUCATION:
Required: Bachelor's degree in Biomedical Engineering, Electrical Engineering, Computer Engineering, Physics, Applied Mathematics, Science, Engineering, Computer Science, Statistics, Computational Biology, or related field.
Preferred: Master's or PhD in Biomedical Engineering, Electrical Engineering, Computer Engineering, Physics, Applied Mathematics, Science, Engineering, Computer Science, Statistics, Computational Biology, or related field.
EXPERIENCE:
Required: Five years of experience in scientific software or industry programming with a concentration in scientific computing. With Master's degree, three years experience required. With PhD, one year of experience required.
Preferred: Experience with best practices in spatial omics data analysis, including multimodal data integration.
It is the policy of The University of Texas MD Anderson Cancer Center to provide equal employment opportunity without regard to race, color, religion, age, national origin, sex, gender, sexual orientation, gender identity/expression, disability, protected veteran status, genetic information, or any other basis protected by institutional policy or by federal, state or local laws unless such distinction is required by law. http://www.mdanderson.org/about-us/legal-and-policy/legal-statements/eeo-affirmative-action.html
Additional Information
#LI-Hybrid
The IDSO recruits, positions and enables best-in-class data scientists to unravel seemingly insoluble problems in cancer and work toward meaningful solutions for patients. Our aims include reducing the time between diagnostic procedures and treatment decisions, advancing drug discovery efforts and bringing new, precise medicines to bedsides sooner. The IDSO centralizes our focused institutional investment in data science, as well as enables partnerships with other world-leading organizations, to operate and enhance an unprecedented oncological "data supply chain" designed to accelerate research and treatment innovation. Comprised of the best minds in a myriad of scientific and data-driven fields, the IDSO facilitates a culture grounded in our innovative "team-data science" principles, such as shared motivation, shared learning, provenance linking insights to observations and integrated data governance.
One of the five focus areas IDSO has prioritized is Focus Area 2 - Single Cell Spatial Omics to Unveil Cancer Complexity, which is co-led by Dr. Linghua Wang, an Associated Professor in the department of Genomic Medicine at MD Anderson Cancer Center and a recognized computational biologist and leader in single-cell spatial research in cancer. Our goal is to build a world-class data science hub for single-cell and spatial omics studies to accelerate the discovery and translation of cancer research at MD Anderson and beyond.
We are seeking multiple highly motivated and enthusiastic Senior Data Scientists with a strong computational background and expertise in programming and bioinformatics pipeline development. The successful candidate will lead the development and optimization of spatial multi-omics pipelines for commonly employed platforms, including spatial transcriptomics, proteomics, metabolomics, genomic, microbiome, antigen receptor profiling, tissue and molecular imaging, and other related omics platforms. The candidates will also be responsible for providing regular updates and organizing efforts for advancing pipeline development.
Dr. Linghua Wang and her group will provide guidance and support, assisting the candidates in brainstorming workflow decisions and optimizing the pipeline development process. With our significant expertise in the field, we are committed to supporting each candidate achieve excellence in their role.
JOB SPECIFIC COMPETENCIES
Technical-specific:
* Participate in the design, development, and optimization of spatial multi-omics pipelines.
* Benchmark computational methods and toolkits to ensure best practices and excellence in data processing and analysis.
* Develop computational tools as needed for new technologies and integrate them with existing platforms.
* Ensure scalability and efficiency of the pipelines to handle large and complex datasets.
* Conduct rigorous testing and validation of computational tools to ensure accuracy and reproducibility.
* Develop interactive web portals for effective data exploration and visualization.
Collaboration and Communication:
* Meet with teams and collaborators regularly to discuss projects, communicate results, and provide computational expertise. Effective collaboration is a strong component of this role.
* Work with teams to integrate data and metadata, assisting postdocs and students in analyzing and resolving issues related to data analysis as needed.
* Lead hands-on training workshops, courses, and seminars.
Research and Publication:
* Assist with code submission and distribute analytical methods and codes to the scientific community.
* Assist with manuscript and grant preparation and presentations, and present at lab meetings, departmental and institutional seminars, and appropriate scientific conferences.
Other Duties:
* Attend collaborator meetings and team working group meetings. Prioritize and manage multiple projects in a timely and resource-effective manner.
* Stay up to date with relevant literature, gather information systematically, and confer with the supervisor regarding new procedures.
Ideal Candidate will have:
Background
* Ph.D. degree in computer science, quantitative science, data science, bioinformatics, biostatistics, or a related field.
* Experience with best practices in spatial omics data analysis, including multimodal data integration.
* Demonstrated ability to develop, optimize, and maintain bioinformatics pipelines.
Technical Proficiency:
* Proficiency in at least two or more programming languages, such as R and Python.
* Proficiency in modern workflow languages, such as Nextflow, WDL, or CWL for scalable, reproducible bioinformatics pipelines.
* Proficiency in containerization technologies, including Docker, Docker Swarm, and Kubernetes, for creating reproducible and scalable environments.
* Proficiency in version control systems (e.g., Git, and GitHub) and Linux system administration.
* Familiarity with FAIR principles (Findable, Accessible, Interoperable, and Reusable) for data and tool development to ensure reproducibility and accessibility of scientific workflows.
Soft Skills:
* Must be honest, highly motivated, reliable, responsible, and well-organized.
* Strong attention to detail, with the ability to multi-task and a willingness to learn new methods and technologies.
* Excellent written and verbal communication skills.
* Ability to work effectively in a collaborative and interdisciplinary team environment (strong plus).
Additional Qualifications:
* Familiarity with cloud computing platforms (e.g., AWS, Google Cloud, Azure) is a plus.
* Knowledge of high-performance computing (HPC) environments are a plus.
Onsite Presence: Is Not Required
WORKING CONDITIONS
| Frequency |
Deadlines | -- |
PHYSICAL DEMANDS
| Frequency | Weight |
Sitting Keyboarding Carrying Lifting Standing Pushing/Pulling Reaching | Constant 67-100% Constant 67-100% Occasionally 11-33% Seldom 3-10% Occasionally 11-33% Occasionally 11-33% Seldom 3-10% | -- -- 10-20 lbs. 10-20 lbs. -- 10-20 lbs. -- |
COGNITIVE DEMANDS
Analytical Ability Attention to detail Interpersonal Skills Oral Communication Working Alone Written Communication |
EDUCATION:
Required: Bachelor's degree in Biomedical Engineering, Electrical Engineering, Computer Engineering, Physics, Applied Mathematics, Science, Engineering, Computer Science, Statistics, Computational Biology, or related field.
Preferred: Master's or PhD in Biomedical Engineering, Electrical Engineering, Computer Engineering, Physics, Applied Mathematics, Science, Engineering, Computer Science, Statistics, Computational Biology, or related field.
EXPERIENCE:
Required: Five years of experience in scientific software or industry programming with a concentration in scientific computing. With Master's degree, three years experience required. With PhD, one year of experience required.
Preferred: Experience with best practices in spatial omics data analysis, including multimodal data integration.
It is the policy of The University of Texas MD Anderson Cancer Center to provide equal employment opportunity without regard to race, color, religion, age, national origin, sex, gender, sexual orientation, gender identity/expression, disability, protected veteran status, genetic information, or any other basis protected by institutional policy or by federal, state or local laws unless such distinction is required by law. http://www.mdanderson.org/about-us/legal-and-policy/legal-statements/eeo-affirmative-action.html
Additional Information
- Requisition ID: 170806
- Employment Status: Full-Time
- Employee Status: Regular
- Work Week: Days
- Minimum Salary: US Dollar (USD) 119,500
- Midpoint Salary: US Dollar (USD) 149,500
- Maximum Salary : US Dollar (USD) 179,500
- FLSA: exempt and not eligible for overtime pay
- Fund Type: Soft
- Work Location: Hybrid Onsite/Remote
- Pivotal Position: Yes
- Referral Bonus Available?: Yes
- Relocation Assistance Available?: Yes
- Science Jobs: Yes
#LI-Hybrid