Welcome to the new SBIR.gov, to assist in getting you situated with the system, a preview of the new login and registration process is available here. Please reach out to the website support team with any questions via sba.sbir.support@reisystems.com
Company
Portfolio Data
Genomenon, Inc.
Address
3135 S STATE ST # 350BRAnn Arbor, MI, 48108-1653
USA
UEI: Z1NLALZ88SE1
Number of Employees: 24
HUBZone Owned: No
Woman Owned: No
Socially and Economically Disadvantaged: No
SBIR/STTR Involvement
Year of first award: 2017
2
Phase I Awards
2
Phase II Awards
100%
Conversion Rate
$377,363
Phase I Dollars
$3,194,746
Phase II Dollars
$3,572,109
Total Awarded
Awards

Micropublications for Automating Genome Sequence Variant Interpretation from Medical Literature
Amount: $1,710,048 Topic: 172
PROJECT SUMMARY Accurate and efficient interpretation of genomic variants for clinical decision making is predicated on ready access to and extraction of information from the medical literature. The sheer number of potentially relevant articles that must be examined during this process poses a significant challenge in ensuring the accuracy and reproducibility of clinical interpretation as it is time-consuming, error-prone, and highly user-dependent. To this end, we have developed the Mastermind Genomic Search Engine - a commercial database that automatically organizes disease, gene and variant information from the medical literature by systematically indexing millions of scientific articles. Mastermind is used by over 9,100 variant scientists in more than 100 different countries to more quickly interpret genetic variants in clinical settings. In Phase I of this project, we developed and tested a micropublication platform within Mastermind that assembles literature curation along with population frequency data, computational predictions of pathogenicity, and automated ACMG/AMP classifications that improves the speed of variant interpretation by more than 70% and increases the sensitivity of these results by 2-20x. The present proposal seeks to build on the success of Phase I by 1) integrating the micropublication platform into Mastermind with migration of collaborative features for community-based evaluation of variant interpretations; 2) optimizing and improving automated variant interpretation/prioritization of articles and implementing a rigorous quality assurance process; and 3) using these improvements to curate all evidence in all variants in all genes comprising the entire human genome, beginning with the clinical exome. Integration of the pre-curated genome data in the micropublication platform will result in Mastermind Enterprise, allowing for immediate and accurate genome-wide variant interpretations with collaborative curation in real-time at the point of interaction with source material (i.e. individual references). This work will mitigate reproducibility challenges plaguing other large-scale crowd-sourced projects, including those undertaken by groups like NIH’s ClinVar and QIAGEN’s HGMD. In addition, our novel approach will not suffer from poor sensitivity as it relies on a comprehensive source of medical literature pre-annotated based on genetic content. This work will permit dramatic scaling of variant interpretation activities and allow for complete and accurate curation of the entire human genome within 2 years – a feat that could not be completed utilizing current manual methods for variant interpretation. Mastermind Enterprise will be revolutionary in the genomics industry and will represent a natural next step to build on the achievements provided by the Human Genome Project and the reduced cost of next-generation sequencing. It will substantially improve diagnostic rates and accuracy in the clinic, especially in rare disease, where a lack of genetic evidence often results in severely delayed and inaccurate diagnoses. Additionally, it will allow the pharmaceutical industry to develop more successful targeted therapies and to design more inclusive clinical trials as well as to more reliably identify patients who would benefit from therapeutic intervention. [Word count – 468; Line count – 30]
Tagged as:
SBIR
Phase II
2021
HHS
NIH

Micropublications for Automating Genome Sequence Variant Interpretation from Medical Literature
Amount: $152,946 Topic: 172
PROJECT SUMMARY Accurate and efficient interpretation of genomic variants for clinical decision making is predicated on ready access to useful information in the medical literatureThe sheer number of potentially relevant articles that must be examined during this curation process poses a major challenge in ensuring the accuracy and reproducibility of clinical variant interpretation as it is time consuming and the results highly user dependentTo this endwe have developed the Mastermind Genomic Search Enginea commercial database that automatically organizes diseasegene and variant information from the medical literature by systematically indexing millions of scientific articlesIn direct comparison to manually developed databases of genetic variantswe have achieved greater thanconcordance and accurately identified andgtmore variants with an average offold more references demonstrating the effectiveness of our automated approachCurrentlyMastermind is used by overvariant scientists indifferent countries to more quickly curate literature for genetic variants in clinical settingsIn response to feedback from ClinGen curators and othersthe present proposal seeks to create a framework to facilitate literature curation and clinical variant interpretation activities within Mastermind byprioritizing relevant references and external database entries containing content meaningful to variant classification guidelinesassembling this information into amicropublicationtext format with codified data fields including population frequenciescomputational predictionsreference citations and relevant sentence fragments with conclusive contentallowing users to manually reviewalter and augment pre populated entries andproviding a platform to share and continuously update this information with other variant scientists in the Mastermind community and elsewhereDeveloping tools that allow for collaborative curation in real time at the point of interaction with source materiali eindividual articleswill mitigate reproducibility challenges plaguing other large scale crowd sourced projectsIn contrast to genetic variant databases of user submitted classification information and associated datathe present proposal seeks to create enhanced curation tools to fill such variant databases with more accurate and reproducible data and in a way that would promote dramatic scaling of variant curation activities including those undertaken by groups like ClinVarTo test this approachwe will work with industry partners engaged in variant curation activities todetermine the requisite data fieldsintegrate external database informationtest the accuracy and relevance of results and the overall efficacy of the approach using hundreds of manually curated genetic variantsandsolicit and incorporate feedback from our development partners to iterate and refine the software features for the greatest effectWithin the $ B genome sequencing software marketthere is significant commercial potential and scientific merit in bringing more automated techniques of data analysis to large scale genome sequencing variant interpretation as described in this proposalWord countLine countPROJECT NARRATIVE Successful completion of the present project will contribute to the public health mission of the NIH by promoting more widespread adoption of genome sequencing by making the interpretation of genome variant data more accuratereproducible and cost effective in clinical and research laboratoriesThe community of users that can benefit from this work include geneticistsoncologistspathologistsresearchers and patientsTwo sentences
Tagged as:
SBIR
Phase I
2019
HHS
NIH

Commercial Software Using High-throughput Computational Techniques to Improve Genome Analysis
Amount: $1,484,698 Topic: 172
Recent advances in DNA sequencing technology have not been matched by improved analytic techniques to quickly and accurately interpret patient genome data to inform diagnosisprognosis and therapy making decisions in the clinic and to identify candidate biomarkers of disease in research laboratoriesDevelopment of automated techniques to facilitate interpretation of this data will benefit patient care and improve public health by promoting widespread use of cost efficient sequencing clinically and by making it feasible to sequence a broader range of patients including those with complex disease or to identify patients who have an elevated risk of developing future diseaseOur long term goal is to commoditize sequence interpretation using highthroughput computational techniques in the same way that next generation DNA sequencing technology has commoditized genome data productionThe present project will result in commercial software that automates genome sequence interpretationSpecificallywe will developsoftware that automatically collects and organizes a comprehensive set of genetic information by systematically reading millions of scientific articles and scanning dozens of genetic variant databasessoftware that uses this information to prioritize patient data into clinical categories based on the likelihood of diseaseandsoftware that automatically identifies candidate biomarkers of disease from multi sample cohort dataTo do this we will use a variety of innovative data processing techniquesFirstwe will systematically mutate the reference genome in silico to produce a comprehensive database of every possible mutation at every position of every gene and use this data to query every word of every article ever published or any publicly available database to identify disease gene variant associationsWe will compare the results from this automated process to results obtained using more expensive and time consuming manual methods and hypothesize that we can achieveconcordance and identifymore variants andfold more references for eachThese results will be organized into clinically meaningful categories and presented in an interactive graphical interface that displays the evidence for each of these associationsWe will then use this information to drive prioritization of patient data based on similarities to known disease causing variants and the strength of evidence for their pathogenicity in order to increase analytic sensitivity and specificity thereby improving speed and reliability of sequencing in the clinicOur automated results will then be compared to conventional methods of data annotation and filtration for andgtpatient samples fromdiseasesFinallywe will use the same prioritization strategy to comprehensively compare variant data between all patients within a disease cohort to automatically identify the variants most likely to lead to disease and compare our automated results to conventional methods for andgtsamples fromdiseasesThe growth in the $B genome sequencing market is driven by improvements in informatics techniques and automated solutions such as proposed here have significant commercial potential The successful completion of the proposed project will contribute to the public health mission of the NIH by promoting more widespread adoption of genome sequencing by making the interpretation of this data more accurate and cost effective in clinical and research laboratoriesThe community of users that can benefit from this research include geneticistsoncologistspathologistsresearchers and patients
Tagged as:
SBIR
Phase II
2018
HHS
NIH

Commercial Software Using High throughput Computational Techniques to Improve Genome Analysis
Amount: $224,417 Topic: 172
Recent advances in DNA sequencing technology have not been matched by improved analytic techniques to quickly and accurately interpret patient genome data to inform diagnosis prognosis and therapy making decisions in the clinic and to identify candidate biomarkers of disease in research laboratories Development of automated techniques to facilitate interpretation of this data will benefit patient care and improve public health by promoting widespread use of cost efficient sequencing clinically and by making it feasible to sequence a broader range of patients including those with complex disease or to identify patients who have an elevated risk of developing future disease Our long term goal is to commoditize sequence interpretation using high throughput computational techniques in the same way that next generation DNA sequencing technology has commoditized genome data production The present project will result in commercial software that automates genome sequence interpretation Specifically we will develop software that automatically collects and organizes a comprehensive set of genetic information by systematically reading millions of scientific articles and scanning dozens of genetic variant databases software that uses this information to prioritize patient data into clinical categories based on the likelihood of disease and software that automatically identifies candidate biomarkers of disease from multi sample cohort data To do this we will use a variety of innovative data processing techniques First we will systematically mutate the reference genome in silico to produce a comprehensive database of every possible mutation at every position of every gene and use this data to query every word of every article ever published or any publicly available database to identify disease gene variant associations We will compare the results from this automated process to results obtained using more expensive and time consuming manual methods and hypothesize that we can achieve concordance and identify more variants and fold more references for each These results will be organized into clinically meaningful categories and presented in an interactive graphical interface that displays the evidence for each of these associations We will then use this information to drive prioritization of patient data based on similarities to known disease causing variants and the strength of evidence for their pathogenicity in order to increase analytic sensitivity and specificity thereby improving speed and reliability of sequencing in the clinic Our automated results will then be compared to conventional methods of data annotation and filtration for andgt patient samples from diseases Finally we will use the same prioritization strategy to comprehensively compare variant data between all patients within a disease cohort to automatically identify the variants most likely to lead to disease and compare our automated results to conventional methods for andgt samples from diseases The growth in the $ B genome sequencing market is driven by improvements in informatics techniques and automated solutions such as proposed here have significant commercial potential The successful completion of the proposed project will contribute to the public health mission of the NIH by promoting more widespread adoption of genome sequencing by making the interpretation of this data more accurate and cost effective in clinical and research laboratories The community of users that can benefit from this research include geneticists oncologists pathologists researchers and patients
Tagged as:
SBIR
Phase I
2017
HHS
NIH