Science exchange logo white
  • Solutions
      Buyers

      We are making R&D services readily available to every organization that seeks to make scientific impact. Learn More

      Providers

      We are changing the way providers access and engage customers to streamline the sale and delivery of R&D services. Learn More

      Industries Agriscience Animal Health Basic Research Biopharmaceutical Chemicals Consumer Health Food Science Medical Devices
      Reproducibility

      We believe that good experiments can and should be independently replicated and validated. Learn More

  • Resources
    Innovation Blog
    Customer Stories
    Events
    Industry Trends
    News
    Product Updates
    Help Center
  • About
    About
    Our Story
    Leadership
    Partners
    Join the Team
  • Contact
  • Log In Sign Up
  • Get a Demo
  • Annotating genes and genomes with DNA sequences extracted from biomedical articles.

    Bioinformatics. 27(7):980-6. doi: 10.1093/bioinformatics/btr043. April 1, 2011. View on PubMed.
  • Authors

    Casey Bergman, Haeussler M, and Gerner M
  • Abstract

    MOTIVATION Increasing rates of publication and DNA sequencing make the problem of finding relevant articles for a particular gene or genomic region more challenging than ever. Existing text-mining approaches focus on finding gene names or identifiers in English text. These are often not unique and do not identify the exact genomic location of a study. RESULTS Here, we report the results of a novel text-mining approach that extracts DNA sequences from biomedical articles and automatically maps them to genomic databases. We find that ∼20% of open access articles in PubMed central (PMC) have extractable DNA sequences that can be accurately mapped to the correct gene (91%) and genome (96%). We illustrate the utility of data extracted by text2genome from more than 150 000 PMC articles for the interpretation of ChIP-seq data and the design of quantitative reverse transcriptase (RT)-PCR experiments. Conclusion Our approach links articles to genes and organisms without relying on gene names or identifiers. It also produces genome annotation tracks of the biomedical literature, thereby allowing researchers to use the power of modern genome browsers to access and analyze publications in the context of genomic data. Availability and implementation Source code is available under a BSD license from http//sourceforge.net/projects/text2genome/ and results can be browsed and downloaded at http//text2genome.org.

Science exchange logo white

  • Facebook
  • Twitter
  • LinkedIn

Solutions

  • Buyers
  • Providers
  • Reproducibility

Industries

  • Agriscience
  • Animal Health
  • Basic Research
  • Biopharmaceutical
  • Chemicals
  • Consumer Health
  • Food Science
  • Medical Devices

Resources

  • Innovation Blog
  • Customer Stories
  • Events
  • Industry Trends
  • News
  • Product Updates

About

  • Our Story
  • Leadership
  • Partners
  • Join the Team

Support

  • Contact Us
  • Help Center
  • Trust
  • Terms of Use
  • Privacy Policy

Copyright © 2021 Science Exchange, Inc. All rights reserved.