SlideShare a Scribd company logo
Literature mining and large-scale data integration Lars Juhl Jensen EMBL Heidelberg
literature mining
why?
 
too much to read
information retrieval
finding the papers
ad hoc  retrieval
user-specified query
“ yeast  AND  cell cycle”
stemming
yeast / yeasts
dynamic query expansion
yeast /  S. cerevisiae
ranking
 
 
 
 
 
 
 
 
Mitotic cyclin (Clb2)-bound Cdc28 (Cdk1 homolog) directly phosphorylated Swe1 and this modification served as a priming step to promote subsequent Cdc5-dependent Swe1 hyperphosphorylation and degradation
no tool will find it
entity recognition
identifying the substance(s)
Mitotic cyclin ( Clb2 )-bound  Cdc28  (Cdk1 homolog) directly phosphorylated  Swe1  and this modification served as a priming step to promote subsequent  Cdc5 -dependent  Swe1  hyperphosphorylation and degradation
Cdc28    yeast
Cdc28    cell cycle
good synonyms list
manual curation
orthographic variation
CDC28
Cdc28p
disambiguation
hairy
SDS
APC
Cdc2
 
 
 
 
still too much to read
information extraction
formalizing the facts
 
co-mentioning
statistical methods
NLP Natural Language Processing
Gene  and protein  names Cue words for entity recognition Verbs for relation extraction [ nxexpr  T he  expression  of   [ nxgene   the cytochrome  genes   [ nxpg   CYC1  and  CYC7 ]]] is  controlled  by [ nxpg   HAP1 ]
Mitotic cyclin ( Clb2 )-bound  Cdc28  (Cdk1 homolog) directly phosphorylated  Swe1  and this modification served as a priming step to promote subsequent  Cdc5 -dependent  Swe1  hyperphosphorylation  and degradation
 
no new discoveries
text mining
undiscovered links
 
Raynaud’s syndrome
fish oil
 
temporal trends
 
buzzwords
 
data integration
association networks
 
information extraction
 
curated knowledge
 
protein interaction data
 
genetic interaction data
 
gene expression data
 
computational predictions
conserved neighborhood
 
gene fusion
 
phylogenetic profiles
 
variable reliability
raw quality scores
 
 
 
not comparable
benchmarking
calibrate vs. gold standard
 
probabilistic scores
spread over many species
373 genomes
 
transfer by orthology
 
combine all evidence
P = 1-(1-P 1 ) . (1-P 2 ) . (1-P 3 ) …
web resources
 
 
signaling networks
phosphoproteomics
 
in vivo  phosphosites
kinases are unknown
computational methods
 
overprediction
context
scaffolders
association networks
 
NetworKIN
 
benchmarking
 
2.5-fold better accuracy
web resources
 
 
summary
literature mining is good
data integration is better
Acknowledgments Reflect & NLP Evangelos Pafilis Jasmin Saric Rossitza Ouzounova Sean O’Donoghue Isabel Rojas STRING & STITCH Christian von Mering Michael Kuhn Manuel Stark Samuel Chaffron Philippe Julien Tobias Doerks Jan Korbel Berend Snel Martijn Huynen Peer Bork NetworKIN & NetPhorest Rune Linding Martin Lee Miller Gerard Ostheimer Francesca Diella Karen Colwill Jing Jin Pavel Metalnikov Vivian Nguyen Adrian Pasculescu Jin Gyoon Park Leona D. Samson Nikolaj Blom Rob Russell Peer Bork Søren Brunak Michael Yaffe Tony Pawson
http://larsjuhljensen.wordpress.com

More Related Content

PPT
Literature Mining and Systems Biology
Lars Juhl Jensen
 
PPT
Biomedical literature mining
Lars Juhl Jensen
 
PPT
Literature mining: what is it, and should I care?
Lars Juhl Jensen
 
PPT
Biomedical literature mining (and why we really need open access)
Lars Juhl Jensen
 
PPT
Biological literature mining - from information retrieval to biological disco...
Lars Juhl Jensen
 
ZIP
Exploring proteins, chemicals and their interactions with STRING and STITCH
biocs
 
PPTX
Mining Drug Targets, Structures and Activity Data
Chris Southan
 
PPT
Mining literature and medical records
Lars Juhl Jensen
 
Literature Mining and Systems Biology
Lars Juhl Jensen
 
Biomedical literature mining
Lars Juhl Jensen
 
Literature mining: what is it, and should I care?
Lars Juhl Jensen
 
Biomedical literature mining (and why we really need open access)
Lars Juhl Jensen
 
Biological literature mining - from information retrieval to biological disco...
Lars Juhl Jensen
 
Exploring proteins, chemicals and their interactions with STRING and STITCH
biocs
 
Mining Drug Targets, Structures and Activity Data
Chris Southan
 
Mining literature and medical records
Lars Juhl Jensen
 

What's hot (20)

PPT
Applied text mining
Lars Juhl Jensen
 
PPT
Text mining
Lars Juhl Jensen
 
PPT
Applied text mining
Lars Juhl Jensen
 
PPT
Text mining
Lars Juhl Jensen
 
PPT
Integration of biomedical literature and databases
Lars Juhl Jensen
 
PPT
Biomedical text mining
Lars Juhl Jensen
 
PPT
Integration of biomedical literature and databases
Lars Juhl Jensen
 
PPTX
Graph Analytics in Pharmacology over the Web of Life Sciences Linked Open Data
Maulik Kamdar
 
PPTX
Mason abrf single_cell_2017
Christopher Mason
 
PPT
Open access - making the most of biomedical literature mining
Lars Juhl Jensen
 
PDF
CRISPR Gene Editing Congress, 25-27 February 2015 in Boston, MA
Diane McKenna
 
PPTX
171017 giab for giab grc workshop
Genome Reference Consortium
 
PPTX
An Introduction to Crispr Genome Editing
Chris Thorne
 
PPTX
Transitioning to gr_ch38
Genome Reference Consortium
 
PDF
Hippocampal transcriptomic responses to technical and biological perturbations
Rayna Harris
 
PPTX
CRISPR Screening: the What, Why and How
HorizonDiscovery
 
PDF
Bda2015 tutorial-part2-data&databases
InterpretOmics
 
PPTX
Lrg and mane 16 oct 2018
Genome Reference Consortium
 
PPTX
ClinVar: Getting the most from the reference assembly and reference materials
Genome Reference Consortium
 
DOCX
Research project
Dingquan Yu
 
Applied text mining
Lars Juhl Jensen
 
Text mining
Lars Juhl Jensen
 
Applied text mining
Lars Juhl Jensen
 
Text mining
Lars Juhl Jensen
 
Integration of biomedical literature and databases
Lars Juhl Jensen
 
Biomedical text mining
Lars Juhl Jensen
 
Integration of biomedical literature and databases
Lars Juhl Jensen
 
Graph Analytics in Pharmacology over the Web of Life Sciences Linked Open Data
Maulik Kamdar
 
Mason abrf single_cell_2017
Christopher Mason
 
Open access - making the most of biomedical literature mining
Lars Juhl Jensen
 
CRISPR Gene Editing Congress, 25-27 February 2015 in Boston, MA
Diane McKenna
 
171017 giab for giab grc workshop
Genome Reference Consortium
 
An Introduction to Crispr Genome Editing
Chris Thorne
 
Transitioning to gr_ch38
Genome Reference Consortium
 
Hippocampal transcriptomic responses to technical and biological perturbations
Rayna Harris
 
CRISPR Screening: the What, Why and How
HorizonDiscovery
 
Bda2015 tutorial-part2-data&databases
InterpretOmics
 
Lrg and mane 16 oct 2018
Genome Reference Consortium
 
ClinVar: Getting the most from the reference assembly and reference materials
Genome Reference Consortium
 
Research project
Dingquan Yu
 
Ad

Similar to Literature mining and large-scale data integration (20)

PPT
Computational approaches to cell cycle analysis: Current research topics (tho...
Lars Juhl Jensen
 
PPT
Text mining and data integration
Lars Juhl Jensen
 
PPT
Open access - making the most of biomedical literature mining
Lars Juhl Jensen
 
PPT
Text mining
Lars Juhl Jensen
 
PPT
Text mining for protein and small molecule relations
Lars Juhl Jensen
 
PPTX
Cell cycle and molecular basis of cancer.
Dr Durga Gahlot
 
PDF
ELGP 21_Molecular basis of cancer.pdf
DanySamuel4
 
PPTX
Cell cycle regulation
HimakaraDattaMandala1
 
PPTX
1. CELL DIVISION.pptx
spitzmark2030
 
PPTX
Regulation of cell cycle
mohit kumar
 
PDF
Cell cycle, regulation & cancer - PATHOLOGY.pdf
jenishJebadurai1
 
PDF
Introduction to the Cell Cycle (Tutorial)
Christiane Riedinger
 
PPT
Biomedical literature mining
Lars Juhl Jensen
 
PPT
University of Texas at Austin
butest
 
PPT
University of Texas at Austin
butest
 
PPTX
mol basis cancer.pptx abou the carcinoma signs symptoms and pathology
DrPankajTripathi2
 
PPTX
mol basis cancer.pptx, carcinoma molecular basis
DrPankajTripathi2
 
PPTX
4. molecular basis of cancer dr. sinhasan, mdzah
kciapm
 
PPTX
Regulation of cell cycle (1)
Swati Singh
 
PPTX
Cdk
Suba Venkat
 
Computational approaches to cell cycle analysis: Current research topics (tho...
Lars Juhl Jensen
 
Text mining and data integration
Lars Juhl Jensen
 
Open access - making the most of biomedical literature mining
Lars Juhl Jensen
 
Text mining
Lars Juhl Jensen
 
Text mining for protein and small molecule relations
Lars Juhl Jensen
 
Cell cycle and molecular basis of cancer.
Dr Durga Gahlot
 
ELGP 21_Molecular basis of cancer.pdf
DanySamuel4
 
Cell cycle regulation
HimakaraDattaMandala1
 
1. CELL DIVISION.pptx
spitzmark2030
 
Regulation of cell cycle
mohit kumar
 
Cell cycle, regulation & cancer - PATHOLOGY.pdf
jenishJebadurai1
 
Introduction to the Cell Cycle (Tutorial)
Christiane Riedinger
 
Biomedical literature mining
Lars Juhl Jensen
 
University of Texas at Austin
butest
 
University of Texas at Austin
butest
 
mol basis cancer.pptx abou the carcinoma signs symptoms and pathology
DrPankajTripathi2
 
mol basis cancer.pptx, carcinoma molecular basis
DrPankajTripathi2
 
4. molecular basis of cancer dr. sinhasan, mdzah
kciapm
 
Regulation of cell cycle (1)
Swati Singh
 
Ad

More from Lars Juhl Jensen (20)

PPT
One tagger, many uses: Illustrating the power of dictionary-based named entit...
Lars Juhl Jensen
 
PPT
One tagger, many uses: Simple text-mining strategies for biomedicine
Lars Juhl Jensen
 
PPT
Extract 2.0: Text-mining-assisted interactive annotation
Lars Juhl Jensen
 
PPT
Network visualization: A crash course on using Cytoscape
Lars Juhl Jensen
 
PPT
STRING & STITCH : Network integration of heterogeneous data
Lars Juhl Jensen
 
PPT
Biomedical text mining: Automatic processing of unstructured text
Lars Juhl Jensen
 
PPT
Medical network analysis: Linking diseases and genes through data and text mi...
Lars Juhl Jensen
 
PPT
Network Biology: A crash course on STRING and Cytoscape
Lars Juhl Jensen
 
PPT
Cellular networks
Lars Juhl Jensen
 
PPT
Cellular Network Biology: Large-scale integration of data and text
Lars Juhl Jensen
 
PPT
Statistics on big biomedical data: Methods and pitfalls when analyzing high-t...
Lars Juhl Jensen
 
PPT
STRING & related databases: Large-scale integration of heterogeneous data
Lars Juhl Jensen
 
PPT
Tagger: Rapid dictionary-based named entity recognition
Lars Juhl Jensen
 
PPT
Network Biology: Large-scale integration of data and text
Lars Juhl Jensen
 
PPT
Medical text mining: Linking diseases, drugs, and adverse reactions
Lars Juhl Jensen
 
PPT
Network biology: Large-scale integration of data and text
Lars Juhl Jensen
 
PPT
Medical data and text mining: Linking diseases, drugs, and adverse reactions
Lars Juhl Jensen
 
PPT
Cellular Network Biology
Lars Juhl Jensen
 
PPT
Network biology: Large-scale integration of data and text
Lars Juhl Jensen
 
PPT
Biomarker bioinformatics: Network-based candidate prioritization
Lars Juhl Jensen
 
One tagger, many uses: Illustrating the power of dictionary-based named entit...
Lars Juhl Jensen
 
One tagger, many uses: Simple text-mining strategies for biomedicine
Lars Juhl Jensen
 
Extract 2.0: Text-mining-assisted interactive annotation
Lars Juhl Jensen
 
Network visualization: A crash course on using Cytoscape
Lars Juhl Jensen
 
STRING & STITCH : Network integration of heterogeneous data
Lars Juhl Jensen
 
Biomedical text mining: Automatic processing of unstructured text
Lars Juhl Jensen
 
Medical network analysis: Linking diseases and genes through data and text mi...
Lars Juhl Jensen
 
Network Biology: A crash course on STRING and Cytoscape
Lars Juhl Jensen
 
Cellular networks
Lars Juhl Jensen
 
Cellular Network Biology: Large-scale integration of data and text
Lars Juhl Jensen
 
Statistics on big biomedical data: Methods and pitfalls when analyzing high-t...
Lars Juhl Jensen
 
STRING & related databases: Large-scale integration of heterogeneous data
Lars Juhl Jensen
 
Tagger: Rapid dictionary-based named entity recognition
Lars Juhl Jensen
 
Network Biology: Large-scale integration of data and text
Lars Juhl Jensen
 
Medical text mining: Linking diseases, drugs, and adverse reactions
Lars Juhl Jensen
 
Network biology: Large-scale integration of data and text
Lars Juhl Jensen
 
Medical data and text mining: Linking diseases, drugs, and adverse reactions
Lars Juhl Jensen
 
Cellular Network Biology
Lars Juhl Jensen
 
Network biology: Large-scale integration of data and text
Lars Juhl Jensen
 
Biomarker bioinformatics: Network-based candidate prioritization
Lars Juhl Jensen
 

Recently uploaded (20)

PDF
Security features in Dell, HP, and Lenovo PC systems: A research-based compar...
Principled Technologies
 
PDF
NewMind AI Weekly Chronicles - July'25 - Week IV
NewMind AI
 
PPTX
AI and Robotics for Human Well-being.pptx
JAYMIN SUTHAR
 
PPTX
The-Ethical-Hackers-Imperative-Safeguarding-the-Digital-Frontier.pptx
sujalchauhan1305
 
PDF
Unlocking the Future- AI Agents Meet Oracle Database 23ai - AIOUG Yatra 2025.pdf
Sandesh Rao
 
PDF
Orbitly Pitch Deck|A Mission-Driven Platform for Side Project Collaboration (...
zz41354899
 
PDF
Trying to figure out MCP by actually building an app from scratch with open s...
Julien SIMON
 
PDF
Google I/O Extended 2025 Baku - all ppts
HusseinMalikMammadli
 
PDF
The Future of Mobile Is Context-Aware—Are You Ready?
iProgrammer Solutions Private Limited
 
PDF
How-Cloud-Computing-Impacts-Businesses-in-2025-and-Beyond.pdf
Artjoker Software Development Company
 
PDF
Data_Analytics_vs_Data_Science_vs_BI_by_CA_Suvidha_Chaplot.pdf
CA Suvidha Chaplot
 
PDF
Oracle AI Vector Search- Getting Started and what's new in 2025- AIOUG Yatra ...
Sandesh Rao
 
PDF
Structs to JSON: How Go Powers REST APIs
Emily Achieng
 
PPTX
cloud computing vai.pptx for the project
vaibhavdobariyal79
 
PDF
OFFOFFBOX™ – A New Era for African Film | Startup Presentation
ambaicciwalkerbrian
 
PPTX
Dev Dives: Automate, test, and deploy in one place—with Unified Developer Exp...
AndreeaTom
 
PDF
Advances in Ultra High Voltage (UHV) Transmission and Distribution Systems.pdf
Nabajyoti Banik
 
PDF
Accelerating Oracle Database 23ai Troubleshooting with Oracle AHF Fleet Insig...
Sandesh Rao
 
PDF
CIFDAQ's Market Wrap : Bears Back in Control?
CIFDAQ
 
PDF
The Evolution of KM Roles (Presented at Knowledge Summit Dublin 2025)
Enterprise Knowledge
 
Security features in Dell, HP, and Lenovo PC systems: A research-based compar...
Principled Technologies
 
NewMind AI Weekly Chronicles - July'25 - Week IV
NewMind AI
 
AI and Robotics for Human Well-being.pptx
JAYMIN SUTHAR
 
The-Ethical-Hackers-Imperative-Safeguarding-the-Digital-Frontier.pptx
sujalchauhan1305
 
Unlocking the Future- AI Agents Meet Oracle Database 23ai - AIOUG Yatra 2025.pdf
Sandesh Rao
 
Orbitly Pitch Deck|A Mission-Driven Platform for Side Project Collaboration (...
zz41354899
 
Trying to figure out MCP by actually building an app from scratch with open s...
Julien SIMON
 
Google I/O Extended 2025 Baku - all ppts
HusseinMalikMammadli
 
The Future of Mobile Is Context-Aware—Are You Ready?
iProgrammer Solutions Private Limited
 
How-Cloud-Computing-Impacts-Businesses-in-2025-and-Beyond.pdf
Artjoker Software Development Company
 
Data_Analytics_vs_Data_Science_vs_BI_by_CA_Suvidha_Chaplot.pdf
CA Suvidha Chaplot
 
Oracle AI Vector Search- Getting Started and what's new in 2025- AIOUG Yatra ...
Sandesh Rao
 
Structs to JSON: How Go Powers REST APIs
Emily Achieng
 
cloud computing vai.pptx for the project
vaibhavdobariyal79
 
OFFOFFBOX™ – A New Era for African Film | Startup Presentation
ambaicciwalkerbrian
 
Dev Dives: Automate, test, and deploy in one place—with Unified Developer Exp...
AndreeaTom
 
Advances in Ultra High Voltage (UHV) Transmission and Distribution Systems.pdf
Nabajyoti Banik
 
Accelerating Oracle Database 23ai Troubleshooting with Oracle AHF Fleet Insig...
Sandesh Rao
 
CIFDAQ's Market Wrap : Bears Back in Control?
CIFDAQ
 
The Evolution of KM Roles (Presented at Knowledge Summit Dublin 2025)
Enterprise Knowledge
 

Literature mining and large-scale data integration