Lars Juhl Jensen
@larsjuhljensen
One tagger, many uses
Simple text-mining strategies for
biomedicine
>10 km
too much to read
computer
as smart as a dog
teach it specific tricks
named entity recognition
dictionary
genes / proteins
chemical compounds
diseases
organisms
environments
not comprehensive
expansion rules
prefixes and suffixes
curated blacklist
SDS
software
C++ tagger
>1000 abstracts / second
inherently thread-safe
70–80% recall
80–90% precision
open source
bitbucket.org/larsjuhljensen/tagger/
Python module
Docker
hub.docker.com/r/larsjuhljensen/tagger/
web service
tagger.jensenlab.org
Extract
extract.jensenlab.org
community resources
STRING
string-db.org
functional associations
DISEASES
disease–gene associations
Cytoscape
curated knowledge
experimental data
computational predictions
co-occurrence text mining
Medline abstracts
only abstracts
<1 km
access restrictions
are abstracts sufficient?
15 million full-text articles
Westergaard et al., BioRxiv, 2017
~50% more associations
electronic health records
Jensen et al., Nature Reviews Genetics, 2012
in Danish
dictionary
drugs
adverse events
in Danish
named entity recognition
temporal correlations
Drug introduction Drug discontinuation
Adverse eventNegative modifier Indication Pre-existing
condition
Adverse drug reaction Possible
adverse drug reaction
Adverse event
ADR of
additional drug
Identification start
Eriksson et al., Drug Safety, 2014
find novel associations
summary
broadly applicable
keep it simple
free tools
Acknowledgments
Evangelos Pafilis
Sune Pletscher-
Frankild
Nadezhda
Doncheva
Damian Szklarczyk
Michael Kuhn
Robert Eriksson
Peter Bjødstrup
Jensen
John “Scooter”
Morris
Christian von
Mering
Peer Bork
Christos Arvanitidis
Søren Brunak
One tagger, many uses: Simple text-mining strategies for biomedicine

One tagger, many uses: Simple text-mining strategies for biomedicine