Uploaded byLars Juhl Jensen

PPT, PDF199 views

One tagger, many uses: Simple text-mining strategies for biomedicine

The document summarizes a text mining tool called a tagger that can be used for named entity recognition in biomedical texts. It recognizes genes, proteins, chemicals, diseases, and other entities. The tagger is open source, runs quickly at over 1000 abstracts per second, and has 70-80% recall and 80-90% precision. It comes with Python and Docker implementations and can be accessed via a web service. It is useful for tasks like extracting functional associations from literature and electronic health records.

Lars Juhl Jensen
@larsjuhljensen
One tagger, many uses
Simple text-mining strategies for
biomedicine

>10 km

too much to read

computer

as smart as a dog

teach it specific tricks

named entity recognition

dictionary

genes / proteins

chemical compounds

diseases

organisms

environments

not comprehensive

expansion rules

prefixes and suffixes

curated blacklist

SDS

software

C++ tagger

>1000 abstracts / second

inherently thread-safe

70–80% recall

80–90% precision

open source
bitbucket.org/larsjuhljensen/tagger/

Python module

Docker
hub.docker.com/r/larsjuhljensen/tagger/

web service
tagger.jensenlab.org

Extract
extract.jensenlab.org

community resources

STRING

string-db.org

functional associations

DISEASES

disease–gene associations

Cytoscape

curated knowledge

experimental data

computational predictions

co-occurrence text mining

Medline abstracts

only abstracts

<1 km

access restrictions

are abstracts sufficient?

15 million full-text articles

Westergaard et al., BioRxiv, 2017

~50% more associations

electronic health records

Jensen et al., Nature Reviews Genetics, 2012

in Danish

dictionary

drugs

adverse events

in Danish

named entity recognition

temporal correlations

Drug introduction Drug discontinuation
Adverse eventNegative modifier Indication Pre-existing
condition
Adverse drug reaction Possible
adverse drug reaction
Adverse event
ADR of
additional drug
Identification start
Eriksson et al., Drug Safety, 2014

find novel associations

summary

broadly applicable

keep it simple

free tools

Acknowledgments
Evangelos Pafilis
Sune Pletscher-
Frankild
Nadezhda
Doncheva
Damian Szklarczyk
Michael Kuhn
Robert Eriksson
Peter Bjødstrup
Jensen
John “Scooter”
Morris
Christian von
Mering
Peer Bork
Christos Arvanitidis
Søren Brunak

One tagger, many uses: Simple text-mining strategies for biomedicine

Recommended

PPT

One tagger, many uses: Illustrating the power of dictionary-based named entit...

byLars Juhl Jensen

PPT

Extract 2.0: Text-mining-assisted interactive annotation

byLars Juhl Jensen

PPT

Dr Justin Schonfeld - Bioinformatics Applications

byConsortium for the Barcode of Life (CBOL)

PPT

Biomedical text mining: Automatic processing of unstructured text

byLars Juhl Jensen

PPT

Real-time tagging of biomedical entities

byLars Juhl Jensen

PPT

Tagger: Rapid dictionary-based named entity recognition

byLars Juhl Jensen

PPT

Networks of proteins and diseases

byLars Juhl Jensen

PDF

CV Biplabendu Das

byBiplabendu Das

PPTX

TRY - a global database of plant traits

PPTX

Viral genome sequencing

byCreative Enzymes

PPTX

TAIR Presentation ASPB 2016

byLeonore Reiser

PPTX

High-throughput sequencing and latent variable modelling of within-host paras...

byTuomas Aivelo

PPT

Global Ranavirus Consortium

PDF

AndreaOrmosMS_Resume06042015

PPTX

Welch Wordifier Bosc2009

DOCX

Kyle Pollard Resume

PDF

NLP_BioAssayPoster

PPTX

EuKRef. A community effort towards phylogenetic-based curation of ribosomal d...

PDF

TaylorSmith_CV

PPTX

Ensc 5530 jan2017 ci my draft

PDF

Hybrid Wind Generator PS Coordination and Control

byShanmuga Priyan Thiagarajan

DOCX

Resume 2015

byDanielle Wright

PPT

Dr Julie Stahlhut - Barcode Data Life-cycle

byConsortium for the Barcode of Life (CBOL)

PPTX

Session 6: Optimizing surveillance protocols using unmanned aerial systems

byPlant Biosecurity Cooperative Research Centre

PDF

Seminar: Haider et al. 2014, Bioinformatics:btu395

byRosemary McCloskey

PPTX

Session 10: Invasive fungus threatens Australian native communities

byPlant Biosecurity Cooperative Research Centre

DOCX

Brad Thomas Resume 2016

PPT

Systems Biology Modeling of the Brain in Health & Disease

bySherry-Ann Brown

PPT

Mining biomedical texts

byLars Juhl Jensen

PPT

Text mining for organism and environment names

byLars Juhl Jensen

More Related Content

PPT

One tagger, many uses: Illustrating the power of dictionary-based named entit...

byLars Juhl Jensen

PPT

Extract 2.0: Text-mining-assisted interactive annotation

byLars Juhl Jensen

PPT

Dr Justin Schonfeld - Bioinformatics Applications

byConsortium for the Barcode of Life (CBOL)

PPT

Biomedical text mining: Automatic processing of unstructured text

byLars Juhl Jensen

PPT

Real-time tagging of biomedical entities

byLars Juhl Jensen

PPT

Tagger: Rapid dictionary-based named entity recognition

byLars Juhl Jensen

PPT

Networks of proteins and diseases

byLars Juhl Jensen

PDF

CV Biplabendu Das

byBiplabendu Das

One tagger, many uses: Illustrating the power of dictionary-based named entit...

byLars Juhl Jensen

Extract 2.0: Text-mining-assisted interactive annotation

byLars Juhl Jensen

Dr Justin Schonfeld - Bioinformatics Applications

byConsortium for the Barcode of Life (CBOL)

Biomedical text mining: Automatic processing of unstructured text

byLars Juhl Jensen

Real-time tagging of biomedical entities

byLars Juhl Jensen

Tagger: Rapid dictionary-based named entity recognition

byLars Juhl Jensen

Networks of proteins and diseases

byLars Juhl Jensen

CV Biplabendu Das

byBiplabendu Das

What's hot

PPTX

TRY - a global database of plant traits

PPTX

Viral genome sequencing

byCreative Enzymes

PPTX

TAIR Presentation ASPB 2016

byLeonore Reiser

PPTX

High-throughput sequencing and latent variable modelling of within-host paras...

byTuomas Aivelo

PPT

Global Ranavirus Consortium

PDF

AndreaOrmosMS_Resume06042015

PPTX

Welch Wordifier Bosc2009

DOCX

Kyle Pollard Resume

PDF

NLP_BioAssayPoster

PPTX

EuKRef. A community effort towards phylogenetic-based curation of ribosomal d...

PDF

TaylorSmith_CV

PPTX

Ensc 5530 jan2017 ci my draft

PDF

Hybrid Wind Generator PS Coordination and Control

byShanmuga Priyan Thiagarajan

DOCX

Resume 2015

byDanielle Wright

PPT

Dr Julie Stahlhut - Barcode Data Life-cycle

byConsortium for the Barcode of Life (CBOL)

PPTX

Session 6: Optimizing surveillance protocols using unmanned aerial systems

byPlant Biosecurity Cooperative Research Centre

PDF

Seminar: Haider et al. 2014, Bioinformatics:btu395

byRosemary McCloskey

PPTX

Session 10: Invasive fungus threatens Australian native communities

byPlant Biosecurity Cooperative Research Centre

DOCX

Brad Thomas Resume 2016

PPT

Systems Biology Modeling of the Brain in Health & Disease

bySherry-Ann Brown

TRY - a global database of plant traits

Viral genome sequencing

byCreative Enzymes

TAIR Presentation ASPB 2016

byLeonore Reiser

High-throughput sequencing and latent variable modelling of within-host paras...

byTuomas Aivelo

Global Ranavirus Consortium

AndreaOrmosMS_Resume06042015

Welch Wordifier Bosc2009

Kyle Pollard Resume

NLP_BioAssayPoster

EuKRef. A community effort towards phylogenetic-based curation of ribosomal d...

TaylorSmith_CV

Ensc 5530 jan2017 ci my draft

Hybrid Wind Generator PS Coordination and Control

byShanmuga Priyan Thiagarajan

Resume 2015

byDanielle Wright

Dr Julie Stahlhut - Barcode Data Life-cycle

byConsortium for the Barcode of Life (CBOL)

Session 6: Optimizing surveillance protocols using unmanned aerial systems

byPlant Biosecurity Cooperative Research Centre

Seminar: Haider et al. 2014, Bioinformatics:btu395

byRosemary McCloskey

Session 10: Invasive fungus threatens Australian native communities

byPlant Biosecurity Cooperative Research Centre

Brad Thomas Resume 2016

Systems Biology Modeling of the Brain in Health & Disease

bySherry-Ann Brown

Similar to One tagger, many uses: Simple text-mining strategies for biomedicine

PPT

Mining biomedical texts

byLars Juhl Jensen

PPT

Text mining for organism and environment names

byLars Juhl Jensen

PPT

The pragmatic text miner: It’s just another type of poorly standardized data

byLars Juhl Jensen

PPT

The pragmatic text miner - It's just another type of poorly standardized data

byLars Juhl Jensen

PPT

One tagger, many uses - Illustrating the power of ontologies in named entity ...

byLars Juhl Jensen

PPT

Biomedical text mining and network analysis

byLars Juhl Jensen

PPT

The pragmatic text miner: From literature to electronic health records

byLars Juhl Jensen

PPT

The pragmatic text miner: It’s just another type of poorly standardized data

byLars Juhl Jensen

PPT

Pragmatic text mining: From literature to electronic health records

byLars Juhl Jensen

PPT

Pragmatic text mining: From literature to electronic health records

byLars Juhl Jensen

PPT

Medical data and text mining: Linking diseases, drugs, and adverse reactions

byLars Juhl Jensen

PPT

Text-mining practical

byLars Juhl Jensen

PPT

Medical data and text mining - Linking diseases, drugs, and adverse reactions

byLars Juhl Jensen

PPT

Medical data and text mining - Linking diseases, drugs, and adverse reactions

byLars Juhl Jensen

PPT

Medical data and text mining: Linking diseases, drugs, and adverse reactions

byLars Juhl Jensen

Mining biomedical texts

byLars Juhl Jensen

Text mining for organism and environment names

byLars Juhl Jensen

The pragmatic text miner: It’s just another type of poorly standardized data

byLars Juhl Jensen

The pragmatic text miner - It's just another type of poorly standardized data

byLars Juhl Jensen

One tagger, many uses - Illustrating the power of ontologies in named entity ...

byLars Juhl Jensen

Biomedical text mining and network analysis

byLars Juhl Jensen

The pragmatic text miner: From literature to electronic health records

byLars Juhl Jensen

The pragmatic text miner: It’s just another type of poorly standardized data

byLars Juhl Jensen

Pragmatic text mining: From literature to electronic health records

byLars Juhl Jensen

Pragmatic text mining: From literature to electronic health records

byLars Juhl Jensen

Medical data and text mining: Linking diseases, drugs, and adverse reactions

byLars Juhl Jensen

Text-mining practical

byLars Juhl Jensen

Medical data and text mining - Linking diseases, drugs, and adverse reactions

byLars Juhl Jensen

Medical data and text mining - Linking diseases, drugs, and adverse reactions

byLars Juhl Jensen

Medical data and text mining: Linking diseases, drugs, and adverse reactions

byLars Juhl Jensen

More from Lars Juhl Jensen

PPT

Cellular Network Biology

byLars Juhl Jensen

PPT

Network Biology: A crash course on STRING and Cytoscape

byLars Juhl Jensen

PPT

Network visualization: A crash course on using Cytoscape

byLars Juhl Jensen

PPT

STRING & STITCH: Network integration of heterogeneous data

byLars Juhl Jensen

PPT

Biomarker bioinformatics: Network-based candidate prioritization

byLars Juhl Jensen

PPT

Medical network analysis: Linking diseases and genes through data and text mi...

byLars Juhl Jensen

PPT

Network biology: Large-scale integration of data and text

byLars Juhl Jensen

PPT

Network biology: Large-scale integration of data and text

byLars Juhl Jensen

PPT

STRING & related databases: Large-scale integration of heterogeneous data

byLars Juhl Jensen

PPT

Network Biology: Large-scale integration of data and text

byLars Juhl Jensen

PPT

Cellular Network Biology: Large-scale integration of data and text

byLars Juhl Jensen

PPT

Gene association networks: Large-scale integration of data and text

byLars Juhl Jensen

PPT

Protein association networks: Large-scale integration of data and text

byLars Juhl Jensen

PPT

Statistics on big biomedical data: Methods and pitfalls when analyzing high-t...

byLars Juhl Jensen

PPT

Cellular networks

byLars Juhl Jensen

PPT

Medical text mining: Linking diseases, drugs, and adverse reactions

byLars Juhl Jensen

PPT

The Art of Counting: Scoring and ranking co-occurrences in literature

byLars Juhl Jensen

PPT

Medical data and text mining: Linking diseases, drugs, and adverse reactions

byLars Juhl Jensen

PPT

Medical data and text mining: Linking diseases, drugs, and adverse reactions

byLars Juhl Jensen

PPT

Text-mining-based retrieval of protein networks

byLars Juhl Jensen

Cellular Network Biology

byLars Juhl Jensen

Network Biology: A crash course on STRING and Cytoscape

byLars Juhl Jensen

Network visualization: A crash course on using Cytoscape

byLars Juhl Jensen

STRING & STITCH: Network integration of heterogeneous data

byLars Juhl Jensen

Biomarker bioinformatics: Network-based candidate prioritization

byLars Juhl Jensen

Medical network analysis: Linking diseases and genes through data and text mi...

byLars Juhl Jensen

Network biology: Large-scale integration of data and text

byLars Juhl Jensen

Network biology: Large-scale integration of data and text

byLars Juhl Jensen

STRING & related databases: Large-scale integration of heterogeneous data

byLars Juhl Jensen

Network Biology: Large-scale integration of data and text

byLars Juhl Jensen

Cellular Network Biology: Large-scale integration of data and text

byLars Juhl Jensen

Gene association networks: Large-scale integration of data and text

byLars Juhl Jensen

Protein association networks: Large-scale integration of data and text

byLars Juhl Jensen

Statistics on big biomedical data: Methods and pitfalls when analyzing high-t...

byLars Juhl Jensen

Cellular networks

byLars Juhl Jensen

Medical text mining: Linking diseases, drugs, and adverse reactions

byLars Juhl Jensen

The Art of Counting: Scoring and ranking co-occurrences in literature

byLars Juhl Jensen

Medical data and text mining: Linking diseases, drugs, and adverse reactions

byLars Juhl Jensen

Medical data and text mining: Linking diseases, drugs, and adverse reactions

byLars Juhl Jensen

Text-mining-based retrieval of protein networks

byLars Juhl Jensen

Recently uploaded

PDF

Discriminating abiotic and biotic organics in meteorite and terrestrial sampl...

bySérgio Sacani

PDF

ICAR SRF, ASRB NET / SMS & STO Exam – Biotechnology Syllabus | Complete Unit-...

bySatyam Sharma

PDF

Sharing complex research through effective communication techniques

PDF

Mission Critical Mission Critical Science & Engineering November 2025.pdf

byUniversity of Hertfordshire

PPTX

Variability and its measures mean mode and median standard deviation and coe...

byDr Showkat Ahmad Wani

PPTX

Computer aided techniques for data presentation & data analyses.pptx

byDr Showkat Ahmad Wani

PDF

ECU 224_ImagingDiagnostic Agents2.b2.pdf

bylawrenceokwir

PPTX

Methodology, Tools and Scenarios to Help Monitor and Evaluate Pathways to the...

bypensoftservices

PDF

Lipid Metabolism-Lipolysis, Lipid synthesis, Beta Oxidation of Fatty Acid

byNistarini College, Purulia (W.B) India

PDF

Qualitative Analysis BSc 5th Sem.pdf World of Wisdom

byWorld of Wisdom

PDF

ESUD_CIESUD25AI_Leadership_Conf_Presentation_Proactive_final.pdf

byEbba Ossiannilsson

PPTX

Cell biology(Introduction, Discovery, evolution of cell, Cell theory, Types o...

PPTX

Drugs in Dermatology important only drugs

byakshikashetty

PDF

What is UN CBD Reporting? Definitions, Key UN CBD Decisions, Timeframe

bypensoftservices

PDF

BP504T UNIT 03 PART 02 Glycosides Isolation.pdf

byAnubhav Gupta

PDF

Glyoxylate Cycle- Basic idea and biochemical cycle with importane

byNistarini College, Purulia (W.B) India

PPTX

Chapter 1 Environmental Psychology History and Methods.pptx

PPTX

Lung Tumor.pptx (all about lung tumor.)

PPTX

Blood_Supply_to_Eyeball_Vaibhav.pptx eye

byrachurivaibhav

PPTX

Pharmaceutical engineering (SEM-3) UNIT 4-2

Discriminating abiotic and biotic organics in meteorite and terrestrial sampl...

bySérgio Sacani

ICAR SRF, ASRB NET / SMS & STO Exam – Biotechnology Syllabus | Complete Unit-...

bySatyam Sharma

Sharing complex research through effective communication techniques

Mission Critical Mission Critical Science & Engineering November 2025.pdf

byUniversity of Hertfordshire

Variability and its measures mean mode and median standard deviation and coe...

byDr Showkat Ahmad Wani

Computer aided techniques for data presentation & data analyses.pptx

byDr Showkat Ahmad Wani

ECU 224_ImagingDiagnostic Agents2.b2.pdf

bylawrenceokwir

Methodology, Tools and Scenarios to Help Monitor and Evaluate Pathways to the...

bypensoftservices

Lipid Metabolism-Lipolysis, Lipid synthesis, Beta Oxidation of Fatty Acid

byNistarini College, Purulia (W.B) India

Qualitative Analysis BSc 5th Sem.pdf World of Wisdom

byWorld of Wisdom

ESUD_CIESUD25AI_Leadership_Conf_Presentation_Proactive_final.pdf

byEbba Ossiannilsson

Cell biology(Introduction, Discovery, evolution of cell, Cell theory, Types o...

Drugs in Dermatology important only drugs

byakshikashetty

What is UN CBD Reporting? Definitions, Key UN CBD Decisions, Timeframe

bypensoftservices

BP504T UNIT 03 PART 02 Glycosides Isolation.pdf

byAnubhav Gupta

Glyoxylate Cycle- Basic idea and biochemical cycle with importane

byNistarini College, Purulia (W.B) India

Chapter 1 Environmental Psychology History and Methods.pptx

Lung Tumor.pptx (all about lung tumor.)

Blood_Supply_to_Eyeball_Vaibhav.pptx eye

byrachurivaibhav

Pharmaceutical engineering (SEM-3) UNIT 4-2

One tagger, many uses: Simple text-mining strategies for biomedicine

1.
Lars Juhl Jensen @larsjuhljensen Onetagger, many uses Simple text-mining strategies for biomedicine
2.
>10 km
3.
too much toread
4.
computer
5.
as smart asa dog
6.
teach it specifictricks
9.
named entity recognition
10.
dictionary
11.
genes / proteins
12.
chemical compounds
13.
diseases
14.
organisms
15.
environments
16.
not comprehensive
17.
expansion rules
18.
prefixes and suffixes
19.
curated blacklist
20.
SDS
21.
software
22.
C++ tagger
23.
>1000 abstracts /second
24.
inherently thread-safe
25.
70–80% recall
26.
80–90% precision
27.
open source bitbucket.org/larsjuhljensen/tagger/
28.
Python module
29.
Docker hub.docker.com/r/larsjuhljensen/tagger/
30.
web service tagger.jensenlab.org
31.
Extract extract.jensenlab.org
33.
community resources
34.
STRING
35.
string-db.org
36.
functional associations
37.
DISEASES
38.
disease–gene associations
39.
Cytoscape
41.
curated knowledge
42.
experimental data
43.
computational predictions
44.
co-occurrence text mining
45.
Medline abstracts
46.
only abstracts
47.
<1 km
48.
access restrictions
49.
are abstracts sufficient?
50.
15 million full-textarticles
51.
Westergaard et al.,BioRxiv, 2017
52.
~50% more associations
53.
electronic health records
54.
Jensen et al.,Nature Reviews Genetics, 2012
56.
in Danish
57.
dictionary
58.
drugs
59.
adverse events
60.
in Danish
61.
named entity recognition
62.
temporal correlations
63.
Drug introduction Drugdiscontinuation Adverse eventNegative modifier Indication Pre-existing condition Adverse drug reaction Possible adverse drug reaction Adverse event ADR of additional drug Identification start Eriksson et al., Drug Safety, 2014
64.
find novel associations
65.
summary
66.
broadly applicable
67.
keep it simple
68.
free tools
69.
Acknowledgments Evangelos Pafilis Sune Pletscher- Frankild Nadezhda Doncheva DamianSzklarczyk Michael Kuhn Robert Eriksson Peter Bjødstrup Jensen John “Scooter” Morris Christian von Mering Peer Bork Christos Arvanitidis Søren Brunak