SlideShare a Scribd company logo
MACHINES ARE PEOPLE TOO
Dr. Paul Groth | @pgroth | pgroth.com
Disruptive Technology Director
Elsevier Labs | @elsevierlabs
Theory and Practice of Digital Libraries 2017
THANKS FOR CONVERSATION & SLIDES!
Riffing off of Brad’s Dublin Core
2016 keynote
https://www.slideshare.net/bpa777/
dc2016-keynote-20161013-
67164305
THE SUCCESS OF DIGITAL LIBRARIES
“Live every day like it's NBER day”
THE SUCCESS OF DIGITAL LIBRARIES
THE SUCCESS OF DIGITAL LIBRARIES
THE SUCCESS OF DIGITAL LIBRARIES
THE SUCCESS OF DIGITAL LIBRARIES
THE NEXT MEDIA: DATA
Machines are people too
FAIR EVERYWHERE
Machines are people too
RESEARCH DATA MANAGEMENT
DATA SEARCH
Antony Scerri, John Kuriakose, Amit Ajit Deshmane, Mark Stanger, Peter Cotroneo, Rebekah Moore, Raj Naik, Anita de Waard;
Elsevier’s approach to the bioCADDIE 2016 Dataset Retrieval Challenge, Database, Volume 2017, 1 January 2017,
bax056, https://doi.org/10.1093/database/bax056
THE CENTRALITY OF THE USER
HOW DO RESEARCHERS SEARCH FOR DATA?
Gregory, K., Groth, P., Cousijn, H., Scharnhorst, A.,
& Wyatt, S. (2017). Searching Data: A Review of
Observational Data Retrieval Practices. arXiv
preprint arXiv:1707.06937.
Some observations from @gregory_km
survey:
1. The needs and behaviours of specific user groups
(e.g. early career researchers, policy makers,
students) are not well documented.
2. Background uses of observational data are better
documented than foreground uses.
3. Reconstructing data tables from journal articles,
using general search engines, and making direct data
requests are common.
BUT ARE WE MISSING A USER?
WHY MACHINES?
ELSEVIER’S BUSINESS: PROVIDING ANSWERS FOR
RESEARCHERS, DOCTORS AND NURSES
My work is moving towards a new field; what should I know?
• Journal articles, reference works, profiles of researchers, funders &
institutions
• Recommendations of people to connect with, reading lists, topic pages
How should I treat my patient given her condition & history?
• Journal articles, reference works, medical guidelines, electronic health
records
• Treatment plan with alternatives personalized for the patient
How can I master the subject matter of the course I am taking?
• Course syllabus, reference works, course objectives, student history
• Quiz plan based on the student’s history and course objectives
INFORMATION OVERLOAD
WHAT CAN MACHINE INTELLIGENCE DO TODAY?
If there’s a task that a normal person can do with
less than one second of thinking, there’s a very
good chance we can automate it with deep
learning.
Andrew Ng, Chief Scientist, Baidu (lecture at Bay Area Deep Learning
School, Stanford, CA, September 24, 2016)
HUMAN SPEECH RECOGNITION
Was 23% in 2013, and over 35% in 2012.
https://venturebeat.com/2017/05/17/googles-speech-recognition-technology-now-has-a-4-9-word-error-rate/
IMAGE RECOGNITION
https://devblogs.nvidia.com/parallelforall/author/czhang/
THESE RESULTS ARE DRIVEN BY DATA
“The paradigm shift of the ImageNet
thinking is that while a lot of people
are paying attention to models, let’s
pay attention to data, …”
– Prof. Fei-Fei Li [1]
[1] The data that transformed AI research—and possibly the world
https://qz.com/1034972/the-data-that-changed-the-direction-of-ai-research-and-
possibly-the-world/
THE GROWTH IN DATA ENGINEERS
https://www.stitchdata.com/resources/reports/the-state-of-data-engineering
BUT DO DIGITAL LIBRARIES HELP MACHINES?
• Machines’ proficiency in learning to answer questions from text, audio,
images and video will depend on our ability to train them effectively to read
information from the Web
• How machines read the Web today
• Crawling and indexing Web resources, possibly semantically tagged
(e.g. using schema.org)
• Find-and-follow crawling of open linked data resources for ontology and
data sharing and reuse
• Programmatic access to APIs mediated through HTTP/S and other
Internet protocols
DIGITAL LIBRARIES & LINKED DATA STANDARDS
THE SEMANTIC WEB WAS INTENDED FOR MACHINE READING
… that’s the real idea behind the Semantic Web:
letting software use the vast collective genius
embedded in its published pages.
Swartz, A. (2013). Aaron Swartz's A programmable Web: An unfinished
work. San Rafael, Calif.: Morgan & Claypool Publishers.
BUT THE SEMANTIC WEB IS BUILT FOR PEOPLE, NOT MACHINES
• The Semantic Web is largely a logicist take on the way knowledge is to be
represented
• The latest advances in machine intelligence are based on a connectionist
approach to knowledge representation
• There is a gap between how knowledge is represented in the Semantic Web
and what deep learning is exploiting to such good effect
• The Semantic Web is silent about how machines can become better
readers, and hence better partners in the second machine age
• How will we evolve metadata standards to better accommodate machines?
MACHINE READING IS ENABLED BY MACHINE LEARNING
input
output
algorithm
input
output
model
learning
architecture
data
Programming
Machine learning
GPU
CPU
CPU
MACHINES SEE THINGS DIFFERENTLY THAN PEOPLE
From: Alain, G. and Bengio, Y. (2016). Understanding intermediate layers using linear classifier probes. arXiv:1610.01644v1.
MACHINES LEARN THINGS DIFFERENTLY THAN PEOPLE
VOCABULARIES ARE SETS OF VECTOR EMBEDDINGS
From: Eisner, B., Rocktäschel, T., Augenstein, I., Bošnjak, M. and Riedel, S. (2016). Emoji2vec: learning emoji representations from their description. arXiv:1609.08359v1.
TRAINING DATASETS ARE GROWING IN VOLUME AND COVERAGE
From: Abu-El-Haija, S., Kothari, N., Lee, J., Natsev, P., Toderici, G., Varadarajan, B. and Vijayanarasimhan, S. YouTube-8M: a large-scale video classification benchmark. arXiv:1609.08675.
MODELS ARE BECOMING REUSABLE DATA RESOURCES
Check out: sujitpal.blogspot.com for more
MACHINE LEARNING DATASETS AND MODELS ARE BECOMING
PART OF THE WEB
• Machines need lots and lots of data to learn how to read
• Datasets with ad-hoc formats are being made openly available
• Open Images “~9 million URLs to images that have been annotated with labels spanning over 6000 categories” (The Open Images Dataset.
(n.d.). Retrieved September 29, 2016, from https://github.com/openimages/dataset.)
• YouTube-8M : “8 million YouTube video URLs (representing over 500,000 hours of video), along with video-level labels from a diverse set of
4800 Knowledge Graph entities” (Vijayanarasimhan S. and Natsev, P. (2016). Announcing YouTube-8M: A Large and Diverse Labeled Video
Dataset for Video Understanding Research. Retrieved September 29, 2016, https://research.googleblog.com/2016/09/announcing-youtube-8m-
large-and-diverse.html.)
• Stanford Natural Language Inference: “570k human-written English sentence pairs manually labeled for balanced classification with the
labels entailment, contradiction, and neutral, supporting the task of natural language inference” (The Stanford Natural Language Inference
(SNLI) Corpus. (n.d.). Retrieved September 29, 2016, from http://nlp.stanford.edu/projects/snli/.)
• Standard architectures for machine (deep) learning are being released as open source
• Dense neural networks for classification
• Convolutional neural networks for image, audio and video recognition
• Recurrent neural networks for sequence processing and generation
• Advances in the field are being published quickly and transferred to industrial application just as
quickly
THE OPPORTUNITY FOR LIBRARIANS AND PUBLISHERS
As machines become increasingly capable of general-
purpose language understanding, the burden of effort in
building machine intelligences will shift from software
engineering to the acquisition, organization and curation
of training content and data.
THE ROLE OF METADATA IN THE SECOND MACHINE AGE – DC-2016 / KØBENHAVN / 13 OCTOBER
SAVE THE TIME OF THE MACHINE READER
Perhaps this law is not so self-evident as the others.
None the less, it has been responsible for many
reforms in library administration and has a great
potentiality for effecting many more reforms in the
future.
Ranganathan, S.R. (1931). The five laws of library science. Madras: The
Madras Library Association.
IMAGE SOURCE: HTTP://WESTPORTLIBRARY.ORG/ABOUT/NEWS/ROBOTS-ARRIVE-WESTPORT-LIBRARY
WHAT DOES IT LOOK LIKE TO HAVE MACHINES AS
LIBRARY PATRONS?
Tasks
1. Dataset / Model / Vocabulary Curation
2. Combating Bias
3. Explanation
4. Interoperability
5. Data  Narratives
DATASET CURATION
MODEL CURATION
VOCABULARY CURATION
BATTLING BIAS
BATTLING BIAS: ALGORITHMIC LITERACY
Algorithms all have their own ideologies. As computational
methods and data science become more and more a part of
every aspect of our lives, it is essential that work begin to ensure
there is a broader literacy about these techniques and that
there is an expansive and deep engagement in the ethical
issues surrounding them.”
– Trevor Owens (Library of Congress / Former IMLS)
http://www.pewinternet.org/2017/02/08/theme-7-the-need-grows-for-algorithmic-literacy-transparency-and-oversight/
THE RIGHT TO AN EXPLANATION
“The data subject shall have the right to obtain … the
existence of automated decision-making, including profiling
… meaningful information about the logic involved, as
well as the significance and the envisaged consequences
of such processing for the data subject.”
EU General Data Protection Chapter 3, Article 15
PROVENANCE FOR EXPLANATION
Credits: Curt Tilmes, Peter Fox
Tilmes, C.; Fox, P.; Ma, X.; McGuinness, D.L.; Privette, A.P.; Smith, A.; Waple, A.; Zednik, S.; Zheng, J.G.,
"Provenance Representation for the National Climate Assessment in the Global Change Information System,"
Geoscience and Remote Sensing, IEEE Transactions on , vol.51, no.11, pp.5160,5168, Nov. 2013
NATIONAL CLIMATE CHANGE ASSESSMENT
PROVENANCE
INTEROPERABILITY
DATA  NARRATIVE GENERATION
Towards Automating Data Narratives.
Gil, Y.; and Garijo, D. In Proceedings of the
Twenty-Second ACM International Conference
on Intelligent User Interfaces (IUI-17),
Limassol, Cyprus, 2017.
THE CHALLENGE: DIGITAL LIBRARIES FOR MACHINES
• Digital Libraries have made tremendous strides in making media available
• The investment in Linked Data and APIs has made integration and building
applications easier and can help machine reader use cases
• But a new user needs new support:
• new forms of media (models, data)
• new vocabulary representations
• new forms of transparency
• new ways to interoperate
• new mechanisms to communicate
• ….
THANK YOU
Dr. Paul Groth | @pgroth | pgroth.com
labs.elsevier.com

More Related Content

PPTX
The Roots: Linked data and the foundations of successful Agriculture Data
Paul Groth
 
PPTX
From Text to Data to the World: The Future of Knowledge Graphs
Paul Groth
 
PPTX
Sources of Change in Modern Knowledge Organization Systems
Paul Groth
 
PPTX
The need for a transparent data supply chain
Paul Groth
 
PPTX
Minimal viable-datareuse-czi
Paul Groth
 
PPTX
The Challenge of Deeper Knowledge Graphs for Science
Paul Groth
 
PPTX
Knowledge graph construction for research & medicine
Paul Groth
 
PPTX
End-to-End Learning for Answering Structured Queries Directly over Text
Paul Groth
 
The Roots: Linked data and the foundations of successful Agriculture Data
Paul Groth
 
From Text to Data to the World: The Future of Knowledge Graphs
Paul Groth
 
Sources of Change in Modern Knowledge Organization Systems
Paul Groth
 
The need for a transparent data supply chain
Paul Groth
 
Minimal viable-datareuse-czi
Paul Groth
 
The Challenge of Deeper Knowledge Graphs for Science
Paul Groth
 
Knowledge graph construction for research & medicine
Paul Groth
 
End-to-End Learning for Answering Structured Queries Directly over Text
Paul Groth
 

What's hot (20)

PPTX
Thoughts on Knowledge Graphs & Deeper Provenance
Paul Groth
 
PPTX
Combining Explicit and Latent Web Semantics for Maintaining Knowledge Graphs
Paul Groth
 
PPTX
More ways of symbol grounding for knowledge graphs?
Paul Groth
 
PPTX
Thinking About the Making of Data
Paul Groth
 
PPTX
Content + Signals: The value of the entire data estate for machine learning
Paul Groth
 
PPTX
From Data Search to Data Showcasing
Paul Groth
 
PPTX
Data Science, Data Curation, and Human-Data Interaction
University of Washington
 
PPTX
Research Data Sharing: A Basic Framework
Paul Groth
 
PPT
ISMB/ECCB 2013 Keynote Goble Results may vary: what is reproducible? why do o...
Carole Goble
 
PDF
Reproducible research: First steps.
Richard Layton
 
PPTX
Data Communities - reusable data in and outside your organization.
Paul Groth
 
PDF
Knowledge Graph Maintenance
Paul Groth
 
PPTX
Reproducibility and Scientific Research: why, what, where, when, who, how
Carole Goble
 
PPTX
RARE and FAIR Science: Reproducibility and Research Objects
Carole Goble
 
PPTX
Science Data, Responsibly
University of Washington
 
PDF
Knowledge Graph Maintenance
Paul Groth
 
PDF
Data science and privacy regulation
blogzilla
 
PPTX
Elsevier’s Healthcare Knowledge Graph
Paul Groth
 
PPTX
Data, Responsibly: The Next Decade of Data Science
University of Washington
 
PPTX
Knowledge Graph Semantics/Interoperability
James Hendler
 
Thoughts on Knowledge Graphs & Deeper Provenance
Paul Groth
 
Combining Explicit and Latent Web Semantics for Maintaining Knowledge Graphs
Paul Groth
 
More ways of symbol grounding for knowledge graphs?
Paul Groth
 
Thinking About the Making of Data
Paul Groth
 
Content + Signals: The value of the entire data estate for machine learning
Paul Groth
 
From Data Search to Data Showcasing
Paul Groth
 
Data Science, Data Curation, and Human-Data Interaction
University of Washington
 
Research Data Sharing: A Basic Framework
Paul Groth
 
ISMB/ECCB 2013 Keynote Goble Results may vary: what is reproducible? why do o...
Carole Goble
 
Reproducible research: First steps.
Richard Layton
 
Data Communities - reusable data in and outside your organization.
Paul Groth
 
Knowledge Graph Maintenance
Paul Groth
 
Reproducibility and Scientific Research: why, what, where, when, who, how
Carole Goble
 
RARE and FAIR Science: Reproducibility and Research Objects
Carole Goble
 
Science Data, Responsibly
University of Washington
 
Knowledge Graph Maintenance
Paul Groth
 
Data science and privacy regulation
blogzilla
 
Elsevier’s Healthcare Knowledge Graph
Paul Groth
 
Data, Responsibly: The Next Decade of Data Science
University of Washington
 
Knowledge Graph Semantics/Interoperability
James Hendler
 
Ad

Similar to Machines are people too (20)

PDF
DC-2016 Keynote 2016-10-13
Bradley Allen
 
PDF
Charting Our Course- Information Professionals as AI Navigators
Brian Pichman
 
PPTX
AI and Libraries - Yasser Ayyash.pptx
yasserayyash1
 
PPTX
Society5UP.pptx
HeilaPienaar
 
PDF
Deep Neural Networks for Machine Learning
Justin Beirold
 
PDF
UKSG 2024 - Demystifying AI - Evaluating future uses and limits in library co...
UKSG: connecting the knowledge community
 
PPTX
Ajit Jaokar, Data Science for IoT professor at Oxford University “Enterprise ...
Dataconomy Media
 
PPTX
UKSG 2024 -From algorithms to empowerment:teaching algorithmic literacy (AL) ...
UKSG: connecting the knowledge community
 
PDF
SCONUL Summer Conference 2018 - Nicole coleman
sconul
 
PPTX
Leading responsible AI - the role of librarians and information professionals
Nicholas Poole
 
PDF
AI - Artificial Intelligence - Implications for Libraries
Brian Pichman
 
PPTX
HILDA 2023 Keynote Bill Howe
domoritz
 
PDF
Artificial Intelligence, Machine Learning & Tools (v. 2020 ITA)
Frieda Brioschi
 
PDF
Webinar trends in machine learning ce adar july 9 2020 susan mckeever
smckeever
 
PPTX
Wheatley and Hervieux "Voice-Assistants, Artificial Intelligence, and the fut...
National Information Standards Organization (NISO)
 
PDF
Artificial Intelligence explained simplistically
NBC Bearings
 
PDF
Conforming to Destiny or Adapting to Circumstance: The State of Cataloging in...
WiLS
 
PPTX
Your brain is too small to manage your business
Christopher Bishop
 
PDF
Shared data and the future of libraries
Regan Harper
 
PPTX
Esciencetalk
dbgannon
 
DC-2016 Keynote 2016-10-13
Bradley Allen
 
Charting Our Course- Information Professionals as AI Navigators
Brian Pichman
 
AI and Libraries - Yasser Ayyash.pptx
yasserayyash1
 
Society5UP.pptx
HeilaPienaar
 
Deep Neural Networks for Machine Learning
Justin Beirold
 
UKSG 2024 - Demystifying AI - Evaluating future uses and limits in library co...
UKSG: connecting the knowledge community
 
Ajit Jaokar, Data Science for IoT professor at Oxford University “Enterprise ...
Dataconomy Media
 
UKSG 2024 -From algorithms to empowerment:teaching algorithmic literacy (AL) ...
UKSG: connecting the knowledge community
 
SCONUL Summer Conference 2018 - Nicole coleman
sconul
 
Leading responsible AI - the role of librarians and information professionals
Nicholas Poole
 
AI - Artificial Intelligence - Implications for Libraries
Brian Pichman
 
HILDA 2023 Keynote Bill Howe
domoritz
 
Artificial Intelligence, Machine Learning & Tools (v. 2020 ITA)
Frieda Brioschi
 
Webinar trends in machine learning ce adar july 9 2020 susan mckeever
smckeever
 
Wheatley and Hervieux "Voice-Assistants, Artificial Intelligence, and the fut...
National Information Standards Organization (NISO)
 
Artificial Intelligence explained simplistically
NBC Bearings
 
Conforming to Destiny or Adapting to Circumstance: The State of Cataloging in...
WiLS
 
Your brain is too small to manage your business
Christopher Bishop
 
Shared data and the future of libraries
Regan Harper
 
Esciencetalk
dbgannon
 
Ad

More from Paul Groth (11)

PDF
Co-Constructing Explanations for AI Systems using Provenance
Paul Groth
 
PDF
Evaluation Challenges in Using Generative AI for Science & Technical Content
Paul Groth
 
PDF
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
Paul Groth
 
PDF
Data Curation and Debugging for Data Centric AI
Paul Groth
 
PDF
Knowledge Graph Futures
Paul Groth
 
PPTX
Diversity and Depth: Implementing AI across many long tail domains
Paul Groth
 
PPTX
Progressive Provenance Capture Through Re-computation
Paul Groth
 
PPTX
Are we finally ready for transclusion?*
Paul Groth
 
PPTX
Structured Data & the Future of Educational Material
Paul Groth
 
PPTX
Data for Science: How Elsevier is using data science to empower researchers
Paul Groth
 
PPTX
Tradeoffs in Automatic Provenance Capture
Paul Groth
 
Co-Constructing Explanations for AI Systems using Provenance
Paul Groth
 
Evaluation Challenges in Using Generative AI for Science & Technical Content
Paul Groth
 
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
Paul Groth
 
Data Curation and Debugging for Data Centric AI
Paul Groth
 
Knowledge Graph Futures
Paul Groth
 
Diversity and Depth: Implementing AI across many long tail domains
Paul Groth
 
Progressive Provenance Capture Through Re-computation
Paul Groth
 
Are we finally ready for transclusion?*
Paul Groth
 
Structured Data & the Future of Educational Material
Paul Groth
 
Data for Science: How Elsevier is using data science to empower researchers
Paul Groth
 
Tradeoffs in Automatic Provenance Capture
Paul Groth
 

Recently uploaded (20)

PDF
How-Cloud-Computing-Impacts-Businesses-in-2025-and-Beyond.pdf
Artjoker Software Development Company
 
PPTX
Dev Dives: Automate, test, and deploy in one place—with Unified Developer Exp...
AndreeaTom
 
PDF
Brief History of Internet - Early Days of Internet
sutharharshit158
 
PDF
Google I/O Extended 2025 Baku - all ppts
HusseinMalikMammadli
 
PDF
OFFOFFBOX™ – A New Era for African Film | Startup Presentation
ambaicciwalkerbrian
 
PDF
AI-Cloud-Business-Management-Platforms-The-Key-to-Efficiency-Growth.pdf
Artjoker Software Development Company
 
PPTX
OA presentation.pptx OA presentation.pptx
pateldhruv002338
 
PDF
Data_Analytics_vs_Data_Science_vs_BI_by_CA_Suvidha_Chaplot.pdf
CA Suvidha Chaplot
 
PDF
Advances in Ultra High Voltage (UHV) Transmission and Distribution Systems.pdf
Nabajyoti Banik
 
PPTX
What-is-the-World-Wide-Web -- Introduction
tonifi9488
 
PDF
Research-Fundamentals-and-Topic-Development.pdf
ayesha butalia
 
PDF
Accelerating Oracle Database 23ai Troubleshooting with Oracle AHF Fleet Insig...
Sandesh Rao
 
PPTX
The-Ethical-Hackers-Imperative-Safeguarding-the-Digital-Frontier.pptx
sujalchauhan1305
 
PDF
MASTERDECK GRAPHSUMMIT SYDNEY (Public).pdf
Neo4j
 
PDF
Orbitly Pitch Deck|A Mission-Driven Platform for Side Project Collaboration (...
zz41354899
 
PPTX
cloud computing vai.pptx for the project
vaibhavdobariyal79
 
PDF
NewMind AI Weekly Chronicles - July'25 - Week IV
NewMind AI
 
PPTX
AI and Robotics for Human Well-being.pptx
JAYMIN SUTHAR
 
PDF
Automating ArcGIS Content Discovery with FME: A Real World Use Case
Safe Software
 
PDF
The Evolution of KM Roles (Presented at Knowledge Summit Dublin 2025)
Enterprise Knowledge
 
How-Cloud-Computing-Impacts-Businesses-in-2025-and-Beyond.pdf
Artjoker Software Development Company
 
Dev Dives: Automate, test, and deploy in one place—with Unified Developer Exp...
AndreeaTom
 
Brief History of Internet - Early Days of Internet
sutharharshit158
 
Google I/O Extended 2025 Baku - all ppts
HusseinMalikMammadli
 
OFFOFFBOX™ – A New Era for African Film | Startup Presentation
ambaicciwalkerbrian
 
AI-Cloud-Business-Management-Platforms-The-Key-to-Efficiency-Growth.pdf
Artjoker Software Development Company
 
OA presentation.pptx OA presentation.pptx
pateldhruv002338
 
Data_Analytics_vs_Data_Science_vs_BI_by_CA_Suvidha_Chaplot.pdf
CA Suvidha Chaplot
 
Advances in Ultra High Voltage (UHV) Transmission and Distribution Systems.pdf
Nabajyoti Banik
 
What-is-the-World-Wide-Web -- Introduction
tonifi9488
 
Research-Fundamentals-and-Topic-Development.pdf
ayesha butalia
 
Accelerating Oracle Database 23ai Troubleshooting with Oracle AHF Fleet Insig...
Sandesh Rao
 
The-Ethical-Hackers-Imperative-Safeguarding-the-Digital-Frontier.pptx
sujalchauhan1305
 
MASTERDECK GRAPHSUMMIT SYDNEY (Public).pdf
Neo4j
 
Orbitly Pitch Deck|A Mission-Driven Platform for Side Project Collaboration (...
zz41354899
 
cloud computing vai.pptx for the project
vaibhavdobariyal79
 
NewMind AI Weekly Chronicles - July'25 - Week IV
NewMind AI
 
AI and Robotics for Human Well-being.pptx
JAYMIN SUTHAR
 
Automating ArcGIS Content Discovery with FME: A Real World Use Case
Safe Software
 
The Evolution of KM Roles (Presented at Knowledge Summit Dublin 2025)
Enterprise Knowledge
 

Machines are people too

  • 1. MACHINES ARE PEOPLE TOO Dr. Paul Groth | @pgroth | pgroth.com Disruptive Technology Director Elsevier Labs | @elsevierlabs Theory and Practice of Digital Libraries 2017
  • 2. THANKS FOR CONVERSATION & SLIDES! Riffing off of Brad’s Dublin Core 2016 keynote https://www.slideshare.net/bpa777/ dc2016-keynote-20161013- 67164305
  • 3. THE SUCCESS OF DIGITAL LIBRARIES “Live every day like it's NBER day”
  • 4. THE SUCCESS OF DIGITAL LIBRARIES
  • 5. THE SUCCESS OF DIGITAL LIBRARIES
  • 6. THE SUCCESS OF DIGITAL LIBRARIES
  • 7. THE SUCCESS OF DIGITAL LIBRARIES
  • 13. DATA SEARCH Antony Scerri, John Kuriakose, Amit Ajit Deshmane, Mark Stanger, Peter Cotroneo, Rebekah Moore, Raj Naik, Anita de Waard; Elsevier’s approach to the bioCADDIE 2016 Dataset Retrieval Challenge, Database, Volume 2017, 1 January 2017, bax056, https://doi.org/10.1093/database/bax056
  • 14. THE CENTRALITY OF THE USER
  • 15. HOW DO RESEARCHERS SEARCH FOR DATA? Gregory, K., Groth, P., Cousijn, H., Scharnhorst, A., & Wyatt, S. (2017). Searching Data: A Review of Observational Data Retrieval Practices. arXiv preprint arXiv:1707.06937. Some observations from @gregory_km survey: 1. The needs and behaviours of specific user groups (e.g. early career researchers, policy makers, students) are not well documented. 2. Background uses of observational data are better documented than foreground uses. 3. Reconstructing data tables from journal articles, using general search engines, and making direct data requests are common.
  • 16. BUT ARE WE MISSING A USER?
  • 18. ELSEVIER’S BUSINESS: PROVIDING ANSWERS FOR RESEARCHERS, DOCTORS AND NURSES My work is moving towards a new field; what should I know? • Journal articles, reference works, profiles of researchers, funders & institutions • Recommendations of people to connect with, reading lists, topic pages How should I treat my patient given her condition & history? • Journal articles, reference works, medical guidelines, electronic health records • Treatment plan with alternatives personalized for the patient How can I master the subject matter of the course I am taking? • Course syllabus, reference works, course objectives, student history • Quiz plan based on the student’s history and course objectives
  • 20. WHAT CAN MACHINE INTELLIGENCE DO TODAY? If there’s a task that a normal person can do with less than one second of thinking, there’s a very good chance we can automate it with deep learning. Andrew Ng, Chief Scientist, Baidu (lecture at Bay Area Deep Learning School, Stanford, CA, September 24, 2016)
  • 21. HUMAN SPEECH RECOGNITION Was 23% in 2013, and over 35% in 2012. https://venturebeat.com/2017/05/17/googles-speech-recognition-technology-now-has-a-4-9-word-error-rate/
  • 23. THESE RESULTS ARE DRIVEN BY DATA “The paradigm shift of the ImageNet thinking is that while a lot of people are paying attention to models, let’s pay attention to data, …” – Prof. Fei-Fei Li [1] [1] The data that transformed AI research—and possibly the world https://qz.com/1034972/the-data-that-changed-the-direction-of-ai-research-and- possibly-the-world/
  • 24. THE GROWTH IN DATA ENGINEERS https://www.stitchdata.com/resources/reports/the-state-of-data-engineering
  • 25. BUT DO DIGITAL LIBRARIES HELP MACHINES? • Machines’ proficiency in learning to answer questions from text, audio, images and video will depend on our ability to train them effectively to read information from the Web • How machines read the Web today • Crawling and indexing Web resources, possibly semantically tagged (e.g. using schema.org) • Find-and-follow crawling of open linked data resources for ontology and data sharing and reuse • Programmatic access to APIs mediated through HTTP/S and other Internet protocols
  • 26. DIGITAL LIBRARIES & LINKED DATA STANDARDS
  • 27. THE SEMANTIC WEB WAS INTENDED FOR MACHINE READING … that’s the real idea behind the Semantic Web: letting software use the vast collective genius embedded in its published pages. Swartz, A. (2013). Aaron Swartz's A programmable Web: An unfinished work. San Rafael, Calif.: Morgan & Claypool Publishers.
  • 28. BUT THE SEMANTIC WEB IS BUILT FOR PEOPLE, NOT MACHINES • The Semantic Web is largely a logicist take on the way knowledge is to be represented • The latest advances in machine intelligence are based on a connectionist approach to knowledge representation • There is a gap between how knowledge is represented in the Semantic Web and what deep learning is exploiting to such good effect • The Semantic Web is silent about how machines can become better readers, and hence better partners in the second machine age • How will we evolve metadata standards to better accommodate machines?
  • 29. MACHINE READING IS ENABLED BY MACHINE LEARNING input output algorithm input output model learning architecture data Programming Machine learning GPU CPU CPU
  • 30. MACHINES SEE THINGS DIFFERENTLY THAN PEOPLE From: Alain, G. and Bengio, Y. (2016). Understanding intermediate layers using linear classifier probes. arXiv:1610.01644v1.
  • 31. MACHINES LEARN THINGS DIFFERENTLY THAN PEOPLE
  • 32. VOCABULARIES ARE SETS OF VECTOR EMBEDDINGS From: Eisner, B., Rocktäschel, T., Augenstein, I., Bošnjak, M. and Riedel, S. (2016). Emoji2vec: learning emoji representations from their description. arXiv:1609.08359v1.
  • 33. TRAINING DATASETS ARE GROWING IN VOLUME AND COVERAGE From: Abu-El-Haija, S., Kothari, N., Lee, J., Natsev, P., Toderici, G., Varadarajan, B. and Vijayanarasimhan, S. YouTube-8M: a large-scale video classification benchmark. arXiv:1609.08675.
  • 34. MODELS ARE BECOMING REUSABLE DATA RESOURCES Check out: sujitpal.blogspot.com for more
  • 35. MACHINE LEARNING DATASETS AND MODELS ARE BECOMING PART OF THE WEB • Machines need lots and lots of data to learn how to read • Datasets with ad-hoc formats are being made openly available • Open Images “~9 million URLs to images that have been annotated with labels spanning over 6000 categories” (The Open Images Dataset. (n.d.). Retrieved September 29, 2016, from https://github.com/openimages/dataset.) • YouTube-8M : “8 million YouTube video URLs (representing over 500,000 hours of video), along with video-level labels from a diverse set of 4800 Knowledge Graph entities” (Vijayanarasimhan S. and Natsev, P. (2016). Announcing YouTube-8M: A Large and Diverse Labeled Video Dataset for Video Understanding Research. Retrieved September 29, 2016, https://research.googleblog.com/2016/09/announcing-youtube-8m- large-and-diverse.html.) • Stanford Natural Language Inference: “570k human-written English sentence pairs manually labeled for balanced classification with the labels entailment, contradiction, and neutral, supporting the task of natural language inference” (The Stanford Natural Language Inference (SNLI) Corpus. (n.d.). Retrieved September 29, 2016, from http://nlp.stanford.edu/projects/snli/.) • Standard architectures for machine (deep) learning are being released as open source • Dense neural networks for classification • Convolutional neural networks for image, audio and video recognition • Recurrent neural networks for sequence processing and generation • Advances in the field are being published quickly and transferred to industrial application just as quickly
  • 36. THE OPPORTUNITY FOR LIBRARIANS AND PUBLISHERS As machines become increasingly capable of general- purpose language understanding, the burden of effort in building machine intelligences will shift from software engineering to the acquisition, organization and curation of training content and data.
  • 37. THE ROLE OF METADATA IN THE SECOND MACHINE AGE – DC-2016 / KØBENHAVN / 13 OCTOBER SAVE THE TIME OF THE MACHINE READER Perhaps this law is not so self-evident as the others. None the less, it has been responsible for many reforms in library administration and has a great potentiality for effecting many more reforms in the future. Ranganathan, S.R. (1931). The five laws of library science. Madras: The Madras Library Association.
  • 38. IMAGE SOURCE: HTTP://WESTPORTLIBRARY.ORG/ABOUT/NEWS/ROBOTS-ARRIVE-WESTPORT-LIBRARY WHAT DOES IT LOOK LIKE TO HAVE MACHINES AS LIBRARY PATRONS? Tasks 1. Dataset / Model / Vocabulary Curation 2. Combating Bias 3. Explanation 4. Interoperability 5. Data  Narratives
  • 43. BATTLING BIAS: ALGORITHMIC LITERACY Algorithms all have their own ideologies. As computational methods and data science become more and more a part of every aspect of our lives, it is essential that work begin to ensure there is a broader literacy about these techniques and that there is an expansive and deep engagement in the ethical issues surrounding them.” – Trevor Owens (Library of Congress / Former IMLS) http://www.pewinternet.org/2017/02/08/theme-7-the-need-grows-for-algorithmic-literacy-transparency-and-oversight/
  • 44. THE RIGHT TO AN EXPLANATION “The data subject shall have the right to obtain … the existence of automated decision-making, including profiling … meaningful information about the logic involved, as well as the significance and the envisaged consequences of such processing for the data subject.” EU General Data Protection Chapter 3, Article 15
  • 45. PROVENANCE FOR EXPLANATION Credits: Curt Tilmes, Peter Fox Tilmes, C.; Fox, P.; Ma, X.; McGuinness, D.L.; Privette, A.P.; Smith, A.; Waple, A.; Zednik, S.; Zheng, J.G., "Provenance Representation for the National Climate Assessment in the Global Change Information System," Geoscience and Remote Sensing, IEEE Transactions on , vol.51, no.11, pp.5160,5168, Nov. 2013
  • 46. NATIONAL CLIMATE CHANGE ASSESSMENT PROVENANCE
  • 48. DATA  NARRATIVE GENERATION Towards Automating Data Narratives. Gil, Y.; and Garijo, D. In Proceedings of the Twenty-Second ACM International Conference on Intelligent User Interfaces (IUI-17), Limassol, Cyprus, 2017.
  • 49. THE CHALLENGE: DIGITAL LIBRARIES FOR MACHINES • Digital Libraries have made tremendous strides in making media available • The investment in Linked Data and APIs has made integration and building applications easier and can help machine reader use cases • But a new user needs new support: • new forms of media (models, data) • new vocabulary representations • new forms of transparency • new ways to interoperate • new mechanisms to communicate • ….
  • 50. THANK YOU Dr. Paul Groth | @pgroth | pgroth.com labs.elsevier.com

Editor's Notes

  • #4: 8800 facebook group print
  • #5: Media
  • #9: 115 organizations
  • #16: Work with dans Reviewed 400 papers deep dive 114
  • #22: Sundar Pichai
  • #38: These laws are: Books are for use. Every reader his / her book. Every book its reader. Save the time of the reader. The library is a growing organism.
  • #39: Obviously, this is facetious. The “patron” is the machine learning faculty, not the machine itslelf.
  • #43: Identying and document