Big Metadata
Mining Special Collections Catalogs for New Knowledge
@AllisonJaiODell
#rbms15
2015 RBMS Conference, 25 June, Oakland & Berkeley, CA
#gillsans #sorrynotsorry
Metadata
Data about data
“Metadata was traditionally in the card
catalogs of libraries”
-- Wikipedia
“We kill people based on metadata”
Big Data
“Big data is an evolving term that describes
any voluminous amount of structured, semi-
structured, and unstructured data that has
the potential to be mined for information.”
-- Margaret Rouse
IT Acronyms: A Quick Reference Guide
Volume
Velocity
Variety
Veracity
Big Metadata
A voluminous amount of semi-structured
data that has the potential to be mined for
information
Data Mining
“Data mining is the analysis of (often large)
observational data sets to find unsuspected
relationships and to summarize the data in
novel ways that are both understandable and
useful to the data owner.”
-- David Hand, Heikki Mannila, Padhraic Smyth
Principles of Data Mining
Digital Humanities
“By digital humanities, we mean research
that uses information technology as a central
part of its methodology, for creating and/or
processing data.
“The digital humanities used to be known as
Humanities Computing, or ICT (Information
and Communications Technology) for
humanities research.The use of the term
reflects a growing sense of the importance
that digital tools and resources now have for
humanities subjects.”
-- University of Oxford
What are the Digital Humanities?
Visualization
“Data visualization is the presentation of data
in a pictorial or graphical format. For
centuries, people have depended on visual
representations such as charts and maps to
understand information more easily and
quickly”
-- SAS
Topic Modeling
“Topic models provide a simple way to
analyze large volumes of unlabeled text. A
‘topic’ consists of a cluster of words that
frequently occur together.”
-- MAchine Learning for LanguagE Toolkit (MALLET)
Pattern Matching
“Pattern matching is the act of checking a
given sequence of tokens for the presence of
the constituents of some pattern.”
-- Wikipedia
“A regular expression (regex or regexp for
short) is a special text string for describing a
search pattern.You can think of regular
expressions as wildcards on steroids.”
-- regular-expressions.info
Tools
R
D3
Gephi
MIT Exhibit
Tableau
FusionCharts
PALLADIO
MALLET
Topic-Modeling-Tool
ArchExtract
Stanford Named Entity Recognizer
Jigsaw
More in the DH Toychest
ShopYour Closet
“You really can repurpose what you have.
Look in the back of the closet at the
garments and whole outfits you forgot you
have. Mix it all up in new combinations.“
-- Deborah L. Jacobs
10 Ways to ‘Shop Your Closet’
Share Everything Plan
Data dumps
Export options
Harvesting enabled
Provenance Metadata
“Assertions about description statements or
description sets”
-- DCMI Metadata Provenance Task Group
Creation & revision history
Policy documentation
Summary
Metadata is data
Your catalog is full of data
Do some data mining
Make some cool discovery experiences
Make your researchers happy
Questions?
Allison Jai O’Dell
Metadata Librarian
University of Florida
AJODELL@ufl.edu
@AllisonJaiODell
#rbms15

Big Metadata: Mining Special Collections Catalogs for New Knowledge

  • 1.
    Big Metadata Mining SpecialCollections Catalogs for New Knowledge @AllisonJaiODell #rbms15 2015 RBMS Conference, 25 June, Oakland & Berkeley, CA #gillsans #sorrynotsorry
  • 2.
    Metadata Data about data “Metadatawas traditionally in the card catalogs of libraries” -- Wikipedia “We kill people based on metadata”
  • 3.
    Big Data “Big datais an evolving term that describes any voluminous amount of structured, semi- structured, and unstructured data that has the potential to be mined for information.” -- Margaret Rouse IT Acronyms: A Quick Reference Guide Volume Velocity Variety Veracity
  • 4.
    Big Metadata A voluminousamount of semi-structured data that has the potential to be mined for information
  • 5.
    Data Mining “Data miningis the analysis of (often large) observational data sets to find unsuspected relationships and to summarize the data in novel ways that are both understandable and useful to the data owner.” -- David Hand, Heikki Mannila, Padhraic Smyth Principles of Data Mining
  • 6.
    Digital Humanities “By digitalhumanities, we mean research that uses information technology as a central part of its methodology, for creating and/or processing data. “The digital humanities used to be known as Humanities Computing, or ICT (Information and Communications Technology) for humanities research.The use of the term reflects a growing sense of the importance that digital tools and resources now have for humanities subjects.” -- University of Oxford What are the Digital Humanities?
  • 7.
    Visualization “Data visualization isthe presentation of data in a pictorial or graphical format. For centuries, people have depended on visual representations such as charts and maps to understand information more easily and quickly” -- SAS
  • 8.
    Topic Modeling “Topic modelsprovide a simple way to analyze large volumes of unlabeled text. A ‘topic’ consists of a cluster of words that frequently occur together.” -- MAchine Learning for LanguagE Toolkit (MALLET)
  • 9.
    Pattern Matching “Pattern matchingis the act of checking a given sequence of tokens for the presence of the constituents of some pattern.” -- Wikipedia “A regular expression (regex or regexp for short) is a special text string for describing a search pattern.You can think of regular expressions as wildcards on steroids.” -- regular-expressions.info
  • 10.
  • 11.
    ShopYour Closet “You reallycan repurpose what you have. Look in the back of the closet at the garments and whole outfits you forgot you have. Mix it all up in new combinations.“ -- Deborah L. Jacobs 10 Ways to ‘Shop Your Closet’
  • 12.
    Share Everything Plan Datadumps Export options Harvesting enabled
  • 13.
    Provenance Metadata “Assertions aboutdescription statements or description sets” -- DCMI Metadata Provenance Task Group Creation & revision history Policy documentation
  • 14.
    Summary Metadata is data Yourcatalog is full of data Do some data mining Make some cool discovery experiences Make your researchers happy
  • 15.
    Questions? Allison Jai O’Dell MetadataLibrarian University of Florida [email protected] @AllisonJaiODell #rbms15