Big Metadata: Mining Special Collections Catalogs for New Knowledge

Big Metadata
Mining Special Collections Catalogs for New Knowledge
@AllisonJaiODell
#rbms15
2015 RBMS Conference, 25 June, Oakland & Berkeley, CA
#gillsans #sorrynotsorry

Metadata
Data about data
“Metadata was traditionally in the card
catalogs of libraries”
-- Wikipedia
“We kill people based on metadata”

Big Data
“Big data is an evolving term that describes
any voluminous amount of structured, semi-
structured, and unstructured data that has
the potential to be mined for information.”
-- Margaret Rouse
IT Acronyms: A Quick Reference Guide
Volume
Velocity
Variety
Veracity

Big Metadata
A voluminous amount of semi-structured
data that has the potential to be mined for
information

Data Mining
“Data mining is the analysis of (often large)
observational data sets to find unsuspected
relationships and to summarize the data in
novel ways that are both understandable and
useful to the data owner.”
-- David Hand, Heikki Mannila, Padhraic Smyth
Principles of Data Mining

Digital Humanities
“By digital humanities, we mean research
that uses information technology as a central
part of its methodology, for creating and/or
processing data.
“The digital humanities used to be known as
Humanities Computing, or ICT (Information
and Communications Technology) for
humanities research.The use of the term
reflects a growing sense of the importance
that digital tools and resources now have for
humanities subjects.”
-- University of Oxford
What are the Digital Humanities?

Visualization
“Data visualization is the presentation of data
in a pictorial or graphical format. For
centuries, people have depended on visual
representations such as charts and maps to
understand information more easily and
quickly”
-- SAS

Topic Modeling
“Topic models provide a simple way to
analyze large volumes of unlabeled text. A
‘topic’ consists of a cluster of words that
frequently occur together.”
-- MAchine Learning for LanguagE Toolkit (MALLET)

Pattern Matching
“Pattern matching is the act of checking a
given sequence of tokens for the presence of
the constituents of some pattern.”
-- Wikipedia
“A regular expression (regex or regexp for
short) is a special text string for describing a
search pattern.You can think of regular
expressions as wildcards on steroids.”
-- regular-expressions.info

Tools
R
D3
Gephi
MIT Exhibit
Tableau
FusionCharts
PALLADIO
MALLET
Topic-Modeling-Tool
ArchExtract
Stanford Named Entity Recognizer
Jigsaw
More in the DH Toychest

ShopYour Closet
“You really can repurpose what you have.
Look in the back of the closet at the
garments and whole outfits you forgot you
have. Mix it all up in new combinations.“
-- Deborah L. Jacobs
10 Ways to ‘Shop Your Closet’

Share Everything Plan
Data dumps
Export options
Harvesting enabled

Provenance Metadata
“Assertions about description statements or
description sets”
-- DCMI Metadata Provenance Task Group
Creation & revision history
Policy documentation

Summary
Metadata is data
Your catalog is full of data
Do some data mining
Make some cool discovery experiences
Make your researchers happy

Questions?
Allison Jai O’Dell
Metadata Librarian
University of Florida
AJODELL@ufl.edu
@AllisonJaiODell
#rbms15

Big Metadata: Mining Special Collections Catalogs for New Knowledge

More Related Content

Viewers also liked

Similar to Big Metadata: Mining Special Collections Catalogs for New Knowledge

Big Metadata: Mining Special Collections Catalogs for New Knowledge