SlideShare a Scribd company logo
Rik Hoekstra
Marijn Koolen
Marijke van Faassen
DH 2018, Mexico City, Mexico, 28/06/2018
Slides: http://bit.ly/dh2018-data-scopes
Data Scopes
Towards transparent data research in digital humanities
Overview
Background: Data Scopes
Example and Reflection
Research Process and Activities
Data Scopes for Research
Conclusions
● Making data work for research requires:
○ Technical know-how of how digital tools handle data
○ Intimate knowledge of the domain and subject of source materials
● But also:
○ Reflection on how choices are informed by prior knowledge and experience
○ Reflection on how choices put emphasis on some aspects, while pushing back others
○ Reflection of the transformation of data in the research process
● Often requires collaboration…
○ How to organise that
● … and lots of discussion
○ Choices that one collaborator makes should be visible to the rest
Motivation
Data Scopes
● Coherent methods for using digital data in humanities research
● Data scope: you want to analyse a certain aspect of your materials,
○ but the “raw” data is not suitable for direct analysis.
● You have to do something with the data. Questions:
○ What do I have to do to make data suitable?
○ How do I do that?
○ What should I document of this process to I can share it with others?
○ Which parts of this process are specific to my analysis and which are generically applicable?
Example: discourse coalition migration
● Research question: what determined the discourse about the management of migrants
● Context: research project about Dutch emigration 1945-1992
● Sub questions:
○ Who were involved in the international discourse about the management of migrants
○ How did the discourse change over time
○ How can we relate these changes
○ Scientification of politics and the politization of science
● Different datasets:
○ About people and their relation
○ About the discourse
Data Scopes - Towards transparent data research in digital humanities (Digital Humanities 2018)
Progressive steps of data transformation
Data processing: steps (incomplete)
Datasets:
- composition
members
discourse
coalition
- Committee
- National
- International
- Changes
over time
Titles
selection
Institutional
background
Identify and
disambiguate
persons
Process titles
(not fulltext,
stopword
removal,
frequency
measures)
Tool selection
Network
Periodisation
Themes
Feature
comparison
Visualisation
Network
Network
dynamics
Word clouds
Selection of
key person
Link persons
and titles
Select
important
features from
texts
Modelling:
Discourse
coalition
Data Scopes - Towards transparent data research in digital humanities (Digital Humanities 2018)
Focus on Data-Related Activities
Data Interactions
● It’s not about specific tools
○ It’s about the steps researchers take, why they take them
● Translate research questions, assumption and interpretations to data interactions
● Discuss the consequences of interactions for questions, assumptions and interpretations
Frameworks focusing on process
● Scholarly Primitives (Unsworth 2000): discover, annotate, compare, refer, sample, illustrate,
represent
● Stages of Data Visualization (Fry 2007): acquire, parse, filter, mine, represent, refine, interact
● Data Scope: select, model, normalise, link, classify
● Research plan:
○ Analyse network of experts involved in discourse on migration
● Research process:
○ Translate plan into sequence of data selections and transformations
○ Cycle of interpretations, decisions and actions
● Research description (van Faassen & Hoekstra 2017):
○ “To find out exactly how these experts were connected to key actors from the political sphere,
[...], we went through the prefaces of the publications. We modelled the different roles of the
key actors based on issues such as: who were writing forewords, prefaces or introductions to
each other’s work; Who ordered the research? Who financed it? Etc.”
Creating a Data Scope
Selecting
● Which materials do I include? Which do I exclude and why?
○ How important are completeness, representativeness?
○ Potentially huge impact on network analysis
● Algorithmic selection:
○ Everything matching a (set of) keyword(s)
○ Documents by type, creator, title, size, …
○ How does technology allow and limit selection?
● What are consequences of these selections?
Data Scopes - Towards transparent data research in digital humanities (Digital Humanities 2018)
● Computational approach requires modelling data (McCarty 2004)
● Determine what aspects/elements of data to focus on and what to leave out (why?)
○ People and organizations involved in discourse coalition
i. Authors, editors, commissioners, sponsors
○ Change in coalitions from 1950s in 10 year periods
● Structures data in sources around research focus
○ Transforms data, affects interpretation!
Modelling
Data Scopes - Towards transparent data research in digital humanities (Digital Humanities 2018)
● Modelling data creates data axes:
○ Persons, organisations, locations, dates,
○ Themes, topics, events, actions, decisions, life courses, ...
● Defining categories or classes along those axes:
○ Roles of people and organisations, memberships
○ Periods, regions
● Research stages:
○ Model is updated as research progresses
○ This updating reflects growing insights
○ Choice points reflect shifts in interpretation
Data axes
Normalizing
● Bring surface forms expressed in data to underlying standard form
● Map variation onto a single representation:
○ Linguistic, geographical, spatial, temporal, structural
○ E.g. entrepreneur, entrepreneurs, entrepreneurship
○ Important consequences whenever you count frequencies or analyse networks
● What is irrelevant variation?
○ Is the distinction between entrepreneur, and entrepreneurship important for research focus?
○ Uncertainty: are mentions of New York and NYC variants that refer to the same thing?
● Essential for next step: linking
● Establishing explicit connections between objects in data sources
○ Within a dataset: relations between people, organizations
○ Across datasets: e.g. mentions of same person, location, date, …
i. Can bring together disparate data about single entity from different sources
● What counts as a link?
○ Editor - Main author
○ Preface author - Main author
○ Commissioner - Main author
○ Commissioner - Sponsor
Linking
Classifying
● Reduction of complexity by grouping (data) objects into predefined categories, or classes
○ Bringing together objects with similar properties
○ Separating objects with dissimilar properties
● Adds new layers of structure and interpretation to data
○ Especially useful for low-frequency items
○ Many data dimensions have “long tails” which are hard to structure
● Deciding on classification dimensions and classes is part of modelling
Understanding data scope affects interpretation of network visualization!
● Too often, scholars consider this process as “mere preparation”
○ “... not part of the real research”,
○ Leave it out of scholarly communication as it “gets in the way of the narrative”
● Process of selecting, modelling and transforming is intellectual effort
○ Requires both technical and domain knowledge and interpretation
○ Different choices can lead to very different analyses and interpretations
Conclusions (1/2)
Conclusions (1/2)
● Too often, scholars consider this process as “mere preparation”
○ “... not part of the real research”,
○ Leave it out of scholarly communication as it “gets in the way of the narrative”
● Process of selecting, modelling and transforming is intellectual effort
○ Requires both technical and domain knowledge and interpretation
○ Different choices can lead to very different analyses and interpretations
● Even if you didn’t consider a certain transformation you still made a
choice!
Conclusions (2/2)
● Need to increase shared understanding
○ Both in terminology and methodology
○ Data scope provides set of concepts to address this
● Open questions
○ How do we communicate data scope process to collaborators and peers?
○ Need for alternative forms of publication?
■ E.g. layered publication: narrative < process < data
References
Boonstra, Onno, Leen Breure en Peter Doorn, 2006 Past, present and future of historical information science, Amsterdam 2006
Brenninkmeijer, C., et al. 2012. Scientific Lenses over Linked Data: An approach to support task specific views of the data. A vision.
van Faassen, Marijke, Rik Hoekstra. 2017. Modelling Society through Migration Management. Exploring the role of (Dutch) experts in
20th century international migration policy. Conference paper. Government by Expertise: Technocrats and Technocracy in Western
Europe, 1914-1973. Panel 3. Global Expertise.
Graham, S., I. Milligan, and S. Weingart. 2016. The Historian’s Macroscope: Big Digital History http://www.themacroscope.org/2.0/
Groth, P., Y. Gil, J. Cheney, and S. Miles. 2012. “Requirements for provenance on the web.” International Journal of Digital Curation
7(1).
Hoekstra, R., M. Koolen. 2018. Data Scopes for Digital History Research. Historical Methods - A journal of quantitative and
interdisciplinary history, Volume 51, 2018.
Ockeloen, N., A. Fokkens, S. ter Braake, P. Vossen, V. de Boer, G. Schreiber and S. Legêne. 2013. BiographyNet: Managing
Provenance at multiple levels and from different perspectives. In: Proceedings of the Workshop on Linked Science (LiSC) at ISWC
2013, Sydney, Australia, October 2013.
Thank You! Gracias!
Questions?
Preguntas?
Slides: http://bit.ly/dh2018-data-scopes
Q&A
● Q: What is the difference between this process in digital context and in analog
context? In analogue research, we have always obfuscated certain parts of
the process. Why do we need more transparency now?
○ A: There is no difference. Transparency of process was as important then as it is now.
○ The issue is that there currently is a lack of shared understanding of and terminology for
talking about this process and how it fits in research methodology and practice.
○ The reason why we can leave out details of the analogue process is that practitioners have
been trained in these analogue methods, with a shared understanding of the steps involved
and pitfalls to avoid. In digital research, this is not yet the case.
○ We need to discuss how to communicate about this process to collaborators and peers, at
what level of detail of technical steps and choices and consequences. The most detailed level
is probably too detailed, obfuscating the more relevant aspects in a flood of trivial details.
○ Moreover, humanities researchers tend to use digital tools in their data research that transform
their data, but that are viewed as black boxes that ‘just do a job’. This makes it even more
urgent to document research in the digital era.
Q&A
● Q: Tracing this data transformation process is basically the issue of
provenance. To what extent can existing provenance tools tackle this?
○ A: Documenting steps that lead to research output captures only a part of the process, but
misses important parts.
○ First, existing tools keep tracks of steps but not of alternative choices, considerations and
reasoning for steps.
○ Second, such tools tend to suggest a linear flow from raw input to final output, which misses
the point that the research process is non-linear and leaves out the dead ends that can lead to
new insights and judgements.
Q&A
● Q: Are there existing solutions for dealing with the dynamics of the coalition
network for different periodizations? E.g. instead of having non-overlapping
periods, visualize the networks for 10 year periods with 1-year of 5-year
jumps? Would that solve the problem of interpreting these networks?
○ A: Communicating about the research process is important, regardless of the approach taken.
A sliding window of 10 periods shifting 1 year each step introduces new questions of
interpretation and potential consequences.
○ Note that using multiple, complementary analyses can provide complementary perspectives
and ways to reciprocally and critically assess the individual analyses.

More Related Content

PDF
A hands-on approach to digital tool criticism: Tools for (self-)reflection
Marijn Koolen
 
PDF
Tools that Encourage Criticism - Leiden University Symposium on Tools Criticism
Marijn Koolen
 
PDF
Lessons Learned from a Digital Tool Criticism Workshop
Marijn Koolen
 
PDF
Tool criticism
Marijn Koolen
 
PPTX
Requirements for Learning Analytics
Tore Hoel
 
PPTX
Website User Experience: A cross-cultural study of the relation between user...
Ather Nawaz
 
PDF
Learning Analytics in action: ethics and privacy issues in the classroom
María Jesús Rodríguez Triana
 
PPTX
Privacy-driven design of Learning Analytics applications – exploring the desi...
Tore Hoel
 
A hands-on approach to digital tool criticism: Tools for (self-)reflection
Marijn Koolen
 
Tools that Encourage Criticism - Leiden University Symposium on Tools Criticism
Marijn Koolen
 
Lessons Learned from a Digital Tool Criticism Workshop
Marijn Koolen
 
Tool criticism
Marijn Koolen
 
Requirements for Learning Analytics
Tore Hoel
 
Website User Experience: A cross-cultural study of the relation between user...
Ather Nawaz
 
Learning Analytics in action: ethics and privacy issues in the classroom
María Jesús Rodríguez Triana
 
Privacy-driven design of Learning Analytics applications – exploring the desi...
Tore Hoel
 

Similar to Data Scopes - Towards transparent data research in digital humanities (Digital Humanities 2018) (20)

PDF
Digital Humanities and “Digital” Social Sciences
Chantal van Son
 
PPT
AI (1).ppt ug gjhghhhjkjhhjjffdfhhcchhvvh
viralak69
 
PPT
Artificial Intelligence and the Internet
JCGonzaga1
 
PPT
Workshop a way-of_applying_an_events_model_to_national_archives_data
semanticsconference
 
PDF
5 part 2 - Methodology _ Learning About Your Data.pdf
ahmadluky1
 
PPTX
Omitola birmingham cityuniv
Tope Omitola
 
PDF
Intro to Data Vis for the Humanities nov 2013
Shawn Day
 
PPTX
Data as a service: a human-centered design approach/Retha de la Harpe
African Open Science Platform
 
PPTX
Identifying semantics characteristics of user’s interactions datasets through...
Fernando de Assis Rodrigues
 
PPTX
Managing 'Big Data' in the social sciences: the contribution of an analytico-...
CILIP MDG
 
PDF
Download full ebook of Doing Digital Methods Richard Rogers instant download pdf
flattferen7x
 
PPTX
IMPACT Final Event 26-06-2012 - Franciska de Jong - Indexing and searching of...
IMPACT Centre of Competence
 
PPTX
Beyond the Black Box: Data Visualisation
Mia
 
PDF
15. political discourseinthenewskb
ingeangevaare
 
PDF
MPhil Lecture on Data Vis for Analysis
Shawn Day
 
PDF
Data Science and What It Means to Library and Information Science
Jian Qin
 
PPTX
Turning FAIR data into reality
Sarah Jones
 
PPTX
DC 2012 - Leveraging the DDI Model for Linked Statistical Data in the Social...
Dr.-Ing. Thomas Hartmann
 
PPTX
New methods of access and discoverability bring new affordances for digital r...
benosteen
 
PDF
The web of data: how are we doing so far?
Elena Simperl
 
Digital Humanities and “Digital” Social Sciences
Chantal van Son
 
AI (1).ppt ug gjhghhhjkjhhjjffdfhhcchhvvh
viralak69
 
Artificial Intelligence and the Internet
JCGonzaga1
 
Workshop a way-of_applying_an_events_model_to_national_archives_data
semanticsconference
 
5 part 2 - Methodology _ Learning About Your Data.pdf
ahmadluky1
 
Omitola birmingham cityuniv
Tope Omitola
 
Intro to Data Vis for the Humanities nov 2013
Shawn Day
 
Data as a service: a human-centered design approach/Retha de la Harpe
African Open Science Platform
 
Identifying semantics characteristics of user’s interactions datasets through...
Fernando de Assis Rodrigues
 
Managing 'Big Data' in the social sciences: the contribution of an analytico-...
CILIP MDG
 
Download full ebook of Doing Digital Methods Richard Rogers instant download pdf
flattferen7x
 
IMPACT Final Event 26-06-2012 - Franciska de Jong - Indexing and searching of...
IMPACT Centre of Competence
 
Beyond the Black Box: Data Visualisation
Mia
 
15. political discourseinthenewskb
ingeangevaare
 
MPhil Lecture on Data Vis for Analysis
Shawn Day
 
Data Science and What It Means to Library and Information Science
Jian Qin
 
Turning FAIR data into reality
Sarah Jones
 
DC 2012 - Leveraging the DDI Model for Linked Statistical Data in the Social...
Dr.-Ing. Thomas Hartmann
 
New methods of access and discoverability bring new affordances for digital r...
benosteen
 
The web of data: how are we doing so far?
Elena Simperl
 
Ad

More from Marijn Koolen (9)

PDF
OPG 2025 Tutorial on Digital Political History and the role of search in rese...
Marijn Koolen
 
PDF
Recommender Systems NL Meetup
Marijn Koolen
 
PDF
Narrative-Driven Recommendation for Casual Leisure Needs
Marijn Koolen
 
PDF
Digital History - Maritieme Carrieres bij de VOC
Marijn Koolen
 
PDF
Hobby horses-and-detail-devils-transparency-in-digital-humanities-research-an...
Marijn Koolen
 
PDF
Facilitating reusable third-party annotations in the digital edition
Marijn Koolen
 
PDF
Narrative-Driven Recommendation for Casual Leisure Needs
Marijn Koolen
 
PDF
Scholary Web Annotation - HuC Live 2018
Marijn Koolen
 
PDF
Search in Research, Let's Make it More Complex!
Marijn Koolen
 
OPG 2025 Tutorial on Digital Political History and the role of search in rese...
Marijn Koolen
 
Recommender Systems NL Meetup
Marijn Koolen
 
Narrative-Driven Recommendation for Casual Leisure Needs
Marijn Koolen
 
Digital History - Maritieme Carrieres bij de VOC
Marijn Koolen
 
Hobby horses-and-detail-devils-transparency-in-digital-humanities-research-an...
Marijn Koolen
 
Facilitating reusable third-party annotations in the digital edition
Marijn Koolen
 
Narrative-Driven Recommendation for Casual Leisure Needs
Marijn Koolen
 
Scholary Web Annotation - HuC Live 2018
Marijn Koolen
 
Search in Research, Let's Make it More Complex!
Marijn Koolen
 
Ad

Recently uploaded (20)

PDF
Sujay Rao Mandavilli Multi-barreled appraoch to educational reform FINAL FINA...
Sujay Rao Mandavilli
 
PPTX
Internal Capsule_Divisions_fibres_lesions
muralinath2
 
PPTX
The Toxic Effects of Aflatoxin B1 and Aflatoxin M1 on Kidney through Regulati...
OttokomaBonny
 
PPTX
ANTIANGINAL DRUGS.pptx m pharm pharmacology
46JaybhayAshwiniHari
 
PPTX
Unit 4 - Astronomy and Astrophysics - Milky Way And External Galaxies
RDhivya6
 
PDF
Control and coordination Class 10 Chapter 6
LataHolkar
 
PPT
1a. Basic Principles of Medical Microbiology Part 2 [Autosaved].ppt
separatedwalk
 
PDF
Renewable Energy Resources (Solar, Wind, Nuclear, Geothermal) Presentation
RimshaNaeem23
 
PPTX
Embark on a journey of cell division and it's stages
sakyierhianmontero
 
PDF
A water-rich interior in the temperate sub-Neptune K2-18 b revealed by JWST
Sérgio Sacani
 
PPTX
Nature of Science and the kinds of models used in science
JocelynEvascoRomanti
 
PDF
Identification of unnecessary object allocations using static escape analysis
ESUG
 
DOCX
Echoes_of_Andromeda_Partial (1).docx9989
yakshitkrishnia5a3
 
PDF
NSF-DOE Vera C. Rubin Observatory Observations of Interstellar Comet 3I/ATLAS...
Sérgio Sacani
 
PDF
Multiwavelength Study of a Hyperluminous X-Ray Source near NGC6099: A Strong ...
Sérgio Sacani
 
PPT
Grade_9_Science_Atomic_S_t_r_u_cture.ppt
QuintReynoldDoble
 
PPTX
Quality control test for plastic & metal.pptx
shrutipandit17
 
PPTX
Q1_Science 8_Week4-Day 5.pptx science re
AizaRazonado
 
PPTX
Brain_stem_Medulla oblongata_functions of pons_mid brain
muralinath2
 
PPTX
The Obesity Paradox. Friend or Foe ?pptx
drdgd1972
 
Sujay Rao Mandavilli Multi-barreled appraoch to educational reform FINAL FINA...
Sujay Rao Mandavilli
 
Internal Capsule_Divisions_fibres_lesions
muralinath2
 
The Toxic Effects of Aflatoxin B1 and Aflatoxin M1 on Kidney through Regulati...
OttokomaBonny
 
ANTIANGINAL DRUGS.pptx m pharm pharmacology
46JaybhayAshwiniHari
 
Unit 4 - Astronomy and Astrophysics - Milky Way And External Galaxies
RDhivya6
 
Control and coordination Class 10 Chapter 6
LataHolkar
 
1a. Basic Principles of Medical Microbiology Part 2 [Autosaved].ppt
separatedwalk
 
Renewable Energy Resources (Solar, Wind, Nuclear, Geothermal) Presentation
RimshaNaeem23
 
Embark on a journey of cell division and it's stages
sakyierhianmontero
 
A water-rich interior in the temperate sub-Neptune K2-18 b revealed by JWST
Sérgio Sacani
 
Nature of Science and the kinds of models used in science
JocelynEvascoRomanti
 
Identification of unnecessary object allocations using static escape analysis
ESUG
 
Echoes_of_Andromeda_Partial (1).docx9989
yakshitkrishnia5a3
 
NSF-DOE Vera C. Rubin Observatory Observations of Interstellar Comet 3I/ATLAS...
Sérgio Sacani
 
Multiwavelength Study of a Hyperluminous X-Ray Source near NGC6099: A Strong ...
Sérgio Sacani
 
Grade_9_Science_Atomic_S_t_r_u_cture.ppt
QuintReynoldDoble
 
Quality control test for plastic & metal.pptx
shrutipandit17
 
Q1_Science 8_Week4-Day 5.pptx science re
AizaRazonado
 
Brain_stem_Medulla oblongata_functions of pons_mid brain
muralinath2
 
The Obesity Paradox. Friend or Foe ?pptx
drdgd1972
 

Data Scopes - Towards transparent data research in digital humanities (Digital Humanities 2018)

  • 1. Rik Hoekstra Marijn Koolen Marijke van Faassen DH 2018, Mexico City, Mexico, 28/06/2018 Slides: http://bit.ly/dh2018-data-scopes Data Scopes Towards transparent data research in digital humanities
  • 2. Overview Background: Data Scopes Example and Reflection Research Process and Activities Data Scopes for Research Conclusions
  • 3. ● Making data work for research requires: ○ Technical know-how of how digital tools handle data ○ Intimate knowledge of the domain and subject of source materials ● But also: ○ Reflection on how choices are informed by prior knowledge and experience ○ Reflection on how choices put emphasis on some aspects, while pushing back others ○ Reflection of the transformation of data in the research process ● Often requires collaboration… ○ How to organise that ● … and lots of discussion ○ Choices that one collaborator makes should be visible to the rest Motivation
  • 4. Data Scopes ● Coherent methods for using digital data in humanities research ● Data scope: you want to analyse a certain aspect of your materials, ○ but the “raw” data is not suitable for direct analysis. ● You have to do something with the data. Questions: ○ What do I have to do to make data suitable? ○ How do I do that? ○ What should I document of this process to I can share it with others? ○ Which parts of this process are specific to my analysis and which are generically applicable?
  • 5. Example: discourse coalition migration ● Research question: what determined the discourse about the management of migrants ● Context: research project about Dutch emigration 1945-1992 ● Sub questions: ○ Who were involved in the international discourse about the management of migrants ○ How did the discourse change over time ○ How can we relate these changes ○ Scientification of politics and the politization of science ● Different datasets: ○ About people and their relation ○ About the discourse
  • 7. Progressive steps of data transformation Data processing: steps (incomplete) Datasets: - composition members discourse coalition - Committee - National - International - Changes over time Titles selection Institutional background Identify and disambiguate persons Process titles (not fulltext, stopword removal, frequency measures) Tool selection Network Periodisation Themes Feature comparison Visualisation Network Network dynamics Word clouds Selection of key person Link persons and titles Select important features from texts Modelling: Discourse coalition
  • 9. Focus on Data-Related Activities Data Interactions ● It’s not about specific tools ○ It’s about the steps researchers take, why they take them ● Translate research questions, assumption and interpretations to data interactions ● Discuss the consequences of interactions for questions, assumptions and interpretations Frameworks focusing on process ● Scholarly Primitives (Unsworth 2000): discover, annotate, compare, refer, sample, illustrate, represent ● Stages of Data Visualization (Fry 2007): acquire, parse, filter, mine, represent, refine, interact ● Data Scope: select, model, normalise, link, classify
  • 10. ● Research plan: ○ Analyse network of experts involved in discourse on migration ● Research process: ○ Translate plan into sequence of data selections and transformations ○ Cycle of interpretations, decisions and actions ● Research description (van Faassen & Hoekstra 2017): ○ “To find out exactly how these experts were connected to key actors from the political sphere, [...], we went through the prefaces of the publications. We modelled the different roles of the key actors based on issues such as: who were writing forewords, prefaces or introductions to each other’s work; Who ordered the research? Who financed it? Etc.” Creating a Data Scope
  • 11. Selecting ● Which materials do I include? Which do I exclude and why? ○ How important are completeness, representativeness? ○ Potentially huge impact on network analysis ● Algorithmic selection: ○ Everything matching a (set of) keyword(s) ○ Documents by type, creator, title, size, … ○ How does technology allow and limit selection? ● What are consequences of these selections?
  • 13. ● Computational approach requires modelling data (McCarty 2004) ● Determine what aspects/elements of data to focus on and what to leave out (why?) ○ People and organizations involved in discourse coalition i. Authors, editors, commissioners, sponsors ○ Change in coalitions from 1950s in 10 year periods ● Structures data in sources around research focus ○ Transforms data, affects interpretation! Modelling
  • 15. ● Modelling data creates data axes: ○ Persons, organisations, locations, dates, ○ Themes, topics, events, actions, decisions, life courses, ... ● Defining categories or classes along those axes: ○ Roles of people and organisations, memberships ○ Periods, regions ● Research stages: ○ Model is updated as research progresses ○ This updating reflects growing insights ○ Choice points reflect shifts in interpretation Data axes
  • 16. Normalizing ● Bring surface forms expressed in data to underlying standard form ● Map variation onto a single representation: ○ Linguistic, geographical, spatial, temporal, structural ○ E.g. entrepreneur, entrepreneurs, entrepreneurship ○ Important consequences whenever you count frequencies or analyse networks ● What is irrelevant variation? ○ Is the distinction between entrepreneur, and entrepreneurship important for research focus? ○ Uncertainty: are mentions of New York and NYC variants that refer to the same thing? ● Essential for next step: linking
  • 17. ● Establishing explicit connections between objects in data sources ○ Within a dataset: relations between people, organizations ○ Across datasets: e.g. mentions of same person, location, date, … i. Can bring together disparate data about single entity from different sources ● What counts as a link? ○ Editor - Main author ○ Preface author - Main author ○ Commissioner - Main author ○ Commissioner - Sponsor Linking
  • 18. Classifying ● Reduction of complexity by grouping (data) objects into predefined categories, or classes ○ Bringing together objects with similar properties ○ Separating objects with dissimilar properties ● Adds new layers of structure and interpretation to data ○ Especially useful for low-frequency items ○ Many data dimensions have “long tails” which are hard to structure ● Deciding on classification dimensions and classes is part of modelling
  • 19. Understanding data scope affects interpretation of network visualization!
  • 20. ● Too often, scholars consider this process as “mere preparation” ○ “... not part of the real research”, ○ Leave it out of scholarly communication as it “gets in the way of the narrative” ● Process of selecting, modelling and transforming is intellectual effort ○ Requires both technical and domain knowledge and interpretation ○ Different choices can lead to very different analyses and interpretations Conclusions (1/2)
  • 21. Conclusions (1/2) ● Too often, scholars consider this process as “mere preparation” ○ “... not part of the real research”, ○ Leave it out of scholarly communication as it “gets in the way of the narrative” ● Process of selecting, modelling and transforming is intellectual effort ○ Requires both technical and domain knowledge and interpretation ○ Different choices can lead to very different analyses and interpretations ● Even if you didn’t consider a certain transformation you still made a choice!
  • 22. Conclusions (2/2) ● Need to increase shared understanding ○ Both in terminology and methodology ○ Data scope provides set of concepts to address this ● Open questions ○ How do we communicate data scope process to collaborators and peers? ○ Need for alternative forms of publication? ■ E.g. layered publication: narrative < process < data
  • 23. References Boonstra, Onno, Leen Breure en Peter Doorn, 2006 Past, present and future of historical information science, Amsterdam 2006 Brenninkmeijer, C., et al. 2012. Scientific Lenses over Linked Data: An approach to support task specific views of the data. A vision. van Faassen, Marijke, Rik Hoekstra. 2017. Modelling Society through Migration Management. Exploring the role of (Dutch) experts in 20th century international migration policy. Conference paper. Government by Expertise: Technocrats and Technocracy in Western Europe, 1914-1973. Panel 3. Global Expertise. Graham, S., I. Milligan, and S. Weingart. 2016. The Historian’s Macroscope: Big Digital History http://www.themacroscope.org/2.0/ Groth, P., Y. Gil, J. Cheney, and S. Miles. 2012. “Requirements for provenance on the web.” International Journal of Digital Curation 7(1). Hoekstra, R., M. Koolen. 2018. Data Scopes for Digital History Research. Historical Methods - A journal of quantitative and interdisciplinary history, Volume 51, 2018. Ockeloen, N., A. Fokkens, S. ter Braake, P. Vossen, V. de Boer, G. Schreiber and S. Legêne. 2013. BiographyNet: Managing Provenance at multiple levels and from different perspectives. In: Proceedings of the Workshop on Linked Science (LiSC) at ISWC 2013, Sydney, Australia, October 2013.
  • 24. Thank You! Gracias! Questions? Preguntas? Slides: http://bit.ly/dh2018-data-scopes
  • 25. Q&A ● Q: What is the difference between this process in digital context and in analog context? In analogue research, we have always obfuscated certain parts of the process. Why do we need more transparency now? ○ A: There is no difference. Transparency of process was as important then as it is now. ○ The issue is that there currently is a lack of shared understanding of and terminology for talking about this process and how it fits in research methodology and practice. ○ The reason why we can leave out details of the analogue process is that practitioners have been trained in these analogue methods, with a shared understanding of the steps involved and pitfalls to avoid. In digital research, this is not yet the case. ○ We need to discuss how to communicate about this process to collaborators and peers, at what level of detail of technical steps and choices and consequences. The most detailed level is probably too detailed, obfuscating the more relevant aspects in a flood of trivial details. ○ Moreover, humanities researchers tend to use digital tools in their data research that transform their data, but that are viewed as black boxes that ‘just do a job’. This makes it even more urgent to document research in the digital era.
  • 26. Q&A ● Q: Tracing this data transformation process is basically the issue of provenance. To what extent can existing provenance tools tackle this? ○ A: Documenting steps that lead to research output captures only a part of the process, but misses important parts. ○ First, existing tools keep tracks of steps but not of alternative choices, considerations and reasoning for steps. ○ Second, such tools tend to suggest a linear flow from raw input to final output, which misses the point that the research process is non-linear and leaves out the dead ends that can lead to new insights and judgements.
  • 27. Q&A ● Q: Are there existing solutions for dealing with the dynamics of the coalition network for different periodizations? E.g. instead of having non-overlapping periods, visualize the networks for 10 year periods with 1-year of 5-year jumps? Would that solve the problem of interpreting these networks? ○ A: Communicating about the research process is important, regardless of the approach taken. A sliding window of 10 periods shifting 1 year each step introduces new questions of interpretation and potential consequences. ○ Note that using multiple, complementary analyses can provide complementary perspectives and ways to reciprocally and critically assess the individual analyses.