SlideShare a Scribd company logo
Storytelling for Summarizing
Collections in Web Archives
Yasmin AlNoamany
Michele C. Weigle
Michael L. Nelson
Old Dominion University
Web Science and Digital Libraries Group
@WebSciDL
This work is supported in part by IMLS LG-71-15-0077
CNI Spring 2016
2016-04-05
1
IMLS-Funded Research
1. Use small “stories” to summarize much larger
collections of archived web pages
– big  small
2. Generate web archive collections by mining
user-generated stories for seed URIs
– small  big
http://ws-dl.blogspot.com/2015/10/2015-10-07-imls-and-nsf-fund-web.html
2
Archive-It, a subscription-based
service, hosts curated web collections
3
> 3,000
collections
> 400
partners
> 10B
archived
pages
4
Collection
title
Collection
categorization
according to
the curator
Seed
URI
Metadata
about the
collection
Text
search
box
The group
that the
resource
belongs to
List of
the
seed
URIs
Timespan of
the resource
and the
number of
times it has
been captured
Problem:
Collection understanding and
collection summarization are
not currently supported
Not easy to answer “what’s in that collection?”
5
There is more than one collection
about the Egyptian Revolution
6
• “2010-2011 Arab Spring” https://archive-it.org/collections/3101
• “North Africa & the Middle East 2011-2013” https://archive-it.org/collections/2349
• “Egypt Revolution and Politics” https://archive-it.org/collections/2358
(1000s of Seeds X 1000s of Mementos)
+ Dimension of Time ==
Conventional Vis Methods
Not Applicable
7
Using Timelines, Treemaps, etc.:
http://ws-dl.blogspot.com/2012/08/2012-08-10-ms-thesis-visualizing.html
Idea:
Storytelling
8
Stories in Literature
Story elements: setting, characters, sequence, exposition,
conflict, climax, resolution
9
Once upon a time…
http://www.learner.org/interactives/story/
Stories in social media
10
“It's hard to define a story, but I know it when I see it” (Alexander, 2008)
A sampling and arrangement of web resources for summarization.
Collection == thematic sample from the Web
Story == arranged sample from the collection
S
1
S
2
S
3
S
4
S
2
S
1
S
3
Collection Y
S
3
S
2
S
1
Collection Z
Collection X
11
We sample k mementos from N pages of the collection to create a summary story
Collections have two dimensions
12
Time
URI
Fixed Pages, Fixed Time
R1
R1
R1
R1
t1 t3t2 t5t4 t6
13
Fixed Page, Fixed Time
14
A desktop Chrome user-agent
http://www.cnn.com/2014/02/24/world/africa/egypt-
politics/index.html?hpt=wo_c2
Andriod Chrome user-agent
http://www.cnn.com/2014/02/24/world/africa/egypt-
politics/index.html?hpt=wo_c2
First Steps in Archiving the Mobile Web: Automated Discovery of Mobile Websites, JCDL 2013: https://www.harding.edu/fmccown/pubs/jcdlsp182-schneider.pdf
A Method for Identifying Personalized Representations in Web Archives, D-Lib Magazine, 2013: http://www.dlib.org/dlib/november13/kelly/11kelly.html
Fixed Page, Sliding Time
R R R R R R
t1 t3t2 t5t4 t6
15
Feb 1 Feb 1 Feb 2
Feb 4 Feb 5 Feb 7
Feb 9 Feb 11
Feb 11
16
Sliding Page, Fixed Time
R1
R2
R3
R4
t1 t3t2 t5t4 t6
17
Feb. 11, 2011
Mubarak resigns
18
Sliding Page, Sliding Time
R1
R2
R1
R3
R4
R2
t1 t3t2 t5t4 t6
19
Jan 27 Jan 31
Feb 7Feb 4
Feb 11 Feb 11
Feb 2
Jan 25
Feb 10
20
21
What do stories in Storify look like?
“Characteristics of Social Media Stories”, TPDL 2015
http://www.cs.odu.edu/~mln/pubs/tpdl-2015/tpdl-2015-stories.pdf
What is the length of a story
(the number of resources per story)?
• This story
has 31
resources
22
1
3
2
What are the types of resources
that compose a story?
• This story has
– 19 quotes
– 8 images
– 4 videos
23
Quotes
Video
What are the most frequently
used domains?
• This story uses:
– 90% twitter.com
– 7% instagram.com
– 3% facebook.com
24
Twitter.com
Twitter.com
Twitter.com
What differentiates a popular story?
25
19,795 views 64 views
(skipping many details,
see TPDL 2015 paper)
26
We should create stories with:
• ~28 pages
• moar images!
• where possible, select pages from social
media, news, blogs
• additional dimensions of quality:
– are well archived (e.g., not missing images,
stylesheets)
– generate nice summaries in the Storify
interface
27
Stories from collections about the Egyptian Revolution
28
https://storify.com/yasmina85/auto-stories-from-archived-collections-56fbc3d1b8d27c6f6571c647
https://storify.com/yasmina85/auto-stories-from-archived-collections-5702ff8f228eede273d49c21
https://storify.com/yasmina85/auto-stories-from-archived-collections-5702c7f1228eede273d48ddf
Evaluation: can humans tell human
generated stories from machine generated?
29
https://storify.com/yasmina85/this-is-manually-generated-story-from-archive-it-c-56b25ae72c0664474ee34f13
https://storify.com/yasmina85/auto-stories-from-archived-collections-56f1cfd36bc660f47f1b9f5e
Use an interface people already know how
to use to summarize collections
30
Archived collectionsStorytelling services
Archived enriched
stories
more info:
https://github.com/yasmina85/OffTopic-Detection
http://ws-dl.blogspot.com/2015/09/2015-09-28-tpdl-2015-in-poznan-poland.html
http://ws-dl.blogspot.com/2015/08/2015-08-20-odu-l3s-stanford-and.html

More Related Content

What's hot (20)

PPTX
Storytelling With Web Archives
Shawn Jones
 
PPT
More Archives, More Better
Michael Nelson
 
PPTX
The Off-Topic Memento Toolkit
Shawn Jones
 
PPTX
The Many Shapes of Archive-It
Shawn Jones
 
PDF
csvconfyasmin2017_05_03
Yasmin AlNoamany, PhD
 
PPTX
Improving Understanding of Web Archive Collections Through Storytelling - PhD...
Shawn Jones
 
PPTX
Combining Heritrix and PhantomJS for Better Crawling of Pages with Javascript
Michael Nelson
 
PPTX
Social Cards Probably Provide For Better Understanding Of Web Archive Collect...
Shawn Jones
 
PDF
Impact of URI Canonicalization on Memento Count
Mat Kelly
 
PPT
Who Will Archive the Archives? Thoughts About the Future of Web Archiving
Michael Nelson
 
PPTX
Improving Collection Understanding in Web Archives
Shawn Jones
 
PPTX
Web Archiving Activities of ODU’s Web Science and Digital Library Research G...
Michael Nelson
 
PDF
Impact of HTTP Cookie Violations in Web Archives
Sawood Alam
 
PDF
Supporting Web Archiving via Web Packaging
Sawood Alam
 
PPTX
Combining Social Media Storytelling With Web Archives
Shawn Jones
 
PPTX
Where Can We Post Stories Summarizing Web Archive Collections
Shawn Jones
 
PDF
Characteristics of Social Media Stories
Yasmin AlNoamany, PhD
 
PDF
Using Web Archives to Enrich the Live Web Experience Through Storytelling
Yasmin AlNoamany, PhD
 
PPTX
Telling Stories with Web Archives
Michele Weigle
 
PPT
Web Archives at the Nexus of Good Fakes and Flawed Originals
Michael Nelson
 
Storytelling With Web Archives
Shawn Jones
 
More Archives, More Better
Michael Nelson
 
The Off-Topic Memento Toolkit
Shawn Jones
 
The Many Shapes of Archive-It
Shawn Jones
 
csvconfyasmin2017_05_03
Yasmin AlNoamany, PhD
 
Improving Understanding of Web Archive Collections Through Storytelling - PhD...
Shawn Jones
 
Combining Heritrix and PhantomJS for Better Crawling of Pages with Javascript
Michael Nelson
 
Social Cards Probably Provide For Better Understanding Of Web Archive Collect...
Shawn Jones
 
Impact of URI Canonicalization on Memento Count
Mat Kelly
 
Who Will Archive the Archives? Thoughts About the Future of Web Archiving
Michael Nelson
 
Improving Collection Understanding in Web Archives
Shawn Jones
 
Web Archiving Activities of ODU’s Web Science and Digital Library Research G...
Michael Nelson
 
Impact of HTTP Cookie Violations in Web Archives
Sawood Alam
 
Supporting Web Archiving via Web Packaging
Sawood Alam
 
Combining Social Media Storytelling With Web Archives
Shawn Jones
 
Where Can We Post Stories Summarizing Web Archive Collections
Shawn Jones
 
Characteristics of Social Media Stories
Yasmin AlNoamany, PhD
 
Using Web Archives to Enrich the Live Web Experience Through Storytelling
Yasmin AlNoamany, PhD
 
Telling Stories with Web Archives
Michele Weigle
 
Web Archives at the Nexus of Good Fakes and Flawed Originals
Michael Nelson
 

Viewers also liked (16)

PDF
Using Web Archives to Enrich the Live Web Experience Through Storytelling
Yasmin AlNoamany, PhD
 
PPT
Assessing the Quality of Web Archives
Michael Nelson
 
PPTX
Evaluating the SiteStory Transactional Web Archive with the ApacheBench Tool
Michael Nelson
 
PPTX
On the Change in Archivability of Websites Over Time
Michael Nelson
 
PPTX
Resurrecting My Revolutionsing Social Link Neighborhood in Bringing Context t...
Michael Nelson
 
PDF
@WebSciDL PhD Student Project Reviews August 5&6, 2015
Michael Nelson
 
PPTX
Who and What Links to the Internet Archive
Michael Nelson
 
PPTX
Profiling Web Archive Coverage for Top-Level Domain and Content Language
Michael Nelson
 
PDF
Web Archiving: A Brief Introduction
Sawood Alam
 
PPTX
When Should I Make Preservation Copies of Myself?
Michael Nelson
 
PDF
Software as a Well-Formed Research Object
Yasmin AlNoamany, PhD
 
PPTX
Evaluating the Temporal Coherence of Archived Pages
Michael Nelson
 
PPT
Profiling Web Archives
Michael Nelson
 
PPT
Old Dominion University Computer Science IIPC New Member
Michael Nelson
 
PPT
Why Care About the Past?
Michael Nelson
 
PDF
OAI-ORE: The Open Archives Initiative Object Reuse and Exchange Project
Michael Nelson
 
Using Web Archives to Enrich the Live Web Experience Through Storytelling
Yasmin AlNoamany, PhD
 
Assessing the Quality of Web Archives
Michael Nelson
 
Evaluating the SiteStory Transactional Web Archive with the ApacheBench Tool
Michael Nelson
 
On the Change in Archivability of Websites Over Time
Michael Nelson
 
Resurrecting My Revolutionsing Social Link Neighborhood in Bringing Context t...
Michael Nelson
 
@WebSciDL PhD Student Project Reviews August 5&6, 2015
Michael Nelson
 
Who and What Links to the Internet Archive
Michael Nelson
 
Profiling Web Archive Coverage for Top-Level Domain and Content Language
Michael Nelson
 
Web Archiving: A Brief Introduction
Sawood Alam
 
When Should I Make Preservation Copies of Myself?
Michael Nelson
 
Software as a Well-Formed Research Object
Yasmin AlNoamany, PhD
 
Evaluating the Temporal Coherence of Archived Pages
Michael Nelson
 
Profiling Web Archives
Michael Nelson
 
Old Dominion University Computer Science IIPC New Member
Michael Nelson
 
Why Care About the Past?
Michael Nelson
 
OAI-ORE: The Open Archives Initiative Object Reuse and Exchange Project
Michael Nelson
 
Ad

Similar to Storytelling for Summarizing Collections in Web Archives (20)

PPTX
Nelson, Michael: Summarizing Archival Collections Using Storytelling Techniques
Reynolds Journalism Institute (RJI)
 
PPTX
2015-odu-ece-tools-for-past-web
Michele Weigle
 
PDF
Generating stories from Archive-It collections
Yasmin AlNoamany, PhD
 
PDF
The web is a mess: how I learnt to stop worrying and love web archiving. Kris...
Biblioteca Nacional de España
 
PDF
Improving Collection Understanding For Web Archives With Storytelling: Shinin...
Shawn Jones
 
PDF
Scripts in a Frame: A Two-Tiered Crawling Approach to Archiving Deferred Repr...
Justin Brunelle
 
PDF
Scripts in a Frame: A Two-Tiered Approach for Archiving Deferred Representations
Justin Brunelle
 
PPTX
SHARI (StoryGraph Hypercane ArchiveNow Raintale Integration)
Shawn Jones
 
PPTX
Web archiving challenges and opportunities
Ahmed AlSum
 
PPTX
Telling Better Stories Across the Open Web by Adam Greenberg (Sr. Global Prod...
Hilary Ip
 
PDF
Creating Structure in Web Archives With Collections: Different Concepts From ...
Himarsha Jayanetti
 
PDF
Towards Multidimensional Web Archive Access (IIPC 2016)
TimelessFuture
 
PDF
Slides anu talkwebarchivingaug2012
Roxanne Missingham
 
PPTX
Avoiding Spoilers On MediaWiki Fan Sites Using Memento
Shawn Jones
 
PDF
Building Nonlinear Narratives for the Web
internalransom907
 
PPTX
Generating Storylines (Literature Survey)
Anunaya
 
PDF
TPDL 2016 Doctoral Consortium - Web Archive Profiling
Sawood Alam
 
PDF
Using Web Archives to Enrich the Live Web Experience Through Storytelling - P...
Yasmin AlNoamany, PhD
 
PDF
How Narrative is being Media Social?
Frank Nack
 
PPT
Creating and Maintaining Web Archives
MARAC Bethlehem PC
 
Nelson, Michael: Summarizing Archival Collections Using Storytelling Techniques
Reynolds Journalism Institute (RJI)
 
2015-odu-ece-tools-for-past-web
Michele Weigle
 
Generating stories from Archive-It collections
Yasmin AlNoamany, PhD
 
The web is a mess: how I learnt to stop worrying and love web archiving. Kris...
Biblioteca Nacional de España
 
Improving Collection Understanding For Web Archives With Storytelling: Shinin...
Shawn Jones
 
Scripts in a Frame: A Two-Tiered Crawling Approach to Archiving Deferred Repr...
Justin Brunelle
 
Scripts in a Frame: A Two-Tiered Approach for Archiving Deferred Representations
Justin Brunelle
 
SHARI (StoryGraph Hypercane ArchiveNow Raintale Integration)
Shawn Jones
 
Web archiving challenges and opportunities
Ahmed AlSum
 
Telling Better Stories Across the Open Web by Adam Greenberg (Sr. Global Prod...
Hilary Ip
 
Creating Structure in Web Archives With Collections: Different Concepts From ...
Himarsha Jayanetti
 
Towards Multidimensional Web Archive Access (IIPC 2016)
TimelessFuture
 
Slides anu talkwebarchivingaug2012
Roxanne Missingham
 
Avoiding Spoilers On MediaWiki Fan Sites Using Memento
Shawn Jones
 
Building Nonlinear Narratives for the Web
internalransom907
 
Generating Storylines (Literature Survey)
Anunaya
 
TPDL 2016 Doctoral Consortium - Web Archive Profiling
Sawood Alam
 
Using Web Archives to Enrich the Live Web Experience Through Storytelling - P...
Yasmin AlNoamany, PhD
 
How Narrative is being Media Social?
Frank Nack
 
Creating and Maintaining Web Archives
MARAC Bethlehem PC
 
Ad

More from Michael Nelson (8)

PDF
Web Archiving in the Year eaee1902f186819154789ee22ca30035
Michael Nelson
 
PDF
Uncertainty in replaying archived Twitter pages
Michael Nelson
 
PPT
Web Archives at the Nexus of Good Fakes and Flawed Originals
Michael Nelson
 
PPT
Blockchain Can Not Be Used To Verify Replayed Archived Web Pages
Michael Nelson
 
PPT
Blockchain Can Not Be Used To Verify Replayed Archived Web Pages
Michael Nelson
 
PPT
Weaponized Web Archives: Provenance Laundering of Short Order Evidence
Michael Nelson
 
PPT
Weaponized Web Archives: Provenance Laundering of Short Order Evidence
Michael Nelson
 
PPT
Weaponized Web Archives: Provenance Laundering of Short Order Evidence
Michael Nelson
 
Web Archiving in the Year eaee1902f186819154789ee22ca30035
Michael Nelson
 
Uncertainty in replaying archived Twitter pages
Michael Nelson
 
Web Archives at the Nexus of Good Fakes and Flawed Originals
Michael Nelson
 
Blockchain Can Not Be Used To Verify Replayed Archived Web Pages
Michael Nelson
 
Blockchain Can Not Be Used To Verify Replayed Archived Web Pages
Michael Nelson
 
Weaponized Web Archives: Provenance Laundering of Short Order Evidence
Michael Nelson
 
Weaponized Web Archives: Provenance Laundering of Short Order Evidence
Michael Nelson
 
Weaponized Web Archives: Provenance Laundering of Short Order Evidence
Michael Nelson
 

Recently uploaded (20)

PPTX
"Autonomy of LLM Agents: Current State and Future Prospects", Oles` Petriv
Fwdays
 
PDF
Transcript: New from BookNet Canada for 2025: BNC BiblioShare - Tech Forum 2025
BookNet Canada
 
PDF
Newgen 2022-Forrester Newgen TEI_13 05 2022-The-Total-Economic-Impact-Newgen-...
darshakparmar
 
PDF
Complete JavaScript Notes: From Basics to Advanced Concepts.pdf
haydendavispro
 
PDF
CIFDAQ Token Spotlight for 9th July 2025
CIFDAQ
 
PDF
"AI Transformation: Directions and Challenges", Pavlo Shaternik
Fwdays
 
PDF
[Newgen] NewgenONE Marvin Brochure 1.pdf
darshakparmar
 
PDF
"Beyond English: Navigating the Challenges of Building a Ukrainian-language R...
Fwdays
 
PDF
Blockchain Transactions Explained For Everyone
CIFDAQ
 
PPTX
OpenID AuthZEN - Analyst Briefing July 2025
David Brossard
 
PPTX
COMPARISON OF RASTER ANALYSIS TOOLS OF QGIS AND ARCGIS
Sharanya Sarkar
 
PDF
July Patch Tuesday
Ivanti
 
PDF
Fl Studio 24.2.2 Build 4597 Crack for Windows Free Download 2025
faizk77g
 
PDF
Jak MŚP w Europie Środkowo-Wschodniej odnajdują się w świecie AI
dominikamizerska1
 
PDF
Presentation - Vibe Coding The Future of Tech
yanuarsinggih1
 
PDF
LLMs.txt: Easily Control How AI Crawls Your Site
Keploy
 
PDF
New from BookNet Canada for 2025: BNC BiblioShare - Tech Forum 2025
BookNet Canada
 
PDF
From Code to Challenge: Crafting Skill-Based Games That Engage and Reward
aiyshauae
 
PDF
Empower Inclusion Through Accessible Java Applications
Ana-Maria Mihalceanu
 
PDF
Chris Elwell Woburn, MA - Passionate About IT Innovation
Chris Elwell Woburn, MA
 
"Autonomy of LLM Agents: Current State and Future Prospects", Oles` Petriv
Fwdays
 
Transcript: New from BookNet Canada for 2025: BNC BiblioShare - Tech Forum 2025
BookNet Canada
 
Newgen 2022-Forrester Newgen TEI_13 05 2022-The-Total-Economic-Impact-Newgen-...
darshakparmar
 
Complete JavaScript Notes: From Basics to Advanced Concepts.pdf
haydendavispro
 
CIFDAQ Token Spotlight for 9th July 2025
CIFDAQ
 
"AI Transformation: Directions and Challenges", Pavlo Shaternik
Fwdays
 
[Newgen] NewgenONE Marvin Brochure 1.pdf
darshakparmar
 
"Beyond English: Navigating the Challenges of Building a Ukrainian-language R...
Fwdays
 
Blockchain Transactions Explained For Everyone
CIFDAQ
 
OpenID AuthZEN - Analyst Briefing July 2025
David Brossard
 
COMPARISON OF RASTER ANALYSIS TOOLS OF QGIS AND ARCGIS
Sharanya Sarkar
 
July Patch Tuesday
Ivanti
 
Fl Studio 24.2.2 Build 4597 Crack for Windows Free Download 2025
faizk77g
 
Jak MŚP w Europie Środkowo-Wschodniej odnajdują się w świecie AI
dominikamizerska1
 
Presentation - Vibe Coding The Future of Tech
yanuarsinggih1
 
LLMs.txt: Easily Control How AI Crawls Your Site
Keploy
 
New from BookNet Canada for 2025: BNC BiblioShare - Tech Forum 2025
BookNet Canada
 
From Code to Challenge: Crafting Skill-Based Games That Engage and Reward
aiyshauae
 
Empower Inclusion Through Accessible Java Applications
Ana-Maria Mihalceanu
 
Chris Elwell Woburn, MA - Passionate About IT Innovation
Chris Elwell Woburn, MA
 

Storytelling for Summarizing Collections in Web Archives

  • 1. Storytelling for Summarizing Collections in Web Archives Yasmin AlNoamany Michele C. Weigle Michael L. Nelson Old Dominion University Web Science and Digital Libraries Group @WebSciDL This work is supported in part by IMLS LG-71-15-0077 CNI Spring 2016 2016-04-05 1
  • 2. IMLS-Funded Research 1. Use small “stories” to summarize much larger collections of archived web pages – big  small 2. Generate web archive collections by mining user-generated stories for seed URIs – small  big http://ws-dl.blogspot.com/2015/10/2015-10-07-imls-and-nsf-fund-web.html 2
  • 3. Archive-It, a subscription-based service, hosts curated web collections 3 > 3,000 collections > 400 partners > 10B archived pages
  • 4. 4 Collection title Collection categorization according to the curator Seed URI Metadata about the collection Text search box The group that the resource belongs to List of the seed URIs Timespan of the resource and the number of times it has been captured
  • 5. Problem: Collection understanding and collection summarization are not currently supported Not easy to answer “what’s in that collection?” 5
  • 6. There is more than one collection about the Egyptian Revolution 6 • “2010-2011 Arab Spring” https://archive-it.org/collections/3101 • “North Africa & the Middle East 2011-2013” https://archive-it.org/collections/2349 • “Egypt Revolution and Politics” https://archive-it.org/collections/2358
  • 7. (1000s of Seeds X 1000s of Mementos) + Dimension of Time == Conventional Vis Methods Not Applicable 7 Using Timelines, Treemaps, etc.: http://ws-dl.blogspot.com/2012/08/2012-08-10-ms-thesis-visualizing.html
  • 9. Stories in Literature Story elements: setting, characters, sequence, exposition, conflict, climax, resolution 9 Once upon a time… http://www.learner.org/interactives/story/
  • 10. Stories in social media 10 “It's hard to define a story, but I know it when I see it” (Alexander, 2008) A sampling and arrangement of web resources for summarization.
  • 11. Collection == thematic sample from the Web Story == arranged sample from the collection S 1 S 2 S 3 S 4 S 2 S 1 S 3 Collection Y S 3 S 2 S 1 Collection Z Collection X 11 We sample k mementos from N pages of the collection to create a summary story
  • 12. Collections have two dimensions 12 Time URI
  • 13. Fixed Pages, Fixed Time R1 R1 R1 R1 t1 t3t2 t5t4 t6 13
  • 14. Fixed Page, Fixed Time 14 A desktop Chrome user-agent http://www.cnn.com/2014/02/24/world/africa/egypt- politics/index.html?hpt=wo_c2 Andriod Chrome user-agent http://www.cnn.com/2014/02/24/world/africa/egypt- politics/index.html?hpt=wo_c2 First Steps in Archiving the Mobile Web: Automated Discovery of Mobile Websites, JCDL 2013: https://www.harding.edu/fmccown/pubs/jcdlsp182-schneider.pdf A Method for Identifying Personalized Representations in Web Archives, D-Lib Magazine, 2013: http://www.dlib.org/dlib/november13/kelly/11kelly.html
  • 15. Fixed Page, Sliding Time R R R R R R t1 t3t2 t5t4 t6 15
  • 16. Feb 1 Feb 1 Feb 2 Feb 4 Feb 5 Feb 7 Feb 9 Feb 11 Feb 11 16
  • 17. Sliding Page, Fixed Time R1 R2 R3 R4 t1 t3t2 t5t4 t6 17
  • 18. Feb. 11, 2011 Mubarak resigns 18
  • 19. Sliding Page, Sliding Time R1 R2 R1 R3 R4 R2 t1 t3t2 t5t4 t6 19
  • 20. Jan 27 Jan 31 Feb 7Feb 4 Feb 11 Feb 11 Feb 2 Jan 25 Feb 10 20
  • 21. 21 What do stories in Storify look like? “Characteristics of Social Media Stories”, TPDL 2015 http://www.cs.odu.edu/~mln/pubs/tpdl-2015/tpdl-2015-stories.pdf
  • 22. What is the length of a story (the number of resources per story)? • This story has 31 resources 22 1 3 2
  • 23. What are the types of resources that compose a story? • This story has – 19 quotes – 8 images – 4 videos 23 Quotes Video
  • 24. What are the most frequently used domains? • This story uses: – 90% twitter.com – 7% instagram.com – 3% facebook.com 24 Twitter.com Twitter.com Twitter.com
  • 25. What differentiates a popular story? 25 19,795 views 64 views
  • 26. (skipping many details, see TPDL 2015 paper) 26
  • 27. We should create stories with: • ~28 pages • moar images! • where possible, select pages from social media, news, blogs • additional dimensions of quality: – are well archived (e.g., not missing images, stylesheets) – generate nice summaries in the Storify interface 27
  • 28. Stories from collections about the Egyptian Revolution 28 https://storify.com/yasmina85/auto-stories-from-archived-collections-56fbc3d1b8d27c6f6571c647 https://storify.com/yasmina85/auto-stories-from-archived-collections-5702ff8f228eede273d49c21 https://storify.com/yasmina85/auto-stories-from-archived-collections-5702c7f1228eede273d48ddf
  • 29. Evaluation: can humans tell human generated stories from machine generated? 29 https://storify.com/yasmina85/this-is-manually-generated-story-from-archive-it-c-56b25ae72c0664474ee34f13 https://storify.com/yasmina85/auto-stories-from-archived-collections-56f1cfd36bc660f47f1b9f5e
  • 30. Use an interface people already know how to use to summarize collections 30 Archived collectionsStorytelling services Archived enriched stories more info: https://github.com/yasmina85/OffTopic-Detection http://ws-dl.blogspot.com/2015/09/2015-09-28-tpdl-2015-in-poznan-poland.html http://ws-dl.blogspot.com/2015/08/2015-08-20-odu-l3s-stanford-and.html

Editor's Notes

  • #2: What we mean here by Storytelling here is using visualizations to put a set of web pages from web archives in a narrative structure, ordered by time
  • #4: First deployed in 2006, Archive-It is a subscription web archiving service from the Internet Archive that helps organizations to harvest, build, and preserve collections of digital content. 
  • #5: Lori created the collections and entered metadata about them,description, title, etc Collection level metadata but it doesn’t help a lot Archive-It provides faceted browsing and search services on the resulting collection
  • #7: , there are about 3 or 4 collections about egyptian revolution in Archive-it, If I want to know about the egy rev, which collection should I browse?? Collection is two dimensions <<URIs, and copies of these URIs>> Historian with more than one collection will not know where to start
  • #10: Every story is made up of a set of events. We use ``story'' in its current, loose context of social media, which is sometimes missing elements from the more formal literary tradition of dramatic structure, morality, humor, improvisation, etc What we mean here by Storytelling here is using visualizations to put a set of web pages from web archives in a narrative structure, ordered by time
  • #11: Story def. in social media much looser and more relaxed. Storytelling may be seen as the set of cultural practices for representing events chronologically.
  • #12: So if this is the web, the archived collections are subsets from the web, we will sample from these collections to create a story…..
  • #15: http://www.cnn.com/2014/02/24/world/africa/egypt-politics/index.html?hpt=wo_c2 http://america.aljazeera.com/ Personalized Web resources offer different representations based on the user-agent string and other values in the HTTP request headers, GeoIP, and other environmental factors. Currently web archives don’t support browsing different representation. This means Web crawlers capturing content for archives may receive representations based on the crawl environment which will differ from the representations returned to the interactive users.
  • #17: http://wayback.archive-it.org/2358/20110211191423/http://news.blogs.cnn.com/category/world/egypt-world-latest-news/ http://wayback.archive-it.org/2358/*/http://news.blogs.cnn.com/category/world/egypt-world-latest-news/
  • #18: Here
  • #19: Here is feb 11 from different news sites https://wayback.archive-it.org/2358/20110211074248/http://www.globalpost.com/dispatch/egypt/110210/mubarak-resign-obama-egypt https://wayback.archive-it.org/2358/20110211191445/http://www.cnn.com/ https://wayback.archive-it.org/2358/20110211192204/http://www.bbc.co.uk/news/world-middle-east-12433045 https://wayback.archive-it.org/2358/20110211192142/http://www.modernegypt.info/ https://wayback.archive-it.org/2358/20110211191423/http://news.blogs.cnn.com/category/world/egypt-world-latest-news/ https://wayback.archive-it.org/2358/20110211191423/http://www.arabist.net/ https://wayback.archive-it.org/2358/20110211194239/http://www.globalpost.com/dispatch/egypt/110211/mubarak-quits-resigns-egypt-cairo
  • #21: And here I want to get the broadest coverage possible for the egyptian revolution
  • #22: Our research question is What are the structural characteristics of popular (i.e., receiving the most views) human-generated stories? We answer the following questions:
  • #26: the top 25% of views, normalized by time available on the web
  • #31: So what we want to do is to create persistent stories then visualize them using storytelling tool that users already know about, such as storify. So we will integrate the story telling servises and the archived collections to generate archived enriched stories.