Posts

Showing posts with the label Archives Unleashed

2020-11-18: Creating Collection Growth Curves With Archives Unleashed Toolkit And Hypercane

Image
Figure 1: Creating collection growth curves with a web page text derivative Recently, I have been learning about Archives Unleashed Toolkit (AUT) , Hypercane , and how these tools can be used together . AUT is one of the tools from the Archives Unleashed Project , which can be used to analyze web archive collections. When AUT is given WARC or ARC files for a web archive collection, it can create network derivatives and text derivatives . The network derivatives have nodes which are the domains in a collection and the links between the nodes occur when there is one or more webpages in one domain that contains a link to a webpage in the other domain. AUT can create text derivatives that include information about either the web pages, images, PDFs, or other documents that are included in the collection. Hypercane, a tool developed by WS-DL's Shawn Jones , can be used to create WARC files that are associated with a public Archive-It collection. The WARC files created by Hypercane can ...

2020-07-29: Working With Archives Unleashed Cloud

Image
Figure 1: Analyzing an  Archive-It  collection with  Archives Unleashed Cloud  to create  derivatives Archives Unleashed Cloud  is one of the tools from the  Archives Unleashed Project  that I have been learning about recently. (The Archives Unleashed Project was recently  awarded a $1M grant from the Andrew W. Mellon Foundation to continue their work and further integrate Archives Unleashed and Archive-It.) As Figure 1 illustrates, Archives Unleashed Cloud takes an  Archive-It  collection, performs domain-level and textual analysis, and produces derivatives that can be directly visualized or imported into other tools. The crawl report for an Archive-It collection is shown at the top of Figure 1. This collection was created by my advisor Dr. Michele Weigle and the name of the collection is " South Louisiana Flood - 2016 ". This collection was created to archive an unexpected event which was a flood that occurred in Au...

2020-06-10: Hypercane Part 2: Synthesizing Output For Other Tools

Image
This image by NOAA is licensed under NOAA's Image Licensing & Usage Info . In Part 1 of this series of blog posts, I introduced Hypercane , a tool for automatically sampling mementos from web archive collections. If a human wishes to create a sample of documents from a web archive collection, they are confronted with thousands of documents from which to choose. Most collections contain insufficient metadata for making decisions. Hypercane's focus is to supply us with a list of memento URI-Ms derived from the input we provide. One of the uses for this sampling is summarization. The previous blog post in this series focused on its high level sample and report actions and how they can be used for storytelling. This post focuses on how to generate output for other tools via Hypercane's synthesize action. The goal of the DSA project : to summarize a web archive collection by selecting a small number of exemplars and then visualize them with social media ...

2020-04-30: Archives Unleashed: New York Datathon Report (From Home Edition)

Image
The Archives Unleashed Datathon is a two-day event hosted by the Archive Unleashed team where participants from different research backgrounds collaborate together to explore web archive collections. The fourth Archives Unleashed datathon partnered with Columbia University Libraries  was supposed to happen in New York City. However, as the spread of COVID-19 cases began to increase, the organizers had to make the tough decision of canceling the New York datathon. Due to the rapidly-evolving COVID-19 situation, we have canceled the datathon which was to be held at Columbia University, March 26-27, 2020. This decision was not taken lightly and was made with the best interests of our attendees. We have been in touch with all attendees. — The Archives Unleashed Project (@unleasharchives) March 3, 2020 In the same email that brought the news of event cancellation, Ian Milligan also mentioned the possibility of organizing the event online through Zoom and ...