Posts

Showing posts with the label Web Site Reconstruction

2012-01-23: Release of Warrick 2.0 Beta

Image
After a long hiatus, the Warrick tool has been resurrected with some modifications. Warrick is a free utility for reconstructing (or recovering) a website. The original version of Warrick discovered archived versions of resources by searching the Web Infrastructure (which includes search engine caches and the Internet Archive ) for archived versions of web resources. It would automatically download and organize the best versions of the archived resources and package them into a copy of the deleted site. As discussed by Warrick's creator, Frank McCown , the original version of Warrick was prone to breaking due to frequent changes to search engine APIs and archive URLs . Warrick 2.0, adapted from Dr. McCown's original code by Justin F. Brunelle , interfaces with the Memento framework via the mcurl program (developed by Ahmed AlSum ). By incorporating Memento timemaps, Warrick no longer has the responsibility of directly searching and communicating with the caches and archive...

2009-10-26: Communications of the ACM Article Published

Image
The article " Why Websites Are Lost (and How They're Sometimes Found) " has finally been published in the November 2009 issue of Communications of the ACM . Co-written with Frank McCown and Cathy Marshall , it was accepted for publication in the fall of 2007. Although we've had a pre-print available since 2008, it just isn't the same until you see it in print. Except we won't be seeing this in print; it is instead published in the "Virtual Extension" part of the CACM. So even though it has page numbers (pp. 141-145), this article won't be among those that arrive in your mailbox in a few weeks. As someone who has spent his entire career trying to transform the scholarly communication process with the web and digital libraries I completely understand this move by the CACM, but I have to admit I'm disappointed that I won't see a printed, bound copy. Even though in the long-term, all discovery will come from the web (e.g., Google Scho...