2014-05-25: IIPC GA 2014

I attended the International Internet Preservation Consortium (IIPC) General Assembly 2014 ( #iipcGA14 ) hosted by the Bibliothèque nationale de France (BnF) in Paris. Although the GA ran the entire week (May 19 -- May 23), I was only able to attend May 20 & 21. It looks like I missed some good material on the first day, including keynotes from Wendy Hall and Wolfgang Nejdl , and a presentation from Common Crawl . Martin Klein also presented an overview of the Hiberlink project, as well as the " mset attribute " that we are working on with the people from Harvard . I arrived after lunch on May 20, in time for a really strong session on "Harvesting and access: technical updates", featuring talks about Solr indexing ( Andy Jackson et al.) ( Andy's slides ), deduplicating content in WARCs ( Kristinn Sigurðsson ), Heritrix updates (Kris Carpenter), and Open Wayback ( Helen Hockx ). Within WS-DL, we haven't really done much with Solr in our p...