Biblioth�que nationale de France
bibnum.bnf.fr ]

The WARC File Format (ISO 28500) - Information, Maintenance, Drafts

Purpose

The WARC (Web ARChive) file format offers a convention for concatenating multiple resource records (data objects), each consisting of a set of simple text headers and an arbitrary data block into one long file.

Context

For many years, memory storage organizations have tried to find the most appropriate ways to collect and keep track of World Wide Web material using web-scale tools such as web crawlers. At the same time, those same organizations had a rising need to archive large numbers of born-digital and digitized files. A need was for a container format that permits one file simply and safely to carry a very large number of constituent data objects of unrestricted type for the purpose of storage, management, and exchange.

The WARC file format offers all these capabilities. It is an extension of the ARC file format that was created in 1996 by Brewster Kahle and Mike Burner from the Internet Archive for managing billions of objects. The motivation to extend the ARC format arose from discussions and experiences of the International Internet Preservation Consortium, whose members mission is to acquire, preserve and make accessible knowledge and information from the Internet for future generations.

Today, the WARC file format is used to build applications for harvesting, managing, accessing, mining and exchanging content. While it represents the unique standard format for web archives, it has been adopted beyond the web archiving community to store born-digital or digitized materials.

History

The WARC file format was first released as ISO 28500:2009 international standard in May 2009.
It was revised in August 2017 : ISO 28500:2017 Information and documentation -- WARC file format. This second edition cancels and replaces the first edition of ISO 28500:2009.

WARC file format maintenance

Evolutions of the WARC file format are discussed within the International Internet Preservation Consortium. Requests for modifications or a revision can be addressed at: [email protected]
A normal revision procedure will be applied for the possible needs of revision of ISO 28500.
The procedure is monitored by Technical Committee ISO/TC 46, Information and documentation, Subcommittee 4, Technical interoperability.

Drafts

WARC 1.1 / draft as of January 2017

PDF file : Information and documentation - WARC file format - ISO 28500 / Draft as of January 2017
Word file : Information and documentation - The WARC File Format - ISO 28500 / Draft as of January 2017

WARC 1.0 / draft as of November 2008

PDF file : Information and documentation - WARC file format - ISO 28500 / Draft as of November 2008
Word file : Information and documentation - The WARC File Format - ISO 28500 / Draft as of November 2008


Last updated: 2017-11