publications

The following publications are available from netarchive.dk:

"Integration of non-harvested data into an existing web archive", november 2007.
Author: Bjarne Andersen ( ).
This paper describes a software prototype developed for transforming non-harvested web data into ARC files. It analyses problems connected to different kind of delivered web material and tries indexing the transformation result with the open source WayBack for testing the transformation quality


"Interoperability in the Future", september 2007. Author: Grethe Jacobsen ( ).
This article summarizes the results of a questionaire about web archive interoperability between national web archives - results given as a speech on the 73rd World Library and Information Congress


"La Captura de Internet en Dinamarca (in spanish)", may 2007. Author: Grethe Jacobsen ( ).
The article describes experiences with harvesting the danish part of the internet gained within the first two years of the netarchive.dk project


"Collecting the Danish Internet", may 2007. Author: Grethe Jacobsen ( ).
The article describes experiences with harvesting the danish part of the internet gained within the first two years of the netarchive.dk project


"Definition of an event in terms of net archiving", april 2007.
The document describes the guidelines used for determining when an event is relevant for the netarchive.dk project to harvest


"Overview of the Netarkivet web archiving system", october 2006. Author: Lars Clausen ( ).
The paper presents an overview of the entire system build by Netarchive.dk to control preservation of internet material in both large scale (snapshot harvesting) and small scale (selective / thematic harvesting) - from defining harvests to perserving the bits

This paper has been presented at the 6th International Web Archiving Workshop (IWAW'06).


"A formal analysis of recovery in a preservational data grid", may 2006. Author: Niels H. Christensen ( ).

A data grid made for the long-term preservation of digital materials is described. The data grid's ability to recover from data loss is analysed by developing a formal, mathematical model for the relevant, implemented software operations.

This paper has been presented at the Conference on Mass Storage Systems and Technologies (MSST06).


"The DK domain: in words and figures", Febuary 2006. Author: Bjarne Andersen ().

This article summarizes the experiences and statistics from the first snap shot harvest undertaken by netarchive.dk during july to september 2005


"Preserving the bits of the Danish internet", august 2005. Author: Niels H. Christensen ().

This paper describes simulations of bit preservation setup used by netarchive.dk

This paper has been presented at the 5th International Web Archiving Workshop (IWAW05).


"Webarkivering", april 2005. Author: Birgit N. Henriksen (). In Danish.

This paper describes web archiving as a discipline and Netarchive.dk as a project.


"Towards format repositories for web archives", august 2004. Main author: Niels H. Christensen ()

Web archives face a formidable challenge regarding the handling of file formats. It is the thesis of this paper that this challenge could and should be met through the development of format repositories fit for that purpose. The format challenge for web archives - and its relation to software for viewing and converting digital objects - is analyzed in detail using methods from the field of programming language implementation. As a result of the analysis, we are able to list a number of specific requirements to a format repository. A format repository that satisfies these requirements can be integrated with a web archive s software and thereby provide it with automatic support for handling formats.

This paper has been presented at the 4th International Web Archiving Workshop (IWAW04).


"Concerning Etags and Datestamps", August 2004. Main author: Lars Clausen ().

In web archiving, avoiding unnecessary downloads of unchanged pages can significantly reduce the load on both the archiving system and the server being archived. However, the indicators available for determining whether a page is changed are frequently either missing or wrong, causing pages changes to missed. In this paper, we investigate the quality of the two change indicators defined in the HTTP protocol, Last-Modified and Etag. Based on downloads of front pages of Danish web sites, we compare the reliability and usefulness of the two indicators and consider if using a combination of the two can lead to better prediction of page changes. Finally, we present a systematic way to determine the best prediction scheme, and present an unexpected download scheme with better characteristics than the obvious choices.

This paper has been presented at the 4th International Web Archiving Workshop (IWAW04).


"Archive Format and metadata requirements", July 2004. Main author: Steen Sloth Christensen

A discussion of this projects requirements for both archival format and metadata.


"Handling File Formats", May 2004. Main author: Lars Clausen ().

Considerations and plans for handling the problem of evolving file formats in a long-term web archive setting. Problems discussed include: Categorization of formats, preserving limited aspects of files, criteria for evaluation the long-term viability of formats, DRM issues, preservation strategies, and preservation workflow.


"Web Archive Activities in Denmark", juni 2004. Author: Birte Christensen-Dalsgaard ( ). Article in RLG DigiNews.


"Final Report for The Pilot Project netarkivet.dk", February 2003.

The present report by the group behind the ”netarkivet.dk” project describes the experience gained from a pilot study, in which existing software was used to harvest and subsequently test out materials relating to the County and District elections of 2001. The pilot study showed that a great deal of material could be harvested in this way, but also that much of the interactive use of the net cannot be caught by ordinary methods.

The pilot project also offers an indication of the financing needed if Denmark is to safeguard an important part of its cultural heritage. Estimates are given both for the archiving of this heritage under present conditions, where the work is carried out on the basis of voluntary agreements, and on the assumption that the law on legal deposit of material may be changed, making it legal for institutions receiving statutory deliveries to acquire online materials.


"Danish Legal Deposit on the Internet: Current Solutions and Approaches for the Future", 5th European Conference, ECDL 2001, Darmstadt, Germany, September 4-9, 2001. Author: Birgit Henriksen ()


Proceedings from the conference 'Preserving the present for the future', 2001
Several of the papers that were presented at the conference about archiving of the internet are available on the website of The Danish Electronic Research Library.