Planning for Long-Term Access to COVID-19 Memory Websites

Tyng-Ruey Chuang wrote a guest post at the Archive-It blog on Planning for Long-Term Access to COVID-19 Memory Websites. The article is re-posted below. We collaborate with the National Museum of Taiwan History and Word Gleaner Ltd. in setting up COVID-19 Images and Stories, a communal memory website. The depositar lab participates in the Community Webs program at the Internet Archive to help build digital archives documenting local histories and underrepresented voices.

This March ushers in the fourth year of the COVID-19 pandemic—the WHO first declared the disease caused by the SARS-CoV-2 virus a pandemic on March 11th, 2020. Since the early days of the pandemic, we have witnessed websites sprouting up to call for and collect people’s experience in living with the virus. These records, often in the form of images with notes, are uploaded to websites set up by research centers, libraries, archives, and museums. Moments of daily life in the pandemic have been captured and kept at these websites. These are the memory fragments of our times. They can be ephemeral and personal, yet universal at the same time as the pandemic is experienced by all over the world.

Some of these COVID-19 memory websites were put up with a shoestring. The collaborative arrangement for operating a memory collection website can be ad hoc; it may involve researchers, memory institutions, and the community at large. These websites incur cost over the years too. The continuous maintenance of the COVID-19 memory website increasingly faces challenging issues, not the least of which is now the pandemic is perceived to have subsided. In this post, we give some thoughts to the long-term access of the materials collected from these websites. We speak from our experience in operating a COVID-19 memory website and about our own transition plan. These are initial thoughts and may not apply in other contexts.

The first question to ask is whether to sunset the website, and if so, what preparations are required. After the sunsetting, the site is not accessible at its original web address. We feel that contributors to the site shall be informed before the sunsetting, and be given time to export their own materials from the site. Our COVID-19 memory website is built on Drupal (an open source Content Management System that also manages user registration and login). We plan to send e-mail to users in advance about our decisions. We will put up notices on the site too. It will be useful to keep the site’s domain name, however, so that future visitors can be directed to places where the collections can be found. If persistent identifiers such as Archival Resource Keys (ARKs) have been used for items in the collection, now it is a good time to plan for redirections.

There are many ways to preserve website content. Each has its pros and cons. A decision about which approach to take inevitably will be a trade-off between access, integrity, cost, and (the rapid change of) technology. We list below several approaches.

Make a virtual machine image of the website server

With the required technical skills, a virtual machine image of a COVID-19 memory website can be made and later re-launched to bring back the site (usually in an emulated environment). There are some drawbacks however. The website, and the collection it contains, is hardly accessible as a virtual machine image. Programs frozen on the image are gradually outdated. They will need special skills to maintain as technologies will have moved on. If the website depends on web services of today to function (e.g. Google Maps), it may not work years into the future.
Export website contents to collections management systems

Contents from a COVID-19 memory website, along with their metadata, can be exported, transformed, and imported into long-lasting collections management systems for preservation and access. The original website will be gone, but the materials it has collected will be kept elsewhere. Provenance about the collections (e.g. from whom they were collected) and the website itself (why and how did the site come into place) shall not be lost in the process. After the transfer, access to the collections will be managed by the hosting memory institutions.
Repackage the website collections as research datasets

Instead of importing the materials from a COVID-19 memory website into another collections management system, we can package the materials into stand-alone datasets. The datasets can be copied, distributed, and deposited into research data repositories to be used by the community at large. Metadata about the materials and the collections shall be prepared and aligned with the standards and formats used by the repositories. After the deposits, the repositories will regulate access to the datasets (e.g., with the use of open content licenses).
Save the website pages as archived snapshots

A COVID-19 memory website can also be saved as a collection of static web pages. When opened in a web browser, the pages are rendered live (to some extent). Tools and services are available for this purpose, of which the most notable are the Internet Archive’s Wayback Machine and the Archive-It web archiving services. If the snapshots are available online, the public will have access. These snapshots, however, capture just the public-facing side of the website. The snapshots may capture only the previews but not the high-resolution originals of an image collection, for example.

These approaches can be applied in combination when conserving COVID-19 memory collections. For instance, the snapshots of a website can be archived at the Wayback Machine while at the same time collected and deposited into a research data repository. In the following we outline a plan to provide long-term access to COVID-19 Images and Stories, a website set up in June 2020 for people to share what they see in the pandemic. Everyone can contribute by registering for an account and uploading their observations. An observation is a set of pictures with a short description. The observations can be tagged by keywords and locations by the contributors. The entire observation collection is public (Creative Commons licensed), searchable, and presented as a map. The website was set up in Drupal quickly in a style similar to the websites we had previously built for Citizen Science projects (on ecological observations). Two screenshots of the website are shown below. This a slide from our presentation at the Web Archiving Conference (WAC) in May 2022.

The COVID-19 Images and Stories website, as in May 2022

The COVID-19 Images and Stories Website in May 2022

The COVID-19 Images and Stories project is a collaboration between the Institute of Information Science at Academia Sinica (中央研究院資訊科學研究所), the National Museum of Taiwan History (國立臺灣歷史博物館), and Word Gleaner Ltd. (拾穗者文化股份有限公司). Contributors to the website are mostly from Taiwan, and the observations about their daily life in the pandemic. As often with emergent projects sourced from a labor of love but not much funding, the continuous maintenance of the site can become an issue. Below we outline a plan to provide long-term access to the stories and images collected by the site. We again reuse a slide from our WAC 2022 presentation to illustrate the plan.

Planning for Long-Term Access to the Collection

First of all, all observations submitted to the website will be exported and transferred to the National Museum of Taiwan History for preservation (but with limited public access). The exports are in their original formats and completed with metadata so that the fidelity of the collection is not lost in the transfer. We are also planning to export the entire collection into Omeka S, a well known collections management system, for easy access and maintenance. For this, we will need to align the data (and metadata) formats we use in Drupal with those expected by Omeka S. Some website functionality, e.g., browsing the collection by a map in Drupal, may be lost after the move to Omeka S. Another plan is to package the exports from the website as research datasets and deposit them into the depositar, a research data repository we operate at Academia Sinica, for both long-term preservation and public access.

Last but not the least, we have been using the Archive-It service to archive COVID-19 Images and Stories website snapshots and to preserve them at the Internet Archive. We are very fortunate to participate in the Community Webs program at the Internet Archive to explore this opportunity. The snapshots, as WARC collections, can also be packaged as research datasets and again deposited to the depositar. By planning to preserve the original collections and the website snapshots in an open repository, we feel, will better prepare them in meeting the FAIR data principles (Findable, Accessible, Interoperable, and Reusable).

This is the current plan for providing long-term access to the collection at COVID-19 Images and Stories. It is still being implemented. There could be changes as would be required by the circumstance but we hope the plan goes well. You can monitor the COVID-19 Images and Stories website at the domain th.covid19.commons.tw to see if it goes as planned.