Report on the 2025 Web Archiving Conference (WAC)
(Note: This post was updated on 2025-12-24.)
The depositar team participated in and contributed to the 2025 Web Archiving Conference (WAC), held on April 9-10, 2025, in Oslo, Norway. Chia-Hsun Ally Wang wrote a detailed report in Traditional Chinese. Below is a report from Tyng-Ruey Chuang in English.
The Web Archiving Conference 2025 (WAC 2025) was held on April 9-10, 2025, in Oslo, Norway. We outline the main conference program and the depositar lab’s participation in this short report.
WAC 2025 programme and conference materials
WAC 2025 was a two-days conference following the 2025 General Assembly (GA) of the International Internet Preservation Consortium (IIPC). Not being IIPC members, we didn’t take part in the IIPC GA. The WAC 2025 programme and abstracts are available online. WAC 2025 presentations had also been recorded and released on the IIPC channel at YouTube. The UNT Digital Library keeps a collection of IIPC GA and WAC presentations over the years, so materials about WAC 2025 presentations are readily available on the Web.
The conference opened with a keynote by Javier de la Rosa from the National Library of Norway on Libraries, Copyright, and Language Models, which is about the Mímir Project, a collaboration between the National Library of Norway, the University of Oslo, and the Norwegian University of Science and Technology. The project extended the Mistral 7B Large Language Model (LLM) with a corpus drawn from books and newspapers in the library’s collections. The goal is to understand the performance of the new models as compared to the base models with respect to regional language and knowledge understanding.
We took part in the workshop Exploring Dilemmas in the Archiving of Legacy Webportals: An Exercise in Reflective Questioning, organized by the National Library of the Netherlands. The workshop posted ethical and technical questions, and engaged the participants with discussions, related to the preservation of De Digitale Stad / The Digital City, the remains of an online community in the early day of the World Wide Web, now a registered UNESCO world heritage.
Many presentations left their impressions on us. Below we can only list a few:
- Web Archives for Music Research from Royal Danish Library
- Lost, but Preserved - A Web Archiving Perspective on the Ephemeral Web from Internet Archive
- A Minimal Computing Approach for Web Archive Research from University of Victoria and Universidad Autónoma del Estado de México
- Using Generative AI to Interrogate the UK Government Web Archive from National Archives (UK)
We find the panel Past, Present & Future of Cross-Institutional Collaboration in Web Archiving: Insights from the Norwegian and Danish Web Archive, the NetArchiveSuite Community, & Beyond, overviewing the first 20 years of web archiving, to be informative and inspiring.
The closing keynote Quantifying Complexity: Using Web Data to Decode Online Public Debate by Håvard Lundberg and Ida Haugen-Poljac from Analysis & Numbers, an employee-owned Scandinavian cooperative, is timely and memorable.
The depositar lab’s participation
We made a presentation on Recently Orphaned Newspapers: From Archived Webpages to Reusable Datasets and Research Outlooks at the Discovery & Access (News/Newspapers) session on April 10. The presentation slideset is deposited at the depositar (ark:37281/k5p3h9k37) as well as at the UNT Digital Library (ark:67531/metadc2472454). The video recording is on YouTube.
We reported on our progress in converting a recently orphaned newspaper into accessible article collections in IPTC (International Press Telecommunications Council) standard representation. Specifically, we focus on Taiwan’s Apple Daily and work on the WARC files built by the Archive Team in September 2022 at a time when the future of the newspaper seemed dim. We convert these WARC files into de-duplicated collections of pure text in ninjs (News in JSON) format.
Some thoughts
The depositar lab seems to be the only team from Taiwan participating in the Web Archiving Conference series. Previously at WAC 2022, which was held as a virtual event, we made a presentation on Archiving COVID-19 Memory Websites: “COVID-19 Images and Stories” and Other Sites. We will continue our efforts in web archiving and are looking forward to further contribute to the WAC series. Memory institutions in Taiwan need to actively work on web archiving in order to preserve Taiwan’s presence on the web. Applying to IIPC to become a member, and to learn from others, will be an important step.