Tyng-Ruey Chuang recently attended a CODATA Executive Committee meeting in Barcelona and, proceeding it, the Computational Social Science Conference hosted by the Barcelona Supercomputing Center.

Organized by the Barcelona Supercomputing Center (BSC), CODATA and Fundació La Caixa, the Computational Social Science Conference: Innovative Methods, Research Workflows and Data Stewardship was held on 28-29 October, 2024, in Barcelona at the campus of Universitat Politècnica de Catalunya (UPC). The conference is consisted of five sessions on the intertwined topics of data sciences and social sciences, with a lively lightning talks session, and starting with a keynote by David Lazer on Building a Trans-Atlantic Coalition for Studying the Giants of the Internet. The conference, overall, has been an intense two-day meeting with speakers giving expert overviews of the challenging issues faced and addressed by front-line social science researchers. By our estimates, the conference was attended by about 150 researchers and students, with around a quarter of them traveling from abroad for the meeting.

In his keynote, David Lazer (Northeastern University) talks about the problems in relying on social media platforms (e.g. the site formally known as Twitter) for human behavior data when doing academic research. He advocates instead a shared research infrastructure with voluntarily contributed user data, as exemplified by the National Internet Observatory project funded by the National Science Foundation in the US. In the following, we then highlight some of the other talks given in the conference. Note that this report is not meant to be comprehensive and, due to limited space, can only be considered selective and subjective. Slides from some of the presentations can be found at the conference website.

In the first session, Chico Camargo (University of Exeter) surveys the areas in which complexity scientists can work in the social sciences (among them, “social data science” and “cultural evolution and cultural analytics”) and emphasizes the interdisciplinarity of works. Marga Torre (Universidad Carlos III de Madrid, UC3M) reflects on the successful establishment of a Master degree program in Computational Social Science at UC3M and its future challenges (among them, “interdisciplinarity for real” and “strengthen collaboration between social scientists and computer scientists”). David Garcia (University of Konstanz) talks about simulating opinion dynamics among LLM agents with some implications in decision-making.

The second session is about data modeling for social science with contributions from Hannes Mueller (Barcelona School of Economics), Shaily Gandhi (University of Salzburg), and Sebastian Poledna (International Institute for Applied Systems Analysis). The topics ranged from ethics in economic forecast and policy making, enhancing disaster response with social media analytics, to economic modeling by large-scale simulations. The session is followed by lightning talks of which we are especially impressed by a presentation on the Living Arrangements Project, a world-scale multilevel analysis on household composition and change. The presentation is given by Paolo Marangio and Nienke Visscher (BSC).

The second day of the conference starts with a session with two presentations on the state-of-art practices in research data repositories. Stefano Iacus (Harvard University) updates the audience on the recent advances in the Dataverse Project and on how to use differential privacy to improve access to sensitive data while maintaining confidentiality. Darren Bell (UK Data Service) detailed the real-world issues in mediating access to data objects stored in research data repositories (among them, costly human mediation, unstructured access policies, and the lack of upstream access metadata).

The fourth session is on data description for AI and machine analysis. Gyorgy Gyomai (OECD) explores how SDMX (Statistical Data and Metadata eXchange), a standard to improve data connectivity and machine readability, can be an enabler for AI applications. Elena Simperl (King’s College London) give an engaging presentation on Croissant, a community-driven metadata format for AI-ready datasets. Christine Kirkpatrick (San Diego Supercomputer Center) overviews the works being carried out in research networks such as FARR (FAIR in ML, AI Readiness, and Reproducibility) and FAIR Digital Objects (FDOs).

Data provenance and research reproducibility is the subject of the last session of the conference. Tony Ross-Hellauer (Know Center Research) presents the TIER2 Project, funded by the European Union’s Horizon research programme, and discusses reproducibility challenges in computational social science. Carole Goble (University of Manchester) introduces the RO-Crate (Research Object Crate) FAIR Digital Object framework for reproducible computational processing. Rosa Badia (BSC) provides an overview of PyCOMPSs, a programming environment for the development of workflow applications, as well as on how it can be used to support computational reproducibility in high-performance computing. This session is probably the most technical one of the entire conference. It is nevertheless very enjoyable for people, like us, with a background in Computer Science.

The conference organizers also arranged a visit to MareNostrum 5, an EuroHPC Joint Undertaking supercomputer hosted at BSC. The MareNostrum 5 ACC (Accelerated Partition) has 1120 nodes with each equipped with two Intel 40-core CPUs and four Nvidia Hopper GPUs. The cluster has a peak performance of 260 PFlops

We thank Dr. Mercè Crosas (CODATA President and the head of Computational Social Sciences at BSC) for the invitation to participate in this exciting conference. Our travel to attend the meetings has been supported by CODATA and the Institute of Information Science, Academia Sinica, Taiwan.

Universitat Politècnica de Catalunya (UPC). Photo: Tyng-Ruey Chuang.
Universitat Politècnica de Catalunya (UPC). Photo: Tyng-Ruey Chuang.