Archiving Social Media in an Age of APIcalypse

Introductory text (modified version) to the roundtable “Archiving Social Media in an Age of APIcalypse”, IIPC WAC conference, 26.04.2024.

Original abstract.

In 2016, the Cambridge Analytica scandal, in relation to the Brexit UK referendum and the US presidential elections, changed our relationship to social media, as well as our relationship to social media data: it could also be toxic data, which use could influence and mine democracies. The Cambridge Analytica scandal emphasized even more the importance of studying social media but, parlty using the scandal as an excuse, major platforms such as Facebook closed down their API access, undermining the possibilities to practice research on those platforms. The scandal was indeed an excuse: Facebook started to change its access policy, for commercial purposes, in 2015 and LinkedIn restricted its API access the same year. As a consequence, lots of research projects had to stop.

Axel Bruns, in 2019, used the expression of APIcalypse to describe this phenomenon of APIs closing down and analysed their consequences. He also clearly advocated the idea that those platforms were (and sometimes are still) fighting againt critical research about them. The phenomenon of APIcalypse is not a linear one. Twitter’s policy has been a succession of ups and downs but remained relatively open until last year. In 2020, researchers, who could use the free access to the regular API before, were granted a specific access, for instance, that was very interesting for researchers, maybe less for heritage institutions. But Elon Musk took over Twitter, renamed it “X” and decided to change the prices of its API’s products. Today, collecting tweets is not sustainable for a heritage or research institution: we are living a second wave of APIcalypse since 2023, as Twitter and Reddit closed down their free access to their APIs, because of their economic model, because, with the rise of the LLMs and chatbots, social media data became an even more important commercial and financial asset for those firms.

In parallel to those waves of APIcalypse, most archive institutions, including web archive services from many National Libraries, have tried to archive social media. As many of you know better than I do, it is not an easy task. Among other social media archiving projects, the BeSocial project at the Belgian Royal Library showed that very well. The Library of Congress Twitter archive is not accessible to research today (to my knowledge). If the INA Twitter archive is an impressive one, it had to be constantly adapted to socio-technical changes of the Twitter API and tweets harvesting is for now on hold since mid-2023. Of course, almost all web archive services are archiving some parts of those social media, as web pages. But this kind of archiving does not fit all research questions and is, in the end, quite specific in terms of scope.

Access to API is strategic for archiving and research. The recent European Union’s Digital Services Act (DSA) might be a decisive step forwards. But a recent report showed that the platforms’ policies set up to conform to the DSA are very diverse. And the efficiency of those APIs for researchers are yet to be evaluated. In other words, the DSA does not put an end to the APIcalypse: it is geographically limited, and the notion of “systemic risk” that is in the DSA remains ambiguous and this ambiguity allows those firms to restrict researchers’ access to their data. Last but not least, the DSA is not about archiving or about heritage at all.