Frédéric Clavert

(Digital) Historian

prompting the past

this text hasn’t been copy-edited by a native english speaker – please excuse the many language errors. I also here use mostly ‘AI’ as a convenient shortcut for generative artificial intelligence. As pointed out by many authors (an example here), the use of ‘Artificial Intelligence’ or ‘AI’ is, let’s say, ambiguous, unprecise.

For several years now, images created by generative artificial intelligence systems such as midjourney, DALL·E or StableDiffusion, to give just a few examples, have become commonplace. Like text-generating systems such as GPT, LLAMA or BLOOM, to name but a few, they are based on interaction with users, which notably starts in the writing of prompts. The aim of our current research is to examine the various means at our disposal for transforming not only generated images and texts but also prompts into primary sources for historians.

digital history and digital memory studies litterature

In the past few years, digital history and digital memory studies litterature on AI has expanded, included in mainstream historical journal sur as the American Historical Review. Some have investigated the possible paedagogical uses of ChatGPT to teach history. Wulf Kansteiner investigated what a specifically trained generative AI for historians could be but without really looking at primary sources produced by these systems. The possibilities and risks of AI, particularly generative AI and especially in Holocaust studies, were analysed in an “open forum” of Eastern European Holocaust Studies edited by Mykola Makhortykh. In particular, the (low) adequacy of the responses of some of the generative AIs to Holocaust research was pointed out. Daniel Hutchinson developped an online software that can help understand what generative AI systems “know” about history. Changes – positive or not – in the memorialisation processes have also been analysed, rather from the infrastructure angle. Most of those articles are also dealing with ethics, privacy and biases (including biased training dataset). All in all, though research on AI and the past becomes more dense, as far as I know, there is little research on prompts as a source for history or for the study of collective memory.

questionning the past…

But what are prompts? Prompts are usually small pieces of text – though, as GenAI systems are becoming multimodal, it is more and more made also of other sorts of material, but let’s stick to text for now – that will be entered by a user and will be the basis for the generation of the images or the texts generated by a GenAI service. In other words, GenAI systems have numerous incentives to question the world as prompts are often explicit or implicit questions, sometimes about the past.

And that should spark interest among historians, as questionning the past is the core activity of historians, their basic epistemological operation in the sense that we ask questions to start the process of elaborating new knowledge about the past. But if GenAI systems are build around incentives to question our world and the world that was, as “stochastic parrots”, they are a-epistemological: there is no notion of truth, lie or knowledge in the way those systems are working.

Hence, the core question of this research: how those prompts, when they contain some sorts of references to the past, can be used as primary sources for historians and memory studies scholars?

If we consider what those systems are based on (algorithms / code, training datasets, reinforcement learning) and what is needed to generate an output – prompts —-, then those systems are generating numerous primary sources, artefacts that may tell us a lot about our societies. In this current research, I focus on prompts relating to the past. Prompts, at first sight, could be considered as open doors to users’ imagination about the past. But, following in a way Mykola Markhortykh, the writing of a prompt is the result also of the intervention of non-human agency, ie is the result of what I see as a user-machine negotiation. Indeed, when generative systems are not delivering what the user is expecting, the prompt is modified by the user in a way that will better fit the “machine”’s logic. Prompts are then to be considered as cyborg primary sources.

To study those prompts, we need to consider a series of methodological barriers and how to lift them.

harvesting data

The first methodological barrier to a research project on prompts related to the past is the making of a corpus. There are several ways we could assemble a database of prompts.

A first path would be to use dedicated prompts search engines, such as Lexica. Those search engines pose several problems: either they are basic keywords based search engines and collecting prompts would imply a database of past-related keywords (which, considered together, would be implicitelyconstituting a sort of definition of the past), or they are “semantic” (they more or less implicitely associate the keyword you enter in the search engine interface to other words that are considered semanticaly related to your keyword) and become hence black boxes. A further problem with those search engines, or some of them, is that, in our experience, their developers have often not kept some metadata that are important to historians: date and time.

A second path would be to directly do what those search engines did: collecting data on Discord, an instant messaging and VoIP social platform. Discord has indeed become a keypoint for communities to share their uses of large language models or image-generation systems. Companies such as midjourney have even used Discord as an interface to their system. Data could then be scrapped from discord public servers. Of course, respecting Discord’s terms of use, GDPR (or any other privacy legislation) principles, as well as following ethics guidelines, would be mandatory.

A third path would be to use existing datasets, and for instance the krea open prompts corpus, that I am investigating in this jupyter notebook. Again, this data has to be used carefully, including by looking at metadata associated to the prompts, that should should answer a historian’s need.

The constitution of a corpus of prompts refereing to the past will probably depend on mixing all those different ways, and maybe on investigating others: contacting AI firms – but let’s be honest, I’m very skeptical that they would care – or setting up our own chatbot (based on an existing LLM that would be fine-tuned or even aligned through the use of a historical dataset for instance) that would allow us to collect directly prompts related to the past.

a balanced corpus?

The second and far more complicated methodological barrier to the constitution of a corpus of prompts related to the past is nevertheless to create a balanced corpus. By ‘balanced’, I mean here that we should have an idea of 1) what we mean by ‘related to the past’ and hence what we mean by ‘past’ and 2) who are the users we are studying.

Indeed, databases of prompts do not have sociodemographic metadata about the authors of the prompts we would study. In other words, we cannot know who we are studying. The lack of sociodemographic metadata could be supplemented by a complementary qualitative approach, based on interviews with users of generative AI systems. Those interviews could also complete the prompts’ data by enligthening us on how users are “negotiating“ with the “machine” to get what they expected from it.

We need also to define what we mean by ”past” and to operationalise this definition into a set of keywords and/or of queries that would combine those keywords in a way or another. Defining what we mean by ‘prompts related to the past’ is key, for numerous reasons, including the fact that prompts can be ambiguous: generated images, for instance, refering to Afrofuturism, often contains at the same time an implicit vision of the past and, of course, of the future, if not the present.

analyzing data

Once those methodological barriers will be (I hope) bypassed, we still need to analyse our corpus. We plead for scalable reading, but will focus here on distant reading.

Figure 1

Figure 1 shows an analysis performed through iramuteq of a sample corpus (1908 prompts) I have collected from lexica.art with the keyword ‘european union’. This small corpus (with no personal data, using the lexica API that does not seem to exist anymore). I chose this keyword to see in which ways users are relating to the European Union’s and Europe’s historical past.

The preliminary results show that references to the past can be of different nature: we can find them in the styles users want for an image (‘soviet propaganda’ for instance); references to the past can be mixed with elements of the current or recent news (‘marine’ for the far-right politician Marine Le Pen in France, ‘nigel farage’ for the Brexit activist, or references to the Russian agression against Ukraine for instance); the notion of ‘Europe’ is sometimes linked to a precise period of time, usually the Middle Ages (‘heraldic’ - which is non-sensical, unless if you consider far right ideologies); some historical concepts linked to the European history are also quite visible (‘empire’, ‘war’).

I am today using the krea corpus (10 millions of prompts, with a small part with historical references), and am experimenting with the more traditional LDA style topic modelling, and to word vectors for now. Using a LLM could of course be a supplementary option.

conclusion

(yeah, conclusions are not my thing)

AI-based generative systems are an occasion for historians to investigate new kind of primary sources, including prompts. Nevertheles, a set of methodological barriers need to be bypassed, thanks to comprehensive ways to collect data; and a scalable reading of prompts as primary sources. Nevertheless, distant reading of prompts do not give any insights on how users are negotiating with “the machine” when the generated text or image does not fit their (ideological, cultural, etc) expectations. We hence suggest that historians should mix quantiative and qualitative methods, including by doing an oral history of the uses of prompts.

ps

At DH2024, where I presented this research, numerous papers were investigating AI and most particularly LLMs. If all those papers are worth reading, one panel (that I could not attend: i was speaking in another session at the same time) will be particularly interesting to view, once the zoom videos will be available: “Reinventions and Responsibilities in the Age of AI”.

blog / homepage