historical representations
generative artificial intelligence
collective memory
So the aim is to build a corpus of prompts containing some sort of references to the past and to analyse it. There is a set of methodological issues to take into account:
army of the european union invades budapest 2 0 2 2, highly detailed painting, digital painting, artstation, concept art
army of the european union fighting on the streets of budapest 2 0 2 2, highly detailed illustration for time magazine cover art
army of the european union with tanks fighting on the streets of budapest 2 0 2 2, highly detailed oil painting
Let’s take an example: the three prompts here are not refering explicitly to the past. But just use the prompt for a search on an image search engine: all the results are historical. We have here an implicit reference, probably to the hungarian revolution of 1956, that was put to an end by a soviet intervention.
So how to build a robust identification strategy?
krea corpus (sample) → claude api → 5000 prompts
keyword (‘european union’) → api lexica.art → 2000 prompts
I have tried several strategies to constitute two corpora.
For Corpus 1, I have set up a sample of 50 000 lines (because too costly otherwise), sent them to the Claude API with a prompt explaining to Claude sonnet how to determine if a line contained references to the past, and Claude sends back a ‘yes’ or a ‘no’.
For corpus 2, the code was written by a student assistant two years ago (Yaroslav Zabolotskyi).
Let’s focus a bit on Claude: this prompt was co-written with Claude to help Claude reasoning for each item of my corpus. It’s a mixed approach: reasoning and personae, the prompt being enginered through a negociation between me and Claude (the chatbot).
It’s the use of Claude that worked the best. It’s far from perfect: the implicit part is not fully taken into account and some prompts are refering to the past, but not the historical past.
I must precise that for now I did not do any benchmarking, which is an obvious weakness, that I plan to adress. Furthermore, I used a sample of the krea corpus for cost reasons. The plan is at one point to find prompts with references to the past in the full 10 mmillion lines corpus, which should allow me to get around 900 000 prompts with references to the past.
krea corpus (sample) → claude api → 5000 prompts
keyword (‘european union’) → api lexica.art → 2000 prompts
So I have two corpora.
In the next few slides, I’ll analyse them through scalable reading:
This is a dataviz obtained with iramuteq. Iramuteq performs something that looks like topi modelling, but it is not bag of words. The words that you see are in fact the most representative words of clusters of prompts.
There’s some past everywhere here. In the style, obviously, but also in the content.
And most references to the past, if they are not arty, are to wars (cluster 17), or propanganda wars (cluster 8). And most of them are somehow linked to the present news (trump, propaganda, soviet, macron in the same cluster for instance).
What you see here is interesting, because it’s a lot linking the European Union and Europe themes to all sorts of empire themes and to some sort of medievalism. We also find back the ‘propaganda’ cluster – probably because those prompts were produced in the wake of the agression against Ukraine.
Those distant reading analyses are interesting, but confirm more than discover: in a way, we expect Europe to be linked to the concept of empire, we expect that lots of references to the past are linked to the ‘style’ part of a prompt.
But those distant readings allow us also to go back to specific prompts. By looking at prompts that are very similar, we can trace the evolution of a prompt written / re-written several times by a user. And it’s here, that we can see that gen AI platforms, seen as frameworks, encourage users to negotiate with the machine the past they want to see or read.
This negotiation can be seen as a confrontation between several kinds of collective memories: the one that are embedded in the genAI platform and that comes from the way the LLM or diffusion system was trained and from the corpus it was trained on; the collective memory of the group the individual belongs too; the individuals own vision of the past.
I’ll give two examples.
Ursula von Der Leyne [sic] and Emmanuel Macron, Peter [sic] Pavel in the image of knights of the round table
I have showed before prompts that relate to the hungarian revolution – unfortunately I do not have the images.
Here is another example. Several prompts of this kind were written, with some differences, but results that are very similar.
As you can see, we can consider there are several references to the past here.
The user never managed to have Petr Pavel on their images.
joe biden doing a nazi salute, in front of brandenburger tor. huge nazi crowd in front of him. face of joe biden is clearly visible. canon eos r 3, f / 1. 4, iso 1 6 0 0, 1 / 8 0 s, 8 k, raw, grainy
This is a striking example of negotition with a machine to get something in the present with references to the past that serves as an ideological reading of the present.
The user never got what they wanted. Biden doing a nazi salute in front of nazis – that just does not exist. Nevertheless, the prompt activated patterns from the second world war and maybe from the cold war.
I may be overinterpreting, but I can see influence, of course, from nazi footages, but also from De Gaulle in August 1944 (26) in the Champs Elysées, Kennedy in front of the Brandenburger Tor.
The political goal of the prompt writer here is partly in failure: this political goal is confronted with the collective memory of the second world war, that are preeminent over their views.
Chatbots function as media of memory that store, circulate, and trigger collective memories. They are medium of memory because AI models encode historical perspectives. Metaphorically, we could call that the collective memory latent space, that reflects collective memory patterns from training corpora. On top of the training process, alignment and fine-tuning embed specific views of the past (e.g., DeepSeek/Tiananmen, minority representations). In this sense, we can also consider chatbots as frameworks.
When we look at users’ prompt that have references to the past, we see how users are writing those references to the past, but also how they negotiate with gen AI systems to obtain their desired vision of the past, creating a confrontation between different collective memories, but also, often, their vision of the present.
This negotiation reveals tensions between user expectations and historical representations (frameworks) embedded in AI.
Beyond this, I think we should remind something quite important about LLMs and Diffusion System: they are the products of artefacts from the past (the training dataset), they are producing primary sources (prompts, images, texts), and they are triggers of collective memory.
In this sense, GenAI systems are fundmentaly products from history and memory.