AI’s short term memory problems

Over the past couple of months I have done a lot of work on Artificial Intelligence (AI). Not only studying it (I had been doing that for years, but the field keeps on changing extremely fast) but also looking at investments in the space. However, the most important thing has been to actively use it to get new business ideas.

When using the latest Large Language Models (LLM) you quickly learn its limitations. For example, when using Midjourney, a text-to-image LLM, to come up with colourful sailing yachts, you realize that the model doesn’t understand the physics of sailing (yet). Many of the pictures of sailboats I try to generate are simply incorrect, for example with the mainsail on the bow and the jib at the stern of the boat, or even with boats sailing backwards Of course there are ways to change that with better prompts, but I was surprised that it was not able to automatically come up with correct pictures, even though there are millions of sailboat pictures online that it must have been trained on.

Worse than that, however, is actually the short term memory of AI. This is something that’s quite scary, because it’s not immedialtely noticable when you start using AI. My thinking was that if you give a LLM general instructions it will always keep those in mind. If you tell it to remember 6 words and you ask them the 6 words the next day again, it should remember them, right? Well, actually that’s not always the case. And that is a big problem that you don’t read about a lot yet. (Note: I am aware that you can now give custom instructions that should stay in ChatGPT’s memory ‘forever’. However, these instructions are still very limited.)

This summer I started using ChatGPT and Claude to write novels. I started off with translating novels from English to (mostly) Dutch with ChatGPT-4 and was quite impressed with the results, much better than Google Translate for example. Then I moved on to writing novels from scratch. Both ChatGPT and Claude are perfect to brainstorm ideas, to come up with unique plot twists, or simply to help you write the next paragraph when you get stuck. But don’t try to use them to write a complete novel, because the AI will go completely off the rails.

I am currently writing a psychological thriller with these 2 AI models. The storyline is about 3 couples that go on a sailing trip through Indonesia on a large sailing yacht. All couples have some issues that become more clear during the trip. At a certain point a huge storm hits and when the storm is over the husband of one of the couples has disappeared. The models helped me to come up with a good pitch, they wrote a synopsis for me and together we came up with a chapter outline and beats for each chapter. So far so good.

But then we started writing the prose together. At first I thought I would let AI write 90% of it and I would do the remaining 10%. I think that will eventually (= maybe next year already) work, but it’s too early now for LLMs to do that. What happened is that I would for example have a cliffhanger at the end of a chapter where the couples are having breakfast. Then the next chapter starts and all of a sudden they all go to bed after the meal is finished, meaning that AI seems to have forgotten that the meal was a breakfast and not a dinner!

Worse, one chapter is describing the storm and how it partially seems to destruct the yacht. Then in the next scene everyone is sitting on the deck looking at the dark clouds that foreshadow the arrival of the storm. This makes no sense at all of course.

Also smaller things were incorrect. For example, Claude described the approaching storm as a hurricane – but there are no hurricanes in Indonesia. It seems to have forgotten where the story took place, which seems quite crucial when you write a novel. Another example was that during the storm the water was described as frigid, which is kind of unusual in the tropics. This should not happen to a model that was trained on real life data, so it means it forgot the novel takes place in the tropics.

To me it’s strange that AI can make so many big mistakes and it makes me wonder how good the current LLMs really are. I know many models have a relatively limited memory: I believe the current context window is about 8200 tokens for ChatGPT-4, with 4 tokens being roughly similar to 3 words (so about 6000 words). That means that beyond that limit ChatGPT may start to forget what you told it.

My naive assumption was that you should be able to solve that while writing a novel, if you go from your story outline to chapter beats that contain the right information (the beats are used to write the detailed prose). I will likely have to manually change the beats for every chapter to remind AI that this is a novel that takes place on a sail boat in tropical waters, with 6 passengers and only 2 crew (at one point there were suddenly 3 crew on deck!).

Generally I am very impressed with LLMs, but I didn’t realize how dangerous it can be to rely on them, even for writing simple fiction. The good thing is that when you write a novel you can easily spot the problems, but if you are using ChatGPT for business or scientific purposes, the content may be much harder to understand and it may be almost impossible to catch the mistakes that the LLM is making.

If you write short stories or just ask simple questions these models are great, but for anything over a couple of pages you may want to either write it yourself or triple-check both your prompts and output. However, you can always ask the LLM to proofread the novel for you and then it should (hopefully) find most of the mistakes immediately.

Marc

September 2, 2023

Hey Marc, long time no speak. Nice that you are also exploring how generative AI can help you. My company enables large organizations to use LLMs without the risks and limitations. Here is an article I wrote why companies hit the brakes with LLM usage: https://y.digital/company/shared-knowledge/blog/llm/
Alexander W. van der Kemp

September 3, 2023

Great article and to the point, true Marc, it is not perfect for writing longer and more complex text. If we set the assignment correctly it tends to be better. Or notifies a no-go.
It is still the beginning, in 5 years design, engineering, medical, MRI & CBCT scan results and AI will be standing as standard tools to support knowledge. Law is holding back on several AI integrations.
And here we need to push, lobby further and outline integrations for changes in law.

Greetings from Strasbourg, France.
Alexander
Jordi

April 27, 2024

Hi Marc. I’m trying to reach you out. Please send me an email. Thanks
meika

May 2, 2024

why would i read the novel if I can read the prompt
Marc van der Chijs

May 3, 2024

Jordi, you can reach me at marcvanderchijs (at) gmail (dot) com

Published

September 2, 2023

Marc van der Chijs in Uncategorized | September 2, 2023

AI’s short term memory problems

Published

September 2, 2023

Write a Comment