Related ToolsClaudeChatgptClaude CodeClaude For Desktop

Your AI Forgets Your Rules in Long Sessions. Here's Why.

AI news: Your AI Forgets Your Rules in Long Sessions. Here's Why.

When your AI assistant has been working with you for hours, reading through hundreds of pages of documents, it might start ignoring rules you set at the very beginning of the session.

This phenomenon, dubbed "the 200k ghost" by a researcher documenting the problem on GitHub, describes what happens to LLM instruction-following as context windows fill up. The 200k refers to 200,000 tokens, roughly 150,000 words, or about twice the length of a standard novel. Most frontier models now offer context windows at or above this size. The problem is that bigger windows do not mean equally reliable attention across all that text.

How Models Lose Track of Early Instructions

Modern LLMs use a mechanism called "attention" to decide which parts of their input to prioritize when generating a response. The math works out so that tokens (the small chunks of text that models process, typically 3-4 characters each) near the current generation point get more weight than tokens far away. Your system prompt, the instructions you set at the start of a session telling the model how to behave, sits at position zero. By the time you are 180,000 tokens into a conversation, those early instructions are as far from the model's focus as they can be.

The documented failure modes are specific: models contradict formatting rules from the system prompt, drop persona instructions set at session start, and ignore constraints like "never suggest competitors" or "always respond in bullet points." The model is not refusing. It simply gives those early tokens so little weight that the instructions effectively vanish from its decision-making.

Who Actually Hits This Problem

Short conversations are unaffected. But several real use cases push into dangerous territory:

  • Customer support bots that preserve full conversation history can accumulate thousands of turns in a long session
  • Document analysis tools loading entire research papers, legal contracts, or code repositories before taking questions
  • Coding assistants working through large codebases, where behavior rules drift as files accumulate in context
  • AI agents running multi-step tasks, where early task instructions compete with tool outputs and intermediate results

For anyone building with AI APIs (the programming interfaces that let developers plug AI models into their own software), this is a practical architecture problem.

Workarounds That Actually Help

There is no clean fix yet. Three approaches reduce, but do not eliminate, the problem:

Periodic re-injection: Repeat key instructions at regular intervals throughout the conversation. This costs extra tokens and money, but it works. Re-stating critical rules every 20,000 tokens keeps them within the model's effective attention range.

End-of-context placement: Add critical instructions just before the user's current message, not only at the start of the system prompt. Models show consistently stronger compliance with instructions placed close to the current generation point.

Chunked processing: Break large documents into smaller sessions instead of loading everything at once. You lose cross-document reference ability, but instruction-following becomes reliable again.

The underlying issue is architectural. Current transformer models, the technology behind GPT-4, Claude, Gemini, and virtually every other major AI, have fundamental limits on how evenly they distribute attention across very long inputs. Until that changes at the research level, long-context sessions carry this reliability risk. If you are building anything serious on top of a large context window, test your instruction-following behavior at high token counts before shipping.