How AI (LLM) Context Window Size Impacts What Data AI 'Remembers' In Software Development Workflows

========================================

This document was created with help of Gemini.

Date created: 25 March 2026, last updated on 26 March 2026.

When using AI coding assistants (like Gemini on web, Gemini Code Assist extension in VS Code, Gemini in Google Colab or even Gemini in AntiGravity (AI agent)) for software development, understanding how the AI remembers the work done in the sessions is important. Further, the massive difference between a free-tier context window and a pro-tier context window fundamentally changes how you can use these tools to manage a codebase.

Recently when I was using Gemini Code Assist in VS Code, I found that it had 'forgotten' the work done in the previous day - I had closed and reopened the project in VS Code between these work sessions. As I am on Google AI Pro plan, I had thought it would have remembered the previous day's work session. When I pointed out to Gemini that we had done some work in the previous day, it said, "You are absolutely right, my apologies for the mix-up! I completely lost track of our progress from the previous session."

Update: After this note was prepared, I came to know that Gemini Code Assist extension in VS Code opens a new chat window by default but retains few past chats which can be made the current chat. If one does that then the context of the earlier sessions in the now-current chat is automatically sent to Gemini backend and so it will 'remember' the earlier sessions. However, a lot of the contents of this note is still relevant and so I am publishing it as a blog post.

I should also mention that Gemini Code Assist VS Code extension chats, as of now, cannot be renamed unlike Gemini web chat. It says that it stores the past 20 chats but I see that it frequently saves copies of the current chat with names like 'Copy of "Copy of "Copy of "Copy of "XYZ...' where XYZ is a name it creates automatically based on the initial prompt(s) of the chat. Each of these copies takes up one of the 20 slots. I have not checked whether these copies truncate the chat session snipping out older content. In conclusion, my view is that Gemini Code Assist VS Code extension chat history is quite a primitive mechanism as compared to the sophisticated chat history that Gemini chat on web has.

That triggered me into a chat with Gemini on this topic and this document has been prepared by Gemini based on that chat, after which I have reviewed and edited it. Based on my understanding after the chat, I see that Gemini's 'apology' was unnecessary - it was just being polite. In the past, especially when I was on free tier and I faced such issues in the current session itself, Gemini would typically apologize when I would point out such things and that led me to think that these are AI tool errors. I now understand that in many such cases, Gemini would simply not have had the old data, especially in free tier case where the 'sliding window' (explained below) would have cut out the old data from the request sent to Gemini AI backend. In other words, what I thought was AI error may have simply been an issue of full data not sent to AI backend due to small context window size.

The Illusion of AI Memory: The "Stateless" Architecture

Fundamentally, Large Language Models (LLMs) are stateless. They do not possess a continuous, long-term memory bank or a hard drive where they store memories of your past interactions.

When you use an AI chat, the backend does not remember your previous messages. Instead, the responsibility of "memory" is shifted to the Client Application (e.g., your browser tab or your VS Code extension). Every time you send a new message, the client automatically packages up your new message along with the recent history of your current conversation and sends it all to the AI at once.

Because the AI can read the history in that package, it appears as though it "remembers" what you just did. This stateless architecture is the industry standard because it allows tech companies to scale horizontally—routing your prompt to any available server without needing to reserve massive, dedicated memory banks for every single active session.

However, when there is a break—such as closing the application window or starting a new chat thread—the default behavior is often to start with a fresh, blank slate because the historical package of the previous session is no longer active. But this doesn't mean the context is lost forever. Modern client applications like the Gemini Code Assist extension in VS Code maintain a chat history. By manually selecting a previous chat session from this history, you are instructing the client to reload that entire conversation. When you send your next message, the client bundles it with the full history of that restored session, effectively allowing the AI to "remember" the context from a previous work session.

What is a Context Window?

The amount of history the client can send to the AI in that package is governed by the Context Window, which is measured in Tokens.

What is a Token? A token is a small chunk of text. In English, a token is roughly 4 characters or about 3/4 of a word. When dealing with code, special characters, indentations, and syntax also consume tokens.
The "Sliding Window" Effect: If a conversation goes on for so long that the transcript exceeds the AI's token limit, the system uses a "sliding window." It silently drops the oldest messages from the very top of the transcript to make room for your newest messages. This is why an AI might suddenly "forget" a rule you established an hour ago.

Free Tier vs. Pro Plan: A Paradigm Shift

The size of the context window dictates whether you are feeding an AI piecemeal, or allowing it to ingest your entire project architecture. (For official context window limits, refer to Google's Gemini limits support page).

1. The Standard/Free Tier (e.g., 32,000 Tokens)

Capacity: Roughly 24,000 words, or about 50 pages of text.
The Workflow: At ~32k tokens, you can only feed the AI one or two moderately sized scripts at a time. If you try to paste an entire folder of code, the system will either reject it or immediately trigger the sliding window, forgetting the beginning of your conversation to make room for the new text. You are forced to work file-by-file.

2. The Pro/Advanced Tier (e.g., 1,000,000 Tokens)

Capacity: Roughly 750,000 words. This is equivalent to about 1,500 pages of text, or 30,000 to 50,000 lines of code.
The Workflow: This is a massive leap. With 1 million tokens, you do not have to pick and choose what to show the AI. You can load dozens of scripts, entire folders, and documentation into the chat at the exact same time.

3. Free Tier Google AI Studio or Gemini API may offer higher Context Window

As per Gemini, Gemini API or Google AI Studio offers the 1-million token context window for free, but it restricts you with very tight rate limits, like only a few requests per minute. However I was not able to verify this from the Google reference pages for such info. like https://ai.google.dev/gemini-api/docs/pricing.

The Practical Impacts of a Massive Context Window

Upgrading to a massive context window transforms a complex software engineering task in three major ways:

1. Defeating "Sliding Window" Amnesia

With a small window, spending an hour discussing one folder means the AI completely forgets the architectural decisions made for the previous folder. With a massive window, the sliding window effect is pushed back exponentially. You can have a deep, hours-long session, and the AI will still flawlessly remember rules established at the very beginning of the chat.

2. Whole-Project Ingestion vs. Piecemeal Feeding

You no longer have to feed the AI code line-by-line or function-by-function. You can theoretically highlight your entire project directory and drop it into the prompt at once. The AI can hold the entire architecture of your project in its "short-term memory" simultaneously.

3. Cross-File Reasoning (Holistic Vision)

This is the biggest advantage for software development. With a small window, if the AI suggests a change to Script-A, it has no idea if that change breaks Script-B because Script-B didn't fit in the context.

With a large window, the AI has holistic vision. It can trace a variable from the moment a user types it into a prompt in a top-level script, all the way down through a batch manager, and into a core recursion engine stored in a completely different folder.

4. AntiGravity AI Agent Injection of Relevant Past Memories

AI "Agents" like Google's AntiGravity may wrap the stateless LLM in complex software connected to a Vector Database, allowing the agent to dynamically search and inject relevant past memories into the current context window when the context window has massive size.

The One Catch: "Needle in a Haystack"

The only downside to having a massive context window is that you are giving the AI a lot of information to read through for every single response. While modern models are getting much better at finding the "needle in the haystack," it becomes even more important to be highly explicit in your prompts. When the AI has 50 files in its context, you must clearly tell it exactly which file or logic block to focus on for your current request.

Technical Notes

The following points are excerpted from my chat with Gemini on this topic.

Pros and Cons of Stateless AI Backend

Massive Backend Scalability (The REST Philosophy): By keeping the LLM model itself completely stateless, tech companies can route your prompt to any available GPU/server in their massive data centers. If the model had to remember your session, your requests would have to be pinned to a specific server, which makes load balancing incredibly difficult. Statelessness means the infrastructure can scale horizontally with ease.
Where the "Memory" Actually Lives: Since the AI backend doesn't remember anything, the responsibility of memory is shifted to the Client Application. When you use Gemini Web, your browser tab is storing the chat history. When you use Gemini Code Assist, the VS Code extension is storing it. When you hit "Send," the client application acts as the manager, bundles up your local history, and fires it off to the AI.
How Agents (like AntiGravity) Differ: You brought up AI Agents, which introduces a fun twist. The core LLM inside an agent is still stateless. However, agents are wrapped in complex software that gives them a "Memory Layer" (often using Vector Databases). So, if an agent takes a break and comes back hours later, its software will first search its database for relevant past memories, bundle those specific memories into a new prompt, and send that to the stateless LLM. It simulates long-term memory by intelligently dynamically assembling the Context Window!
Trade-offs: The trade-off of higher network traffic and token processing costs is heavily outweighed by the sheer scalability and reliability this architecture provides.

Search This Blog

Ravi S. Iyer: Software Development