Last updated on 14 Feb. 2026

Note:

In this exploration, I have only viewed videos, interacted with AI chatbots and read articles to arrive at my conclusions.
I have NOT tried out Ollama, Claude Code and local open source coding models for Next.js and React Native app dev. Surely trying them out myself will be a more reliable assessment but such trials have challenges of investing lot of time for the trials and organizing required hardware resources even if I use online resources like Google Colab. The videos, AI chatbot interactions and articles themselves have convinced me that my conclusions are on the right track.

Summary

My Views (Conclusions)

Ollama + Claude code + qwen3-coder-30b even with local machine with required power like the PC shown in video 1 below (60+ GB System (CPU) RAM, 32 GB VRAM, NVIDIA GeForce RTX 5090):

does not seem to be a reliable agentic coding system for medium size Next.js or React Native apps
is much less in capability as compared to paid cloud Claude agentic coding systems which offer powerful models with strong coding capability like Opus 4.5 (or 4.6 released in Feb. 2026) and Sonnet 4.5

Model quality seems to be very important for medium size Next.js or React Native app development needs. Open source models don't seem to be good enough yet.
Claude code and qwen3 class of open source models may be good for smaller level coding tasks like small plain HTML/JS/CSS app.
Claude code and an open source model are shown to be doing a good job for document query type of task (non coding task) in video 1 below. This may be a good setup for local document query AI app/system.
Claude code and local open source model may be worthwhile to try out for validation and correction of documents. This could include the open source Bhagavad Gita in JSON data mentioned in my previous posts.
Learning agentic coding without paying for paid models, can be done using Claude code + Ollama + open source models.

However, even for this purpose, GPU with significant memory (VRAM) may be preferrable as almost all videos I have seen about Ollama presume GPU availability, including Google Colab online setup.

CPU-only setups may be possible but could be impractically slow for agentic workflows.

==================================================================

Details

The section Impractical for me: Using Claude code for free with Ollama and free model in my recent blog post clearly shows ChatGPT's (CG) view about my PC specs of Intel Core i3-12100 processor, GIGABYTE H610M K motherboard, 16 GB (as 2 x 8 GB) Crucial DDR4 RAM, 240 GB SSD, being too low for local agentic coding LLM needs for Nextjs and React Native app dev.

Some relevant CG summarization type statements from that section:

This combination (my PC specs mentioned above) is perfectly fine for development, but not well-suited for local LLM hosting beyond very small models.
To comfortably run a large coding-focused LLM such as qwen3-coder locally using Ollama, a fairly capable system is required. Practically speaking, this means a modern multi-core CPU (at least mid-range i5 / Ryzen 5 class or better), 32 GB of RAM for smooth operation, and ample fast storage—preferably an SSD with at least 100–150 GB of free space to accommodate models, caches, and updates. While GPU acceleration is not strictly mandatory, a discrete GPU with sufficient VRAM can significantly improve responsiveness. Without such resources, running large local models tends to be slow, memory-constrained, or impractical for sustained development use.

A little later, on same day of 10 Feb 2026 IFIRC, I saw another related video put up just a day earlier: Stop Paying Anthropic $200/month for Claude Code (Do This Instead) ~8 mins.

In the video, the author used gpt-oss:20b model of size 13 GB and got Claude Code to build a Hello world Next.js app. It generated the very simple app using the older pages router instead of app router but she had not specified which router to use. The very simple app ran as expected.

That essentially triggered me to check in more detail what exactly is involved in a local LLM setup for Claude code and a relatively smaller size (13 GB as against 19/20 GB) coding model that could create a Next.js app than simply relying on ChatGPT's responses mentioned earlier.

The comment I put up on the video:

Thank you so much for catering to free-tier users (like me) who are fascinated by media reports about Claude Code but hit the bottleneck of Claude Code using Claude Code models being paid only. My blog post: Quick look at Claude Code-generated Bhagavad Gita React Native and Expo app (no coding by author), https://raviswdev.blogspot.com/2026/02/quick-look-at-claude-code-generated.html covers my recent 'look' at Claude Code (and Cowork) related videos in this context.

I did see one Ollama related video about using Claude code with open source models but was not sure whether given my low PC specs of i3 12th gen and 16 GB RAM with integrated graphics, I should invest time to explore it.

Your video has been much clearer about the issues and has convinced me that if not now then in the near future, I should invest time to explore using Ollama and a well appreciated open source coding model locally on my PC with Claude Code. Even if my PC is too slow for these needs, I have the option of future PC upgrade if the exploration works out and I feel the PC upgrade is justified.

The big advantages are: Free usage, open source model and associated community support, and data being private. Of course, these models will not match paid models but these are typical free-tier limitations which one has to accept.

Thanks again for catering to free-tier users like me.

--- end YT comment ---

I shared the above with a friend and asked him about any Ollama setup inputs he had. He provided a lot of information about what he had tried in the past with Ollama and local LLM though not as a coding agent. He also was interested in possibility of exploring using Ollama, Claude code and a coding LLM for a local LLM agentic coding setup.

I discussed possibility of local LLM for agentic software development assistant given my low power PC and also free tier GitHub runners and Google Colab with ChatGPT (CG).

Key points are that as per CG, GitHub runner and Google Colab may not work out for agentic software development assistant needs.

I am forming the impression that for software development of medium size Next.js web apps or medium size React Native and Expo mobile (and web possibly) apps (which is my main interest currently), paid Claude Code models (or, I guess, paid Codex models or paid Gemini CLI models) are necessary. I don't think that open source coding models may be good enough for these needs.

I was very impressed by a video (shared in an earlier post) on a Math (K-12 school level) React Native and Expo mobile app generated by Claude Code where the author clearly shows the prompts used and demonstrates the functionality of generated code. While I don't know how exactly the tenminute Gita React Native and Expo mobile iOS app (also shared in that post) was created, I will go by the author's claim that Claude Code generated all of it.

In both these cases, they would have been using the Max20x plan costing $200 (plus tax, it seems) per month (Gita app was on $200 per month Max plan). That would have given them *high quality* reasoning capability which resulted in such apps getting done easily. The Math app did not seem to have any bugs at all. Just instruct the functionality clearly, wait for app to be done/updated, and then run the app! This is quite different from the bugs/not-working code I get quite often from free tier ChatGPT and Google Search AI or Gemini AI.

I now think that the video I shared above "Stop Paying Anthropic ..." of Claude Code being run with Ollama and gpt-oss:20b open source model, was too simplistic. She asked for a Hello world Nextjs app to be done which is very, very basic. That got done and worked as expected but if functionality like gita app or Math app would have been asked, perhaps gpt-oss:20b coding model may have struggled to deliver.

------------------

I found two videos forwarded by my friend to be very useful in the context of this post.

1) AI Coding Without Rate Limits Is Finally Here (Local Claude Code) ~11 mins, 30 Oct. 2025

I could not see any text info. about his PC config nor does he refer to it in the video.

So we need to catch it from the task manager display in the video.

https://youtu.be/nYDUdnMVDdU?t=61 shows key info. of:

Memory 61.6 GB (This must be System RAM which seems to be informally referred in some articles/posts as CPU RAM)
GPU 0 AMD Radeon (This may not be playing a key role for activity shown in the video).
GPU 1 NVIDIA GeForce RTX 5090: Task Manager shows ~62.3 GB total GPU memory, comprised of:

~31.5 GB Dedicated GPU Memory (seems to be VRAM). NVIDIA GeForce RTX 5090 has 32 GB of GDDR7 memory

~30.8 GB Shared GPU memory. The shared memory seems to be additional system RAM the OS can allocate, not additional physical VRAM.

My Views

These two videos validate my key point that Ollama + Claude code + qwen3-coder-30b even with local machine with required power like the PC shown in video 1 (60+ GB System (CPU) RAM, 32 GB VRAM, NVIDIA GeForce RTX 5090):

does not seem to be a reliable agentic coding system for medium size Next.js or React Native apps
are much less in capability as compared to paid cloud Claude agentic coding systems which offer powerful models with strong coding capability like Opus 4.5 (or 4.6 released in Feb. 2026) and Sonnet 4.5

Model quality seems to be very important for medium size Next.js or React Native app development needs. Open source models don't seem to be good enough yet.
Claude code and qwen3 class of open source models may be good for smaller level coding tasks like small plain HTML/JS/CSS app.
Claude code and an open source model are shown to be doing a good job for document query type of task (non coding task) in the first video above. This may be a good setup for local document query AI app/system.
Claude code and local open source model may be worthwhile to try out for validation and correction of documents. This could include the open source Bhagavad Gita in JSON data mentioned in my previous posts.
Learning agentic coding without paying for paid models, can be done using Claude code + Ollama + open source models.

However, even for this purpose, GPU with significant memory (VRAM) may be preferrable as almost all videos I have seen about Ollama presume GPU availability, including Google Colab online setup.

CPU-only setups may be possible but could be impractically slow for agentic workflows.

================================================================

AI Chatbot Interactions Details

Given below are extracts from the ChatGPT chat titled, Agentic Mode Explanation.

Me: I have used your wonderful services in chat mode. What is agentic mode?

CG: Short answer: yes — meaningful agentic capabilities are generally tied to paid tiers or API usage.

Let me clarify carefully, because there are layers here.

1️⃣ Chat Interface vs Agentic Capabilities

On a Free plan, you typically get:

Standard chat interaction
Limited tool usage (if enabled)
No persistent autonomous task execution
No long-running background agents

Agentic systems require:

Tool execution rights (file system, code execution, browsing)
Higher compute budgets (multi-step reasoning loops)
Longer context windows
Persistent task state

Those cost infrastructure resources — so they are usually part of:

Paid ChatGPT tiers (Plus / Pro / Team / Enterprise)
API usage (pay-per-token billing)

2️⃣ Two Different “Agentic” Paths

A) Agentic inside ChatGPT app

Higher-tier plans may provide:

Advanced tool usage
File uploads + analysis
Web browsing
Code execution
More autonomy

But still bounded within session limits.

B) True Agentic Systems via API

This is where it becomes more serious:

You build:

A planning loop
Tool registry
Execution controller
Memory persistence
Error recovery

You pay via API usage.
Cost depends on:

Model used
Tokens consumed
Tool invocations
Runtime

This is how people build:

DevOps automation bots
Research agents
AI code maintainers
Business workflow automators

3️⃣ Why Free Plans Rarely Offer Full Agentic Mode

Agentic systems:

Use more tokens (multi-step reasoning)
Run iterative loops
Call tools repeatedly
Maintain state
Consume more compute per task

That directly increases cost per user.

4️⃣ Practical View for You

Given your current technical interests (React, GitHub Pages, Netlify, DNS troubleshooting):

You do not necessarily need full agentic mode.

Your workflow benefits more from:

Deep reasoning
Precise debugging
Controlled step-by-step interaction

In fact, for infrastructure work, excessive autonomy can introduce unwanted changes.

5️⃣ Strategic Advice

If someday you want:

Automated repo maintenance
Continuous refactoring
Deployment monitoring agent
Blog infrastructure manager

Then an API-based agentic setup might justify the cost.

But for disciplined engineering workflows like yours, high-quality chat + manual control is often safer and more predictable.

If you would like, I can outline:

Approximate cost structure of running an API-based agentic developer assistant
Whether it would be economically sensible for an individual developer
Or how to simulate “agentic behavior” manually using structured prompting

Tell me which direction you want to go.

Me: my PC specs (low): i3 12th gen and 16 GB RAM with integrated graphics

...

Further, would GitHub actions with its runners including Mac runners in free tier quota be of some use here?

CG: Good — now we can evaluate this realistically instead of theoretically.

You are asking the right question:
“Given limited hardware, what is actually feasible?”

Let’s analyze each environment separately.

1️⃣ Your PC: i3 12th Gen, 16 GB RAM, Integrated Graphics

What this means technically

No discrete GPU → CPU-only inference
16 GB RAM → practical ceiling ≈ 7B–14B quantized models
i3 → limited parallelism → low tokens/sec

What you can run comfortably

With Ollama:

7B model (Q4 or Q5 quantization) → workable
13B Q4 → borderline but possible
14B Q4 → slow but usable for experimentation

Expect:

3–8 tokens/sec
Planning loops will feel slow
Long reasoning chains may be painful

What you cannot realistically run

32B+
70B
Anything requiring > 20 GB RAM

Practical conclusion

You can build a proof-of-concept local coding agent.

It will:

Modify small files
Add functions
Refactor small components

It will struggle with:

Large React codebases
Multi-file reasoning
Debugging complex build failures

But for learning purposes — absolutely viable.

...

3️⃣ Now the Interesting Question:

Can GitHub Actions Free Runners Help?

Short answer:
Not really for persistent agentic experimentation.

Here’s why.

GitHub Actions Free Tier Constraints

Limited minutes per month
No persistent storage between runs
No GPU
CPU-only runners
Ephemeral environment (destroyed after job ends)

That means:

You cannot:

Run a persistent agent loop
Keep a model server alive
Maintain memory across sessions

You could:

Trigger short inference tasks
Run batch experiments
Benchmark prompts

But not interactive development agents.

4️⃣ ---SNIPPED---

In theory:

Install Ollama inside runner
Download small model
Run inference
Output results

In practice:

Cold start time is large
Model download time eats runtime quota
Runner memory is limited
Not suitable for interactive iteration

It becomes inefficient and clumsy.

5️⃣ Much More Practical Alternatives

If your goal is:

A) Local experimentation (low cost)

Use:

Ollama
7B–14B coder model
Simple Python agent loop

B) Slightly more power without subscription

Consider:

Occasional API usage (controlled cost)
Very low budget (₹ few hundred/month)
Only invoke API for heavy reasoning

Hybrid strategy often wins.

6️⃣ What I Would Recommend For You

Given your engineering mindset:

Build a minimal local agent
Test on small repos
Measure:
- Tokens/sec
- Iteration stability
- Failure patterns
Compare with a few API calls to a strong model
Draw conclusions

You will gain:

Deep intuition about model capability
Understanding of scaling limits
Appreciation of reasoning depth

7️⃣ The Hard Truth

Agentic architecture is cheap.

High-quality reasoning is expensive.

The cost is not in the loop.
It is in the intelligence per token.

8️⃣ Final Assessment For Your Setup

Your PC:
✔ Good for learning
✔ Good for small coding tasks
✘ Not good for serious refactors

...

GitHub Actions:
✘ Not useful for this purpose

...

Me: One suggestion was to use Google Colab (which seems to have free tier). Your view please.

CG: Good question. Google Colab can be useful — but only under certain expectations. Let’s evaluate it properly in the context of running an agentic coding system.

1️⃣ What Colab Free Tier Actually Gives You

Typically (this fluctuates):

Ephemeral VM
~12–15 GB RAM
Occasional access to a modest GPU (T4 / P100 class, not guaranteed)
Session timeout (often 2–12 hours, sometimes shorter)
Idle disconnects
No persistent background processes

Important: Access to GPU in free tier is inconsistent and throttled.

2️⃣ Can You Run Ollama / Local LLMs There?

Yes — but not in the typical “install once and reuse forever” sense.

Each session:

You install dependencies
Download model weights
Start the model server
Run your agent loop
Lose everything when session ends

So it’s suitable for experiments, not long-running agents.

3️⃣ Performance Compared to Your PC

If GPU is available:

A T4 can run 7B–14B much faster than your i3.
You may get 15–40 tokens/sec depending on model.

If no GPU:

CPU performance may be similar or slightly better than your PC.
Still limited.

So Colab can be significantly better — but only when GPU is allocated.

4️⃣ Suitability for Agentic Work

Let’s evaluate agent requirements:

Requirement	Colab Free
Persistent state	❌ No
Long-running loop	❌ Risky
Background daemon	❌ Not stable
GPU sometimes	✔ Yes (unreliable)
Good for batch experiments	✔ Yes
Good for interactive dev agent	⚠ Limited

So:

Colab is good for:

Testing models
Benchmarking coding ability
Running controlled experiments
Evaluating reasoning depth

Colab is poor for:

Daily development assistant
Persistent coding agent
Continuous integration style agent

5️⃣ Realistic Workflow With Colab

You could:

Upload a small repo
Run agent for 20–40 minutes
Observe behavior
Download modified files
Repeat later

This is actually reasonable for research-style experimentation.

6️⃣ Practical Limitation Most People Underestimate

Model download time.

Large models (7B+) are multiple GB.

Every new session:

Download cost in time
Sometimes rate-limited

That eats into runtime quota.

7️⃣ Strategic View For You

Given your goal (learning + experimentation with friend):

Colab Free is:

✔ Better than your i3 for short experiments
✔ Good for comparing models
✔ Useful for structured benchmarking
✘ Not suitable as a daily coding assistant

8️⃣ Best Hybrid Strategy (If You Want Maximum Insight)

Use your PC for:
- Understanding agent architecture
- Writing the agent loop
- Testing with small models
Use Colab occasionally for:
- Running stronger models
- Observing behavior differences
- Comparing reasoning depth
---snipped---

9️⃣ Important Conceptual Takeaway

Agentic systems require:

Stable environment
Persistent state
Fast iteration
Reliable compute

Colab Free provides:

Temporary bursts of compute

That is the difference.

If you want, I can now:

Recommend specific open models that balance performance and reasoning
Outline a minimal coding agent architecture for your hardware
Or compare Colab Free vs Colab Pro from a cost–benefit standpoint

Which direction do you want to explore?

Me: I saw this video: https://www.youtube.com/watch?v=Y4R-DyvAux8

It talks of a feature of saving the 'notebook' to Google Drive. Does that mean I can save the created local LLM on my local drive and pick it up from there in the next Google CoLab session?

CG: Here’s the clear answer regarding saving models and notebooks in Google Colab — and what that Drive-saving feature actually means:

📌 What the “Save to Google Drive” Feature Does

When you save a Colab notebook file (the .ipynb), it automatically gets stored in your Google Drive (usually in the Colab Notebooks folder). You can open it later in another session or share it with others. (dummies.com)
That only saves the notebook file itself — not the running environment or session state. (research.google.com)

Key point:
Saving the notebook means you preserve:

Code cells
Documentation
Notebook outputs (sometimes)
Version history in Drive

It does not preserve:

A running model loaded in memory
The temporary session filesystem
Installed packages
Downloaded weights stored in /content

Those are wiped whenever the Colab VM resets or times out.

🧠 What You Can Save Persistently

You can save large files — including LLM weights/quantized models — by explicitly writing them to Google Drive, if you mount your Drive in the notebook:

from google.colab import drive
drive.mount('/content/drive')

Then copy the model files into Drive:

!cp /content/your_model_folder/* /content/drive/MyDrive/llm_models/

After that, the files will remain in your Drive even after the Colab session ends. (Neurostars)

However, two practical realities:

Uploading/Downloading Gigabytes takes time – syncing a multi-GB model between Colab and Drive can take several minutes each session. (LinkedIn)
Drive mount requires authentication each session – you generally need to re-mount in each new session. (research.google.com)

📌 What You Can’t Do

You cannot simply save the entire Colab session state (e.g., loaded model in memory, environment setup) and resume it the next time by opening the notebook from Drive.

Why?

Colab VMs are ephemeral — each time you connect, it’s a new environment with no memory of the previous one unless you explicitly script saving/loading. (dataprogpy.github.io)

Example of what gets lost when a session ends:

Loaded model weights in RAM
Installed Python packages (unless you re-install them)
Temporary files in /content/

💡 Practical Workflow for LLM Models on Colab

If you want to reuse a downloaded model across sessions:

Store the model files in Google Drive.
- Use the Drive mount code above and copy them there.
In each new Colab session:
- Mount your Drive
- Copy the model files from Drive to the Colab VM
- Start your LLM server locally inside Colab

Example:

!cp -r /content/drive/MyDrive/llm_models /content/models
!ollama serve --model-dir /content/models

This avoids re-downloading the model every time from the internet.

📌 Summary

What you save	Persists across sessions?
📘 Notebook (.ipynb)	✅ Yes (in Google Drive)
💾 Model files saved to Drive	✅ Yes (if you mount & save them)
🧠 Loaded model in memory	❌ No
🛠 Installed libraries	❌ No

So the answer to your question:

Can I save the created local LLM on my local drive and pick it up next session?

Not automatically just by saving the notebook.
You must explicitly save the model files (weights/checkpoints) to Google Drive, and then in the next session mount Drive and load them back into Colab’s environment. (Neurostars)

If you want, I can provide a small ready-to-use Colab notebook template that:

Mounts Google Drive
Downloads or loads your model from Drive
Caches it so you don’t have to redownload every time
Launches an interactive LLM server

Just tell me which open model you want to experiment with (e.g., Code Llama, Qwen coder, etc.).

...

Me: Let us contrast a Google Colab session with a chat I have with you as chatbot. In your chat case, I can shutdown my PC and after a day reopen the chat. You pick up from where we had stopped. But in Google Colab case, after session is closed and then reopened after a day, there will be no equivalent picking up of context from previous session. Did i get that correctly?

CG: Yes — you have understood the distinction correctly. Let’s formalize it carefully.

1️⃣ Chat Conversation (Like This One)

In this interface:

The conversation history is stored server-side.
When you return later:
- The prior messages are still available.
- The model is given that conversation history again.
- Context continuity is preserved (subject to context window limits).

Important nuance:

I am not “remembering” in a human sense.
The system simply reloads the prior messages and feeds them back into the model as context.

So:

✔ You can shut down your PC
✔ Return after a day
✔ Resume the thread
✔ Context persists

2️⃣ Google Colab Session

Colab works very differently.

A Colab session is:

A temporary virtual machine (VM)
With RAM, disk, processes
Destroyed when session ends

When you return after a day:

You get a new VM
RAM is empty
No processes are running
No prior state exists (unless manually saved)

So:

✔ Notebook file persists (in Drive)
✘ In-memory state is gone
✘ Running model is gone
✘ Agent loop state is gone
✘ Conversation context inside the agent is gone

3️⃣ Why the Difference Exists

Chat system:

Stores conversation transcript as structured data.
Reloads it for each new interaction.

Colab:

Is just a compute environment.
Does not automatically store execution state.
Has no built-in conversational persistence.

It is infrastructure, not an application-level memory system.

4️⃣ Can Colab Be Made to “Pick Up”?

Only if you explicitly design it to.

You would need to:

Save conversation logs to Drive
Save agent state to files
Save intermediate plans
Reload them manually next session

That requires engineering effort.

There is no automatic persistence.

5️⃣ Conceptual Difference

Feature	Chat Interface	Google Colab
Conversation stored	✔ Yes	✘ No
VM persists	Not applicable	✘ No
Memory auto-restored	✔ Yes	✘ No
Designed for dialogue	✔ Yes	✘ No
Designed for computation	Limited	✔ Yes

6️⃣ Important Subtlety

Even here in chat:

If conversation exceeds context window, early parts may drop.
But practically, you experience continuity.

In Colab:

There is zero continuity unless you implement it.

7️⃣ Final Validation

Your statement:

In Colab case, after session is closed and reopened after a day, there will be no equivalent picking up of context from previous session.

✔ Correct.

That is an accurate understanding.

If you want, next we can discuss:

How to engineer persistence in a local/Colab agent
Or whether building such persistence is worth the effort
Or how serious agent frameworks solve this problem

Where would you like to go next?

=============== end ChatGPT chat extracts ================

13:23, 13 February 2026

Me: in the context of LLM installation locally, what is VRAM

GSAI: VRAM (Video Random Access Memory) is the dedicated high-speed memory on a GPU used to store LLM parameters, activations, and context data during local inference. Unlike system RAM, VRAM provides the necessary bandwidth for fast token generation (40–80 tokens/second), acting as the primary constraint on model size and context length.

Ref. article provided by GSAI which seems to be very good: Ollama VRAM Requirements: Complete 2026 Guide to GPU Memory for Local LLMs, https://localllm.in/blog/ollama-vram-requirements-for-local-llms, pub Nov 2025, last updated Feb 2026.

VRAM Calculator for Local Open Source LLMs - Accurate Memory Requirements 2025, https://localllm.in/blog/interactive-vram-calculator

Assessing feasibility of agentic coding with (free) Ollama, Claude Code and local open-source models for Next.js and React Native apps (no hands-on trials)