Assessing feasibility of agentic coding with (free) Ollama, Claude Code and local open-source models for Next.js and React Native apps (no hands-on trials)
- In this exploration, I have only viewed videos, interacted with AI chatbots and read articles to arrive at my conclusions.
- I have NOT tried out Ollama, Claude Code and local open source coding models for Next.js and React Native app dev. Surely trying them out myself will be a more reliable assessment but such trials have challenges of investing lot of time for the trials and organizing required hardware resources even if I use online resources like Google Colab. The videos, AI chatbot interactions and articles themselves have convinced me that my conclusions are on the right track.
Summary
My Views (Conclusions)
- Ollama + Claude code + qwen3-coder-30b even with local machine with required power like the PC shown in video 1 below (60+ GB System (CPU) RAM, 32 GB VRAM, NVIDIA GeForce RTX 5090):
- does not seem to be a reliable agentic coding system for medium size Next.js or React Native apps
- is much less in capability as compared to paid cloud Claude agentic coding systems which offer powerful models with strong coding capability like Opus 4.5 (or 4.6 released in Feb. 2026) and Sonnet 4.5
- Model quality seems to be very important for medium size Next.js or React Native app development needs. Open source models don't seem to be good enough yet.
- Claude code and qwen3 class of open source models may be good for smaller level coding tasks like small plain HTML/JS/CSS app.
- Claude code and an open source model are shown to be doing a good job for document query type of task (non coding task) in video 1 below. This may be a good setup for local document query AI app/system.
- Claude code and local open source model may be worthwhile to try out for validation and correction of documents. This could include the open source Bhagavad Gita in JSON data mentioned in my previous posts.
- Learning agentic coding without paying for paid models, can be done using Claude code + Ollama + open source models.
- However, even for this purpose, GPU with significant memory (VRAM) may be preferrable as almost all videos I have seen about Ollama presume GPU availability, including Google Colab online setup.
- CPU-only setups may be possible but could be impractically slow for agentic workflows.
Details
- This combination (my PC specs mentioned above) is perfectly fine for development, but not well-suited for local LLM hosting beyond very small models.
- To comfortably run a large coding-focused LLM such as qwen3-coder locally using Ollama, a fairly capable system is required. Practically speaking, this means a modern multi-core CPU (at least mid-range i5 / Ryzen 5 class or better), 32 GB of RAM for smooth operation, and ample fast storage—preferably an SSD with at least 100–150 GB of free space to accommodate models, caches, and updates. While GPU acceleration is not strictly mandatory, a discrete GPU with sufficient VRAM can significantly improve responsiveness. Without such resources, running large local models tends to be slow, memory-constrained, or impractical for sustained development use.
- Memory 61.6 GB (This must be System RAM which seems to be informally referred in some articles/posts as CPU RAM)
- GPU 0 AMD Radeon (This may not be playing a key role for activity shown in the video).
- GPU 1 NVIDIA GeForce RTX 5090: Task Manager shows ~62.3 GB total GPU memory, comprised of:
- ~31.5 GB Dedicated GPU Memory (seems to be VRAM). NVIDIA GeForce RTX 5090 has 32 GB of GDDR7 memory
- https://www.nvidia.com/en-in/geforce/graphics-cards/50-series/rtx-5090/
- https://www.techpowerup.com/gpu-specs/geforce-rtx-5090.c4216
- ~30.8 GB Shared GPU memory. The shared memory seems to be additional system RAM the OS can allocate, not additional physical VRAM.
- Model he uses for the test: qwen3-coder-30b 18.63 GB - https://youtu.be/nYDUdnMVDdU?t=55
- He uses Claude Code Router
- He asks Ollama - Claude Code to create an app that loads a PDF book and allows user to query the book using AI model. His app specifications include Next.js with App router. - https://youtu.be/nYDUdnMVDdU?t=70
- Claude + qwen3 create the files of the app.
- But it uses outdated version packages! Author says that this has to fixed later (manually?) by upgrading the packages.
- Generated app is not working - https://youtu.be/nYDUdnMVDdU?t=374
- Local AI fails to figure out what's wrong - https://youtu.be/nYDUdnMVDdU?t=404
- Author then switches to Cloud Claude and seems to use Sonnet 4.5 model with a Claude Pro plan. Cloud Claude figures out the problems and fixes it. - https://youtu.be/nYDUdnMVDdU?t=429
- Running app hits some token capacity error for a specific query. LM Studio is used to run the LLM and shows details about the error - https://youtu.be/nYDUdnMVDdU?t=553
- He switches the model to some older qwen model and tweaks some config. After that the app is able to execute that query but takes very long.
- Recent video - Just 2 weeks back
- On Windows
- Shows how to download Ollama
- shows models set up on his pc. - https://youtu.be/3x2q6-5XbQ8?t=98
- qwen3-coder-30b 18 GB
- glm-4.7-flash:latest 18 GB
- gpt-oss:20b 13 GB
- Shows Ollama web page with recommended models for Claude Code. 4 models are listed of which 3 are the ones set up on his pc - https://youtu.be/3x2q6-5XbQ8?t=122
- gpt-oss:20b is smallest - https://youtu.be/3x2q6-5XbQ8?t=152
- He got better results with qwen3 as compared to glm-4.7-flash - https://youtu.be/3x2q6-5XbQ8?t=192
- Claude Code install for Windows - https://youtu.be/3x2q6-5XbQ8?t=257
- Launching Claude Code from Ollama (which enables local model to be used; when directly running claude code - only paid cloud models have to be used) - https://youtu.be/3x2q6-5XbQ8?t=299
- Clearly says that these (local) models will never compete with the likes of Opus and Sonnet 4.5 - https://youtu.be/3x2q6-5XbQ8?t=401
- He then says, local models are perfect for starting up with agentic coding, vibe coding when somebody does not have the means to pay the expensive cloud models.
- The app he uses is a plain HTML/CSS/JS app. gpt-oss is able to describe that codebase well, as per author.
- Pity he did not show a Next.js app!
- Author talks of context window (token) size of models and says that coding agents can use lot of context - https://youtu.be/3x2q6-5XbQ8?t=487
- Claude Code has 200K token limit. Convo gets summarized or compacted on getting close to (crossing?) the limit - https://youtu.be/3x2q6-5XbQ8?t=504
My Views
- These two videos validate my key point that Ollama + Claude code + qwen3-coder-30b even with local machine with required power like the PC shown in video 1 (60+ GB System (CPU) RAM, 32 GB VRAM, NVIDIA GeForce RTX 5090):
- does not seem to be a reliable agentic coding system for medium size Next.js or React Native apps
- are much less in capability as compared to paid cloud Claude agentic coding systems which offer powerful models with strong coding capability like Opus 4.5 (or 4.6 released in Feb. 2026) and Sonnet 4.5
- Model quality seems to be very important for medium size Next.js or React Native app development needs. Open source models don't seem to be good enough yet.
- Claude code and qwen3 class of open source models may be good for smaller level coding tasks like small plain HTML/JS/CSS app.
- Claude code and an open source model are shown to be doing a good job for document query type of task (non coding task) in the first video above. This may be a good setup for local document query AI app/system.
- Claude code and local open source model may be worthwhile to try out for validation and correction of documents. This could include the open source Bhagavad Gita in JSON data mentioned in my previous posts.
- Learning agentic coding without paying for paid models, can be done using Claude code + Ollama + open source models.
- However, even for this purpose, GPU with significant memory (VRAM) may be preferrable as almost all videos I have seen about Ollama presume GPU availability, including Google Colab online setup.
- CPU-only setups may be possible but could be impractically slow for agentic workflows.
AI Chatbot Interactions Details
Let me clarify carefully, because there are layers here.
1️⃣ Chat Interface vs Agentic Capabilities
On a Free plan, you typically get:
Standard chat interaction
Limited tool usage (if enabled)
No persistent autonomous task execution
No long-running background agents
Agentic systems require:
Tool execution rights (file system, code execution, browsing)
Higher compute budgets (multi-step reasoning loops)
Longer context windows
Persistent task state
Those cost infrastructure resources — so they are usually part of:
Paid ChatGPT tiers (Plus / Pro / Team / Enterprise)
API usage (pay-per-token billing)
2️⃣ Two Different “Agentic” Paths
A) Agentic inside ChatGPT app
Higher-tier plans may provide:
Advanced tool usage
File uploads + analysis
Web browsing
Code execution
More autonomy
But still bounded within session limits.
B) True Agentic Systems via API
This is where it becomes more serious:
You build:
A planning loop
Tool registry
Execution controller
Memory persistence
Error recovery
You pay via API usage.
Cost depends on:
Model used
Tokens consumed
Tool invocations
Runtime
This is how people build:
DevOps automation bots
Research agents
AI code maintainers
Business workflow automators
3️⃣ Why Free Plans Rarely Offer Full Agentic Mode
Agentic systems:
Use more tokens (multi-step reasoning)
Run iterative loops
Call tools repeatedly
Maintain state
Consume more compute per task
That directly increases cost per user.
4️⃣ Practical View for You
Given your current technical interests (React, GitHub Pages, Netlify, DNS troubleshooting):
You do not necessarily need full agentic mode.
Your workflow benefits more from:
Deep reasoning
Precise debugging
Controlled step-by-step interaction
In fact, for infrastructure work, excessive autonomy can introduce unwanted changes.
5️⃣ Strategic Advice
If someday you want:
Automated repo maintenance
Continuous refactoring
Deployment monitoring agent
Blog infrastructure manager
Then an API-based agentic setup might justify the cost.
But for disciplined engineering workflows like yours, high-quality chat + manual control is often safer and more predictable.
If you would like, I can outline:
Approximate cost structure of running an API-based agentic developer assistant
Whether it would be economically sensible for an individual developer
Or how to simulate “agentic behavior” manually using structured prompting
Tell me which direction you want to go.
You are asking the right question:
“Given limited hardware, what is actually feasible?”
Let’s analyze each environment separately.
1️⃣ Your PC: i3 12th Gen, 16 GB RAM, Integrated Graphics
What this means technically
No discrete GPU → CPU-only inference
16 GB RAM → practical ceiling ≈ 7B–14B quantized models
i3 → limited parallelism → low tokens/sec
What you can run comfortably
With Ollama:
7B model (Q4 or Q5 quantization) → workable
13B Q4 → borderline but possible
14B Q4 → slow but usable for experimentation
Expect:
3–8 tokens/sec
Planning loops will feel slow
Long reasoning chains may be painful
What you cannot realistically run
32B+
70B
Anything requiring > 20 GB RAM
Practical conclusion
You can build a proof-of-concept local coding agent.
It will:
Modify small files
Add functions
Refactor small components
It will struggle with:
Large React codebases
Multi-file reasoning
Debugging complex build failures
But for learning purposes — absolutely viable.
3️⃣ Now the Interesting Question:
Can GitHub Actions Free Runners Help?
Short answer:
Not really for persistent agentic experimentation.
Here’s why.
GitHub Actions Free Tier Constraints
Limited minutes per month
No persistent storage between runs
No GPU
CPU-only runners
Ephemeral environment (destroyed after job ends)
That means:
You cannot:
Run a persistent agent loop
Keep a model server alive
Maintain memory across sessions
You could:
Trigger short inference tasks
Run batch experiments
Benchmark prompts
But not interactive development agents.
4️⃣ ---SNIPPED---
In theory:
Install Ollama inside runner
Download small model
Run inference
Output results
In practice:
Cold start time is large
Model download time eats runtime quota
Runner memory is limited
Not suitable for interactive iteration
It becomes inefficient and clumsy.
5️⃣ Much More Practical Alternatives
If your goal is:
A) Local experimentation (low cost)
Use:
Ollama
7B–14B coder model
Simple Python agent loop
B) Slightly more power without subscription
Consider:
Occasional API usage (controlled cost)
Very low budget (₹ few hundred/month)
Only invoke API for heavy reasoning
Hybrid strategy often wins.
6️⃣ What I Would Recommend For You
Given your engineering mindset:
Build a minimal local agent
Test on small repos
Measure:
Tokens/sec
Iteration stability
Failure patterns
Compare with a few API calls to a strong model
Draw conclusions
You will gain:
Deep intuition about model capability
Understanding of scaling limits
Appreciation of reasoning depth
7️⃣ The Hard Truth
Agentic architecture is cheap.
High-quality reasoning is expensive.
The cost is not in the loop.
It is in the intelligence per token.
8️⃣ Final Assessment For Your Setup
Your PC:
✔ Good for learning
✔ Good for small coding tasks
✘ Not good for serious refactors
...
GitHub Actions:
✘ Not useful for this purpose
...
Me: One suggestion was to use Google Colab (which seems to have free tier). Your view please.
CG: Good question. Google Colab can be useful — but only under certain expectations. Let’s evaluate it properly in the context of running an agentic coding system.
1️⃣ What Colab Free Tier Actually Gives You
Typically (this fluctuates):
Ephemeral VM
~12–15 GB RAM
Occasional access to a modest GPU (T4 / P100 class, not guaranteed)
Session timeout (often 2–12 hours, sometimes shorter)
Idle disconnects
No persistent background processes
Important: Access to GPU in free tier is inconsistent and throttled.
2️⃣ Can You Run Ollama / Local LLMs There?
Yes — but not in the typical “install once and reuse forever” sense.
Each session:
You install dependencies
Download model weights
Start the model server
Run your agent loop
Lose everything when session ends
So it’s suitable for experiments, not long-running agents.
3️⃣ Performance Compared to Your PC
If GPU is available:
A T4 can run 7B–14B much faster than your i3.
You may get 15–40 tokens/sec depending on model.
If no GPU:
CPU performance may be similar or slightly better than your PC.
Still limited.
So Colab can be significantly better — but only when GPU is allocated.
4️⃣ Suitability for Agentic Work
Let’s evaluate agent requirements:
| Requirement | Colab Free |
|---|---|
| Persistent state | ❌ No |
| Long-running loop | ❌ Risky |
| Background daemon | ❌ Not stable |
| GPU sometimes | ✔ Yes (unreliable) |
| Good for batch experiments | ✔ Yes |
| Good for interactive dev agent | ⚠ Limited |
So:
Colab is good for:
Testing models
Benchmarking coding ability
Running controlled experiments
Evaluating reasoning depth
Colab is poor for:
Daily development assistant
Persistent coding agent
Continuous integration style agent
5️⃣ Realistic Workflow With Colab
You could:
Upload a small repo
Run agent for 20–40 minutes
Observe behavior
Download modified files
Repeat later
This is actually reasonable for research-style experimentation.
6️⃣ Practical Limitation Most People Underestimate
Model download time.
Large models (7B+) are multiple GB.
Every new session:
Download cost in time
Sometimes rate-limited
That eats into runtime quota.
7️⃣ Strategic View For You
Given your goal (learning + experimentation with friend):
Colab Free is:
✔ Better than your i3 for short experiments
✔ Good for comparing models
✔ Useful for structured benchmarking
✘ Not suitable as a daily coding assistant
8️⃣ Best Hybrid Strategy (If You Want Maximum Insight)
Use your PC for:
Understanding agent architecture
Writing the agent loop
Testing with small models
Use Colab occasionally for:
Running stronger models
Observing behavior differences
Comparing reasoning depth
---snipped---
9️⃣ Important Conceptual Takeaway
Agentic systems require:
Stable environment
Persistent state
Fast iteration
Reliable compute
Colab Free provides:
Temporary bursts of compute
That is the difference.
If you want, I can now:
Recommend specific open models that balance performance and reasoning
Outline a minimal coding agent architecture for your hardware
Or compare Colab Free vs Colab Pro from a cost–benefit standpoint
Which direction do you want to explore?
📌 What the “Save to Google Drive” Feature Does
When you save a Colab notebook file (the
.ipynb), it automatically gets stored in your Google Drive (usually in the Colab Notebooks folder). You can open it later in another session or share it with others. (dummies.com)That only saves the notebook file itself — not the running environment or session state. (research.google.com)
Key point:
Saving the notebook means you preserve:
Code cells
Documentation
Notebook outputs (sometimes)
Version history in Drive
It does not preserve:
A running model loaded in memory
The temporary session filesystem
Installed packages
Downloaded weights stored in
/content
Those are wiped whenever the Colab VM resets or times out.
🧠What You Can Save Persistently
You can save large files — including LLM weights/quantized models — by explicitly writing them to Google Drive, if you mount your Drive in the notebook:
from google.colab import drive
drive.mount('/content/drive')
Then copy the model files into Drive:
!cp /content/your_model_folder/* /content/drive/MyDrive/llm_models/
After that, the files will remain in your Drive even after the Colab session ends. (Neurostars)
However, two practical realities:
Uploading/Downloading Gigabytes takes time – syncing a multi-GB model between Colab and Drive can take several minutes each session. (LinkedIn)
Drive mount requires authentication each session – you generally need to re-mount in each new session. (research.google.com)
📌 What You Can’t Do
You cannot simply save the entire Colab session state (e.g., loaded model in memory, environment setup) and resume it the next time by opening the notebook from Drive.
Why?
Colab VMs are ephemeral — each time you connect, it’s a new environment with no memory of the previous one unless you explicitly script saving/loading. (dataprogpy.github.io)
Example of what gets lost when a session ends:
Loaded model weights in RAM
Installed Python packages (unless you re-install them)
Temporary files in
/content/
💡 Practical Workflow for LLM Models on Colab
If you want to reuse a downloaded model across sessions:
Store the model files in Google Drive.
Use the Drive mount code above and copy them there.
In each new Colab session:
Mount your Drive
Copy the model files from Drive to the Colab VM
Start your LLM server locally inside Colab
Example:
!cp -r /content/drive/MyDrive/llm_models /content/models
!ollama serve --model-dir /content/models
This avoids re-downloading the model every time from the internet.
📌 Summary
| What you save | Persists across sessions? |
|---|---|
| 📘 Notebook (.ipynb) | ✅ Yes (in Google Drive) |
| 💾 Model files saved to Drive | ✅ Yes (if you mount & save them) |
| 🧠Loaded model in memory | ❌ No |
| 🛠Installed libraries | ❌ No |
So the answer to your question:
Can I save the created local LLM on my local drive and pick it up next session?
Not automatically just by saving the notebook.
You must explicitly save the model files (weights/checkpoints) to Google Drive, and then in the next session mount Drive and load them back into Colab’s environment. (Neurostars)
If you want, I can provide a small ready-to-use Colab notebook template that:
Mounts Google Drive
Downloads or loads your model from Drive
Caches it so you don’t have to redownload every time
Launches an interactive LLM server
Just tell me which open model you want to experiment with (e.g., Code Llama, Qwen coder, etc.).
...
Me: Let us contrast a Google Colab session with a chat I have with you as chatbot. In your chat case, I can shutdown my PC and after a day reopen the chat. You pick up from where we had stopped. But in Google Colab case, after session is closed and then reopened after a day, there will be no equivalent picking up of context from previous session. Did i get that correctly?
CG: Yes — you have understood the distinction correctly. Let’s formalize it carefully.
1️⃣ Chat Conversation (Like This One)
In this interface:
The conversation history is stored server-side.
When you return later:
The prior messages are still available.
The model is given that conversation history again.
Context continuity is preserved (subject to context window limits).
Important nuance:
I am not “remembering” in a human sense.
The system simply reloads the prior messages and feeds them back into the model as context.
So:
✔ You can shut down your PC
✔ Return after a day
✔ Resume the thread
✔ Context persists
2️⃣ Google Colab Session
Colab works very differently.
A Colab session is:
A temporary virtual machine (VM)
With RAM, disk, processes
Destroyed when session ends
When you return after a day:
You get a new VM
RAM is empty
No processes are running
No prior state exists (unless manually saved)
So:
✔ Notebook file persists (in Drive)
✘ In-memory state is gone
✘ Running model is gone
✘ Agent loop state is gone
✘ Conversation context inside the agent is gone
3️⃣ Why the Difference Exists
Chat system:
Stores conversation transcript as structured data.
Reloads it for each new interaction.
Colab:
Is just a compute environment.
Does not automatically store execution state.
Has no built-in conversational persistence.
It is infrastructure, not an application-level memory system.
4️⃣ Can Colab Be Made to “Pick Up”?
Only if you explicitly design it to.
You would need to:
Save conversation logs to Drive
Save agent state to files
Save intermediate plans
Reload them manually next session
That requires engineering effort.
There is no automatic persistence.
5️⃣ Conceptual Difference
| Feature | Chat Interface | Google Colab |
|---|---|---|
| Conversation stored | ✔ Yes | ✘ No |
| VM persists | Not applicable | ✘ No |
| Memory auto-restored | ✔ Yes | ✘ No |
| Designed for dialogue | ✔ Yes | ✘ No |
| Designed for computation | Limited | ✔ Yes |
6️⃣ Important Subtlety
Even here in chat:
If conversation exceeds context window, early parts may drop.
But practically, you experience continuity.
In Colab:
There is zero continuity unless you implement it.
7️⃣ Final Validation
Your statement:
In Colab case, after session is closed and reopened after a day, there will be no equivalent picking up of context from previous session.
✔ Correct.
That is an accurate understanding.
If you want, next we can discuss:
How to engineer persistence in a local/Colab agent
Or whether building such persistence is worth the effort
Or how serious agent frameworks solve this problem
Where would you like to go next?
=============== end ChatGPT chat extracts ================
Comments
Post a Comment