Main notes about Google AI Pro plan features for software development work - Part 5

This post is a continuation of Main notes about Google AI Pro plan features for software development work - Part 4.

===================
In Part 4 of this series, I had mentioned about a (free tier) ChatGPT chat: "NotebookLM vs Google Colab" which I had and which was interesting but later realized that ChatGPT was not updated with the latest features of Google Colab. So I decided not to include those chat excerpts in Part 4.

Today, 27 Feb 2026, I had a separate chat session with Gemini (Gem) titled, "NotebookLM vs. Google Colab Use Cases" where I used many similar prompts to that I had used in above mentioned ChatGPT chat. Gemini's responses included what seems to be based on current or close to current versions of NotebookLM and Google Colab. Given below is almost all, if not all of that chat.

Me: When would I use a NotebookLM and when would I use Google Colab Notebook?

Gem: While they both have "Notebook" in the name and live within the Google ecosystem, NotebookLM and Google Colab are designed for completely different tasks.

Think of it this way: NotebookLM is for reading and thinking, while Google Colab is for coding and calculating.


📚 NotebookLM

Your "Personal AI Research Assistant"

NotebookLM is a tool for synthesizing information. You give it documents (PDFs, Google Docs, website links, or YouTube transcripts), and it uses AI to help you understand them. It doesn't require any coding knowledge.

When to use it:

  • Synthesizing Research: You have 20 long PDFs for a project and need to find the common themes across all of them.

  • Studying: You want to upload your lecture notes and have the AI quiz you or create a study guide.

  • Fact-Checking: You need to ask questions about a specific document and want to see citations (exactly where the info came from).

  • Content Creation: You want to turn your notes into a blog post, a script, or even a podcast-style "Audio Overview."


💻 Google Colab

Your "Cloud-Based Coding Lab"

Google Colab is a hosted version of Jupyter Notebooks. It allows you to write and execute Python code in your browser. It’s a powerful environment that gives you free access to high-end hardware like GPUs and TPUs.

When to use it:

  • Data Science & Machine Learning: You need to train a neural network or analyze a massive CSV file using libraries like pandas or scikit-learn.

  • Heavy Computation: Your local computer isn't powerful enough to run a complex script, so you want to use Google’s cloud servers (GPUs).

  • API Development: Since you've worked on the Gita APIs, you might use Colab to test endpoints, prototype data cleaning scripts, or experiment with GraphQL queries.

  • Sharing Executable Code: You want to send a teammate a script that they can run immediately without them having to install Python or libraries on their own machine.


At a Glance: Key Differences

FeatureNotebookLMGoogle Colab
Primary GoalAnalyzing and summarizing text.Writing and running code.
Input TypePDFs, Docs, URLs, Text.Python code, Datasets (.csv, .json).
Technical SkillNone (Natural language).Intermediate (Python programming).
HardwareCloud-based LLM.Cloud-based GPUs and TPUs.
Best ForWriters, students, researchers.Developers, data scientists, engineers.

Note: For your Timestamp Tracker project, you would use Colab if you needed to write a script to migrate your database or analyze user usage patterns. You would use NotebookLM if you wanted to upload all your project documentation and the React Native docs to quickly find how to implement a specific feature.

Would you like me to help you draft a Python script in Colab for one of your Gita API tasks, or perhaps help you organize a research notebook in NotebookLM?

Me: I may have some data specific tasks. For example, I have a Gita JSON API - https://github.com/ravisiyer/gita-data . The tens of MB JSON data having English, Sanskrit and Hindi text have some language errors in them.
At some point, I would like to explore using an AI tool to clean up this data.
Do  you think for this task Google Colab may be a good option?

Gem: Google Colab is an excellent choice for this specific task. Since your Gita dataset is in JSON format and contains multi-language text (Sanskrit, Hindi, and English), Colab provides the programmatic control you need to handle those complexities at scale.

While NotebookLM is great for reading the Gita, Google Colab is the tool for fixing it.


Why Colab is ideal for your Gita Data project:

1. Handling Large-Scale JSON

Cleaning "tens of MBs" of text manually is impossible. In Colab, you can use the pandas library to load your JSON files into a dataframe, making it easy to filter for specific errors or patterns across thousands of verses.

2. Specialized Language Libraries

Since you are dealing with Sanskrit and Hindi, you can install specialized Python libraries in Colab (like indic-nlp-library or polyglot) to handle Devanagari script normalization, transliteration, or spell-checking.

3. Integrated AI (Gemini in Colab)

Colab has built-in AI coding assistance. You can describe the cleaning task in plain English (e.g., "Write a function to find and fix common typos in this Sanskrit JSON field"), and it will generate the Python code for you.

4. Direct GitHub Integration

Since your data is hosted on GitHub, you can:

  • Clone the repo directly into your Colab environment.

  • Run the cleaning scripts on the files.

  • Export the cleaned JSON back to your computer or even push changes back to a branch.


🛠️ A Potential "Cleaning" Workflow in Colab

If you were to start this today, your notebook would likely look like this:

  1. Data Loading: Use import json or pandas to read your files from the GitHub URL.

  2. Pattern Detection: Write "Regex" (Regular Expression) scripts to find common formatting errors (e.g., mismatched quotes or broken line breaks in the porcelain output you've used before).

  3. LLM Cleaning: You can use the Google Gemini API within Colab. You could send a batch of verses to the API with a prompt like: "Correct the spelling of the following Sanskrit verses while maintaining the JSON structure."

  4. Verification: Compare the "before" and "after" to ensure the AI didn't hallucinate or change the meaning of the sacred text.


A Quick Comparison for your specific task:

TaskUse NotebookLM?Use Google Colab?
Finding a specific error in one chapterYes (Fast & easy)No (Overkill)
Fixing errors across 700 versesNoYes (Automated)
Validating JSON schema/structureNoYes
Comparing Hindi vs. English translationsYes (For analysis)Yes (For processing)

Pro-Tip for your PowerShell Workflow:

Since you've previously used git diff --no-index --word-diff=porcelain to track changes, you can actually run those exact same commands inside a Colab cell by prefixing them with an exclamation mark (e.g., !git diff ...). This allows you to verify your data cleaning results using the tools you are already comfortable with.

Would you like me to draft a starter Python script for a Colab notebook that loads one of your Gita JSON files and checks for common formatting errors?

Me: The big thing that Google Colab brings is lot of compute power, as compared to NotebookLM which will have limited compute power. Am I right? Also, where does the LLM run? In a separate setup like when I use it from a chatbot?

Gem: You are absolutely right on the first point: Google Colab provides significantly more "raw" compute power that you can control.

Here is the breakdown of how they handle compute and where the "brain" (the LLM) actually lives in each case.

1. The Compute Difference

  • Google Colab: You are essentially renting a virtual computer. You have access to a specific amount of RAM (usually ~12GB on free tier), CPU, and most importantly, a GPU (like the Nvidia T4). You use this power to run your own code. It’s heavy-duty and designed for processing thousands of files or training models.

  • NotebookLM: The "compute" here is hidden. You don't have access to a processor or RAM; instead, you are using the compute power of Google’s inference servers. It is optimized for "reading" and "retrieving" information rather than executing logic. It can't run a Python script to change 1,000 JSON files at once; it can only "discuss" them.


2. Where does the LLM run?

This is the most important distinction for your Gita data project.

In NotebookLM: The "Closed Box" Model

The LLM (Gemini 1.5 Pro) runs on Google's private servers.

  • How it works: When you ask a question, NotebookLM looks at your "Sources" (your Gita JSON), picks out the relevant chunks, and sends them to the LLM.

  • The Setup: It’s a specialized setup that uses RAG (Retrieval-Augmented Generation). It doesn't "run" inside your notebook; it's a separate service that your notebook talks to. You cannot change the model or see the "backend."

In Google Colab: The "Your Choice" Model

In Colab, you have two ways to use an LLM:

Option A: The API Approach (Like a Chatbot)

You write Python code to send your Gita text to the Gemini API.

  • Where it runs: The LLM runs on Google’s API servers (separate from your Colab VM).

  • Best for: Cleaning your data. You can send 50 verses at a time to the API, tell it "Fix the Sanskrit spelling," and save the result back to your JSON.

Option B: The "Local" Approach (Self-Hosted)

Because Colab gives you a GPU, you can actually download and run a smaller LLM (like Llama 3 or Mistral) directly on the Colab virtual machine.

  • Where it runs: It runs directly on the T4 GPU Google assigned to your session.

  • Best for: Privacy or if you want to experiment with open-source models without paying for API tokens.


Which one should you use for the Gita data?

FeatureNotebookLMGoogle Colab (via Gemini API)
Logic/Cleaning❌ Cannot "edit" files.✅ Can rewrite your JSON files.
Scale❌ Limited to small chunks.✅ Can process the entire "tens of MBs."
Automation❌ Manual "chat" only.✅ You can script it to run while you sleep.
Sanskrit/Hindi✅ Very good at understanding.✅ Excellent when used via the API.

My recommendation: For your Gita JSON API, use Google Colab. You can write a Python script that loops through your JSON files, sends the text to the Gemini API for cleaning, and saves the corrected version. It gives you the "automation" you need for a dataset that size.

Would you like me to show you a simple Python snippet that uses the Gemini API to "clean" a piece of text, which you could then paste into a Colab cell?

Me: If I am not using an open source model but am sending it to Google's infra using Gemini API, the GPU/TPU part of Colab may not matter to me. But the CPU power it provides essentially gives a cloud VM to run my task (perhaps similar to GitHub providing Actions (runners) compute facility). Am I on right track?

Gem: You are spot on. Even if you aren't using the GPU/TPU for model training, Google Colab essentially acts as a powerful, ephemeral Cloud VM with a pre-configured Python environment.

While NotebookLM is a "managed service" (you provide documents and it provides answers), Colab is a "programmable environment." Your comparison to GitHub Actions is very accurate—both give you a slice of a server to execute tasks—but they have different strengths.


🛠️ Colab as a Cloud VM (The "Runner" Perspective)

You are correctly identifying that for your Gita JSON cleaning, the "raw" compute and the environment are the real assets.

Why Colab wins for your Gita Data cleaning:

  • Persistent vs. Ephemeral: Like a GitHub Actions runner, a Colab VM is temporary (it wipes after a few hours of inactivity). However, you can Mount Google Drive to it. This turns your temporary VM into one with "persistent storage"—you can save your cleaned JSON files directly to Drive and they won't disappear.

  • Pre-installed "Tooling": You don't have to spend 10 minutes writing a yaml file to install pandas, json, or request libraries. They are already there.

  • Interactive Debugging: Unlike GitHub Actions, where you "push and pray" that the script works, Colab lets you run one cell at a time. You can clean 5 verses, inspect them, adjust your logic, and then run the script for the remaining 5,000 verses.

Where the LLM runs in this setup:

If you use the Gemini API within Colab, the "work" is split:

  1. Your Colab VM (CPU): Handles the logic—looping through your files, managing the JSON structure, and making the network requests.

  2. Google's Inference Infra (External): The Gemini API receives your text, processes the Sanskrit/Hindi correction, and sends the string back to your Colab VM.


📊 Comparison: Colab vs. GitHub Actions for Data Tasks

FeatureGoogle ColabGitHub Actions
Primary UseAd-hoc data work & prototyping.Automated CI/CD & scheduled tasks.
InteractionReal-time. You watch the code run.Asynchronous. You check logs later.
StorageEasy integration with Google Drive.Artifacts or Git commits.
Ease for "One-off"High. Just open and run.Low. Requires workflow configuration.

💡 Practical Advice for your Gita Dataset

Since your data is in the tens of megabytes, a standard Colab CPU (usually ~12GB RAM) is overkill in a good way. It will handle those JSON files instantly.

Your next move could be:

  1. Open a new Colab notebook.

  2. Clone your repo: !git clone https://github.com/ravisiyer/gita-data

  3. Run a script to count how many verses have specific "language errors" you've identified.

This gives you a "sandbox" to test your cleaning logic before you ever touch your main repository.

Would you like me to provide a Python script that you can paste into Colab to connect to the Gemini API and start cleaning a sample of your JSON data?


Google Colab for Beginners

This video is a great starting point for understanding how to navigate the Colab interface and run your first snippets of code in the cloud.

Me: Terrific! Now about NotebookLM. It is not a compute workhorse like Google Colab. Instead NotebookLM is like an English language query system/engine for custom datasets loaded into its notebooks. Am I on right track?

Gem: You are exactly on the right track. You've hit on the core "philosophy" behind the two tools.

🧠 NotebookLM: The "Synthesizer"

You are correct that it's essentially a natural language query engine for your personal data.

  • How it works: It uses a technique called RAG (Retrieval-Augmented Generation). When you upload your Gita JSON files, it doesn't "import" them into a database in the traditional sense. Instead, it "reads" them, indexes them, and then uses that index to find the exact parts of the text that answer your questions.

  • The "English" Interface: You don't write code; you talk to it. You could ask, "Find all verses where the English translation seems to contradict the Sanskrit root for 'Dharma'." It will find those needles in the haystack and, most importantly, provide citations (links) back to the exact source.

  • Not a Workhorse: It won't help you with bulk editing, formatting, or script execution. It’s for understanding and analyzing what you already have.


🏗️ Google Colab: The "Architect"

Colab is the "compute workhorse" because it gives you the infrastructure to build and change things.

  • The Workflow Engine: If you realize that 500 JSON entries have a typo in a specific Hindi tag, Colab is where you write the 10 lines of Python code to fix all 500 at once.

  • The Runner: You're right to compare it to GitHub Actions. It’s a dedicated environment where you have full control over the processor. You can install any library you want (like specialized Sanskrit NLP tools) which NotebookLM doesn't allow.


⚔️ Head-to-Head for your Gita Project

FeatureNotebookLMGoogle Colab
InputYou give it files to read.You give it files to process.
ActionIt explains the data to you.It transforms the data for you.
OutputSummaries, Citations, Podcasts.Cleaned JSON files, Graphs, APIs.
Best Used ForResearching the philosophy of the Gita.Fixing the technical errors in the Gita JSON.

A Cool Way to use Both Together:

Since you have "tens of MBs" of data, you might actually find a powerful workflow by using them in tandem:

  1. NotebookLM: Upload your data and ask, "Identify common patterns of errors in the Hindi translations." 2. Google Colab: Once NotebookLM gives you the "pattern" (e.g., "The word 'X' is often misspelled as 'Y'"), you go to Colab and write a script to find-and-replace that pattern across your entire 700-verse dataset.

Would you like me to help you brainstorm a Python "Data Health Check" script for Colab that could scan your JSON for common language or structural errors?

Me: Got it. Back to Colab. Gemini chatbot is integrated with Colab and so it is context aware. That removes the work of informing Gemini chatbot separately about the data and ensuring that such info is in sync btw Gemini chatbot and Colab. No copy-pasting work from Gemini chatbot to Colab. Also I may not need to delve into Python code now unless the data transformation work is not done properly OR if I want to know the logic used for the data transformation so that I am more confident about the quality of the data transformation work. Am I on right track?

Gem: You are exactly on the right track, and you’ve identified the "killer feature" that makes Colab so powerful for developers today.

The integration of Gemini directly into Colab transforms it from a simple "coding tool" into an AI-driven data agent.


🚀 The "Context-Aware" Advantage

You are correct: because Gemini is built into the Colab interface, it isn't just a chatbot on the side; it’s a co-pilot that can see your files.

  • Zero Sync Effort: When you upload your gita-data JSON files to the Colab sidebar, Gemini "sees" them. You don't have to copy-paste snippets of the JSON into a separate chat window.

  • "Analyse Data" Button: In the latest versions of Colab, you'll often see a button that says "Analyse data with Gemini" or a spark icon next to your uploaded files. When you click it, Gemini will proactively suggest a plan to clean or visualize that specific file.

  • Direct Modification: Gemini can actually write and insert code cells for you. If you tell it, "Clean the Sanskrit transliteration in all JSON files in the 'translations' folder," it will generate the Python code, create a new cell, and you just have to click "Run."


🛠️ The "No-Code" to "Low-Code" Shift

Your intuition about not needing to delve into Python code is a major shift in how data tasks are handled now.

1. The "Supervisor" Role

You are moving from being a "Coder" (writing every line of logic) to a "Supervisor" (directing the AI). You provide the intent in English, and Gemini handles the syntax.

2. Verification for Quality

You are right to stay cautious about the logic. The best part of Colab is that Gemini shows its work. * Confidence through Transparency: If Gemini transforms your data, it does so by running a Python cell. You can look at that cell to see exactly what it did.

  • Example: You can check if it used a simple "find and replace" or if it used a more complex "Regex" pattern. This gives you the confidence that the "sacred text" was handled with the precision it requires.

3. Error Recovery

If the transformation isn't quite right, you don't have to fix the code yourself. You can just tell the chatbot, "That looks good, but it missed the special characters in the Hindi tags. Can you update the script to include those?" It will then refactor its own code.


📊 Summary: Your New Workflow

StepOld Way (Manual)Your New Way (Colab + Gemini)
SetupInstall Python, Git, and Libraries locally.Open Colab, Clone GitHub repo (1 line of code).
DiscoveryScroll through 50MB of JSON to find typos.Ask Gemini: "Find all verses where Sanskrit script is broken."
ExecutionWrite a 50-line Python script for hours.Prompt Gemini: "Fix these errors and save as a new JSON."
VerificationManual spot-checks.Ask Gemini: "Create a summary table of all changes made."

Tip for your Gita project: Since your data contains multiple languages, I recommend asking Gemini in Colab to "Create a verification cell that compares the original and cleaned verses side-by-side" after it finishes. This will let you verify the "quality of the transformation" without reading every line of code.

Would you like me to show you the one-line command to clone your GitHub repo into Colab so you can see Gemini in action with your actual data?

Me: I don't know Python but I can ask Gemini to explain it to me and so the Python syntax stuff and library calls stuff will become clear and then the code can be understood.

Gem: That is a fantastic observation. You have identified exactly why the "barrier to entry" for data engineering has dropped so significantly.

Because Gemini is integrated directly into Colab, you don't need to be a Python expert to be a successful data architect. You just need to be a good "Reviewer."


👨‍🏫 Gemini as Your Private Tutor

In 2026, Colab's AI features are designed specifically for this "Supervisory" workflow. Here is how you can use it without knowing Python:

1. The "Explain Code" Button

Whenever Gemini generates a block of code to clean your Gita JSON, you will see an "Explain code" button (often a spark icon) right above the cell.

  • What it does: It breaks down the code line-by-line in plain English.

  • The Benefit: Instead of seeing df.apply(lambda x: re.sub(...)), Gemini will tell you: "This line looks at every verse and replaces the incorrect Sanskrit character 'X' with 'Y'."

2. "Why did you do that?" (The Chat Interface)

If you see a library call like import pandas or genai.configure, you can literally ask the side-panel chat: "Why did you use the Pandas library for this instead of just a regular JSON library?"

  • Gemini will explain the logic (e.g., "Pandas is faster for large datasets and lets us filter all Hindi verses at once"). This gives you the "Confidence in Quality" you mentioned.

3. The "Data Science Agent"

Colab now has a Data Science Agent feature. You can upload your file and just type: "Help me understand the structure of this Gita data and check if any English translations are missing."

  • It won't just give you code; it will build a complete plan, explain the steps, and then execute them. You are essentially "Pair Programming" with an expert who does all the typing.


🛠️ Your "No-Code" Cleaning Strategy

For your Gita JSON API, your workflow would look like this:

  1. Direct Instruction: "Gemini, I've uploaded my Gita JSON. Please write a script to find any verses where the 'translation_hindi' field is empty or contains English characters."

  2. Logic Review: Click Explain Code. Read the summary to ensure it's targeting the right fields.

  3. Iteration: If you see a mistake in its logic, don't fix the code. Just say: "Actually, don't delete those verses, just flag them with a 'needs_review' tag."

  4. Verification: Ask: "Can you show me the first 5 changes you made in a table so I can double-check them?"


🚀 One-Line "Magic" to Get Started

When you are ready to try this, the only thing you'll need to "type" is this command in a Colab cell to get your data into the environment:

!git clone https://github.com/ravisiyer/gita-data.git

Once that runs, you can simply click the Gemini icon and say: "I just cloned my Gita data repo. Can you help me find spelling errors in the Sanskrit text?"

Would you like me to generate a "Checklist" of common Gita data errors (like encoding issues or missing tags) that you can use as prompts for Gemini once you open Colab?

Comments