Identifying Blogger blog posts with size bloat and fixing them

Last updated on 28 Apr 2026

Quick Info


Markdown documents in above repo:
  • Readme.md - Main Readme for repo
  • Gemini Rich Text Copy Bloat Experiment - covers an April 2026 technical investigation into the "Fidelity Tax" of rich-text copying that reveals that the browser's serialization process can bloat a 10 KB HTML segment to 241 KB by baking in computed inline CSS, confirming that a Markdown-to-HTML pipeline is essential for maintaining a lean Blogger database.
  • checkPostBloat.md - Detailed log of exchanges with Gemini on analysis and checking of Blogger post bloat due to unwanted CSS and tag attributes, followed by my work on checking post bloat for this sw dev blog and reducing it when the bloat had crossed a threshold.
  • GColab/prompts.md - Covers the prompts I gave to Google Colab AI related to extracting pre elements from original post HTML (post-orig.html), cleaning them up and then trying to auto-patch them back into PrettyHTML bloat cleanup output file (post-pretty.html). It also has some Gemini exchanges related to the Colab session. 

Details

28 April 2026 Update: This document in the public shared repo mentioned below: Gemini Rich Text Copy Bloat Experiment covers an experiment isolating the root cause of the massive file size increases observed when copy-pasting from Gemini web chat into Blogger's WYSIWYG editor. The top summary section of the document details a comparative analysis between a bloated 241 KB clipboard payload and its clean 10 KB raw DOM counterpart. It demonstrates how the browser's rich-text serialization process permanently bakes computed inline CSS and extension artifacts (such as Dark Reader variables) into the HTML. Ultimately, the findings validate the necessity of bypassing the rich-text clipboard entirely in favor of a dedicated Markdown-to-HTML conversion pipeline.

This post follows up my previous post: Fixing Gemini chat to Blogger Compose post copy-paste causing 1.5 MB post size bloat due to unnecessary CSS.

Gemini chat: Blogger Feed Request Issue

This part of the chat happened over a few days from end February to early March 2026, and was interspersed with work related to above mentioned blog post creation.

[In response to Gemini question: Are there any other posts you noticed in your feed that might still be carrying that Gemini/Dark Reader bloat?]
Me: I am sure there are other posts too but the content copy-pasted from Gemini was limited and so I did not encounter this issue. I did notice that my incremental blog backup size had become much more than usual (into few MBs instead of being less than an MB). I need to first identify which are the specific posts which are big in size. Blogger dashboard is not showing it to me. So I will have to figure it out some other way. any suggestions?

----- end chat segment -----

Above response led to a series of exchanges which culminated in creation of two Powershell scripts to help measure the payload size of individual Blogger posts and to automate the process of scanning multiple posts of a blog for size analysis. These scripts and associated data are shared in a public GitHub repo which also has a detailed Readme covering how I used these scripts to generate size data for all posts of this blog. The two Powershell scripts are:
  1. postsize.ps1 which reports post size. See Readme section for details.
  2. postsInListSize.ps1 which parses a local HTML file containing a list of Blogger post URLs (in a particular format) and invokes postsize.ps1 for each URL to generate a comprehensive report of sizes for all posts in the list. See Readme section for details.
The Readme section Scan Data Processing and Payload Analysis details the steps taken to transform the comprehensive report (raw scan output) of postsInListSize.ps1 for this blog into an Excel workbook with one sheet having sorted list of posts in descending order of post size. After discussion with ChatGPT, I have not included the Excel workbook in the git repo.
===========

8 Mar. 2026

Later as I continued with identification and fixing of size bloat, I tried improving the process. That is covered in additional markdown documents in the same repo, which are mentioned in the Quick Info section above. CSV file versions of the main sheets of above mentioned Excel workbook were created and which are part of the git repo.

Comments