Identifying Blogger blog posts with size bloat and fixing them
Last updated on 8 Mar 2026
Quick Info
Markdown documents in above repo:
- Readme.md - Main Readme for repo
- checkPostBloat.md - Detailed log of exchanges with Gemini on analysis and checking of Blogger post bloat due to unwanted CSS and tag attributes, followed by my work on checking post bloat for this sw dev blog and reducing it when the bloat had crossed a threshold.
- GColab/prompts.md - Covers the prompts I gave to Google Colab AI related to extracting pre elements from original post HTML (post-orig.html), cleaning them up and then trying to auto-patch them back into PrettyHTML bloat cleanup output file (post-pretty.html). It also has some Gemini exchanges related to the Colab session.
Details
This post follows up my previous post: Fixing Gemini chat to Blogger Compose post copy-paste causing 1.5 MB post size bloat due to unnecessary CSS.Gemini chat: Blogger Feed Request Issue
This part of the chat happened over a few days from end February to early March 2026, and was interspersed with work related to above mentioned blog post creation.
[In response to Gemini question: Are there any other posts you noticed in your feed that might still be carrying that Gemini/Dark Reader bloat?]
Me: I am sure there are other posts too but the content copy-pasted from Gemini was limited and so I did not encounter this issue. I did notice that my incremental blog backup size had become much more than usual (into few MBs instead of being less than an MB). I need to first identify which are the specific posts which are big in size. Blogger dashboard is not showing it to me. So I will have to figure it out some other way. any suggestions?
----- end chat segment -----
Above response led to a series of exchanges which culminated in creation of two Powershell scripts to help measure the payload size of individual Blogger posts and to automate the process of scanning multiple posts of a blog for size analysis. These scripts and associated data are shared in a public GitHub repo which also has a detailed Readme covering how I used these scripts to generate size data for all posts of this blog. The two Powershell scripts are:
- postsize.ps1 which reports post size. See Readme section for details.
- postsInListSize.ps1 which parses a local HTML file containing a list of Blogger post URLs (in a particular format) and invokes postsize.ps1 for each URL to generate a comprehensive report of sizes for all posts in the list. See Readme section for details.
The Readme section Scan Data Processing and Payload Analysis details the steps taken to transform the comprehensive report (raw scan output) of postsInListSize.ps1 for this blog into an Excel workbook with one sheet having sorted list of posts in descending order of post size. After discussion with ChatGPT, I have not included the Excel workbook in the git repo.
===========
8 Mar. 2026
Later as I continued with identification and fixing of size bloat, I tried improving the process. That is covered in additional markdown documents in the same repo, which are mentioned in the Quick Info section above. CSV file versions of the main sheets of above mentioned Excel workbook were created and which are part of the git repo.
Comments
Post a Comment