Notes on bypassing CORS restriction (error) in JavaScript code that fetches blogger blog feed

Last updated on 2 April 2024

I have put up an HTML and JavaScript modified (and simplified) version of Google Apps Script based bloggerToEbook . Key feature of this barebones blogger blog feed to set of posts (book) HTML & JS program is bypassing CORS restriction/error using cors-anywhere.herokuapp.com. For more on latter, see section '4. Using CORS-anywhere' in https://codedamn.com/news/javascript/javascript-cors-error-handling . You can use the program by simply downloading the index.html and script.js files and then opening index.html in a browser like Chrome. A server like Live Server (VSCode extension) is not needed.

To save the dynamically rendered HTML in Chrome browser, I am using Save Rendered HTML Chrome extension but that seems to need a server like Live Server. Print as PDF can also be used to save rendered page as PDF. Chrome browser's Save As ... command used with 'Save as type:' specified as "Webpage, Single File (*.mhtml)" results in a single .mhtml file being created which has all text and images in the (dynamically) rendered page.

I also tried out the "2. Using a proxy server" suggestion in above mentioned codedamn.com article, which essentially is creating a simple Node Express proxy server using the http-proxy-middleware node package, and using that proxy server (on localhost i.e. one's computer) instead of cors-anywhere.herokuapp.com.

I stumbled into a problem of fetch from the index.html page and associated script code, run through LiveServer on localhost on one port (4000) being blocked by CORS from accessing the proxy server on same localhost on another port (3001)! The following article covers this scenario: Why does my JavaScript code receive a "No Access-Control-Allow-Origin header is present on the requested resource" error, while Postman does not?, https://sentry.io/answers/why-does-my-javascript-code-receive-a-no-access-control-allow-origin-header-error-while-postman-does-not/ .

Solution was to use cors node package. I also figured out how to get some more logging info. from http-proxy-middleware package.

The simple app server code hardcodes the target URL (blog) (an improved version could pick up the target URL from an environment variable), and the script.js front end code is slightly different from above mentioned solution. The code files (app.js, index.html and script.js) for this solution 

This solution has the disadvantage of requiring Node.js to be installed on the computer. Then a node project has to be created by running npm init after which app.js file has to be created (with code copy-pasted from above) and one has to ensure that the package.json file is using app.js as its 'main' property (instead of index.js which is the npm init default). Also npm install will be needed for the packages used by app.js. Once this groundwork is done, to use the programs, first the node proxy server has to be run ('node app.js'). Then the index.html file has to be opened in a browser.

...

I am working on a slightly improved version (using cors-anywhere.herokuapp.com) which takes in some parameters like blog URL via the HTML page, and provides margin and max width for displayed set of posts (book). But I want to spend very limited time on it now and so the improvements will be limited. I will be updating this post after that is ready. [31 Mar. 2024: Just put up an in-progress version on Github.]

31 Mar. 2024: Came across an alt=json-in-script solution which needs a dynamic script tag, which does not run into CORS issue at all. So no need for any proxy server. Just put up in-progress version of it on Github. It lacks even basic error handling as of now.

...

https://developers.google.com/gdata/docs/json#json-in-script-output explains json-in-script and how it "allows you get around some of the cross-domain security issues you might encounter in typical client side JavaScript" using callback functions. An extract from sample source code provided in it showing initial part of the callback function listEvents and then the script tag, is given below:

function listEvents(root) {
    var feed = root.feed;

...

<script src="http://www.google.com/calendar/feeds/developer-calendar@google.com/public/full?alt=json-in-script&callback=listEvents"></script>

...

I find this json-in-script feature of Blogger to be quite fascinating.

provides the following output (most details of generated handleFeed function parameter is collapsed and shown as [--snipped--]):

// API callback
handleFeed({"version":"1.0","encoding":"UTF-8","feed":{"xmlns":"http://www.w3.org/2005/Atom","xmlns
[--snipped--]
nGwVI_BqZnKzMj1f7OdITiLCqikt95sWih38urrss7QHs2txgXEAfEb2_x0FWUP-NbClF1p4qnqkjc59q9uw\/s220\/Profile-Gulmohar-tree.jpg"}}],"thr$total":{"$t":"0"}}]}});
---- end output ----

So JavaScript code is returned which is essentially the callback function invocation with the feed provided as a JSON object. This code seems to get immediately executed by HTML & JavaScript code (like the example code shown earlier) which encapsulates the GET request in a script tag with src attribute set to the GET request URL.

...

Notes about Blogger and WordPress web feed requests and feed formats, https://ravisiyermisc.blogspot.com/2024/03/notes-about-blogger-and-wordpress-web.html is a useful resource. 

...

When a CORS error occurs, the associated fetch statement throws an exception with the error object having message property as (in one case with my code): "Failed to fetch" and stack property as: "TypeError: Failed to fetch  at HTMLFormElement.<anonymous> (http://127.0.0.1:4000/script.js:47:22)". Corresponding line 47 in script.js is, "response = await fetch(feedReqURL, options);" Chrome console shows an additional message, "Access to fetch at 'https://raviswdev.blogspot.com/' from origin 'http://127.0.0.1:4000' has been blocked by CORS policy: No 'Access-Control-Allow-Origin' header is present on the requested resource. If an opaque response serves your needs, set the request's mode to 'no-cors' to fetch the resource with CORS disabled." 

I wanted to know if in JavaScript code, I can find out whether the error is a CORS error. But it seems that is not possible by design! From https://developer.mozilla.org/en-US/docs/Web/HTTP/CORS/Errors#identifying_the_issue : "Note: For security reasons, specifics about what went wrong with a CORS request are not available to JavaScript code. All the code knows is that an error occurred. The only way to determine what specifically went wrong is to look at the browser's console for details."

Comments

Archive