yargs npm package does not work well with bundlers like esbuild and webpack

The context for this post is a small command line utility scrape-blogger-post that I wrote recently with mainly Gemini's help - GitHub repo. It is a Node.js script that allows you to scrape the HTML content of a Blogger post from its public URL. It can be used to extract just the post HTML content or to create a full, self-contained HTML document for backup purposes. Over past few days, I found its single post backup feature to be quite handy and am mainly using it that way.

The utility itself is a very small one. But I used it to learn about creating a minified bundle of it as a single .js (JavaScript) file which can be downloaded and used with node theoretically on any supported OS including Windows, Linux and MacOS. Even on Windows which is where I use it, I was very interested in the possibility of having this single .js file which can be run on any Windows PC having node, without having to go through npm install and having node_modules directory. 

I was successful in creating this single minified and bundled scrapeBlogPost.js file (in dist directory) using esbuild bundler. Now to use this utility, one needed to simply have node installed on the computer and the run the script using:
node scrapeBlogPost.js url [optional argument]
I repeat, no need for node_modules folder and npm install stuff.
I found this to be very interesting. With Gemini's help, I also wrote a small Powershell script that invokes this command and which if placed in a directory in the user's PATH environment variable, would enable the node script to be run from any directory in the file system. IFIRC, this is the commit-tree for the last stable version of above script. Note that the dist folder is not part of the GitHub repo and so the minified and bundled scrapeBlogPost.js has to be generated using 'npm run build' as described in README for this commit. I could have created a release and uploaded the dist/scrapeBlogPost.js file there but I felt that would be a little premature and so did not do it.

Later I made some improvements to the script for which Gemini suggested using yargs package which I agreed to, not knowing that yargs was going to trip up the esbuild bundler that I was using. Given below are edited extracts of the chat with Gemini ('Node.js Blogger Scraper Analysis') and later ChatGPT on this phase of the small utility program work.

Me: But I think we could have clearer differentiation between 'pure post HTML' and backup which is 'post HTML + title and as full HTML document' than the implicit kind of way we have now. I think there should be an argument which only specifies whether the output will be 'pure post HTML' or 'full HTML document having post HTML + title', with the default value being the latter. And then there can be an optional fourth argument which if present is the output filename, the output is written to. If fourth argument is not specified, the output will be written to console.
What do you think of this design of the parameters of the program?

Gem: That's a much clearer and more flexible design for the program's parameters. Your proposed changes improve usability by separating the concerns of output format and output destination.

This new design is a significant improvement because it:

  • Decouples Concerns: You're no longer mixing the output format (pure HTML vs. full HTML) with the output destination (console vs. file). This makes the program's behavior more predictable and easier to understand.

  • Improves User Experience: It's more intuitive to specify the output format with a dedicated flag and then, optionally, specify where that output should go.

  • Maintains a Clear Default: Setting "full HTML document" as the default output format aligns with your most frequent use case (--backup), but it does so in a more explicit and scalable way.

To implement this, you'll need to modify how the script processes process.argv. Instead of using simple positional checks, it's a good practice to use named flags (e.g., --format or -f and --output or -o). While you can do this manually, for more complex command-line interfaces, it's often better to use a dedicated library like yargs or commander to handle argument parsing. These libraries provide features like default values, help text, and validation, making your CLI more robust and user-friendly.

Here's a breakdown of how the logic would be structured:

  1. The script will still require the URL as the first argument.

  2. The second argument would be a flag to specify the output format. For example, --pure-html or --full-html (the latter being the default).

  3. The third argument, if present, would be the output filename. If it's missing, the script would default to printing to the console.

This approach is a great next step in making the program more maintainable and user-friendly.

Me: Good. Can you make the required changes. You may use named flags and a suitable dedicated library for that.

Gem: That's an excellent design. The use of named flags makes the program's intent and usage much clearer. I'll use the popular yargs library to implement this new command-line interface.

[Gem provided the updated code. After some iterations including me having to refer to yargs documentation, the code was running as expected. With Gem's help, the documentation was updated. Next step was to create a build using esbuild. That's when I started encountering errors.]

Me: npm run build reported success. But running the created dist file is giving an error:
test> node ..\dist\scrapeBlogPost.js
node:internal/modules/cjs/loader:1966
    throw new ERR_INVALID_ARG_VALUE('filename', filename, createRequireError);
    ^
TypeError [ERR_INVALID_ARG_VALUE]: The argument 'filename' must be a file URL object, file URL string, or absolute path string. Received undefined
    at createRequire (node:internal/modules/cjs/loader:1966:11)
...

Gem: ... The error occurs because esbuild sometimes struggles to bundle certain Node.js-specific modules correctly, particularly when they reference internal Node.js APIs in a way that the bundler doesn't expect. A common and simple fix is to tell esbuild to not bundle yargs but to treat it as an external dependency that Node.js will load at runtime.
[Gem gave instructions on how to change the esbuild config for that. I tried that out and it worked. But there was an issue ...]

Me: ... But will this introduce inefficiency as yargs does not seem to be bundled at build time?

Gem: That's a great question. You're correct that the solution of adding yargs to the external array means it's not bundled into the final dist file.

Impact on Efficiency

The script will now rely on Node.js's module resolution system to find and load yargs at runtime. The yargs library and its dependencies will need to be present in your node_modules folder when you run the script, just as they are for your original, unbundled script.

For most scenarios, this will not introduce any noticeable inefficiency. The yargs package is relatively small, and Node.js's module loading is highly optimized. The performance difference will be negligible unless your application has an extremely tight performance budget where every millisecond counts. The main benefit of bundling (a smaller, single-file distribution) is retained for all other dependencies like axios and cheerio, which are much larger.

The primary goal of this particular build step was to create a single executable file that could be run without needing to install dependencies separately. By making yargs external, that goal is partially compromised for yargs itself, but it's a necessary trade-off to resolve the bundling error caused by esbuild's incompatibility with yargs's internal structure.

Me: But will that mean that I need to keep a node_modules directory with yargs and its dependencies in the folder where I will be having the bundled ScrapeBlogPost.js

Gem: Yes, you are correct. Because yargs is no longer bundled into the dist file, you will need to keep the node_modules directory with yargs and its dependencies in the same location as your package.json file.

The dist directory containing scrapeBlogPost.js is a sibling to the node_modules directory. Node.js's module resolution system will look for the required yargs module in the node_modules directory of the current project.

Solution:

The best approach, which you have already documented in your README.md, is to use a simple Powershell or other shell script to run the bundled file from the project's root directory, regardless of where you are on the file system.

The Powershell script you included in your documentation handles this by specifying the full path to the bundled script:

node <full_path_to_project_folder>\dist\scrapeBlogPost.js

By doing this, you ensure that the node command is executed from a directory that has access to the necessary node_modules folder, even if the user is in a different directory when they run the command.

This is a good, standard practice for distributing a Node.js CLI tool that has external dependencies. It keeps the core built file small while still allowing it to be easily used as a command-line tool.


[I was not satisfied with above solution as I wanted to have the single dist/scrapeBlogPost.js without node_modules folder solution we had prior to adding yargs package.]

Gem: ... You are correct that esbuild has a specific challenge with yargs. This is due to how yargs uses dynamic requires and references Node.js-internal modules in a way that static bundlers like esbuild cannot fully analyze and bundle without causing errors.


Recommended Build Tools for yargs

Since yargs is primarily used for Node.js command-line interfaces (CLIs), the most common and compatible build tool is not a bundler at all, but rather the Node.js runtime itself.

The standard way to distribute a Node.js CLI is to:

  1. Do not bundle it. Just ship the source code, package.json, and node_modules folder. The user runs npm install and then executes the script directly with node.

  2. Use a bundler that's more flexible with dynamic dependencies. If you absolutely need a single-file executable, you can use other tools.

The two main alternatives to esbuild for this type of project are:

  • Webpack: Although known for its complexity, Webpack is highly configurable and can handle more complex dependency graphs, including the dynamic require() statements used by yargs. It can be configured to treat specific modules as external or to process them with specialized loaders.

  • Rollup: While primarily used for bundling JavaScript libraries, Rollup is also a great option for Node.js CLI tools. It has strong support for tree-shaking and can handle dynamic imports better than esbuild for certain use cases. Like Webpack, it can be configured to treat modules as external.

For a Node.js CLI application, however, the simplest and most reliable approach is often to skip the bundling process altogether. If the goal is a single-file executable, a better alternative to bundling would be a tool like pkg, which packages a Node.js project into a single executable file, including all dependencies, without the need for a complex bundling configuration.

Me: Disappointing that esbuild cannot handle yargs to create a single .js 'executable-like' bundled file. For my learning, I would like to later on explore using webpack.
...
Me: Sorry if I am being fixated on this. I feel very strongly that I must know how to bundle this simple program even if it uses yargs into one single .js file which can be run from anywhere with just node filename command. ... Now I am mentally tired. .... I will pick on this again later. I am ready to get into webpack stuff if needed. BTW Google Search AI says teh following:

The statement "esbuild cannot handle yargs" is not entirely accurate. While there have been historical compatibility issues, particularly related to how yargs handles CommonJS (CJS) and ECMAScript Modules (ESM), these issues are generally resolvable.

Common Issues and Solutions:

ESM vs. CommonJS:

yargs versions 17.0.0 and above primarily use ESM. If your project or a dependency (like aem-clientlib-generator) is still using require() for yargs in a CJS context, esbuild might encounter errors.

Solution: Update your project to use ESM syntax (e.g., import yargs from 'yargs/yargs') or ensure your build process correctly handles ESM. If upgrading is not feasible, consider downgrading yargs to a version that still supports CJS (e.g., yargs@16.2.0).

Missing File Extensions:

In some cases, esbuild might struggle with yargs's entry points if they lack file extensions, even though Node.js handles them correctly.

Solution: Ensure esbuild's resolution algorithm or configuration is set to correctly identify and process such files.

Dependency Issues:

Sometimes, the problem lies not directly with yargs or esbuild but with how a dependency integrates yargs.

Solution: Update the problematic dependency to its latest version, as it might have addressed compatibility issues with newer yargs versions or ESM.

In summary, while esbuild might require specific configurations or updates to handle yargs correctly, especially with the shift to ESM, it is generally capable of bundling yargs when the project and dependencies are set up appropriately.

....

Gem: ... A user cannot simply download your dist file and run it. The single .js file you've created still relies on the yargs module being present in a node_modules folder at runtime.

This is the necessary trade-off for resolving the esbuild error. The build step was able to bundle all other dependencies like axios and cheerio into the single file, but it had to treat yargs as external to avoid the bundling error.


Solution for a Truly Standalone Executable

To achieve your goal of creating a single, portable file that can be run on any machine with just the file itself, you need to use a tool that packages the Node.js runtime with your script and all its dependencies.

The most popular tool for this is pkg. It's a command-line utility that creates a complete executable file for various operating systems (Windows, macOS, Linux) from your Node.js project.

Here’s how you would use it:

  1. Install pkg globally: npm install -g pkg

  2. Configure package.json: Add a bin property to your package.json to tell pkg which file to use as the entry point. You should point it to your original, unbundled scrapeBlogPost.js file, since pkg will handle all the dependencies itself: "bin": "scrapeBlogPost.js"

  3. Run pkgpkg .

    • This command will produce a single executable file (e.g., scrapeBlogPost.exe on Windows) in your project's root folder. This file is completely standalone and does not require the user to have Node.js or node_modules installed. It is a true "executable-like" file.

Me: So far in React and Next.js I have been able to deploy without facing this kind of issue. But then they use Webpack or Vite which seem to be more mature for such matters than esbuild.

Gem: That's an excellent observation. You're spot on—React and Next.js projects typically use build tools like Webpack or Vite, which are more mature and configured out of the box to handle complex dependency issues like the one we've encountered.

This is a key difference in the build tool ecosystem. Webpack and Vite have more extensive plugin ecosystems and are specifically designed to address complex bundling scenarios, including dynamic imports and various module formats. esbuild, by design, prioritizes speed above all else. Its simplicity is a feature, but it means it doesn't have the same built-in logic or plugin support to handle every possible edge case, particularly with Node.js-specific modules like yargs.

The issue you faced highlights a key consideration when choosing a build tool:

  • esbuild: It's lightning-fast and great for bundling simple libraries or for development environments where speed is critical.

  • Webpack/Vite: They offer more powerful, configurable solutions for complex applications with a variety of dependencies, which is why they are the standard for frameworks like Next.js.

Your experience with this simple project gives you a valuable insight into the trade-offs that professional developers make when selecting tools for their projects.

============ end of this round of chat with Gemini =============

Given below are edited extracts of chat with ChatGPT (CG), 'Nodejs yargs esbuild issue'.

[ChatGPT suggested config fixes for esbuild problem which I tried but none of them worked. Next I wanted to have a quick try of Webpack.]

Me: I would like to try Webpack but just limited number of tries. If that too does not work, I will drop the dist bundle possibility for this command line utility.

[CG provided the instructions. I tried that and was able to get a successful build but with warnings which CG said I could ignore. However, on running the created dist/scrapeBlogPost.js I got an error.]

Me: Hmm. So even with webpack, yargs has issues. That's really disappointing! Does yargs documentation clearly state that it has these issues? I mean, I got suckered into using yargs by an AI chatbot not realizing that it has these build issues. Now I don't want to invest the time to remove yargs and use something else that is more bundler friendly.

CG: Yeah 😕 I get your frustration — you’ve been super patient, and it’s disappointing when the tooling doesn’t line up with expectations. Let me break it down:


🔍 Why yargs and bundlers don’t mix well

  • yargs was designed first and foremost as a pure Node.js CLI library (no bundler, just require('yargs')).

  • It leans heavily on:

    • Dynamic require() (commandDir(), locale files, etc.)

    • Filesystem access (scans directories for commands)

    • import.meta.url + fileURLToPath() tricks in modern versions

  • Bundlers (esbuild, Webpack, Rollup) are designed to “see” all code at build time. Dynamic requires and runtime path resolution break that model.

So, yargs works beautifully in plain Node, but bundling is fragile.


📖 Documentation reality

  • yargs docs don’t really mention bundlers.

  • Their official stance is more: “yargs is a Node.js CLI parser.”

  • There are open issues on GitHub where people ran into bundling problems (with Rollup, esbuild, Webpack).

  • But you’re right: there’s no big red warning in the README that says “⚠️ Don’t use yargs if you plan to bundle.”
    → That’s why you got blindsided.


⚡ Options going forward

  1. Keep yargs, drop bundling

    • Ship your CLI as source (npm package with dependencies).

    • Users run npm install -g <your-repo> and get scrapeBlogPost in PATH.

    • This is the path of least pain.

  2. Switch CLI parser (if you had energy for it)

    • Alternatives like commander or oclif are more bundler-friendly.

    • But that’s a refactor — I hear you that you don’t want to reinvest that time.

  3. Use pkg

    • No bundling, no Node/npm requirement for end users.

    • You can still use yargs without caring about bundler quirks.

    • Produces .exe / Linux / macOS binaries.


✅ Pragmatic conclusion

Since you already built around yargs:

  • Don’t rip it out.

  • Don’t keep fighting esbuild/Webpack.

  • Either:

    • Distribute as plain Node project (npm install once, then node scrapeBlogPost.js), or

    • Use pkg for true binaries.

That way, your time is spent on the actual utility, not fighting bundler incompatibilities.

...

[The trials with ChatGPT had resulted in build.js (used for esbuild) config breaking even when running the minified and bundled dist/scrapeBlogPost.js from that dist directory. So I restored build.js to an earlier commit version which worked for the above case. So still not single all-in-one scrapeBlogPost.js as yargs needed to be picked up from node_modules directory, but something that I could easily run via my Powershell script.]

Me: I was able to restore build.js to a version that partially worked (with external yargs). I was able to do an esbuild and am able to run that build through my powershell script that runs it from the dist folder. That, I think, results in node looking up node_modules folder in parent of dist and finding what it wants there and so not giving an error.

CG: 👏 Nice work! You figured out exactly what’s happening.

What you restored is the version where esbuild was run with external: ['yargs']. That means:

  • dist/scrapeBlogPost.js contains your code bundled, but not yargs.

  • When you run your PowerShell script from inside dist/, Node looks upward in the folder hierarchy for node_modules/.

  • It finds node_modules/yargs in the project root, so everything works.


🔍 Key takeaway

This is why your current setup “works”:

  • Bundled dist file in dist/

  • But still dependent on node_modules in parent folder for yargs (and maybe axios, if you also excluded it).

So it’s not a truly self-contained .js bundle, but it’s a hybrid: bundled + externalized dependencies. That’s a valid compromise.


✅ You now have three viable strategies

  1. Current hybrid approach (working now)

    • Use esbuild, externalize tricky packages like yargs.

    • Distribute dist/ + package.json (so users must npm install).

    • 👍 Less breakage risk, but not standalone.

  2. Pure self-contained (ideal but brittle)

    • Bundle everything, fight with esbuild/Webpack quirks.

    • 👍 One file, no install.

    • 👎 You already hit too many edge cases with yargs.

  3. No bundle, just pkg

    • Use pkg to produce .exe (Windows) / binaries (Linux, macOS).

    • 👍 Truly standalone, no Node/npm needed for users.

    • 👎 Bigger artifacts, slower to build.

============ end of related session in CG chat ==========


Comments