yargs npm package does not work well with bundlers like esbuild and webpack
This new design is a significant improvement because it:
Decouples Concerns: You're no longer mixing the output format (
pure HTML
vs.full HTML
) with the output destination (console
vs.file
). This makes the program's behavior more predictable and easier to understand.Improves User Experience: It's more intuitive to specify the output format with a dedicated flag and then, optionally, specify where that output should go.
Maintains a Clear Default: Setting "full HTML document" as the default output format aligns with your most frequent use case (
--backup
), but it does so in a more explicit and scalable way.
To implement this, you'll need to modify how the script processes process.argv
. Instead of using simple positional checks, it's a good practice to use named flags (e.g., --format
or -f
and --output
or -o
). While you can do this manually, for more complex command-line interfaces, it's often better to use a dedicated library like yargs
or commander
to handle argument parsing. These libraries provide features like default values, help text, and validation, making your CLI more robust and user-friendly.
Here's a breakdown of how the logic would be structured:
The script will still require the URL as the first argument.
The second argument would be a flag to specify the output format. For example,
--pure-html
or--full-html
(the latter being the default).The third argument, if present, would be the output filename. If it's missing, the script would default to printing to the console.
This approach is a great next step in making the program more maintainable and user-friendly.
yargs
library to implement this new command-line interface.node:internal/modules/cjs/loader:1966
throw new ERR_INVALID_ARG_VALUE('filename', filename, createRequireError);
^
TypeError [ERR_INVALID_ARG_VALUE]: The argument 'filename' must be a file URL object, file URL string, or absolute path string. Received undefined
at createRequire (node:internal/modules/cjs/loader:1966:11)
esbuild
sometimes struggles to bundle certain Node.js-specific modules correctly, particularly when they reference internal Node.js APIs in a way that the bundler doesn't expect. A common and simple fix is to tell esbuild
to not bundle yargs
but to treat it as an external dependency that Node.js will load at runtime.yargs
to the external
array means it's not bundled into the final dist
file.Impact on Efficiency
The script will now rely on Node.js's module resolution system to find and load yargs
at runtime. The yargs
library and its dependencies will need to be present in your node_modules
folder when you run the script, just as they are for your original, unbundled script.
For most scenarios, this will not introduce any noticeable inefficiency. The yargs
package is relatively small, and Node.js's module loading is highly optimized. The performance difference will be negligible unless your application has an extremely tight performance budget where every millisecond counts. The main benefit of bundling (a smaller, single-file distribution) is retained for all other dependencies like axios
and cheerio
, which are much larger.
The primary goal of this particular build step was to create a single executable file that could be run without needing to install dependencies separately. By making yargs
external, that goal is partially compromised for yargs
itself, but it's a necessary trade-off to resolve the bundling error caused by esbuild
's incompatibility with yargs
's internal structure.
Me: But will that mean that I need to keep a node_modules directory with yargs and its dependencies in the folder where I will be having the bundled ScrapeBlogPost.js
Gem: Yes, you are correct. Because yargs
is no longer bundled into the dist
file, you will need to keep the node_modules
directory with yargs
and its dependencies in the same location as your package.json
file.
The dist
directory containing scrapeBlogPost.js
is a sibling to the node_modules
directory. Node.js's module resolution system will look for the required yargs
module in the node_modules
directory of the current project.
Solution:
The best approach, which you have already documented in your README.md
, is to use a simple Powershell or other shell script to run the bundled file from the project's root directory, regardless of where you are on the file system.
The Powershell script you included in your documentation handles this by specifying the full path to the bundled script:
node <full_path_to_project_folder>\dist\scrapeBlogPost.js
By doing this, you ensure that the node
command is executed from a directory that has access to the necessary node_modules
folder, even if the user is in a different directory when they run the command.
This is a good, standard practice for distributing a Node.js CLI tool that has external dependencies. It keeps the core built file small while still allowing it to be easily used as a command-line tool.
[I was not satisfied with above solution as I wanted to have the single dist/scrapeBlogPost.js without node_modules folder solution we had prior to adding yargs package.]
Gem: ... You are correct that esbuild
has a specific challenge with yargs
. This is due to how yargs
uses dynamic requires and references Node.js-internal modules in a way that static bundlers like esbuild
cannot fully analyze and bundle without causing errors.
Recommended Build Tools for yargs
Since yargs
is primarily used for Node.js command-line interfaces (CLIs), the most common and compatible build tool is not a bundler at all, but rather the Node.js runtime itself.
The standard way to distribute a Node.js CLI is to:
Do not bundle it. Just ship the source code,
package.json
, andnode_modules
folder. The user runsnpm install
and then executes the script directly withnode
.Use a bundler that's more flexible with dynamic dependencies. If you absolutely need a single-file executable, you can use other tools.
The two main alternatives to esbuild
for this type of project are:
Webpack: Although known for its complexity, Webpack is highly configurable and can handle more complex dependency graphs, including the dynamic
require()
statements used byyargs
. It can be configured to treat specific modules as external or to process them with specialized loaders.Rollup: While primarily used for bundling JavaScript libraries, Rollup is also a great option for Node.js CLI tools. It has strong support for tree-shaking and can handle dynamic imports better than
esbuild
for certain use cases. Like Webpack, it can be configured to treat modules as external.
For a Node.js CLI application, however, the simplest and most reliable approach is often to skip the bundling process altogether. If the goal is a single-file executable, a better alternative to bundling would be a tool like pkg
, which packages a Node.js project into a single executable file, including all dependencies, without the need for a complex bundling configuration.
The statement "esbuild cannot handle yargs" is not entirely accurate. While there have been historical compatibility issues, particularly related to how yargs handles CommonJS (CJS) and ECMAScript Modules (ESM), these issues are generally resolvable.
Common Issues and Solutions:
ESM vs. CommonJS:
yargs versions 17.0.0 and above primarily use ESM. If your project or a dependency (like aem-clientlib-generator) is still using require() for yargs in a CJS context, esbuild might encounter errors.
Solution: Update your project to use ESM syntax (e.g., import yargs from 'yargs/yargs') or ensure your build process correctly handles ESM. If upgrading is not feasible, consider downgrading yargs to a version that still supports CJS (e.g., yargs@16.2.0).
Missing File Extensions:
In some cases, esbuild might struggle with yargs's entry points if they lack file extensions, even though Node.js handles them correctly.
Solution: Ensure esbuild's resolution algorithm or configuration is set to correctly identify and process such files.
Dependency Issues:
Sometimes, the problem lies not directly with yargs or esbuild but with how a dependency integrates yargs.
Solution: Update the problematic dependency to its latest version, as it might have addressed compatibility issues with newer yargs versions or ESM.
In summary, while esbuild might require specific configurations or updates to handle yargs correctly, especially with the shift to ESM, it is generally capable of bundling yargs when the project and dependencies are set up appropriately.
....
Gem: ... A user cannot simply download your dist
file and run it. The single .js
file you've created still relies on the yargs
module being present in a node_modules
folder at runtime.
This is the necessary trade-off for resolving the esbuild
error. The build step was able to bundle all other dependencies like axios
and cheerio
into the single file, but it had to treat yargs
as external to avoid the bundling error.
Solution for a Truly Standalone Executable
To achieve your goal of creating a single, portable file that can be run on any machine with just the file itself, you need to use a tool that packages the Node.js runtime with your script and all its dependencies.
The most popular tool for this is pkg
. It's a command-line utility that creates a complete executable file for various operating systems (Windows, macOS, Linux) from your Node.js project.
Here’s how you would use it:
Install
pkg
globally: npm install -g pkgConfigure package.json: Add a bin property to your package.json to tell pkg which file to use as the entry point. You should point it to your original, unbundled scrapeBlogPost.js file, since pkg will handle all the dependencies itself: "bin": "scrapeBlogPost.js"
Run
pkg
: pkg .
- This command will produce a single executable file (e.g.,
scrapeBlogPost.exe
on Windows) in your project's root folder. This file is completely standalone and does not require the user to have Node.js ornode_modules
installed. It is a true "executable-like" file.
Me: So far in React and Next.js I have been able to deploy without facing this kind of issue. But then they use Webpack or Vite which seem to be more mature for such matters than esbuild.
Gem: That's an excellent observation. You're spot on—React and Next.js projects typically use build tools like Webpack or Vite, which are more mature and configured out of the box to handle complex dependency issues like the one we've encountered.
This is a key difference in the build tool ecosystem. Webpack and Vite have more extensive plugin ecosystems and are specifically designed to address complex bundling scenarios, including dynamic imports and various module formats.yargs
.
The issue you faced highlights a key consideration when choosing a build tool:
esbuild: It's lightning-fast and great for bundling simple libraries or for development environments where speed is critical.
Webpack/Vite: They offer more powerful, configurable solutions for complex applications with a variety of dependencies, which is why they are the standard for frameworks like Next.js.
Your experience with this simple project gives you a valuable insight into the trade-offs that professional developers make when selecting tools for their projects.
============ end of this round of chat with Gemini =============
Given below are edited extracts of chat with ChatGPT (CG), 'Nodejs yargs esbuild issue'.
[ChatGPT suggested config fixes for esbuild problem which I tried but none of them worked. Next I wanted to have a quick try of Webpack.]
Me: I would like to try Webpack but just limited number of tries. If that too does not work, I will drop the dist bundle possibility for this command line utility.
[CG provided the instructions. I tried that and was able to get a successful build but with warnings which CG said I could ignore. However, on running the created dist/scrapeBlogPost.js I got an error.]
Me: Hmm. So even with webpack, yargs has issues. That's really disappointing! Does yargs documentation clearly state that it has these issues? I mean, I got suckered into using yargs by an AI chatbot not realizing that it has these build issues. Now I don't want to invest the time to remove yargs and use something else that is more bundler friendly.
CG: Yeah 😕 I get your frustration — you’ve been super patient, and it’s disappointing when the tooling doesn’t line up with expectations. Let me break it down:
🔍 Why yargs and bundlers don’t mix well
-
yargs was designed first and foremost as a pure Node.js CLI library (no bundler, just
require('yargs')
). -
It leans heavily on:
-
Dynamic
require()
(commandDir()
, locale files, etc.) -
Filesystem access (scans directories for commands)
-
import.meta.url
+fileURLToPath()
tricks in modern versions
-
-
Bundlers (esbuild, Webpack, Rollup) are designed to “see” all code at build time. Dynamic requires and runtime path resolution break that model.
So, yargs works beautifully in plain Node, but bundling is fragile.
📖 Documentation reality
-
yargs docs don’t really mention bundlers.
-
Their official stance is more: “yargs is a Node.js CLI parser.”
-
There are open issues on GitHub where people ran into bundling problems (with Rollup, esbuild, Webpack).
-
But you’re right: there’s no big red warning in the README that says “⚠️ Don’t use yargs if you plan to bundle.”
→ That’s why you got blindsided.
⚡ Options going forward
-
Keep yargs, drop bundling
-
Ship your CLI as source (npm package with dependencies).
-
Users run
npm install -g <your-repo>
and getscrapeBlogPost
in PATH. -
This is the path of least pain.
-
-
Switch CLI parser (if you had energy for it)
-
Use
pkg
-
No bundling, no Node/npm requirement for end users.
-
You can still use yargs without caring about bundler quirks.
-
Produces
.exe
/ Linux / macOS binaries.
-
✅ Pragmatic conclusion
Since you already built around yargs:
-
Don’t rip it out.
-
Don’t keep fighting esbuild/Webpack.
-
Either:
-
Distribute as plain Node project (
npm install
once, thennode scrapeBlogPost.js
), or -
Use pkg for true binaries.
-
That way, your time is spent on the actual utility, not fighting bundler incompatibilities.
...
[The trials with ChatGPT had resulted in build.js (used for esbuild) config breaking even when running the minified and bundled dist/scrapeBlogPost.js from that dist directory. So I restored build.js to an earlier commit version which worked for the above case. So still not single all-in-one scrapeBlogPost.js as yargs needed to be picked up from node_modules directory, but something that I could easily run via my Powershell script.]
Me: I was able to restore build.js to a version that partially worked (with external yargs). I was able to do an esbuild and am able to run that build through my powershell script that runs it from the dist folder. That, I think, results in node looking up node_modules folder in parent of dist and finding what it wants there and so not giving an error.
CG: 👏 Nice work! You figured out exactly what’s happening.
What you restored is the version where esbuild
was run with external: ['yargs']
. That means:
-
dist/scrapeBlogPost.js
contains your code bundled, but not yargs. -
When you run your PowerShell script from inside
dist/
, Node looks upward in the folder hierarchy fornode_modules/
. -
It finds
node_modules/yargs
in the project root, so everything works.
🔍 Key takeaway
This is why your current setup “works”:
-
Bundled dist file in
dist/
-
But still dependent on
node_modules
in parent folder for yargs (and maybe axios, if you also excluded it).
So it’s not a truly self-contained .js
bundle, but it’s a hybrid: bundled + externalized dependencies. That’s a valid compromise.
✅ You now have three viable strategies
-
Current hybrid approach (working now)
-
Use esbuild, externalize tricky packages like yargs.
-
Distribute
dist/
+package.json
(so users mustnpm install
). -
👍 Less breakage risk, but not standalone.
-
-
Pure self-contained (ideal but brittle)
-
Bundle everything, fight with esbuild/Webpack quirks.
-
👍 One file, no install.
-
👎 You already hit too many edge cases with yargs.
-
-
No bundle, just pkg
-
Use pkg to produce
.exe
(Windows) / binaries (Linux, macOS). -
👍 Truly standalone, no Node/npm needed for users.
-
👎 Bigger artifacts, slower to build.
-
============ end of related session in CG chat ==========
Comments
Post a Comment