Building a Production CLI Tool in Rust From Scratch
From cargo init to a polished, distributable binary -- every decision, trade-off, and lesson learned along the way.
Introduction
Six months ago, I decided to build a CLI tool in Rust. Not as a learning exercise -- though it certainly became one -- but because I had a genuine, recurring problem that none of the existing tools solved well enough. I needed something that could watch a directory tree for changes, apply a set of transformation rules, and output the results to either stdout or a file, all while being fast enough to run on every file save without interrupting my flow.
I'd been writing Node.js for years, and my first instinct was to reach for commander and chokidar. But the startup time of a Node process -- even a well-optimized one -- was noticeable. On a warm machine, you're looking at 40-80ms before your code even starts executing. That might not sound like much, but when a tool runs on every keystroke, it adds up fast. I wanted something that felt instantaneous.
Why Rust for CLI Tools
The Rust ecosystem has quietly become one of the best environments for building command-line tools. This isn't just about performance -- though sub-millisecond startup times are hard to argue with. It's about the entire developer experience from cargo init to a published binary.
Here's what makes Rust compelling for CLI work specifically:
- Single binary distribution. No runtime, no dependency hell. Copy the binary and it works.
- Cross-compilation. Target Linux, macOS, and Windows from a single machine with
cross. - Excellent crate ecosystem.
clap,tokio,serde,indicatif-- the tools are mature and well-maintained. - The type system catches your mistakes. In a CLI, this matters more than you'd think. User input is inherently untrustworthy, and Rust's
Result/Optiontypes force you to handle every edge case. - Memory safety without a garbage collector. Your tool won't randomly pause for GC during a long-running watch operation.
Project Structure
I spent more time on project structure than I'd like to admit. Coming from Node.js, where you throw files in directories and import them, Rust's module system felt restrictive at first. But once I understood it, I realized it was pushing me toward better architecture.
Here's what I settled on:
src/ main.rs // Entry point, CLI arg parsing lib.rs // Public API (for library usage) cli/ mod.rs // CLI module root args.rs // Argument definitions (clap) commands/ mod.rs watch.rs // Watch command implementation transform.rs // Transform command implementation init.rs // Init command implementation core/ mod.rs watcher.rs // File system watcher transformer.rs // Transformation engine rules.rs // Rule parsing and validation config/ mod.rs loader.rs // Config file discovery and loading schema.rs // Config schema (serde) output/ mod.rs formatter.rs // Output formatting writer.rs // File/stdout writer error.rs // Custom error types
Key Structural Decisions
The separation between cli/ and core/ is deliberate. The core module knows nothing about command-line arguments -- it's a pure library. The CLI module is a thin translation layer that maps user input to core function calls. This means I can also expose the core as a library crate for programmatic use, which has already paid off in testing.
When I started this project, structopt was still a common recommendation. But clap v4 absorbed all of structopt's derive macro functionality, making structopt effectively deprecated. The derive API in clap v4 is clean and well-documented:
More importantly, clap gives you a Command builder API for when the derive macros aren't expressive enough. Some of my subcommands have complex validation logic that's easier to express imperatively. Having both options in the same crate means no awkward bridging between libraries.
Having a lib.rs alongside main.rs means your crate is both a binary and a library. This is a pattern I picked up from ripgrep and other well-structured Rust CLIs. The benefit is twofold:
First, your integration tests can import the library and test core logic without spawning a subprocess. This is dramatically faster and gives you much better error messages on failure.
Second, other tools can depend on your crate as a library. I didn't plan for this originally, but when I wanted to write a VS Code extension that used the same transformation engine, having a clean library interface saved weeks of work.
Argument Parsing with Clap
Every CLI tool lives or dies by its argument parsing. Get this wrong and your users will fight the tool instead of using it. Clap v4's derive macros make the common cases trivial, but there are a few patterns that took me a while to figure out.
use clap::{Parser, Subcommand}; use std::path::PathBuf; #[derive(Parser)] #[command(name = "fwatch")] #[command(about = "Watch and transform files with surgical precision")] #[command(version)] pub struct Cli { #[command(subcommand)] pub command: Commands, /// Enable verbose output #[arg(short, long, global = true)] pub verbose: bool, /// Path to config file (default: .fwatch.toml) #[arg(short, long, global = true)] pub config: Option<PathBuf>, } #[derive(Subcommand)] pub enum Commands { /// Watch a directory for changes and apply transforms Watch { /// Directory to watch #[arg(default_value = ".")] path: PathBuf, /// Debounce interval in milliseconds #[arg(short, long, default_value_t = 100)] debounce: u64, }, /// Run transforms on a file or directory once Transform { /// Input path input: PathBuf, /// Output path (default: stdout) #[arg(short, long)] output: Option<PathBuf>, }, /// Initialize a new .fwatch.toml config Init, }
global = true flag on verbose and config means these flags can appear before or after the subcommand. Without this, fwatch --verbose watch would work but fwatch watch --verbose would fail. Users expect both to work.
Error Handling Strategy
Error handling in CLI tools is different from error handling in libraries. In a library, you want errors to be precise and composable. In a CLI, you want errors to be helpful. The user who sees "No such file or directory" needs to also see which file, what they can do about it, and ideally a suggestion.
I went through three iterations on error handling before landing on a pattern I'm happy with. The key insight was separating "internal errors" (which use thiserror for composition) from "user-facing errors" (which use miette for rich diagnostics).
use miette::{Diagnostic, NamedSource, SourceSpan}; use thiserror::Error; #[derive(Error, Diagnostic, Debug)] pub enum AppError { #[error("Config file not found")] #[diagnostic( code(fwatch::config::not_found), help("Run `fwatch init` to create a default config file") )] ConfigNotFound, #[error("Invalid rule in config")] #[diagnostic(code(fwatch::config::invalid_rule))] InvalidRule { #[source_code] src: NamedSource<String>, #[label("this rule pattern is malformed")] span: SourceSpan, #[help] advice: String, }, #[error("Watch target does not exist: {path}")] #[diagnostic( code(fwatch::watch::target_missing), help("Check the path and ensure the directory exists") )] TargetMissing { path: String }, }
The beauty of miette is that it produces genuinely helpful error output. When a user makes a syntax error in their config file, they see the exact line, a pointer to the problem, and a suggestion for how to fix it. This is the kind of thing that separates tools people tolerate from tools people enjoy.
unwrap() in user-facing code. Every .unwrap() is a potential panic, and a panic in a CLI means an ugly, unhelpful stack trace. Use ? propagation and let your error types produce readable messages. The only acceptable place for unwrap() is in tests and truly invariant conditions.
The File Watching Challenge
File watching sounds simple until you actually try to do it reliably. Every operating system handles filesystem events differently. Linux uses inotify, macOS uses FSEvents, Windows uses ReadDirectoryChangesW. Each has quirks: inotify doesn't handle recursive watching natively, FSEvents can batch and delay events, and Windows... well, Windows is Windows.
The notify crate abstracts most of this away, but there are still decisions to make. The biggest one: debouncing.
Debouncing Done Right
When you save a file in most editors, the OS doesn't emit a single event. You might get a Write, then a Rename (because the editor wrote to a temp file and renamed), then another Write for metadata. Without debouncing, your transform runs three times for one save. With naive debouncing (a simple timer reset), you risk missing legitimate rapid changes.
My approach was a sliding window with event deduplication. Events within a configurable window (default: 100ms) are collected, deduplicated by path, and then dispatched as a batch. This handles the "editor save" case without missing genuine rapid-fire changes from tools like code generators.
Rather than using notify's built-in debouncer (which is fine for simple cases), I built a custom debouncing layer using tokio::sync::mpsc channels. The watcher pushes raw events into one end of the channel, and a dedicated task on the other end collects them into batches.
The key insight is using tokio::time::timeout on the receive end. After the first event arrives, we wait for the debounce duration. If more events arrive within that window, we extend the window. Once the window expires with no new events, we flush the batch.
This gives us zero-allocation debouncing (we reuse a Vec buffer) and complete control over the batching semantics. The tradeoff is more code to maintain, but for a core feature of the tool, that's worth it.
Distribution and Packaging
Building the binary is the easy part. Getting it onto users' machines is where things get interesting. I wanted to support four distribution channels from day one:
- Cargo install. The default for Rust users:
cargo install fwatch. - Homebrew. For macOS users who don't have Rust installed.
- GitHub Releases. Pre-compiled binaries for Linux (x86_64, aarch64), macOS (Intel, Apple Silicon), and Windows.
- Docker. A minimal Alpine-based image for CI/CD pipelines.
name: Release on: push: tags: ["v*"] jobs: build: strategy: matrix: include: - target: x86_64-unknown-linux-gnu os: ubuntu-latest - target: aarch64-unknown-linux-gnu os: ubuntu-latest - target: x86_64-apple-darwin os: macos-latest - target: aarch64-apple-darwin os: macos-latest runs-on: ${{ matrix.os }} steps: - uses: actions/checkout@v4 - uses: dtolnay/rust-toolchain@stable with: targets: ${{ matrix.target }} - run: cargo build --release --target ${{ matrix.target }} - uses: softprops/action-gh-release@v2 with: files: target/${{ matrix.target }}/release/fwatch*
One lesson I learned the hard way: test your release builds on actual target platforms. Cross-compilation with cross is fantastic, but there are subtle differences between a cross-compiled binary and a natively-compiled one. I had a bug where path handling worked perfectly on macOS but broke on Linux because I was using Path::display() in a context where I needed Path::to_str(). The cross-compiled binary passed all tests. The native Linux build caught it immediately.
Lessons Learned
"The best CLI tools feel like they were designed by someone who uses the command line eight hours a day. Because they were."
After six months with this project, here's what I'd tell someone starting a Rust CLI today:
- Start with the error messages. Before you write a single line of business logic, design your error output. This forces you to think about what can go wrong and how to communicate it.
- Use
tracinginstead oflog. The structured logging fromtracingis invaluable for debugging production issues. The overhead is negligible. - Invest in integration tests early. Unit tests on individual functions are fine, but your CLI's real contract is "given these arguments and this filesystem state, produce this output." Test that contract.
- Ship early, iterate in public. My first release had one subcommand and no config file support. It was useful from day one. Everything else grew from real user feedback.
- Benchmark your startup time. Use
hyperfineto measure it. If it's over 10ms, you probably have a lazy initialization opportunity somewhere.
What's Next
The tool is stable and I use it daily, but there's always more to build. On my roadmap for the next quarter:
- A plugin system using WebAssembly (WASM) for user-defined transformers
- Language Server Protocol (LSP) integration for config file editing
- Performance profiling and optimization for very large directory trees (100k+ files)
- A companion TUI dashboard using
ratatuifor monitoring long-running watch sessions
If any of that sounds interesting -- or if you've built your own CLI tools and have patterns to share -- I'd love to hear from you. Drop me a line on the contact page or open an issue on the GitHub repo.