csv-peek — A 700-Line Rust CLI That Pretty-Prints CSV in the Terminal With a Hand-Rolled RFC 4180 Parser, Type Inference, and One Dependency

rust dev.to

cat customers.csv and the columns slide off the right edge of your terminal. You don't really want to open a spreadsheet just to spot-check eight rows. csv-peek customers.csv prints an aligned table in 1 ms — numbers right-aligned in cyan, dates in yellow, booleans in magenta, the embedded comma in "hello, world" not breaking alignment because the parser actually understands quoting. Single dependency (clap), sub-1 MB stripped binary, 51 tests.

📦 GitHub: https://github.com/sen-ltd/csv-peek

$ cat customers.csv
id,name,email,signup_date,active,balance,note
1,Alice Suzuki,alice@example.com,2024-01-15,true,1280.50,"first 10 customers"
2,Bob Tanaka,bob@example.com,2024-02-03,true,42.00,
…

$ csv-peek customers.csv
┌────┬──────────────┬───────────────────┬─────────────┬────────┬─────────┬────────────────────┐
│ id │ name         │ email             │ signup_date │ active │ balance │ note               │
├────┼──────────────┼───────────────────┼─────────────┼────────┼─────────┼────────────────────┤
│  1 │ Alice Suzuki │ alice@example.com │ 2024-01-15  │ true   │ 1280.50 │ first 10 customers │
│  2 │ Bob Tanaka   │ bob@example.com   │ 2024-02-03  │ true   │   42.00 │                    │
…
Enter fullscreen mode Exit fullscreen mode

Why hand-roll the parser

Rust has a mature csv crate. For real production work, that's the answer. Two reasons to hand-roll anyway:

  1. The dependency tree stays at one entry. cargo build --release finishes in 9 seconds, the stripped Alpine binary is ~600 KB, and the CI matrix is one cell wide.
  2. The state machine is the article. RFC 4180 is short — four rules — and walking through what "", \r\n, and "foo\nbar" actually mean in a CSV is more useful than csv::Reader::from_path() is.

The whole parser is 200 lines in src/csv.rs. I'll quote the core machine and the bits that I find most often misimplemented.

The four-state machine

enum State {
    Start,         // between fields
    Unquoted,      // inside an unquoted field
    Quoted,        // inside a "..." field
    QuotedQuote,   // saw a `"` inside a quoted field
}
Enter fullscreen mode Exit fullscreen mode

QuotedQuote is the interesting one. Once you've seen a " inside a quoted field, you can't tell yet whether it was the closing quote or the first half of an escaped pair (""). You need the next byte to decide:

State::QuotedQuote => match b {
    b'"' => {
        // It was `""` — emit a literal `"` and stay in Quoted.
        field.push(b'"');
        state = State::Quoted;
    }
    c if c == self.delim => {
        // Closing quote followed by delimiter — field ends here.
        fields.push(...);
        state = State::Start;
    }
    b'\r' | b'\n' => {
        // Closing quote followed by newline — record ends here.
        return Ok(Some(...));
    }
    other => {
        // Malformed input like `"foo"bar`. Strict parsers Err here;
        // we'd rather render *something*, so we recover by appending
        // the byte and falling back to Unquoted.
        field.push(other);
        state = State::Unquoted;
    }
}
Enter fullscreen mode Exit fullscreen mode

The recovery branch matters because csv-peek's job is to let you see what's there, even when "what's there" is broken. "foo"bar\n parses to foobar. A strict parser would refuse the line and the user would never see what was wrong with it.

CRLF / LF / CR — sometimes mixed in the same file

CSV inherits the OS-line-ending mess: classic Mac (\r), Unix (\n), Windows (\r\n), and Excel-on-Mac will sometimes mix them in a single file. The parser needs one-byte lookahead:

b'\r' => {
    fields.push(...);
    if let Some(nb) = self.next_byte()? {
        if nb != b'\n' {
            self.pos -= 1;   // peek failed; push it back
        }
    }
    return Ok(Some(...));
}
Enter fullscreen mode Exit fullscreen mode

Implementing peek over a streaming BufReader was the part of the parser I rewrote three times. The version that shipped is a manual circular-ish buffer (Vec<u8> + pos) inside Reader<R: Read>, refilled in 64 KB chunks. The lookahead is just pos -= 1.

Per-column type inference

Each column gets one of Empty / Bool / Int / Float / Date / Text by widening across the body sample. The hierarchy is Empty < Bool < Int < Float < Date < Text — once any value forces Text, the column is Text for good.

fn widen(a: ColType, b: ColType) -> ColType {
    if matches!(a, Empty) { return b; }
    if matches!(b, Empty) { return a; }
    if (a == Int && b == Float) || (a == Float && b == Int) {
        return Float;
    }
    if a == b { a } else { Text }
}
Enter fullscreen mode Exit fullscreen mode

So:

  • [1, 2, 3]Int
  • [1, 2.5, 3]Float (Int promotes to Float)
  • ["", 1, "", 2]Int (Empty is absorbed)
  • [1, "alice", 3]Text (mixed → bail)

Each value gets classified by classify(v):

fn classify(v: &str) -> ColType {
    let s = v.trim();
    if s.is_empty()                    { return Empty; }
    if matches_bool(s)                 { return Bool; }
    if s.parse::<i64>().is_ok()        { return Int; }
    if s.parse::<f64>().is_ok() && !s.eq_ignore_ascii_case("nan") {
        return Float;
    }
    if matches_date(s)                 { return Date; }
    Text
}
Enter fullscreen mode Exit fullscreen mode

The NaN exclusion is deliberate: f64::parse accepts "NaN", but a column containing the literal string NaN mixed with numbers is almost certainly not numeric data, and aligning it like a number makes the table look broken.

Date detection is a cheap shape check, not a calendar:

fn matches_date(s: &str) -> bool {
    let bytes = s.as_bytes();
    if bytes.len() < 10 { return false; }
    let dash_or_slash = |b| b == b'-' || b == b'/';
    bytes[0..4].iter().all(|b| b.is_ascii_digit())
        && dash_or_slash(bytes[4])
        && bytes[5..7].iter().all(|b| b.is_ascii_digit())
        && dash_or_slash(bytes[7])
        && bytes[8..10].iter().all(|b| b.is_ascii_digit())
        && (bytes.len() == 10 || matches!(bytes[10], b' ' | b'T'))
}
Enter fullscreen mode Exit fullscreen mode

2024-13-99 is "a date" by this function. That's fine — type inference runs in microseconds, doesn't have to be right, and the wrong answer for a date-shaped string is still better than rendering it as text.

ANSI palette without per-cell branches

Palette accessors return &'static str. When color is disabled, every accessor returns "". The hot loop never branches on a flag — it just write!s zero-length escape codes that the terminal sees as nothing:

pub struct Palette { enabled: bool }

impl Palette {
    pub fn cyan(&self) -> &'static str {
        if self.enabled { "\x1b[36m" } else { "" }
    }
    pub fn reset(&self) -> &'static str {
        if self.enabled { "\x1b[0m" } else { "" }
    }
}

// Render path
write!(w, " {}{}{} {}", palette.cyan(), value, palette.reset(), border)?;
Enter fullscreen mode Exit fullscreen mode

rustc folds the empty &str writes away and the binary ends up branch-free. The pattern came from #137 hexview (the first Rust entry in this portfolio) and has been the house style ever since.

TTY detection without the atty crate

atty is deprecated in favour of std::io::IsTerminal:

use std::io::IsTerminal;

let stdout_is_tty = io::stdout().is_terminal();
let no_color = args.no_color
    || std::env::var_os("NO_COLOR").is_some()
    || !stdout_is_tty;
Enter fullscreen mode Exit fullscreen mode

That's the entire color-detection logic. Pipe to less and color disappears automatically; set NO_COLOR=1 and color disappears manually; pass --no-color and color disappears explicitly. All three converge into one boolean and the Palette fans out from there.

Sniffing the delimiter

When --delim isn't passed, the first line gets scanned for , \t ; |, and whichever appears most often wins:

pub fn guess_delimiter(sample: &str) -> u8 {
    let candidates = [b',', b'\t', b';', b'|'];
    let line = sample.lines().next().unwrap_or("");
    let mut best = (b',', 0usize);
    for &c in &candidates {
        let count = line.bytes().filter(|&b| b == c).count();
        if count > best.1 {
            best = (c, count);
        }
    }
    best.0
}
Enter fullscreen mode Exit fullscreen mode

Ties default to ,. A single-column file (zero of any of the candidates) also defaults to , and just renders as one column. Tab-separated and semicolon-separated files Just Work.

51 tests

$ cargo test
running 34 tests
test csv::tests::quoted_field_with_comma ... ok
test csv::tests::escaped_quote_inside_quoted_field ... ok
test csv::tests::quoted_field_with_newline ... ok
…
test result: ok. 34 passed; 0 failed
running 17 tests
test renders_basic_csv ... ok
test no_color_env_disables_escape_sequences ... ok
test handles_unicode_data ... ok
…
test result: ok. 17 passed; 0 failed
Enter fullscreen mode Exit fullscreen mode

The unit half of the test suite covers the parser edge cases (every quoted-field scenario, the recovery path, CRLF/LF/CR mixing), the type-inference widening rules, and the padding / truncate helpers including unicode width.

The integration half uses assert_cmd + predicates to drive the actual binary against fixture CSV files. Every CLI flag has at least one test; NO_COLOR=1, piping detection, ASCII vs Unicode borders, --no-header synthesizing column names, --delim overriding the sniffer, etc.

#[test]
fn handles_quoted_fields_with_commas() {
    let f = fixture("name,bio\nAlice,\"hello, world\"\n");
    cli().arg(f.path()).arg("--ascii").assert()
        .success()
        .stdout(predicate::str::contains("hello, world"));
}
Enter fullscreen mode Exit fullscreen mode

Try it

git clone https://github.com/sen-ltd/csv-peek
cd csv-peek
cargo build --release
./target/release/csv-peek sample.csv
Enter fullscreen mode Exit fullscreen mode

Or via Docker (no Rust toolchain needed):

docker build -t csv-peek .
docker run --rm -v "$(pwd)":/data -t csv-peek customers.csv
Enter fullscreen mode Exit fullscreen mode

The -t flag is necessary so the container sees a TTY and IsTerminal returns true; without it, csv-peek auto-disables color (the same path that fires when you pipe to less).

Source: https://github.com/sen-ltd/csv-peek — MIT, ~700 lines, single dependency, 51 tests, sub-1 MB stripped binary.


🛠 Built by SEN LLC as part of an ongoing series of small, focused developer tools. Browse the full portfolio for more.

Source: dev.to

arrow_back Back to Tutorials