cat customers.csvand the columns slide off the right edge of your terminal. You don't really want to open a spreadsheet just to spot-check eight rows.csv-peek customers.csvprints an aligned table in 1 ms — numbers right-aligned in cyan, dates in yellow, booleans in magenta, the embedded comma in"hello, world"not breaking alignment because the parser actually understands quoting. Single dependency (clap), sub-1 MB stripped binary, 51 tests.
📦 GitHub: https://github.com/sen-ltd/csv-peek
$ cat customers.csv
id,name,email,signup_date,active,balance,note
1,Alice Suzuki,alice@example.com,2024-01-15,true,1280.50,"first 10 customers"
2,Bob Tanaka,bob@example.com,2024-02-03,true,42.00,
…
$ csv-peek customers.csv
┌────┬──────────────┬───────────────────┬─────────────┬────────┬─────────┬────────────────────┐
│ id │ name │ email │ signup_date │ active │ balance │ note │
├────┼──────────────┼───────────────────┼─────────────┼────────┼─────────┼────────────────────┤
│ 1 │ Alice Suzuki │ alice@example.com │ 2024-01-15 │ true │ 1280.50 │ first 10 customers │
│ 2 │ Bob Tanaka │ bob@example.com │ 2024-02-03 │ true │ 42.00 │ │
…
Why hand-roll the parser
Rust has a mature csv crate. For real production work, that's the answer. Two reasons to hand-roll anyway:
-
The dependency tree stays at one entry.
cargo build --releasefinishes in 9 seconds, the stripped Alpine binary is ~600 KB, and the CI matrix is one cell wide. -
The state machine is the article. RFC 4180 is short — four rules — and walking through what
"",\r\n, and"foo\nbar"actually mean in a CSV is more useful thancsv::Reader::from_path()is.
The whole parser is 200 lines in src/csv.rs. I'll quote the core machine and the bits that I find most often misimplemented.
The four-state machine
enum State {
Start, // between fields
Unquoted, // inside an unquoted field
Quoted, // inside a "..." field
QuotedQuote, // saw a `"` inside a quoted field
}
QuotedQuote is the interesting one. Once you've seen a " inside a quoted field, you can't tell yet whether it was the closing quote or the first half of an escaped pair (""). You need the next byte to decide:
State::QuotedQuote => match b {
b'"' => {
// It was `""` — emit a literal `"` and stay in Quoted.
field.push(b'"');
state = State::Quoted;
}
c if c == self.delim => {
// Closing quote followed by delimiter — field ends here.
fields.push(...);
state = State::Start;
}
b'\r' | b'\n' => {
// Closing quote followed by newline — record ends here.
return Ok(Some(...));
}
other => {
// Malformed input like `"foo"bar`. Strict parsers Err here;
// we'd rather render *something*, so we recover by appending
// the byte and falling back to Unquoted.
field.push(other);
state = State::Unquoted;
}
}
The recovery branch matters because csv-peek's job is to let you see what's there, even when "what's there" is broken. "foo"bar\n parses to foobar. A strict parser would refuse the line and the user would never see what was wrong with it.
CRLF / LF / CR — sometimes mixed in the same file
CSV inherits the OS-line-ending mess: classic Mac (\r), Unix (\n), Windows (\r\n), and Excel-on-Mac will sometimes mix them in a single file. The parser needs one-byte lookahead:
b'\r' => {
fields.push(...);
if let Some(nb) = self.next_byte()? {
if nb != b'\n' {
self.pos -= 1; // peek failed; push it back
}
}
return Ok(Some(...));
}
Implementing peek over a streaming BufReader was the part of the parser I rewrote three times. The version that shipped is a manual circular-ish buffer (Vec<u8> + pos) inside Reader<R: Read>, refilled in 64 KB chunks. The lookahead is just pos -= 1.
Per-column type inference
Each column gets one of Empty / Bool / Int / Float / Date / Text by widening across the body sample. The hierarchy is Empty < Bool < Int < Float < Date < Text — once any value forces Text, the column is Text for good.
fn widen(a: ColType, b: ColType) -> ColType {
if matches!(a, Empty) { return b; }
if matches!(b, Empty) { return a; }
if (a == Int && b == Float) || (a == Float && b == Int) {
return Float;
}
if a == b { a } else { Text }
}
So:
-
[1, 2, 3]→Int -
[1, 2.5, 3]→Float(Int promotes to Float) -
["", 1, "", 2]→Int(Empty is absorbed) -
[1, "alice", 3]→Text(mixed → bail)
Each value gets classified by classify(v):
fn classify(v: &str) -> ColType {
let s = v.trim();
if s.is_empty() { return Empty; }
if matches_bool(s) { return Bool; }
if s.parse::<i64>().is_ok() { return Int; }
if s.parse::<f64>().is_ok() && !s.eq_ignore_ascii_case("nan") {
return Float;
}
if matches_date(s) { return Date; }
Text
}
The NaN exclusion is deliberate: f64::parse accepts "NaN", but a column containing the literal string NaN mixed with numbers is almost certainly not numeric data, and aligning it like a number makes the table look broken.
Date detection is a cheap shape check, not a calendar:
fn matches_date(s: &str) -> bool {
let bytes = s.as_bytes();
if bytes.len() < 10 { return false; }
let dash_or_slash = |b| b == b'-' || b == b'/';
bytes[0..4].iter().all(|b| b.is_ascii_digit())
&& dash_or_slash(bytes[4])
&& bytes[5..7].iter().all(|b| b.is_ascii_digit())
&& dash_or_slash(bytes[7])
&& bytes[8..10].iter().all(|b| b.is_ascii_digit())
&& (bytes.len() == 10 || matches!(bytes[10], b' ' | b'T'))
}
2024-13-99 is "a date" by this function. That's fine — type inference runs in microseconds, doesn't have to be right, and the wrong answer for a date-shaped string is still better than rendering it as text.
ANSI palette without per-cell branches
Palette accessors return &'static str. When color is disabled, every accessor returns "". The hot loop never branches on a flag — it just write!s zero-length escape codes that the terminal sees as nothing:
pub struct Palette { enabled: bool }
impl Palette {
pub fn cyan(&self) -> &'static str {
if self.enabled { "\x1b[36m" } else { "" }
}
pub fn reset(&self) -> &'static str {
if self.enabled { "\x1b[0m" } else { "" }
}
}
// Render path
write!(w, " {}{}{} {}", palette.cyan(), value, palette.reset(), border)?;
rustc folds the empty &str writes away and the binary ends up branch-free. The pattern came from #137 hexview (the first Rust entry in this portfolio) and has been the house style ever since.
TTY detection without the atty crate
atty is deprecated in favour of std::io::IsTerminal:
use std::io::IsTerminal;
let stdout_is_tty = io::stdout().is_terminal();
let no_color = args.no_color
|| std::env::var_os("NO_COLOR").is_some()
|| !stdout_is_tty;
That's the entire color-detection logic. Pipe to less and color disappears automatically; set NO_COLOR=1 and color disappears manually; pass --no-color and color disappears explicitly. All three converge into one boolean and the Palette fans out from there.
Sniffing the delimiter
When --delim isn't passed, the first line gets scanned for , \t ; |, and whichever appears most often wins:
pub fn guess_delimiter(sample: &str) -> u8 {
let candidates = [b',', b'\t', b';', b'|'];
let line = sample.lines().next().unwrap_or("");
let mut best = (b',', 0usize);
for &c in &candidates {
let count = line.bytes().filter(|&b| b == c).count();
if count > best.1 {
best = (c, count);
}
}
best.0
}
Ties default to ,. A single-column file (zero of any of the candidates) also defaults to , and just renders as one column. Tab-separated and semicolon-separated files Just Work.
51 tests
$ cargo test
running 34 tests
test csv::tests::quoted_field_with_comma ... ok
test csv::tests::escaped_quote_inside_quoted_field ... ok
test csv::tests::quoted_field_with_newline ... ok
…
test result: ok. 34 passed; 0 failed
running 17 tests
test renders_basic_csv ... ok
test no_color_env_disables_escape_sequences ... ok
test handles_unicode_data ... ok
…
test result: ok. 17 passed; 0 failed
The unit half of the test suite covers the parser edge cases (every quoted-field scenario, the recovery path, CRLF/LF/CR mixing), the type-inference widening rules, and the padding / truncate helpers including unicode width.
The integration half uses assert_cmd + predicates to drive the actual binary against fixture CSV files. Every CLI flag has at least one test; NO_COLOR=1, piping detection, ASCII vs Unicode borders, --no-header synthesizing column names, --delim overriding the sniffer, etc.
#[test]
fn handles_quoted_fields_with_commas() {
let f = fixture("name,bio\nAlice,\"hello, world\"\n");
cli().arg(f.path()).arg("--ascii").assert()
.success()
.stdout(predicate::str::contains("hello, world"));
}
Try it
git clone https://github.com/sen-ltd/csv-peek
cd csv-peek
cargo build --release
./target/release/csv-peek sample.csv
Or via Docker (no Rust toolchain needed):
docker build -t csv-peek .
docker run --rm -v "$(pwd)":/data -t csv-peek customers.csv
The -t flag is necessary so the container sees a TTY and IsTerminal returns true; without it, csv-peek auto-disables color (the same path that fires when you pipe to less).
Source: https://github.com/sen-ltd/csv-peek — MIT, ~700 lines, single dependency, 51 tests, sub-1 MB stripped binary.
🛠 Built by SEN LLC as part of an ongoing series of small, focused developer tools. Browse the full portfolio for more.