Building a strict INI parser from scratch in Python — no configparser allowed

python dev.to

Building a strict INI parser from scratch in Python — no configparser allowed

ini-parser config.ini — a Python CLI that parses INI files into structured JSON with type inference, duplicate detection, multiline support, and comment preservation. Zero runtime dependencies, 49 tests, built from scratch without configparser.

Python ships with configparser in the standard library. It works. It handles interpolation, default sections, fallback values, and a reasonable subset of the INI format. So why would anyone write an INI parser from scratch?

Because configparser is opinionated in ways that don't always match what you need. It lowercases all keys by default. It silently merges duplicate sections. It doesn't preserve comments. It doesn't tell you what line number a key came from. And it doesn't give you structured output — you get a ConfigParser object, not data you can pipe to jq or feed to another tool.

I wanted a parser that treats INI files as data, not configuration objects. Something that parses config.ini into a JSON structure where every section, key, value, comment, and line number is preserved as metadata. Something that rejects duplicate sections and duplicate keys instead of silently merging them. Something that infers types so that port = 8080 comes back as an integer, not a string.

So I built ini-parser: a strict INI file parser and validator with JSON output, zero runtime dependencies, and a clean CLI interface.

What the INI format actually is

There is no formal INI specification. Unlike JSON (RFC 8259), TOML (with its published spec), or YAML (YAML 1.2), INI files are defined by convention and by the behaviour of whatever parser you happen to be using. Windows GetPrivateProfileString, Python's configparser, PHP's parse_ini_file, and Git's config format all disagree on the details.

That said, there's a common core that nearly everyone agrees on:

  • Sections are enclosed in square brackets: [section]
  • Keys are separated from values by = (some parsers also accept :)
  • Comments start with ; or # at the beginning of a line
  • Blank lines are ignored

Beyond that core, parsers diverge on questions like: Can keys exist outside any section? Are keys case-sensitive? Can values span multiple lines? Can you have inline comments? What happens with duplicate keys?

For ini-parser, I made explicit choices on each of these questions, and I made the parser strict: if something is ambiguous, it's an error.

The parser architecture

The parser is split into three modules, each with a single responsibility:

  • types.py — type inference (string → int/float/bool/string)
  • parser.py — the line-by-line parser that builds the data structure
  • serializer.py — converts the data structure back to INI text

Type inference

INI files store everything as strings. But when you parse port = 8080, you probably want the integer 8080, not the string "8080". The type inference module handles this:

_BOOL_TRUE = frozenset({"true", "yes", "on", "1"})
_BOOL_FALSE = frozenset({"false", "no", "off", "0"})
_INT_RE = re.compile(r"^[+-]?\d+$")
_FLOAT_RE = re.compile(r"^[+-]?(\d+\.\d*|\.\d+)([eE][+-]?\d+)?$")


def infer_type(value: str) -> dict[str, Any]:
    stripped = value.strip()

    if stripped.lower() in _BOOL_TRUE:
        return {"raw": value, "value": True, "type": "bool"}
    if stripped.lower() in _BOOL_FALSE:
        return {"raw": value, "value": False, "type": "bool"}

    if _INT_RE.match(stripped):
        return {"raw": value, "value": int(stripped), "type": "int"}

    if _FLOAT_RE.match(stripped):
        return {"raw": value, "value": float(stripped), "type": "float"}

    return {"raw": value, "value": stripped, "type": "string"}
Enter fullscreen mode Exit fullscreen mode

The order matters. Booleans are checked first because "1" and "0" are valid both as booleans and integers — and in INI files, enabled = 1 almost always means "true," not "the integer one." Every result includes the raw string alongside the inferred value, so you never lose information.

The line-by-line parser

The parser reads the input line by line and maintains state: which section we're currently in, what the last key was (for multiline continuation), and which sections and keys we've already seen (for duplicate detection).

for lineno, raw_line in enumerate(lines, start=1):
    line = raw_line.strip()

    # Empty line
    if not line:
        current_key = None
        continue

    # Full-line comment
    m = _COMMENT_RE.match(line)
    if m:
        result["comments"].append({"text": m.group(1).strip(), "line": lineno})
        current_key = None
        continue

    # Section header
    m = _SECTION_RE.match(line)
    if m:
        section_name = m.group(1).strip()
        if section_name in seen_sections:
            raise INIParseError(
                f"duplicate section [{section_name}]",
                line=lineno,
            )
        # ...

    # Continuation line
    m_cont = _CONTINUATION_RE.match(raw_line)
    if m_cont and current_key and current_section:
        # Append to previous key's value
        # ...

    # Key = value
    m = _KV_RE.match(line)
    if m:
        # Extract key, value, inline comment
        # Check for duplicates
        # Run type inference
        # ...
Enter fullscreen mode Exit fullscreen mode

The key design decision is that the parser is a single pass with no backtracking. Each line matches exactly one pattern, and the patterns are tried in a fixed priority order: empty → comment → section → continuation → key-value → error. If a line matches none of them, the parser raises INIParseError with the line number. No silent fallback, no guessing.

Multiline values

Multiline values are continuation lines — lines that start with whitespace and follow a key-value pair:

[server]
description = This is a long
    description that spans
    multiple lines
Enter fullscreen mode Exit fullscreen mode

The parser detects these by checking if the raw (untrimmed) line starts with whitespace while current_key is set. When it finds a continuation, it appends the trimmed content to the previous value with a newline separator and forces the type to "string" (a multiline value is never an integer or boolean).

An empty line resets current_key to None, ending any potential continuation. This is important — without it, any indented line anywhere in the file would be treated as a continuation of the last key.

Inline comments

Inline comments are trickier than they look. The pattern value ; comment seems simple, but what about path = C:\Program Files ; install dir? The semicolon in the path is preceded by a space, so a naive parser would truncate the value.

I chose a pragmatic approach: inline comments require at least one space before the comment character. This matches what most INI parsers do in practice. The regex is:

_INLINE_COMMENT_RE = re.compile(r"^(.*?)\s+[;#]\s*(.*)$")
Enter fullscreen mode Exit fullscreen mode

The .*? is non-greedy so it captures the shortest possible value, which means the first ; wins. This isn't perfect — if your value legitimately contains ;, you'll need to restructure it — but it handles the vast majority of real-world INI files correctly.

Duplicate detection

Both duplicate sections and duplicate keys within a section are treated as parse errors:

if section_name in seen_sections:
    raise INIParseError(
        f"duplicate section [{section_name}] "
        f"(first seen at line {seen_sections[section_name]})",
        line=lineno,
    )
Enter fullscreen mode Exit fullscreen mode

This is a deliberate strictness choice. configparser silently merges duplicate sections, which means typos in section names can cause keys to end up in the wrong section without any warning. A strict parser catches this immediately.

The CLI interface

The CLI supports four modes:

# Default: parse to JSON
python main.py config.ini

# Validate (exit 0 = valid, exit 2 = invalid)
python main.py --validate config.ini

# Get a specific value by section.key path
python main.py --get server.port config.ini

# Set a value (modifies the file in place)
python main.py --set server.port=9090 config.ini
Enter fullscreen mode Exit fullscreen mode

The --get command outputs the raw value for strings and JSON-encoded values for other types. This means you can use it in shell scripts:

port=$(python main.py --get server.port config.ini)
if [ "$port" -gt 1024 ]; then
    echo "Using unprivileged port: $port"
fi
Enter fullscreen mode Exit fullscreen mode

The --set command deserves special attention. It doesn't just parse the file, modify the data structure, and serialize it back — that would lose formatting, blank lines, and the exact positioning of comments. Instead, it works directly on the text:

def set_value(text: str, path: str, new_value: str) -> str:
    lines = text.splitlines(keepends=True)
    data = parse(text)

    if section in data["sections"] and key in data["sections"][section]["keys"]:
        # Replace the existing line in place
        target_line = data["sections"][section]["keys"][key]["line"]
        lines[target_line - 1] = f"{key} = {new_value}{inline}\n"
    elif section in data["sections"]:
        # Append after the last key in the section
        # ...
    else:
        # Append a new section and key
        # ...

    return "".join(lines)
Enter fullscreen mode Exit fullscreen mode

By operating on the original lines array, --set preserves everything the user didn't ask to change: blank lines, comments, formatting, ordering. The only line that changes is the one containing the target key.

The JSON output structure

The output preserves everything the parser knows about the file:

{"sections":{"server":{"keys":{"host":{"raw":"0.0.0.0","value":"0.0.0.0","type":"string","inline_comment":null,"line":5},"port":{"raw":"8080","value":8080,"type":"int","inline_comment":"default HTTP port","line":6},"debug":{"raw":"true","value":true,"type":"bool","inline_comment":null,"line":7}},"line":4}},"comments":[{"text":"Application configuration","line":1}],"globals":{"keys":{},"line":0}}
Enter fullscreen mode Exit fullscreen mode

Every key carries both raw (the original string from the file) and value (the type-inferred result). This means you can always recover the exact original text, even after type inference has converted "true" to a boolean. Line numbers are included for every section header and key, which makes it trivial to build linting or diff tools on top of the parser output.

Testing strategy

The test suite has 49 tests organized into logical groups:

  • Basic parsing — sections, keys, globals, empty values
  • Comments — semicolons, hashes, inline comments
  • Type inference — integers, floats, booleans (all variants), strings, edge cases
  • Multiline — continuation lines
  • Duplicate detection — duplicate sections and keys
  • Edge cases — BOM handling, Unicode keys, empty input, unrecognized syntax, equals in values, spaces in section names
  • Serializer — round-trip fidelity, comment preservation, set operations
  • CLI — all four modes, error handling, file not found

The edge cases are where the real bugs hide. BOM handling is one: Windows editors sometimes prepend a UTF-8 BOM (\ufeff) to text files, and if you don't strip it, the first section header becomes [\ufeffsection] instead of [section]. Unicode keys test that the parser doesn't assume ASCII. Equals-in-value tests that formula = a=b+c parses as key formula with value a=b+c, not as key formula with value a followed by garbage.

def test_bom_handling(self):
    ini = "\ufeff[s]\nk = v"
    data = parse(ini)
    assert "s" in data["sections"]

def test_unicode_keys(self):
    ini = "[s]\n名前 = テスト"
    data = parse(ini)
    assert data["sections"]["s"]["keys"]["名前"]["value"] == "テスト"

def test_equals_in_value(self):
    data = parse("[s]\nformula = a=b+c")
    assert data["sections"]["s"]["keys"]["formula"]["value"] == "a=b+c"
Enter fullscreen mode Exit fullscreen mode

Docker

The Dockerfile uses a multi-stage build. The builder stage uses python:3.12-alpine to copy the source. The runtime stage uses alpine:3.19 with just python3 installed and runs as a non-root user:

FROMpython:3.12-alpineASbuilder
WORKDIR /app
COPY src/ src/
COPY main.py .

FROM alpine:3.19
RUN apk add --no-cache python3 && \
    adduser -D -h /app appuser
WORKDIR /app
COPY --from=builder /app/ .
USER appuser
ENTRYPOINT ["python3", "main.py"]
Enter fullscreen mode Exit fullscreen mode

Since there are no dependencies to install, the multi-stage build is mostly about keeping the final image as small as possible — no pip, no build tools, no cache directories.

Why build this?

Three reasons.

First, as a teaching exercise. Writing a parser — even a simple one — forces you to think about edge cases that you'd never consider as a user of an existing parser. What happens with a BOM? With duplicate keys? With continuation lines after an empty line? Each of these is a design decision, and making them explicit teaches you something about the format.

Second, as a Unix tool. The JSON output means ini-parser composes with jq, grep, and every other tool in the pipeline. configparser gives you a Python object; ini-parser gives you data.

Third, as a validator. The --validate flag with exit code 2 for invalid files means you can use it in CI pipelines, pre-commit hooks, or Makefiles to catch INI syntax errors before they cause runtime failures.

The full source is on GitHub. Clone it, run pytest, break the parser with your weirdest INI files, and let me know what I missed.

Source: dev.to

arrow_back Back to Tutorials