Postgres Database, Data Types in Postgres , and The Write Penalty (2026)

python dev.to

BACKEND ARCHITECTURE MASTERY

Day 8: Databases Part 1 - Postgres, orjson, and The Physics of Writes

15 min read

Series: Logic & Legacy

Day 8 / 30

Level: Intermediate/Senior

📍 Table of Contents (Click to Expand)

 Early in my career, I built an analytics dashboard that tracked user clicks. I stored everything in a massive, flat JSON file on the server. It worked beautifully in local testing. On launch day, two users tried to click a button at the exact same millisecond. The Python script locked the file for User A, crashed for User B, and corrupted the entire JSON structure. We lost a week of data. That day, I learned that a Database is not just a place to store data—it is an operating system designed entirely to survive concurrency.

1. The Storage Divide: Disk vs. RAM

Before we write SQL, you must understand the hardware. Every database architecture decision you make is a brutal negotiation between speed and survival.

  • RAM-Based (In-Memory): Systems like Redis or Memcached store data in the server's volatile memory. Retrieving data takes nanoseconds. It is violently fast. The catch? If the server loses power, or the process restarts, the data is instantly annihilated. We use RAM for caching, session states, and ephemeral queues.
  • Disk-Based (Persistent): Systems like PostgreSQL or MySQL write data to physical SSDs or NVMe drives. This guarantees survival. If the server catches fire, you can pull the drive out, plug it into a new motherboard, and your user's bank balances are still there. The catch? Disk I/O is exponentially slower (microseconds to milliseconds).

2. What Actually is a DBMS?

A Database Management System (DBMS) is not a hard drive. It is a highly complex software engine that sits between your application and the hard drive.

When 10,000 users try to buy the last Taylor Swift concert ticket at the exact same millisecond, the DBMS enforces Concurrency Control. It lines those requests up, locks the specific row in memory, processes the transaction, ensures it is mathematically valid, writes it safely to the physical disk (Durability), and unlocks the row. It prevents race conditions that would otherwise sell the same ticket 10,000 times. It is the ultimate referee of state.

3. Why PostgreSQL Won the Internet

In the 2010s, there was a massive hype cycle around NoSQL databases (like MongoDB) because developers were tired of rigid SQL schemas. Then, PostgreSQL updated its engine to include the JSONB data type. It allowed developers to store unstructured JSON documents inside a relational, strictly-typed SQL database, and query inside that JSON instantly.

Overnight, Postgres cannibalized 80% of the NoSQL use cases. In 2026, Postgres is the undisputed king of backend architecture. It is open-source, massively extensible, and robust enough to run the financial systems of entire countries while maintaining the flexibility of a document store.

4. Postgres Data Types (Stop Using VARCHAR)

Junior developers create tables using auto-incrementing integers for IDs and VARCHAR for everything else. Senior architects use specific types to optimize memory and security:

  • UUID vs Auto-Incrementing IDs: If your user profile URL is /user/142, your competitors know you only have 142 users. They can scrape your entire database by just incrementing the number in a loop. Use UUID (Universally Unique Identifier). It looks like 123e4567-e89b-12d3... and leaks zero information about your scale.
  • TIMESTAMPTZ: Never store a naive date. Always store dates with Time Zone awareness. If a user in Tokyo books a flight to New York, and your server is in London, a naive timestamp will destroy your scheduling logic.
  • JSONB: The binary JSON format. It parses the JSON before storing it, allowing Postgres to index specific keys (like finding all logs where metadata->>'latency' > 2000) at lightning speed.

5. The Physics of Writes and The Write Penalty

Here is the brutal truth of Database Optimization: Every time you write data to a disk, you pay a tax. We call this the Write Penalty.

If you don't have indexes on your tables, finding a specific user requires a Table Scan. The database must physically read every single row on the disk from top to bottom until it finds the match. A Table Scan on 10 million rows will destroy your SQL Performance and take your API offline.

To fix Table Scans, we create a Database Index (specifically a B-Tree Index). Think of it like the index at the back of a textbook. It tells the database exactly what page the data lives on. However, indexes are physically duplicated data structures. If a table has 5 indexes, every single INSERT now requires six separate disk writes (one for the row, five to update the B-Trees). This multiplies your Write Penalty.

"Architecture is the art of trade-offs. 95% of web applications operate on a Read-Heavy Architecture. Users view their feeds, scroll products, and read articles 100 times more often than they post, buy, or write. Therefore, we gladly accept the severe Write Penalty of maintaining indexes, because the alternative—a catastrophic Table Scan during a read operation—is unacceptable."

6. Code: High-Concurrency Relational Logging

To mitigate the Write Penalty at the application layer, we don't open a new database connection for every query. We use Connection Pooling and binary protocols. Python's asyncpg is the fastest driver available because it bypasses string-parsing overhead.

Furthermore, we use orjson (a Rust-backed library) instead of Python's standard json. In high-throughput logging, JSON serialization is often the CPU bottleneck before the database even gets hit. orjson eliminates this.

The Real-World Implementation

Check out the full repository for today's code. We build a high-performance, normalized logging engine mapping services to log\_events via Foreign Keys, and execute powerful relational JOINs over JSONB data.

🐙 View the asyncpg Relational Engine on GitHub →

Analytics Snippet (asyncpg + JSONB)

async def run_complex_analytics(pool):
    # Demonstrates powerful SQL querying: JOINs, Aggregations, and JSONB extraction.
    async with pool.acquire() as conn:
        # Querying INSIDE the metadata JSONB field and casting types
        # We find requests where latency was > 2000ms by joining the services table
        slow_query = """
            SELECT s.service_name, l.created_at, 
                   l.metadata->>'latency_ms' as latency, 
                   l.metadata->>'endpoint' as endpoint
            FROM log_events l
            JOIN services s ON l.service_id = s.service_id
            WHERE (l.metadata->>'latency_ms')::int > 2000
            ORDER BY (l.metadata->>'latency_ms')::int DESC
            LIMIT 5;
        """
        slow_records = await conn.fetch(slow_query)
        for r in slow_records:
            print(f"Service: {r['service_name']} | Latency: {r['latency']}ms")
Enter fullscreen mode Exit fullscreen mode

7. Alternatives: The NoSQL Ecosystem

While Postgres is the default, certain architectural problems require entirely different storage paradigms to escape relational limits.

  • Document Stores (MongoDB): Optimized for rapidly changing schemas and massive horizontal scaling. Instead of tables and rows, you store loose JSON documents.
  • Wide-Column Stores (Cassandra): Built for extreme write-heavy workloads. If you are logging millions of IoT sensor readings per second, Cassandra distributes the Write Penalty across a masterless cluster better than anything else.
  • Graph Databases (Neo4j): Built to query relationships. If you need to find "Friends of Friends who also bought Product X", a relational database will choke on massive JOIN operations. A Graph database traverses those nodes instantly.

🔥 DAY 9 TEASER: THE B-TREE ALGORITHM (DATABASES PART 2)

We established that Table Scans are fatal, and Indexes fix them. But what exactly is an index under the hood? Tomorrow, we rip open the database engine to explore Index Cardinality, Clustered vs Composite Indexes, Postgres EXPLAIN ANALYZE, and the legendary B+ Tree data structure that powers the internet's search capabilities.

📚 Deep Diver Resources

If you want to master relational limits and NoSQL architectures, these are mandatory reading:

Frequently Asked Questions

Q: If memory is so fast, why don't we build entire databases in RAM?

A: We do! In-memory databases like Redis and Memcached are exactly this. However, RAM is incredibly expensive compared to SSD storage. Storing a 5TB relational dataset entirely in RAM would bankrupt a startup, and managing the persistence (saving snapshots to disk to prevent data loss on crash) becomes a massive engineering bottleneck.

Q: Why did you use orjson instead of the built-in Python json library?

A: Python's native json library is written in pure Python/C and is relatively slow. orjson is written in Rust. When you are processing millions of log entries or API payloads, CPU serialization becomes a massive bottleneck. orjson can serialize dictionaries to byte-strings magnitudes faster, freeing up the CPU to handle more concurrent requests.

Q: Should I use MongoDB for a new project because it's schema-less?

A: "Schema-less" is a dangerous myth. If your database doesn't enforce the schema, your application code must enforce it (checking if fields exist before accessing them, causing massive code bloat). For 90% of projects, Postgres with a well-designed relational schema—using JSONB columns for the truly dynamic parts like metadata—is much safer and more performant long-term.

Architectural Consulting

If you are building a data-intensive AI application and require a Senior Engineer to architect your secure, high-concurrency backend, I am available for direct contracting.

Explore Enterprise Engagements →

[← Previous

Day 7: Identity Federation](/day-7-sso-oidc)
[Next →

Day 9: Database Indexing](/day-9-btree-indexes)


Originally published at https://logicandlegacy.blogspot.com

Source: dev.to

arrow_back Back to Tutorials