SQL BETWEEN & Range Queries: Numeric, Date, Inclusive vs Exclusive

python dev.to

sql between is the most innocent-looking keyword in the entire SQL surface — three letters, two operands, "inclusive on both ends," done. Then your nightly report misses every record that landed on the last day of the month after midnight, your float column quietly excludes the values you swore were inside the range, and your created_at BETWEEN '2026-01-01' AND '2026-01-31' filter omits 23 hours and 59 minutes of traffic. The keyword isn't broken; the mental model "inclusive both ends" simply doesn't survive contact with timestamps, floats, and back-to-back monthly buckets.

This guide walks through every between sql pattern an interviewer or a production query reviewer will probe — numeric BETWEEN over INT, NUMERIC, and FLOAT (the IEEE-754 trap that swallows 0.2), the date BETWEEN time-truncation pitfall that ranks as the #1 production query bug, half-open intervals [start, end) as the only sql range query notation that composes cleanly, the four-quadrant matrix of closed/open endpoints, ANSI OVERLAPS, Postgres tstzrange, and sql between dates performance under B-tree, BRIN, and partition pruning. Each ### Solution Using … ships the four-part Solution Tail — code, step-by-step trace, output table, and a concept-by-concept why — so you can defend the answer in a sql between numbers interview the same way a senior DE defends a sql date range rewrite in code review.

When you want hands-on reps immediately after reading, drill the SQL practice library →, rehearse on filtering problems (SQL) →, and stress-test edge cases with date functions (SQL) →.


On this page


1. Why BETWEEN is a footgun (and the half-open fix in one paragraph)

sql between is closed-closed by ANSI standard — most production queries want closed-open

The one-sentence definition: x BETWEEN a AND b is exactly equivalent to x >= a AND x <= b — both endpoints are inclusive. That's it. Every SQL dialect (Postgres, MySQL, SQL Server, Oracle, SQLite, BigQuery, Snowflake, Redshift, DuckDB) agrees on this; the ANSI/ISO SQL standard pins it down in the same way. The keyword is shorthand, not magic — the optimiser unfolds it into a pair of comparisons before it ever touches the planner.

The closed-closed semantics are not the problem. The problem is that the mental model "inclusive on both ends" maps cleanly to integers and money columns but breaks the moment one of the operands is a timestamp (which has hours, minutes, seconds, microseconds) or a float (which is stored as a binary fraction and rounds in unintuitive ways). Half the production query bugs that ship to incident channels under headlines like "monthly revenue report is off by a day" are date-BETWEEN bugs.

The three flavours of BETWEEN and what they actually do.

  • Numeric BETWEEN. price BETWEEN 100 AND 200price >= 100 AND price <= 200. Predictable for INT and NUMERIC; precision-sensitive for FLOAT.
  • Date / timestamp BETWEEN. created_at BETWEEN '2026-01-01' AND '2026-01-31'created_at >= '2026-01-01 00:00:00' AND created_at <= '2026-01-31 00:00:00'. The right-hand operand is implicitly cast to midnight, so 23h 59m of Jan 31 is excluded. This is the bug.
  • String BETWEEN. name BETWEEN 'A' AND 'M' → lexicographic comparison. Less common in interviews; covered briefly later for completeness.

Why senior DEs prefer col >= start AND col < end (half-open) over BETWEEN.

  • Composability. Half-open intervals [start, end) for back-to-back months — [Jan, Feb), [Feb, Mar) — never double-count and never gap. The end of one interval is exactly the start of the next, and the open end excludes Feb 1 from [Jan, Feb) so Feb 1 belongs to [Feb, Mar) only.
  • Time-grain independence. created_at < '2026-02-01' correctly excludes everything on or after Feb 1 — including 2026-01-31 23:59:59.999999. You don't have to know whether the column is DATE, TIMESTAMP, or TIMESTAMPTZ; you don't have to remember whether your engine stores microseconds or nanoseconds.
  • Index-friendly. Both BETWEEN and half-open compile to the same B-tree range scan, so the rewrite costs zero performance. The choice is purely about correctness.
  • Reads cleanly. created_at >= '2026-01-01' AND created_at < '2026-02-01' is self-documenting — anyone reading the query knows you mean "all of January" without having to know what "BETWEEN" cast the end operand to.

What interviewers listen for when sql between comes up.

  • Do you mention that BETWEEN is closed on both ends? — baseline answer.
  • Do you bring up the date BETWEEN time-truncation pitfall unprompted? — senior signal.
  • Do you reach for half-open intervals as the production default? — senior+ signal.
  • Do you mention BRIN indexes or partition pruning when the question becomes performance-flavoured? — staff-level signal.
  • Do you note OVERLAPS or tstzrange when the problem becomes interval-vs-interval rather than scalar-vs-interval? — Postgres-fluent signal.

Worked example — show that BETWEEN is closed on both ends

Detailed explanation. The simplest demonstration is to populate a tiny table with the boundary values and check which ones pass BETWEEN. It's the kind of one-page demo that ends a "convince me you understand inclusion" interview detour.

Question. Given a table ranges(id INT, value INT) with rows {1, 10}, {2, 50}, {3, 100}, {4, 150}, {5, 200}, {6, 250}, which rows satisfy value BETWEEN 50 AND 200?

Input.

id value
1 10
2 50
3 100
4 150
5 200
6 250

Code.

SELECT id, value
FROM ranges
WHERE value BETWEEN 50 AND 200
ORDER BY id;
Enter fullscreen mode Exit fullscreen mode

Step-by-step explanation.

  1. The planner rewrites value BETWEEN 50 AND 200 to value >= 50 AND value <= 200. This is the only thing BETWEEN ever does at the engine level.
  2. Row 1 (value=10) fails the value >= 50 test and is excluded.
  3. Row 2 (value=50) satisfies value >= 50 AND value <= 200 — both endpoints are inclusive, so 50 passes.
  4. Rows 3 and 4 fall comfortably inside the interval.
  5. Row 5 (value=200) satisfies value <= 200 — again, inclusive on the upper end.
  6. Row 6 (value=250) fails the value <= 200 test and is excluded.

Output.

id value
2 50
3 100
4 150
5 200

Rule of thumb. When you read x BETWEEN a AND b, mentally rewrite it to x >= a AND x <= b before you reason about edge cases. The shorthand hides nothing — but the closed-closed shape is exactly the part that breaks for timestamps and floats.

Worked example — NOT BETWEEN and the NULL gotcha

Detailed explanation. A common follow-up probe is NOT BETWEEN. The keyword negates the entire inclusion predicate — x NOT BETWEEN a AND b is x < a OR x > b. The interview gotcha is that it shares SQL's three-valued logic with every other comparison: NULL NOT BETWEEN a AND b evaluates to NULL, which fails the WHERE clause. Candidates who write WHERE amount NOT BETWEEN 100 AND 200 to "filter out the middle band" are surprised when NULL-amount rows also vanish.

Question. With the rows {1, 50}, {2, 100}, {3, 150}, {4, 200}, {5, 250}, {6, NULL}, what does WHERE value NOT BETWEEN 100 AND 200 return? Then rewrite it to include the NULL row explicitly.

Input.

id value
1 50
2 100
3 150
4 200
5 250
6 NULL

Code.

-- Naïve NOT BETWEEN — misses NULL rows
SELECT id, value
FROM ranges
WHERE value NOT BETWEEN 100 AND 200
ORDER BY id;

-- The fix when NULL should also be considered "outside the band"
SELECT id, value
FROM ranges
WHERE value NOT BETWEEN 100 AND 200
   OR value IS NULL
ORDER BY id;
Enter fullscreen mode Exit fullscreen mode

Step-by-step explanation.

  1. Row 1 (value=50). 50 NOT BETWEEN 100 AND 20050 < 100 OR 50 > 200 → true. Passes.
  2. Rows 2, 3, 4 (value=100, 150, 200). All inside the closed interval → NOT BETWEEN is false. Excluded.
  3. Row 5 (value=250). 250 > 200 → true. Passes.
  4. Row 6 (value=NULL). NULL NOT BETWEEN 100 AND 200NULL < 100 OR NULL > 200NULL OR NULLNULL. The WHERE clause requires true; NULL fails it. Excluded — even though it isn't "inside" the band.
  5. The fix. Add OR value IS NULL to include NULL rows explicitly. Three-valued logic doesn't bite if you handle the NULL case yourself.

Output (naïve query).

id value
1 50
5 250

Output (fixed query).

id value
1 50
5 250
6 NULL

Rule of thumb. BETWEEN, NOT BETWEEN, =, <>, <, >, <=, >= — every comparison against NULL is NULL, which fails the WHERE clause. If "NULL means missing data and should be visible," handle it with an explicit IS NULL or COALESCE.

SQL interview question on BETWEEN inclusion

A common opener: "We have a column score and a filter score BETWEEN 60 AND 100. A row with score = 100 showed up in production. Is that expected, and how would you write the filter to exclude exactly the upper boundary if a stakeholder asks?"

Solution Using a half-open rewrite score >= 60 AND score < 100

-- Original (inclusive both ends): includes 100
SELECT id, score
FROM grades
WHERE score BETWEEN 60 AND 100;

-- Rewrite to exclude the upper bound (half-open [60, 100)):
SELECT id, score
FROM grades
WHERE score >= 60 AND score < 100;

-- Rewrite to exclude the lower bound only ((60, 100]):
SELECT id, score
FROM grades
WHERE score > 60 AND score <= 100;
Enter fullscreen mode Exit fullscreen mode

Step-by-step trace.

id score BETWEEN 60 AND 100? >= 60 AND < 100?
1 55 no — fails score >= 60 no — fails score >= 60
2 60 yes — boundary, inclusive yes — boundary, inclusive lower
3 80 yes — inside yes — inside
4 100 yes — boundary, inclusive no — open upper, 100 excluded
5 101 no — fails score <= 100 no — fails score < 100

Output:

Filter id values that pass
BETWEEN 60 AND 100 2, 3, 4
>= 60 AND < 100 2, 3
> 60 AND <= 100 3, 4

Why this works — concept by concept:

  • BETWEEN is shorthand for closed-closedx BETWEEN a AND b is exactly x >= a AND x <= b. No dialect deviates from this.
  • Half-open [a, b)>= a AND < b includes the lower boundary, excludes the upper. The default production interval shape because adjacent intervals don't overlap.
  • Half-open (a, b]> a AND <= b is the mirror image. Rarer in practice; useful when "cumulative up to and including b" is the semantic you want.
  • Open-open (a, b)> a AND < b. Almost never the right choice; both boundaries are excluded.
  • Cost — all four forms compile to the same B-tree range scan; the cost difference is zero. The choice is purely a correctness decision, not a performance one.

SQL
Topic — filtering (SQL)
Filtering & range query problems (SQL)

Practice →


2. Numeric BETWEEN — INT, NUMERIC, FLOAT, and the precision trap

sql between numbers is predictable for INT and NUMERIC — and surprisingly broken for FLOAT

For most numeric types, BETWEEN does what the mental model says — closed-closed inclusion over exact arithmetic. The break is FLOAT (and its siblings DOUBLE PRECISION, REAL, BINARY_FLOAT). Floats are stored as IEEE-754 binary fractions, and most decimal values you type (0.1, 0.2, 0.3) cannot be represented exactly — they're rounded to the nearest binary fraction at insert time and again at compare time. That rounding occasionally pushes a "0.2" outside a "BETWEEN 0.1 AND 0.3" filter you swore would catch it.

The three numeric column families and how BETWEEN behaves on each.

  • Integer types (SMALLINT, INT, BIGINT). Exact storage. BETWEEN 100 AND 200 works the way the textbook says — every value 100..200 inclusive passes.
  • Fixed-point types (NUMERIC(p, s), DECIMAL(p, s)). Exact decimal storage. The right choice for money, percentages, counts that need fractional precision. BETWEEN over NUMERIC is bit-for-bit predictable; no hidden rounding.
  • Floating-point types (FLOAT, DOUBLE PRECISION, REAL). IEEE-754 binary, 32 or 64 bits. Most decimal literals are approximations. BETWEEN over FLOAT can quietly include or exclude values you'd expect to fall the other way.

The money rule. If a column represents currency, store it as NUMERIC(precision, scale) (or scaled integer cents) — never FLOAT. The BETWEEN price * 0.9 AND price * 1.1 calculation you'd write for "within 10% of list price" is safe over NUMERIC and unsafe over FLOAT. Production schemas in Stripe, Square, Plaid, and every accounting system treat this as a non-negotiable.

BETWEEN with calculated expressions.

  • Constant-on-both-sides. WHERE price BETWEEN 100 AND 200 — index-friendly.
  • Per-row expression on the right-hand side. WHERE price BETWEEN cost * 0.9 AND cost * 1.1 — re-evaluated per row, still index-friendly on price if there's a B-tree index.
  • Function on the column. WHERE LOWER(SUBSTRING(...)) BETWEEN 'a' AND 'm' — kills index usage. The optimiser can't see through the function. If you must filter on a transformed value, build a functional index on LOWER(SUBSTRING(...)).

The five most-asked numeric BETWEEN questions.

  • "Is BETWEEN 100 AND 200 inclusive of 100 and 200?" — yes, both endpoints.
  • "Does BETWEEN work with NULL?" — no. NULL BETWEEN a AND b is NULL (unknown), and NULL fails the WHERE clause; this is the standard three-valued logic story.
  • "Is BETWEEN faster than >= AND <=?" — identical; they compile to the same plan.
  • "When is BETWEEN over a FLOAT a bug?" — when a literal endpoint can't be represented exactly in binary, the comparison may misorder by one ULP.
  • "Should I use BETWEEN for percentages?" — only if the column is NUMERIC; if it's FLOAT, prefer NUMERIC casts or an explicit tolerance.

Worked example — BETWEEN over INT vs NUMERIC vs FLOAT

Detailed explanation. A small test table with three columns — one INT, one NUMERIC, one DOUBLE PRECISION — makes the precision difference visible in three rows. The INT and NUMERIC pass the same filter cleanly; the FLOAT row drops or admits values you didn't expect.

Question. A prices table stores the same logical value (e.g. 0.2) in three columns of different types. A query filters with BETWEEN 0.1 AND 0.3. Show which columns return all expected rows and which one silently misses one.

Input.

id i_value (INT) n_value (NUMERIC) f_value (DOUBLE)
1 0 0.10 0.1
2 0 0.20 0.2
3 0 0.30 0.3
4 0 0.40 0.4

(i_value is filled with 0 to keep the row count constant; the focus is on the NUMERIC and FLOAT columns.)

Code.

-- 1. NUMERIC BETWEEN — predictable
SELECT id, n_value
FROM prices
WHERE n_value BETWEEN 0.10 AND 0.30
ORDER BY id;

-- 2. FLOAT / DOUBLE BETWEEN — precision-sensitive
SELECT id, f_value, f_value::text AS exact_repr
FROM prices
WHERE f_value BETWEEN 0.1 AND 0.3
ORDER BY id;

-- 3. The safe rewrite when storage is locked to FLOAT
SELECT id, f_value
FROM prices
WHERE f_value >= 0.1 - 1e-9 AND f_value <= 0.3 + 1e-9
ORDER BY id;
Enter fullscreen mode Exit fullscreen mode

Step-by-step explanation.

  1. NUMERIC version. Storage is exact decimal. 0.10 <= 0.10, 0.10 <= 0.20, 0.10 <= 0.30 — all true; 0.30 >= 0.10, 0.30 >= 0.20, 0.30 >= 0.30 — all true; row 4 (0.40) fails. Rows 1, 2, 3 pass.
  2. FLOAT version. 0.1, 0.2, 0.3 cannot all be stored exactly in IEEE-754. The actual binary representations are roughly 0.10000000000000000555…, 0.20000000000000001110…, 0.29999999999999998890…. The endpoint literal 0.3 gets rounded one way; the stored 0.3 may be rounded the other way. In some engines this means f_value=0.3 fails f_value <= 0.3 by one ULP, and the row drops out.
  3. The safe rewrite. Add a tolerance epsilon (1e-9) on both sides. This widens the interval by less than 0.000000001 — wider than any FP rounding error for double precision in the range [0, 1], narrower than any meaningful business distinction.
  4. The real-world fix. Store currency, percentages, and exact decimals as NUMERIC(p, s). FLOAT is for tolerance-aware analytics (sensor readings, ML feature columns) where ULP-level differences don't matter.

Output.

Filter Rows returned
NUMERIC BETWEEN 0.10 AND 0.30 1, 2, 3 (deterministic)
FLOAT BETWEEN 0.1 AND 0.3 usually 1, 2, 3 — but on some engines 3 silently drops
FLOAT BETWEEN 0.1 - 1e-9 AND 0.3 + 1e-9 1, 2, 3 (safe)

Rule of thumb. If you're writing a BETWEEN over a money or percentage column, the column should be NUMERIC. If it's FLOAT and you can't change the schema, widen the interval by a tolerance epsilon — or rewrite to half-open with explicit casts.

Worked example — BETWEEN with a function on the column kills the index

Detailed explanation. The single most common reason a BETWEEN query is slow on a large table is a function or CAST applied to the indexed column. The optimiser can't see through the function, so it falls back to a full table scan. The fix is either to drop the function (rewrite the query) or to materialise the function as a functional index.

Question. A products(id BIGINT, price NUMERIC(10,2)) table has a B-tree index on price. The query WHERE ROUND(price) BETWEEN 100 AND 200 is slow. Explain why and rewrite it.

Input (relevant table stats).

Metric Value
Row count 10,000,000
Index CREATE INDEX prod_price_idx ON products(price)
Selectivity of [100, 200] ~ 2% of rows

Code.

-- ❌ Function on the indexed column — full scan
SELECT id, price
FROM products
WHERE ROUND(price) BETWEEN 100 AND 200;

-- ✅ Fix 1 — rewrite to use raw column
SELECT id, price
FROM products
WHERE price >= 99.5 AND price < 200.5;

-- ✅ Fix 2 — build a functional index on ROUND(price)
CREATE INDEX prod_round_price_idx ON products( (ROUND(price)) );
-- Now the original query can use the functional index
Enter fullscreen mode Exit fullscreen mode

Step-by-step explanation.

  1. Why the original is slow. The B-tree on price indexes the raw values 99.50, 99.51, 99.52, …. ROUND(price) is a computed value; the index has no entry for it. The planner sees no usable index and falls back to a sequential scan of all 10M rows.
  2. Fix 1 — rewrite the predicate. ROUND(price) BETWEEN 100 AND 200 is the same as price >= 99.5 AND price < 200.5 (because ROUND rounds half-up to the nearest integer). Now the predicate is on the raw column; the index is usable; the scan touches ~200K rows instead of 10M.
  3. Fix 2 — functional index. When the rewrite isn't possible (e.g. LOWER(name) BETWEEN 'a' AND 'm' can't be expressed without LOWER), build an index on the expression itself: CREATE INDEX ... ON products( (ROUND(price)) ). The optimiser now matches the index to the predicate.
  4. Cost trade-off. Functional indexes carry the same O(log N) insert cost as regular B-trees; they're slightly larger (the computed value is stored). Use them when the rewrite is impossible or the function appears in many queries.

Output (qualitative cost comparison).

Form Plan Rows touched Latency (typical)
ROUND(price) BETWEEN ... (no functional index) Seq Scan 10,000,000 2–5 s
price >= 99.5 AND price < 200.5 Index Range Scan ~200,000 5–20 ms
ROUND(price) BETWEEN ... (with functional index) Index Range Scan ~200,000 5–20 ms

Rule of thumb. Functions and CASTs on the indexed column are the universal SQL performance footgun. Either keep the function off the column or materialise it as a functional index. Both BETWEEN and half-open inherit this rule; neither is a fix on its own.

SQL interview question on numeric range filtering

The probe usually sounds like: "We have a transactions(amount NUMERIC(12,2)) table and we want every transaction within ±10% of a baseline 100.00. Write the query, and explain why this is safe to do without a CAST."

Solution Using BETWEEN baseline * 0.9 AND baseline * 1.1 over NUMERIC

WITH baseline AS (SELECT 100.00::NUMERIC(12,2) AS amt)
SELECT t.id, t.amount
FROM transactions t, baseline b
WHERE t.amount BETWEEN b.amt * 0.9 AND b.amt * 1.1
ORDER BY t.amount;
Enter fullscreen mode Exit fullscreen mode

Step-by-step trace.

Step What happens
1 baseline.amt evaluates to 100.00 as NUMERIC(12,2).
2 b.amt * 0.9 is computed in NUMERIC arithmetic → 90.00.
3 b.amt * 1.1 similarly → 110.00.
4 The planner rewrites t.amount BETWEEN 90.00 AND 110.00 to t.amount >= 90.00 AND t.amount <= 110.00.
5 Any B-tree index on t.amount is used as a range scan over [90.00, 110.00].

Output:

id amount
7 90.00
12 92.50
18 100.00
21 105.75
23 110.00

Why this works — concept by concept:

  • NUMERIC arithmeticNUMERIC * NUMERIC is exact; no IEEE-754 rounding. 100.00 * 0.9 is bit-for-bit 90.00, not 89.99999999999999.
  • Constant-folded bounds — the planner sees the bounds as constants for the duration of the query (after the join unfolds), so the B-tree index on t.amount is used as a range scan.
  • BETWEEN as syntactic sugar — the optimiser unfolds it to >= AND <= before planning. There is no separate operator at the engine level.
  • No CAST needed — both sides are already NUMERIC; the engine performs the comparison in fixed-point arithmetic without a single rounding step.
  • CostO(log N + K) where N is the table size and K is the number of matching rows. Identical to writing >= AND <= by hand.

SQL
Topic — aggregation (SQL)
Numeric range and aggregation problems (SQL)

Practice →


3. Date BETWEEN — the time-truncation pitfall

sql between dates is the #1 production query bug in the entire range-query family

If you only remember one thing from this guide, make it this: created_at BETWEEN '2026-01-01' AND '2026-01-31' does not return "all of January". It returns every row from midnight Jan 1 through midnight Jan 31 — that is, January 1 through January 30 plus exactly the first second of January 31. Every row from 00:00:01 Jan 31 through 23:59:59.999999 Jan 31 is silently dropped.

The reason is simple — the right-hand operand '2026-01-31' is a DATE literal, and when it's compared against a TIMESTAMP column the engine casts it to '2026-01-31 00:00:00'. The comparison created_at <= '2026-01-31 00:00:00' is true for the first second of Jan 31 and false for every later moment that day. The bug is the implicit cast.

Where this surfaces in real systems.

  • Daily / monthly revenue reports that miss the last day's revenue. Finance team notices a few percent gap; on-call engineer spends a day finding it.
  • Cohort analyses that drop "people who joined on day 31" because the cohort window was written as joined BETWEEN cohort_start AND cohort_end.
  • Backfill scripts that say WHERE event_date BETWEEN '2026-01-01' AND '2026-01-31' and silently skip 23h 59m of Jan 31's events.
  • Audit queries that confirm "no records between dates X and Y" — and miss the records that landed on Y after midnight.

Three fixes — pick one and stay consistent.

  • Fix 1 — Cast the column to DATE. created_at::date BETWEEN '2026-01-01' AND '2026-01-31'. Conceptually simplest. Performance downside: applies a function to the column on the left side of the comparison, which usually disables B-tree index usage. Acceptable on small tables; a problem on warehouse-scale tables. You can rescue it with a functional index on (created_at::date).
  • Fix 2 — Half-open interval (the production default). created_at >= '2026-01-01' AND created_at < '2026-02-01'. No function on the column; B-tree index on created_at is used as a range scan; correct down to the last microsecond. This is what every senior DE writes.
  • Fix 3 — date_trunc('day', …) semantics. Useful when the half-open boundary is dynamic (e.g. "from N days ago to now"): created_at >= date_trunc('day', NOW() - INTERVAL '30 days') AND created_at < date_trunc('day', NOW() + INTERVAL '1 day'). Same B-tree behaviour, expressive endpoints.

TIMESTAMP vs TIMESTAMPTZ in BETWEEN.

  • TIMESTAMP (without time zone). The comparison happens at face value. If your column is naïve TIMESTAMP and your endpoints are naïve DATE literals, the engine casts the DATE to midnight in the same naïve clock — the pitfall above applies as described.
  • TIMESTAMPTZ (with time zone). The column is stored in UTC; literals are converted to UTC at parse time, based on the session's timezone setting. A '2026-01-31' in a session whose timezone is America/New_York becomes 2026-01-31 05:00:00+00 in UTC — and your BETWEEN now drops the last 24 hours of the New York day. The pitfall is amplified: you need to be both date-aware and timezone-aware.

The safest pattern when timezones are in play: write the boundaries explicitly as timestamps with timezone, e.g. created_at >= '2026-01-01 00:00:00-05:00' AND created_at < '2026-02-01 00:00:00-05:00', or use date_trunc('day', created_at AT TIME ZONE 'America/New_York') — which is verbose but unambiguous.

Worked example — show the date BETWEEN bug and the half-open fix

Detailed explanation. A logins(user_id, login_at TIMESTAMP) table tracks user logins. We want every login in January 2026. The naïve query uses BETWEEN '2026-01-01' AND '2026-01-31'. We then rewrite it to half-open and observe the difference.

Question. With the rows below, return all January-2026 logins, then compare the BETWEEN query's result with the half-open rewrite and explain the gap.

Input.

user_id login_at
1 2026-01-01 00:00:00
2 2026-01-15 12:30:00
3 2026-01-30 23:59:59
4 2026-01-31 00:00:00
5 2026-01-31 09:15:42
6 2026-01-31 23:59:59
7 2026-02-01 00:00:00
8 2026-02-01 09:00:00

Code.

-- Naïve BETWEEN — the bug
SELECT user_id, login_at
FROM logins
WHERE login_at BETWEEN '2026-01-01' AND '2026-01-31'
ORDER BY login_at;

-- Half-open rewrite — the fix
SELECT user_id, login_at
FROM logins
WHERE login_at >= '2026-01-01'
  AND login_at <  '2026-02-01'
ORDER BY login_at;
Enter fullscreen mode Exit fullscreen mode

Step-by-step explanation.

  1. Naïve BETWEEN execution. The engine unfolds to login_at >= '2026-01-01 00:00:00' AND login_at <= '2026-01-31 00:00:00'. The right-hand bound is exactly midnight on Jan 31.
  2. Boundary at Jan 31 00:00:00. Row 4 (login_at = 2026-01-31 00:00:00) satisfies <= '2026-01-31 00:00:00' — it passes by a single second.
  3. Boundary at Jan 31 09:15:42 and 23:59:59. Rows 5 and 6 are after midnight on Jan 31. '2026-01-31 09:15:42' <= '2026-01-31 00:00:00' is false. They are silently dropped.
  4. Half-open rewrite. The upper bound becomes < '2026-02-01' — equivalent to < '2026-02-01 00:00:00'. Every Jan 31 timestamp now passes (anything strictly before Feb 1 midnight). Rows 4, 5, 6 are all included. Rows 7 and 8 (Feb 1) are correctly excluded.
  5. The 23-hour gap. The BETWEEN query missed two of the three Jan-31 records. In a high-traffic production system this is roughly ~ traffic_per_day of missing rows on every end-of-month report.

Output.

Query Rows returned (user_ids) Missed
BETWEEN '2026-01-01' AND '2026-01-31' 1, 2, 3, 4 5, 6 (silently dropped)
>= '2026-01-01' AND < '2026-02-01' 1, 2, 3, 4, 5, 6 none — fully correct

Rule of thumb. When the column is a TIMESTAMP or TIMESTAMPTZ and you're filtering by calendar dates, never write BETWEEN. Write col >= start AND col < end_plus_one_day. The half-open pattern is the only one that is correct independent of the column's time resolution.

Worked example — TIMESTAMPTZ + session timezone amplifies the pitfall

Detailed explanation. A finance team in New York runs a "January revenue" report. The column is TIMESTAMPTZ; the session timezone is America/New_York. The query uses BETWEEN '2026-01-01' AND '2026-01-31'. The report comes in noticeably under the marketing team's projection — by both the missing-Jan-31 records and a five-hour timezone shift that pushes some early-Jan records into late-Dec UTC.

Question. With orders(placed_at TIMESTAMPTZ) and SET timezone = 'America/New_York', show how BETWEEN '2026-01-01' AND '2026-01-31' actually translates internally, and rewrite for a correct New-York-local "January" window.

Input (relevant rows).

order_id placed_at (in UTC) placed_at (in America/New_York)
A 2026-01-01 02:00:00+00 2025-12-31 21:00:00 (NY) — late December
B 2026-01-01 12:00:00+00 2026-01-01 07:00:00 (NY) — Jan 1 morning
C 2026-01-31 12:00:00+00 2026-01-31 07:00:00 (NY) — Jan 31 morning
D 2026-01-31 23:00:00+00 2026-01-31 18:00:00 (NY) — Jan 31 evening
E 2026-02-01 04:00:00+00 2026-01-31 23:00:00 (NY) — late Jan 31

Code.

SET timezone = 'America/New_York';

-- ❌ Naïve — dates interpreted in session TZ, then converted to UTC for compare
SELECT order_id, placed_at
FROM orders
WHERE placed_at BETWEEN '2026-01-01' AND '2026-01-31';

-- ✅ Fix — fully-qualified timestamps anchor to New-York-local
SELECT order_id, placed_at
FROM orders
WHERE placed_at >= '2026-01-01 00:00:00-05:00'
  AND placed_at <  '2026-02-01 00:00:00-05:00';

-- ✅ Alternative — date_trunc with AT TIME ZONE for self-documenting code
SELECT order_id, placed_at
FROM orders
WHERE (placed_at AT TIME ZONE 'America/New_York') >= '2026-01-01'
  AND (placed_at AT TIME ZONE 'America/New_York') <  '2026-02-01';
Enter fullscreen mode Exit fullscreen mode

Step-by-step explanation.

  1. What the naïve query becomes. With session TZ = America/New_York, '2026-01-01' is parsed as 2026-01-01 00:00:00 America/New_York, which is 2026-01-01 05:00:00+00 UTC. Similarly '2026-01-31' becomes 2026-01-31 05:00:00+00. So the filter is placed_at BETWEEN '2026-01-01 05:00:00+00' AND '2026-01-31 05:00:00+00'.
  2. Row A — UTC 02:00 Jan 1. In NY this is Dec 31; the user wants it excluded from "January NY revenue." The filter >= '2026-01-01 05:00:00+00' excludes it (row A's 02:00 < 05:00). Correct.
  3. Row B — UTC 12:00 Jan 1. NY local: Jan 1 morning. Included. Correct.
  4. Row C — UTC 12:00 Jan 31. NY local: Jan 31 morning. The filter <= '2026-01-31 05:00:00+00' excludes it because 12:00 > 05:00. Silently dropped — bug.
  5. Row D — UTC 23:00 Jan 31. NY local: Jan 31 evening. Silently dropped.
  6. Row E — UTC 04:00 Feb 1. NY local: Jan 31 23:00. The "real" January (NY-local) should include this. Silently dropped.
  7. The fix. Fully-qualified '2026-01-01 00:00:00-05:00' and '2026-02-01 00:00:00-05:00' pin the bounds to midnight NY-local. Rows B, C, D, E all pass; row A correctly fails. The half-open form also avoids the closed-closed pitfall.

Output (count comparison).

Form Rows returned (order_ids) Bug?
BETWEEN '2026-01-01' AND '2026-01-31' (session TZ NY) B only yes — C, D, E silently dropped
Fully-qualified >= …-05:00 AND < …-05:00 B, C, D, E none

Rule of thumb. With TIMESTAMPTZ columns, never let date literals be interpreted in the session's timezone implicitly. Either anchor the bounds to fully-qualified TIMESTAMP WITH TIME ZONE literals, or convert the column to a local timestamp via AT TIME ZONE and compare against naïve date literals — both forms are explicit and reproducible across sessions.

SQL interview question on date range filtering

A common probe: "We want all orders placed in January 2026, where orders.placed_at is TIMESTAMP. Write the most correct, most index-friendly query. Then defend why you didn't use BETWEEN."

Solution Using a half-open interval >= '2026-01-01' AND < '2026-02-01'

SELECT order_id, placed_at, customer_id, amount
FROM orders
WHERE placed_at >= '2026-01-01'
  AND placed_at <  '2026-02-01'
ORDER BY placed_at;
Enter fullscreen mode Exit fullscreen mode

Step-by-step trace.

Step Action Effect
1 Parse '2026-01-01' and '2026-02-01' as DATE literals both implicit-cast to TIMESTAMP at 00:00:00 on the respective day
2 Plan with B-tree index on placed_at range scan over [2026-01-01 00:00:00, 2026-02-01 00:00:00)
3 Scan leaf pages for placed_at keys in the range every Jan-31 timestamp ≤ 23:59:59.999999 passes
4 Return matching rows in index order results are time-sorted with no extra sort cost
5 Caller can replace the lower bound with date_trunc('month', NOW()) for "this month" reports — same shape, dynamic boundary

Output:

order_id placed_at customer_id amount
8801 2026-01-01 00:00:00 42 120.00
8802 2026-01-15 12:30:00 17 89.50
8803 2026-01-30 23:59:59 42 24.99
8804 2026-01-31 00:00:00 7 41.20
8805 2026-01-31 09:15:42 17 75.00
8806 2026-01-31 23:59:59 42 18.75

Why this works — concept by concept:

  • Half-open semantics[start, end) includes every microsecond from start through end - 1µs. Jan 31 in full, Feb 1 not at all.
  • No function on the column — the comparison placed_at >= '2026-01-01' is a direct B-tree key lookup; the optimiser uses the index as a range scan. Wrapping the column in date_trunc(...) or ::date would have disabled the index unless a functional index existed.
  • Time-grain independence — the same query is correct whether placed_at is TIMESTAMP, TIMESTAMPTZ, or even TIMESTAMP(6) with microsecond precision. The half-open shape doesn't care.
  • Composability for monthly buckets — back-to-back monthly windows [Jan, Feb), [Feb, Mar) partition the year into non-overlapping, non-gapping intervals. Sum-by-month reports don't double-count and don't drop records.
  • CostO(log N + K) B-tree range scan. Identical to the broken BETWEEN form — the correctness difference is everything, the performance difference is zero.

SQL
Topic — date functions (SQL)
Date range and date-function problems (SQL)

Practice →


4. Half-open intervals — the production pattern

[start, end) is the only interval notation that composes cleanly across SQL, Python, and pandas

Half-open intervals — closed on the lower end, open on the upper end — are the production default for one structural reason: adjacent half-open intervals don't overlap and don't gap. [Jan, Feb) ends just before Feb 1 00:00:00; [Feb, Mar) begins exactly at Feb 1 00:00:00. Every moment in time belongs to exactly one interval. This is the property that makes monthly cohort buckets, daily partition windows, and rolling 7-day aggregates summable.

![Visual diagram of half-open intervals — top a number line shows two back-to-back intervals [Jan, Feb) and Feb, Mar) without overlap; bottom a four-pattern matrix shows closed/open combinations [a,b], [a,b), (a,b], (a,b); a small OVERLAPS operator + tstzrange chip; on a light PipeCode card.

The four-quadrant interval notation.

  • [a, b] — closed-closed. What SQL BETWEEN does. Both endpoints inclusive. Adjacent intervals double-count the shared boundary; consecutive [Jan, Feb-1], [Feb, Feb+27] triggers a "where does Feb 1 belong?" headache.
  • [a, b) — closed-open. Lower inclusive, upper exclusive. The production default. a >= x AND x < b. Adjacent intervals compose cleanly.
  • (a, b] — open-closed. Upper inclusive, lower exclusive. The mirror image; less common. Sometimes natural for "cumulative-up-to-and-including" buckets.
  • (a, b) — open-open. Both endpoints excluded. Almost never the right tool; reserved for "strictly inside" mathematical statements.

Why half-open also wins for off-by-one safety.

  • Length is b - a. A half-open interval over integers contains exactly b - a integers; over time, exactly b - a of whatever time unit you're using. The closed-closed form over integers contains b - a + 1 — the + 1 is where every off-by-one bug lives.
  • Empty interval is [a, a). Half-open intervals have a clean empty form; closed-closed has the awkward [a, a] which contains exactly one point.
  • Slicing matches array/Python conventions. Python's range(a, b) and slicing arr[a:b] are half-open. Half-open SQL aligns with the language your downstream code is written in.

ANSI SQL OVERLAPS — interval-vs-interval overlap testing.

  • Postgres and Oracle support the OVERLAPS operator with the signature (start1, end1) OVERLAPS (start2, end2). It returns true iff the two intervals share at least one moment.
  • Equivalent expansion: start1 < end2 AND start2 < end1 — the standard interval overlap check.
  • OVERLAPS is half-open by definition in the SQL standard — back-to-back (Jan, Feb) and (Feb, Mar) do not overlap. (Which is exactly the property you want.)

Postgres tstzrange — first-class interval columns.

  • Range types (int4range, numrange, daterange, tsrange, tstzrange) let you store an interval as a single value. Literal syntax '[a, b)' includes the closed/open markers in the value itself.
  • Operators: @> (contains), && (overlaps), -|- (is adjacent to). Index with GiST (USING gist (range_col)).
  • Useful for booking systems, availability calendars, time-bounded permissions — anywhere a row's natural payload is an interval.

Worked example — back-to-back monthly buckets without double-counting

Detailed explanation. A reporting team wants monthly revenue buckets for Jan, Feb, Mar 2026. Naïve BETWEEN over date ranges double-counts every month-boundary record and misses end-of-month records. Half-open buckets avoid both bugs.

Question. Compute revenue for Jan, Feb, Mar 2026 from orders(placed_at TIMESTAMP, amount NUMERIC) such that every order is counted exactly once and end-of-month orders are not dropped.

Input.

order_id placed_at amount
1 2026-01-15 10:00:00 100.00
2 2026-01-31 23:59:59 50.00
3 2026-02-01 00:00:00 75.00
4 2026-02-15 14:00:00 80.00
5 2026-02-28 23:59:59 30.00
6 2026-03-01 00:00:00 95.00
7 2026-03-15 09:00:00 60.00

Code.

SELECT
  CASE
    WHEN placed_at >= '2026-01-01' AND placed_at < '2026-02-01' THEN 'Jan'
    WHEN placed_at >= '2026-02-01' AND placed_at < '2026-03-01' THEN 'Feb'
    WHEN placed_at >= '2026-03-01' AND placed_at < '2026-04-01' THEN 'Mar'
  END AS month_bucket,
  SUM(amount) AS revenue
FROM orders
WHERE placed_at >= '2026-01-01' AND placed_at < '2026-04-01'
GROUP BY 1
ORDER BY 1;
Enter fullscreen mode Exit fullscreen mode

Step-by-step explanation.

  1. Order 2 — Jan 31 23:59:59. Falls in [Jan, Feb) = [2026-01-01, 2026-02-01) because '2026-01-31 23:59:59' < '2026-02-01' is true. Counted in Jan.
  2. Order 3 — Feb 1 00:00:00. '2026-02-01 00:00:00' < '2026-02-01' is false, so it fails the Jan bucket. '2026-02-01 00:00:00' >= '2026-02-01' is true, so it falls into the Feb bucket. Counted in Feb, not Jan.
  3. Order 5 — Feb 28 23:59:59. Strictly before Mar 1 00:00:00, so it's in Feb.
  4. Order 6 — Mar 1 00:00:00. Belongs to Mar.
  5. No double-count. Every order is assigned to exactly one bucket because the half-open intervals partition time without overlap. If we had used BETWEEN (closed-closed) on the same boundaries, orders 3 and 6 would each belong to two buckets — and the report would silently inflate revenue.

Output.

month_bucket revenue
Jan 150.00
Feb 185.00
Mar 155.00

Rule of thumb. When you bucket time into N consecutive intervals, always use half-open. Closed-closed buckets are a recipe for silent double-counting at every interval boundary.

Worked example — Postgres tstzrange for first-class interval columns

Detailed explanation. When a row's natural payload is an interval — a booking, a permission grant, a price-validity window — store it as a tstzrange instead of two columns. The closed/open marker lives in the value ('[)' is half-open, '[]' is closed-closed). Range operators (@>, &&, -|-) read more cleanly than the equivalent start1 < end2 AND … arithmetic, and a GiST index supports them all.

Question. A room_bookings(booking_id BIGINT, room_id INT, period tstzrange) table stores each booking as a single tstzrange with '[)' half-open semantics. A new request comes in for room_id=7 over [2026-03-10 14:00, 2026-03-10 16:00). Write the conflict query and explain how the index supports it.

Input.

booking_id room_id period
101 7 [2026-03-10 10:00, 2026-03-10 12:00)
102 7 [2026-03-10 13:00, 2026-03-10 15:00)
103 7 [2026-03-10 15:30, 2026-03-10 17:00)
104 7 [2026-03-10 16:00, 2026-03-10 18:00)
105 7 [2026-03-11 09:00, 2026-03-11 10:00)

Code.

-- Schema
CREATE TABLE room_bookings (
  booking_id BIGSERIAL PRIMARY KEY,
  room_id    INT NOT NULL,
  period     TSTZRANGE NOT NULL,
  EXCLUDE USING gist (room_id WITH =, period WITH &&)
);

-- Conflict query using the && (overlap) operator
SELECT booking_id, period
FROM room_bookings
WHERE room_id = 7
  AND period && tstzrange('2026-03-10 14:00+00',
                          '2026-03-10 16:00+00',
                          '[)');
Enter fullscreen mode Exit fullscreen mode

Step-by-step explanation.

  1. Schema. period TSTZRANGE stores both endpoints + the inclusion markers in one column. The EXCLUDE USING gist constraint prevents overlapping bookings for the same room at insert time — the database itself enforces the business rule.
  2. The && operator. a && b is true iff intervals a and b overlap by at least one moment. Half-open semantics are respected — [10:00, 12:00) and [12:00, 14:00) do not overlap.
  3. Booking 101. [10:00, 12:00) vs request [14:00, 16:00). No overlap. Skipped.
  4. Booking 102. [13:00, 15:00) vs request [14:00, 16:00). Overlaps on [14:00, 15:00). Conflict.
  5. Booking 103. [15:30, 17:00) vs request [14:00, 16:00). Overlaps on [15:30, 16:00). Conflict.
  6. Booking 104. [16:00, 18:00) vs request [14:00, 16:00). Back-to-back; shared boundary 16:00 belongs to the later interval. No overlap.
  7. Booking 105. Different day, no overlap.

Output.

booking_id period
102 [2026-03-10 13:00, 2026-03-10 15:00)
103 [2026-03-10 15:30, 2026-03-10 17:00)

Rule of thumb. When the row models an interval (booking, permission, pricing window), use tstzrange over two-columns; pair with EXCLUDE USING gist to let the database refuse to insert conflicting rows. It's the cleanest interval API in any SQL dialect.

SQL interview question on interval overlap testing

A senior probe: "We have a bookings(room_id, starts_at TIMESTAMP, ends_at TIMESTAMP) table. A new request comes in for (room_id, request_start, request_end). Write a query that returns conflicting bookings for that room."

Solution Using ANSI OVERLAPS (or the equivalent start1 < end2 AND start2 < end1)

-- Postgres / Oracle — ANSI OVERLAPS operator
SELECT booking_id, starts_at, ends_at
FROM bookings
WHERE room_id = :room_id
  AND (starts_at, ends_at) OVERLAPS (:request_start, :request_end);

-- Portable rewrite (works in every dialect)
SELECT booking_id, starts_at, ends_at
FROM bookings
WHERE room_id = :room_id
  AND starts_at < :request_end
  AND :request_start < ends_at;
Enter fullscreen mode Exit fullscreen mode

Step-by-step trace.

Suppose :request_start = '2026-03-10 14:00', :request_end = '2026-03-10 16:00', room_id = 7:

booking_id starts_at ends_at starts_at < request_end? request_start < ends_at? Overlap?
101 2026-03-10 10:00 2026-03-10 12:00 10:00 < 16:00 → true 14:00 < 12:00 → false no
102 2026-03-10 13:00 2026-03-10 15:00 13:00 < 16:00 → true 14:00 < 15:00 → true yes
103 2026-03-10 15:30 2026-03-10 17:00 15:30 < 16:00 → true 14:00 < 17:00 → true yes
104 2026-03-10 16:00 2026-03-10 18:00 16:00 < 16:00 → false 14:00 < 18:00 → true no — back-to-back, half-open
105 2026-03-11 09:00 2026-03-11 10:00 09:00 < 16:00 → false (next day) 14:00 < 10:00 → false no

Output:

booking_id starts_at ends_at
102 2026-03-10 13:00 2026-03-10 15:00
103 2026-03-10 15:30 2026-03-10 17:00

Why this works — concept by concept:

  • Interval overlap identity — two half-open intervals [s1, e1) and [s2, e2) overlap iff s1 < e2 AND s2 < e1. The condition is symmetric, branchless, and handles every case (containment, partial overlap, identical, back-to-back).
  • Back-to-back is not an overlap — booking 104 (16:00 → 18:00) does not overlap with the request ending at 16:00. Half-open intervals treat the shared boundary as belonging to the later interval — exactly what a room scheduler wants.
  • ANSI OVERLAPS — Postgres and Oracle expose this as a built-in operator; SQL Server and MySQL require the manual < rewrite. Both forms compile to the same plan.
  • Indexable — with (room_id, starts_at) as a composite B-tree index, the room_id = ? predicate is an equality seek and the starts_at < ? is a range scan suffix. Add (room_id, ends_at) if ends_at becomes the more selective side.
  • CostO(log N + K) per query; cheap enough to run on every booking-request POST.

SQL
Topic — filtering (SQL)
Interval, overlap, and range-filter problems (SQL)

Practice →


5. BETWEEN performance — indexes, BRIN, partition pruning

sql range query performance is decided by three index strategies — B-tree, BRIN, and partition pruning

A BETWEEN (or its half-open equivalent) is only as fast as the index strategy the optimiser can apply to the column. The three strategies that actually matter for production-scale range queries are B-tree range scans (the default), BRIN block-range indexes (for time-ordered append-only data in Postgres), and partition pruning (for declaratively partitioned tables). Pick by table size and access pattern; combine when the data set crosses TB-scale.

Strategy 1 — B-tree range scan (the default).

  • The planner descends the tree to the first key in the range, then scans contiguous leaf pages until it passes the upper bound.
  • Cost: O(log N + K) — logarithmic descent plus linear scan of matching keys.
  • Works on any column type with a total order — INT, NUMERIC, DATE, TIMESTAMP, VARCHAR, BYTEA.
  • The default index in every dialect; no setup beyond CREATE INDEX ... ON table(col).
  • Caveat: a function applied to the column (WHERE date_trunc('day', created_at) BETWEEN ...) hides the column from the optimiser and forces a full scan. Either drop the function or build a functional index.

Strategy 2 — BRIN (Block Range Index) in Postgres.

  • A BRIN stores one (min, max) summary per block range (default 128 pages, ~1MB on disk). At query time, the planner skips any block range whose summary doesn't intersect the WHERE predicate.
  • Cost: O(blocks_examined). For naturally time-ordered data (every insert is "now"), the index is tiny — often 1000× smaller than the equivalent B-tree.
  • The fit: append-only, time-ordered data, large tables. Classic workload — IoT, log shipping, metrics, event streams persisted in Postgres.
  • The anti-fit: heavily-updated columns, columns not correlated with physical order, OLTP point lookups. BRIN excels at "scan a recent window of a huge table"; it's terrible at "find a specific row by id."

Strategy 3 — Partition pruning.

  • Declaratively partition the table by date (or another low-cardinality range key) — CREATE TABLE events ... PARTITION BY RANGE (event_date) in Postgres; PARTITION BY (date_trunc('month', event_ts)) analogues in BigQuery, Snowflake, Redshift.
  • The planner inspects the WHERE clause and only reads the partitions whose range overlaps the predicate.
  • Cost: O(partitions_read × cost_per_partition). The 11 untouched months never load a page.
  • Works with cloud warehouses out of the box: BigQuery and Snowflake auto-prune by partition column without an explicit index.

When BETWEEN is the wrong choice for performance.

  • Function on the columnWHERE LOWER(name) BETWEEN 'a' AND 'm' disables the B-tree on name. Fix: rewrite to half-open, or build a functional index.
  • CAST on the columnWHERE created_at::date BETWEEN '2026-01-01' AND '2026-01-31' disables the B-tree on created_at. Fix: rewrite to half-open without the CAST.
  • OR across non-contiguous rangesWHERE x BETWEEN 1 AND 10 OR x BETWEEN 100 AND 110. The planner often falls back to a full scan. Fix: WHERE x IN (range1) OR x IN (range2) if discrete, or use a UNION ALL of two range queries.
  • LIKE with leading wildcardWHERE name LIKE '%foo%' is a different beast but lives in the same "no index usable" family. Mentioned here for completeness; covered fully in the dedicated LIKE interview blog.

Combining strategies — the multi-TB pattern.

The pattern senior data engineers actually deploy on giant time-series tables is partition pruning + BRIN on the partition column (or B-tree on a more selective column inside the partition). The partition prune drops 95%+ of the table; the BRIN inside each surviving partition drops most of the remaining blocks; the final B-tree (e.g. on customer_id) finds the specific rows. Three layers of selectivity, each cheap.

Worked example — explain BRIN vs B-tree for a 1-billion-row events table

Detailed explanation. A Postgres table events(event_ts TIMESTAMP, event_type TEXT, user_id BIGINT, payload JSONB) has 1 billion rows; new events are inserted in order of event_ts (append-only). We need WHERE event_ts >= NOW() - INTERVAL '7 days' AND event_ts < NOW() to be fast.

Question. Compare a B-tree on event_ts to a BRIN on event_ts for this workload. Which one would you pick and why?

Input (relevant table stats).

Metric Value
Row count 1,000,000,000
Disk size ~ 600 GB
7-day window row count ~ 25,000,000 (about 2.5% of total)
Physical insert order strictly monotonic by event_ts

Code.

-- Option A: B-tree on event_ts
CREATE INDEX events_event_ts_btree
  ON events (event_ts);

-- Option B: BRIN on event_ts
CREATE INDEX events_event_ts_brin
  ON events USING BRIN (event_ts) WITH (pages_per_range = 128);

-- The query (same for both indexes)
SELECT event_type, COUNT(*)
FROM events
WHERE event_ts >= NOW() - INTERVAL '7 days'
  AND event_ts <  NOW()
GROUP BY event_type
ORDER BY COUNT(*) DESC;
Enter fullscreen mode Exit fullscreen mode

Step-by-step explanation.

  1. B-tree size. A B-tree on 1B rows of TIMESTAMP is roughly ~ 1B × 24 bytes (key + pointer overhead) = ~ 25 GB. Big, but acceptable.
  2. BRIN size. With pages_per_range=128 and the table at ~75M pages, the BRIN has 75M / 128 ≈ 590K block ranges; each summary is ~20 bytes. Total BRIN size ≈ 12 MB. Three orders of magnitude smaller than the B-tree.
  3. Query against B-tree. Descend the tree to the first key ≥ NOW() - 7d, then scan leaf pages until past NOW(). Touches ~25M index entries plus the corresponding heap pages. Cost: I/O ≈ 25M random-ish heap fetches.
  4. Query against BRIN. Inspect each of the ~590K block-range summaries, keep the ones whose [min_ts, max_ts] overlap the 7-day window. Because the table is monotonically inserted, only ~ 7 / 365 × 590K ≈ 11K block ranges are kept → ~1.4M pages → ~ 25M rows scanned sequentially. The crucial difference: sequential I/O, not random.
  5. Why BRIN wins here. Sequential I/O is 10–30× faster than random I/O on spinning disks and 3–10× faster on SSDs. For "recent window of a huge, append-only table," BRIN is the right answer. For "find a specific event by id," B-tree on id is.
  6. Caveat — when BRIN regresses. If the physical insert order drifts away from time order (e.g. a backfill prepends old data), BRIN block-range summaries become loose and the index degenerates to a full scan. Use pg_summarize_brin_ranges() and REINDEX to keep it healthy.

Output (qualitative cost comparison).

Aspect B-tree on event_ts BRIN on event_ts
Index size ~ 25 GB ~ 12 MB
Insert overhead per row tree rebalancing append summary block
Query I/O pattern random-ish heap fetches sequential block ranges
7-day query latency (typical) 30–120s 3–8s
Best for point lookups + small windows recent window of large append-only tables

Rule of thumb. For time-ordered append-only Postgres tables larger than ~ 100M rows, always evaluate BRIN before B-tree for the timestamp column. The 1000× size saving and the sequential I/O pattern together turn what used to be a 2-minute query into a 5-second query.

Worked example — composite index for (room_id, period) range queries

Detailed explanation. When a range query also filters by a low-cardinality equality column (room_id, tenant_id, customer_id), the textbook strategy is a composite index with the equality column first and the range column second. The optimiser uses the equality column as an index seek, then the range column as a contiguous suffix scan — cost stays O(log N + K) and only the matching equality bucket is touched.

Question. With bookings(room_id INT, starts_at TIMESTAMP, ends_at TIMESTAMP, …) and the typical query WHERE room_id = ? AND starts_at >= ? AND starts_at < ?, design the optimal composite index and trace the plan.

Input (relevant table stats).

Metric Value
Row count 50,000,000
Distinct room_id 5,000
Average rows per room_id ~ 10,000
Selectivity of a 1-day starts_at window ~ 0.3% within a room

Code.

-- Composite B-tree, equality column FIRST
CREATE INDEX bookings_room_starts_idx
  ON bookings (room_id, starts_at);

-- The query
SELECT booking_id, starts_at, ends_at
FROM bookings
WHERE room_id = 7
  AND starts_at >= '2026-03-10'
  AND starts_at <  '2026-03-11';
Enter fullscreen mode Exit fullscreen mode

Step-by-step explanation.

  1. Why room_id first. The optimiser uses each leading column as an equality seek; only the trailing column can be a range. With (room_id, starts_at), an exact match on room_id narrows the index to one contiguous slice; the range on starts_at then scans within that slice.
  2. Why not (starts_at, room_id). Reversing the order makes starts_at the leading column, which is the range. The optimiser still uses the index — but it now has to scan every key in the time window across every room (5,000× more pages), then filter for room_id=7 after the fact.
  3. Plan trace (correct order). Seek to (7, '2026-03-10') → scan contiguous keys until (7, '2026-03-11') → ~30 matching rows → done in ~5–10 µs.
  4. Plan trace (wrong order). Seek to ('2026-03-10', *) → scan ~150K keys (entire day across 5000 rooms) → filter to room_id=7 → ~30 matching rows in ~50–100 ms. 10,000× more I/O for the same answer.
  5. The general rule. Equality columns first, range column last in the composite index. The leftmost-prefix rule means the index also covers queries that filter on room_id alone (e.g. "all bookings for room 7 across all time") — a bonus you don't get from (starts_at, room_id).

Output (qualitative trace).

Index design Plan Rows touched Latency
(room_id, starts_at) Index Seek + Range ~30 ~5 µs
(starts_at, room_id) Index Range + Filter ~150,000 ~50 ms
(room_id) only Index Seek + heap filter ~10,000 ~5 ms
no index Seq Scan ~50,000,000 ~30 s

Rule of thumb. For mixed equality + range predicates, the composite index is (equality_cols…, range_col). The equality columns can be any order among themselves; the range column always last. If you have two range columns, only the first one in the composite is index-friendly — the second has to be applied as a post-filter.

Worked example — read an EXPLAIN ANALYZE for a BETWEEN range query

Detailed explanation. The fastest way to confirm that a range query is doing what you think is to read its plan with EXPLAIN (ANALYZE, BUFFERS). The three signals to look for are the node type (Index Scan good, Seq Scan bad without a reason), the Index Cond (the BETWEEN expression should appear here, not in Filter), and the rows planned vs actual divergence (a >10× miss means the statistics are stale).

Question. Given the plan output below for WHERE created_at BETWEEN '2026-05-01' AND '2026-05-07' on a Postgres events table with index events_created_at_idx, identify whether the query is using the index correctly and what to fix if not.

Input (the plan output).

QUERY PLAN
-----------------------------------------------------------------------------
 Aggregate  (cost=12340.12..12340.13 rows=1 width=8)
            (actual time=84.121..84.121 rows=1 loops=1)
   Buffers: shared hit=4123 read=58
   ->  Index Scan using events_created_at_idx on events
       (cost=0.43..12134.65 rows=82188 width=4)
       (actual time=0.018..70.301 rows=81342 loops=1)
         Index Cond: ((created_at >= '2026-05-01 00:00:00'::timestamp)
                      AND (created_at <= '2026-05-07 00:00:00'::timestamp))
         Buffers: shared hit=4123 read=58
 Planning Time: 0.121 ms
 Execution Time: 84.158 ms
Enter fullscreen mode Exit fullscreen mode

Code (how to run the analysis yourself).

EXPLAIN (ANALYZE, BUFFERS)
SELECT COUNT(*)
FROM events
WHERE created_at BETWEEN '2026-05-01' AND '2026-05-07';
Enter fullscreen mode Exit fullscreen mode

Step-by-step explanation.

  1. Node type — Index Scan using events_created_at_idx. Good. The optimiser picked the B-tree on created_at and used it as a range scan. If this said Seq Scan, the column has either no index, a function on the column, or the planner thought a scan was cheaper than the index (statistics issue).
  2. Index Cond vs Filter. The BETWEEN appears under Index Cond — meaning the index narrowed the scan to the matching keys before any rows were fetched. If it appeared under Filter, the index would have been used for some other column and the BETWEEN would be a post-filter (much more I/O).
  3. Planned vs actual rows — 82188 vs 81342. Within 1%; statistics are fresh. A 10×+ miss here would suggest running ANALYZE events to refresh the planner's row estimates.
  4. Buffers — hit=4123 read=58. Most of the index + heap pages were already in the shared buffer cache; 58 pages came from disk. A high read count on repeated runs is a sign the table is too cold for the cache; consider preloading or scaling the cache.
  5. The time-truncation pitfall is still present. The Index Cond shows <= '2026-05-07 00:00:00' — confirming the BETWEEN dropped 23h 59m of May 7. The index is doing its job perfectly; the query is just wrong. Fixing the query to >= '2026-05-01' AND < '2026-05-08' preserves the index plan and corrects the boundary.

Output — the diagnostic table.

Signal Value (this plan) Interpretation
Node type Index Scan Good — index is being used
Index name events_created_at_idx The B-tree on created_at
Index Cond contains BETWEEN? yes Index is narrowing the scan
Rows planned vs actual 82188 vs 81342 Statistics fresh (within 1%)
Buffers cache hit ratio 4123 / 4181 = 98.6% Working set is in cache
Time-truncation pitfall in Index Cond? yes — <= '2026-05-07 00:00:00' Query has the date BETWEEN bug

Rule of thumb. For every range query that matters, run EXPLAIN (ANALYZE, BUFFERS) once. If you see Index Scan with the BETWEEN under Index Cond, the optimiser is doing its part — but verify the boundary is correct independently. The plan won't tell you that the upper bound is silently dropping 23 hours of data.

SQL interview question on partition pruning + range filter

A staff-level probe: "Our events table is 8 TB, partitioned by event_date daily. The analytics team writes WHERE date_trunc('day', event_ts) BETWEEN '2026-05-01' AND '2026-05-07'. Their dashboards are slow. What's wrong, and what would you rewrite to?"

Solution Using a half-open range without date_trunc on the column

-- ❌ The problem — date_trunc on the column disables partition pruning
SELECT event_type, COUNT(*)
FROM events
WHERE date_trunc('day', event_ts) BETWEEN '2026-05-01' AND '2026-05-07'
GROUP BY event_type;

-- ✅ The fix — half-open range on the raw partition key
SELECT event_type, COUNT(*)
FROM events
WHERE event_ts >= '2026-05-01'
  AND event_ts <  '2026-05-08'
GROUP BY event_type;
Enter fullscreen mode Exit fullscreen mode

Step-by-step trace.

Step What happens (broken query) What happens (fixed query)
1 Planner sees date_trunc('day', event_ts) Planner sees event_ts >= ? AND event_ts < ?
2 Cannot match expression to partition key Matches partition key (event_ts) directly
3 Falls back to scanning every daily partition (potentially years of data) Prunes to exactly the 7 daily partitions for May 1–7
4 I/O = full table read (~ 8 TB) I/O = 7 daily partitions (~ 7 × 22 GB)
5 Latency = 5–10 minutes Latency = 5–15 seconds
6 The BETWEEN endpoint '2026-05-07' would also drop 23h 59m of May 7 in the broken form even after pruning — the same time-truncation pitfall reappears at the application layer Half-open < '2026-05-08' captures all of May 7

Output:

Query form Partitions scanned I/O p95 latency
date_trunc(event_ts) BETWEEN … all (no pruning) ~ 8 TB 5–10 min
event_ts >= … AND event_ts < … 7 (pruned) ~ 154 GB 5–15 s

Why this works — concept by concept:

  • Partition pruning depends on a literal predicate on the partition key — the planner can match event_ts >= constant to the partition's RANGE (event_ts) spec at plan time. date_trunc('day', event_ts) is opaque — the planner has no rule that says "this expression is bounded by the same range as event_ts."
  • Half-open avoids the time-truncation pitfall a second time — even if partition pruning worked, BETWEEN '2026-05-01' AND '2026-05-07' would still drop 23h 59m of May 7. The half-open < '2026-05-08' is correct on both axes.
  • No function on the column is the universal rule — applies to B-tree (kills index), BRIN (kills the min/max summary match), and partition pruning (kills the partition match). Keep functions off the column.
  • Same lesson, three engines — BigQuery's "partition pruning" and Snowflake's "automatic clustering" follow the identical rule: filter on the raw partition / cluster column, not a transformed expression.
  • Cost — pruning reduces I/O by 1 / num_partitions × scanned_partitions and reduces latency proportionally. On a daily-partitioned table with 365 partitions and a 7-day filter, the speed-up is ~ 50×.

SQL
Topic — date functions (SQL)
Partitioned date-range performance problems (SQL)

Practice →


Cheat sheet — interval recipes for production

Pick the recipe by the intent expressed in the requirement, not by the keyword the requirement happened to use. "All of January" is almost always a half-open monthly window, even if the spec says BETWEEN.

  • Last 7 days (rolling, including now). created_at >= NOW() - INTERVAL '7 days' AND created_at < NOW() — half-open, anchored to wall-clock.
  • This calendar month, dynamic. created_at >= date_trunc('month', NOW()) AND created_at < date_trunc('month', NOW() + INTERVAL '1 month').
  • Specific calendar month (Jan 2026). created_at >= '2026-01-01' AND created_at < '2026-02-01'. Same shape, fixed endpoints.
  • Year-to-date. created_at >= date_trunc('year', NOW()) AND created_at < NOW().
  • Inclusive numeric range (e.g. test scores 60–100). score BETWEEN 60 AND 100 — the closed-closed shape is what the human spec usually wants for grading thresholds.
  • Half-open numeric range (e.g. price tier [100, 200)). price >= 100 AND price < 200 — for back-to-back tiers ([100,200), [200,300), …) without overlap.
  • Tolerance-aware FLOAT compare. value >= target - 1e-9 AND value <= target + 1e-9 — when the column is locked to FLOAT and you need approximate equality.
  • Interval overlap test (booking conflict). (a_start, a_end) OVERLAPS (b_start, b_end) (Postgres / Oracle) or the portable a_start < b_end AND b_start < a_end.
  • First-class interval column (Postgres). tstzrange('2026-01-01', '2026-02-01', '[)') — store the closed-open marker in the value itself; query with @>, &&, -|- operators backed by a GiST index.
  • Last-N-minutes for streaming dashboards. event_ts >= NOW() - INTERVAL '5 minutes' AND event_ts < NOW(). Paired with BRIN on event_ts + daily partitions, sub-second on TB-scale.
  • Numeric BETWEEN on a calculated column. WHERE price BETWEEN cost * 0.9 AND cost * 1.1 — both sides per-row, still index-friendly on price.
  • String BETWEEN. name BETWEEN 'A' AND 'M' — lexicographic, case-sensitive; for case-insensitive use LOWER(name) BETWEEN 'a' AND 'm' with a functional index on LOWER(name).

A small heuristic for the interview-round version of the question: when the interviewer says "between dates X and Y," ask "inclusive of Y, or up-to-but-not-including Y?" In 90% of real production requirements the answer is up-to-but-not-including — and the half-open form is the right one.

Frequently asked questions

Is SQL BETWEEN inclusive or exclusive?

BETWEEN is inclusive on both ends in every SQL dialect — x BETWEEN a AND b is exactly x >= a AND x <= b. This is fixed by the ANSI SQL standard; Postgres, MySQL, SQL Server, Oracle, BigQuery, Snowflake, and SQLite all agree. If you need to exclude one of the endpoints, rewrite to half-open: x >= a AND x < b (closed-open, the production default), x > a AND x <= b (open-closed), or x > a AND x < b (open-open). The optimiser compiles all four forms to the same B-tree range scan, so the choice is purely about correctness.

Why do my date BETWEEN queries miss records on the last day?

Because BETWEEN '2026-01-01' AND '2026-01-31' over a TIMESTAMP column is column >= '2026-01-01 00:00:00' AND column <= '2026-01-31 00:00:00' — the right-hand date literal is implicitly cast to midnight on Jan 31, so every row from 00:00:01 through 23:59:59.999999 of that day is silently excluded. This is the #1 production query bug in the entire range-query family. The fix is either to cast the column to DATE (slow on large tables) or — preferred — use a half-open interval: column >= '2026-01-01' AND column < '2026-02-01'. The half-open form is correct regardless of whether the column is DATE, TIMESTAMP, or TIMESTAMPTZ.

Is BETWEEN slower than >= AND <=?

No — they are bit-for-bit identical at the plan level. Every modern SQL optimiser unfolds x BETWEEN a AND b to x >= a AND x <= b before planning, so the choice is a readability and convention decision, not a performance one. The performance question that does matter is whether a function or CAST has been applied to the column — WHERE created_at::date BETWEEN ... disables B-tree index usage even though BETWEEN itself is free. Keep functions off the column, then BETWEEN and >= AND <= are interchangeable.

Can BETWEEN use an index?

Yes — WHERE indexed_col BETWEEN a AND b is a textbook B-tree range scan with cost O(log N + K), where N is the table size and K is the matching row count. For very large, time-ordered, append-only tables (think 100M+ rows), evaluate a BRIN (block-range index) on the timestamp column — it's often 1000× smaller than the B-tree and faster on "recent window" scans because of sequential I/O. For declaratively partitioned tables (Postgres native partitioning, BigQuery, Snowflake), a BETWEEN on the partition key triggers partition pruning — the planner skips every partition that doesn't overlap the predicate. All three strategies require the WHERE clause to reference the raw column, not a transformed expression.

Does BETWEEN work with TIMESTAMP and TIMESTAMPTZ?

Yes, but with two caveats. First, BETWEEN over a TIMESTAMP column is inclusive on both ends, so the same time-truncation pitfall applies: BETWEEN '2026-01-01' AND '2026-01-31' will silently drop 23h 59m of Jan 31. Use a half-open interval (>= '2026-01-01' AND < '2026-02-01') instead. Second, with TIMESTAMPTZ the right-hand date literal is interpreted in the session's time zone before being converted to UTC for storage and comparison — so a '2026-01-31' in a session set to America/New_York becomes 2026-01-31 05:00:00 UTC, which can shift the boundary by a full calendar day. When timezones matter, write the boundaries as fully-qualified TIMESTAMP WITH TIME ZONE literals ('2026-01-01 00:00:00-05:00') or use date_trunc('day', col AT TIME ZONE 'America/New_York') for an explicit local-day window.

Does BETWEEN match NULL values?

No. SQL is three-valued — NULL is neither true nor false; comparing NULL against anything yields NULL (unknown). So NULL BETWEEN 1 AND 10 evaluates to NULL, which fails the WHERE clause (only rows that evaluate to true are returned). If you want to include rows where the column is NULL, write it explicitly: WHERE (x BETWEEN 1 AND 10) OR x IS NULL. The same rule applies to NOT BETWEENNULL NOT BETWEEN 1 AND 10 is also NULL, so neither BETWEEN nor NOT BETWEEN will pick up NULL rows on its own. This is a frequent interview gotcha: the candidate writes WHERE amount NOT BETWEEN 100 AND 200 to filter out a range and is surprised that NULL amounts also disappear.

Practice on PipeCode

Pipecode.ai is Leetcode for Data Engineering — every BETWEEN, half-open, and partition-pruning concept above ships with hands-on practice rooms where you run real SQL against real data sets and see exactly which rows your query missed. Start with the SQL filtering library and work outward; PipeCode pairs every reading with 450+ DE-focused problems and a real-time scoring engine.

Practice SQL filtering now →
Date-function drills →

Source: dev.to

arrow_back Back to Tutorials