ETL Testing Interview Questions & Answers — A Complete Deep-Dive Guide

python dev.to

ETL Testing Interview Questions & Answers — A Complete Deep-Dive Guide

etl testing interview questions sit at the intersection of seven testable disciplines: metadata testing and schema parity, data completeness testing and row count parity, transformation logic testing with reference tables and rule-based assertions, performance testing (concurrency, volume, stress, endurance), data reconciliation and tolerance thresholds, regression testing with snapshot diffs, and the data quality frameworks (Great Expectations, Soda Core, dbt tests, Monte Carlo) that operationalise the whole stack. Together those seven stages form the deep-guide curriculum every modern etl tester interview questions loop circles back to — and the curriculum this guide walks through, end to end, in seven numbered sections.

This is the deep-dive companion to a tighter Q&A round-up: where a 5-section interview cheat sheet covers the eight etl testing types and the four validation rule families, this guide widens the surface into seven full teaching sectionsmetadata + schema testing (the cheapest tests that catch the loudest bugs), data completeness + row count testing (raw counts, GROUP BY counts, aggregate parity, tolerance thresholds), transformation logic testing (reference tables, business-rule assertions, snapshot diffs, dbt audit_helper, datafold data-diff), performance + reconciliation + regression (the senior tester rounds), DQ frameworks (4-tool decision tree), and the ETL tester career path (behavioural questions, the etl tester to data engineer transition, certification ROI, interview-day playbook). Each section ends as etl testing questions and answers: a question, a SQL or Python snippet, a traced execution, a sample output, and a concept-by-concept why this works breakdown — the exact shape etl testing interview questions for experienced rounds reward.

When you want hands-on reps immediately after reading, browse SQL practice library →, drill data-validation problems →, sharpen ETL drills →, rehearse aggregation reconciliation patterns →, reinforce database problems →, or widen coverage on the full Python practice library →.


On this page


1. Why ETL testing is its own interview track

etl testing interview questions — a distinct discipline from QA and pipeline engineering

The one-sentence invariant: etl testing is a distinct discipline because the bugs it catches — schema drift, row drops, transform errors, reconciliation gaps, regression breaks — are silent until they hit production reports, and the assertions that catch them are data-shaped, not *code-shaped*. A senior data-engineering QA engineer is not a generalist tester with SQL skills; they are a specialist who thinks in row count parity, aggregate reconciliation, null-rate thresholds, and schema drift detection, and who automates those checks as first-class artefacts in CI / CD.

What interviewers actually score on etl tester interview questions.

  • Test-taxonomy fluency — can you map the seven stages of this guide (metadata, completeness, transformation, performance, reconciliation, regression, DQ) onto a real pipeline?
  • SQL fluency on data quality testing — can you write a COUNT(*), GROUP BY, SUM, and FULL OUTER JOIN check in 60 seconds, on paper, with no editor?
  • Tooling familiarity on data quality frameworks — can you name one strength and one weakness of Great Expectations, Soda Core, dbt tests, and Monte Carlo?
  • reconciliation testing mental model — given a source ledger of $1.2M and a target ledger of $1.199M, what's your investigation playbook?
  • regression testing discipline — when you change a transform, what tests do you re-run, and why?
  • Production-safety patternsidempotency, dead-letter queues, late-arriving data, slowly changing dimensions — do you handle them in test logic or only in code?

The 7-stage map this guide walks through.

  • Stage 1 — metadata + schema testing — column existence, type match, nullability, PK / FK preserved, information_schema queries, schema-drift detection.
  • Stage 2 — data completeness testing — raw row counts, GROUP BY row counts, aggregate parity (MIN / MAX / SUM / AVG), tolerance thresholds, FULL OUTER JOIN technique.
  • Stage 3 — transformation logic testing — reference-table comparison, rule-based assertions, snapshot diffs, dbt audit_helper, datafold data-diff.
  • Stage 4 — performance testing — concurrency, volume, stress, endurance, SLA gates.
  • Stage 5 — data reconciliation — source vs target ledger, tolerance bands, drilldown queries.
  • Stage 6 — regression testing — snapshot-based comparison, PR-triggered re-runs, break-on-fail gates.
  • Stage 7 — DQ frameworksGreat Expectations, Soda Core, dbt tests, Monte Carlo — pick-the-tool decision tree.

Why this is its own interview track and not a SQL round.

  • data quality testing is not feature testing — the system under test is data, not code; assertions are statistical and set-based, not procedural.
  • The bugs are silent — a dropped row in a JOIN does not throw an exception; only a COUNT(*) parity test surfaces it.
  • schema drift propagates — an upstream column rename breaks 50 downstream models if no schema parity test runs on every PR.
  • reconciliation is the senior round — junior testers run COUNT(*) parity; senior testers reconcile SUM(amount) GROUP BY region and explain a $1k delta.
  • regression testing is the discipline gate — promoting a transform without re-running the prior suite is the bug that gets you fired; the discipline of always re-running the suite is the senior signal interviewers chase.

Worked example — map a single ETL bug onto all seven stages

Detailed explanation. Real interviews probe whether you can think across stages on a single bug. Below is a canonical failure mode — a daily orders load drops every row where currency IS NULL — and how it surfaces (or escapes) at each of the seven stages.

Question. A daily load of orders from the OLTP source into the fact_orders warehouse table silently drops every row where currency IS NULL because of an INNER JOIN against a dim_currency lookup. Which of the seven test stages catches it, and how?

Input. Source orders has 1,234,567 rows today. After load, fact_orders has 1,231,402 rows — 3,165 missing. The bug only surfaces in next-day reconciliation when finance notices revenue is $48k below the source ledger.

Code.

-- Stage 2 — Completeness check that catches it
SELECT
    (SELECT COUNT(*) FROM stg_orders)  AS source_rows,
    (SELECT COUNT(*) FROM fact_orders) AS target_rows,
    (SELECT COUNT(*) FROM stg_orders)
    - (SELECT COUNT(*) FROM fact_orders) AS missing_rows;
Enter fullscreen mode Exit fullscreen mode

Step-by-step explanation.

  1. Stage 1 (metadata) — does not catch it; the schema is unchanged.
  2. Stage 2 (completeness)catches it the moment source_rows != target_rows; the raw row-count parity test fires.
  3. Stage 3 (transformation logic)catches it if a LEFT JOIN snapshot diff is run against yesterday's fact_orders; the 3,165 missing rows show up as removed.
  4. Stage 4 (performance) — does not catch it; the load actually completes faster because it processes fewer rows.
  5. Stage 5 (reconciliation)catches it when source SUM(amount) = $1.200M and target SUM(amount) = $1.152M; the $48k delta exceeds the 0.01% tolerance band.
  6. Stage 6 (regression)catches it if a prior PR introduced the bad INNER JOIN; snapshot diff against the prior main branch fires on the row-count delta.
  7. Stage 7 (DQ framework)dbt test row_count_match or great_expectations expect_table_row_count_to_equal_other_table codifies the assertion so it runs on every load.

Output (the completeness query's result).

source_rows target_rows missing_rows
1234567 1231402 3165

Rule of thumb: one bug touches multiple stages. Senior testers think across the seven stages on every failure; junior testers fix the one stage that fired and miss the others.

etl testing interview questions for experienced — the four senior signals

Signal 1 — opinionated tooling choices. Senior testers do not say "all four DQ tools are good"; they say "I use dbt tests for column-level invariants because they live next to the model, Great Expectations for distribution checks because the 40+ built-in expectations are unmatched, and Monte Carlo for cross-cutting freshness alerting on > 100 tables."

Signal 2 — tolerance thresholds, not exact equality. Junior testers assert source_total = target_total. Senior testers assert ABS(source_total - target_total) / source_total < 0.0001 because floating-point arithmetic, late-arriving data, and rounding all introduce noise.

Signal 3 — regression-aware change discipline. Senior testers do not ship a transformation change without re-running the prior test suite on the new output; the discipline is every PR re-runs the full prior suite plus the new tests, and the CI gate is break-on-fail.

Signal 4 — incident reasoning, not just incident detection. When a test fails, junior testers report "the test failed". Senior testers report "the test failed because upstream dim_currency was repointed to a new source; here's the rollback PR, here's the 15-minute fix, and here's the new test that would have caught it one stage earlier".

SQL
Topic — etl
ETL testing drills

Practice →

SQL
Topic — data-validation
Data validation practice

Practice →

Solution Using a 7-stage test-coverage matrix

Code.

-- One canonical coverage matrix — every row maps a stage to a test artefact.
CREATE TABLE etl_test_coverage AS
SELECT * FROM (VALUES
    (1, 'metadata',        'schema_parity_orders',         'information_schema.columns',          'every PR'),
    (2, 'completeness',    'row_count_orders',             'COUNT(*) source vs target',           'every load'),
    (2, 'completeness',    'group_by_count_orders_region', 'COUNT(*) GROUP BY region',            'every load'),
    (3, 'transformation',  'fact_orders_rule_total',       'order_total = unit_price * qty - discount', 'every load'),
    (4, 'performance',     'fact_orders_sla',              'load_seconds < 1800',                 'every load'),
    (5, 'reconciliation',  'fact_orders_amount_recon',     'ABS(SUM(amount) src - tgt) / src < 0.0001', 'every load'),
    (6, 'regression',      'fact_orders_snapshot_diff',    'datafold diff vs prior snapshot',     'every PR'),
    (7, 'dq_framework',    'fact_orders_ge_suite',         'great_expectations expectation_suite','every load')
) AS t(stage_id, stage_name, test_name, assertion, cadence);
Enter fullscreen mode Exit fullscreen mode

Step-by-step trace.

stage_id stage_name test_name assertion cadence
1 metadata schema_parity_orders information_schema.columns every PR
2 completeness row_count_orders COUNT(*) source vs target every load
2 completeness group_by_count_orders_region COUNT(*) GROUP BY region every load
3 transformation fact_orders_rule_total order_total = unit_price * qty - discount every load
4 performance fact_orders_sla load_seconds < 1800 every load
5 reconciliation fact_orders_amount_recon ABS(SUM(amount) src - tgt) / src < 0.0001 every load
6 regression fact_orders_snapshot_diff datafold diff vs prior snapshot every PR
7 dq_framework fact_orders_ge_suite great_expectations expectation_suite every load
  1. Row 1 — metadata runs on every PR; it's the cheapest gate and the first one to fire when an upstream column is renamed.
  2. Rows 2-3 — completeness runs on every load; raw counts plus GROUP BY counts together catch missing partitions and skewed transforms.
  3. Row 4 — transformation codifies one business rule; the same model has 5-15 rule assertions in production.
  4. Row 5 — performance gates on a 30-minute SLA; volume-load failures show up here long before reconciliation does.
  5. Row 6 — reconciliation is the senior assertion; a 0.0001 tolerance band tolerates floating-point noise but flags real $100+ deltas.
  6. Row 7 — regression runs on every PR via datafold data-diff; it catches transformation drift that completeness alone misses.
  7. Row 8 — the DQ framework wraps every assertion so the cadence and ownership are uniform.

Output.

stage_id stage_name test_name cadence
1 metadata schema_parity_orders every PR
2 completeness row_count_orders every load
3 transformation fact_orders_rule_total every load
4 performance fact_orders_sla every load
5 reconciliation fact_orders_amount_recon every load
6 regression fact_orders_snapshot_diff every PR
7 dq_framework fact_orders_ge_suite every load

Why this works — concept by concept:

  • Stage coverage matrix — turns the 7-stage map into an auditable artefact; every test is owned by exactly one stage, so you can talk to coverage gaps in one query.
  • Cadence binding — pairs each test with its run cadence (every PR vs every load); senior testers explicitly assign cadence per assertion, not "run everything always".
  • Assertion column — codifies the predicate in plain SQL / English; interviewers love a tester who can recite the predicate, not just the test name.
  • Tolerance bands — reconciliation row uses ABS(...) / src < 0.0001, never raw equality; this single decision separates senior testers from junior ones.
  • CostO(1) to read the coverage matrix; the actual tests are O(N) over the underlying tables but parallelisable across the seven stages.

2. Metadata + schema testing — the cheapest tests that catch the loudest bugs

metadata testing and schema parity — four invariants that gate every PR

metadata testing is the cheapest test you can write and the loudest bug-catcher in the stack: a 10-line query against information_schema.columns runs in milliseconds, and a single missing column or changed type is the bug that breaks fifty downstream dashboards. Every ETL pipeline should gate every PR on four invariants — column existence, type match, nullability, and PK / FK preservation — and schema drift detection is the discipline of running those four checks on every load.

The four schema parity invariants.

  • column existence — every column in the source contract exists in the target with the same name; renames break downstream SQL silently.
  • type matchINT → INT, VARCHAR(50) → VARCHAR(50), TIMESTAMP → TIMESTAMP; an implicit INT → VARCHAR cast loses ordering, aggregations, and date arithmetic.
  • nullability match — if the source contract says customer_id NOT NULL and the target stores customer_id NULL, downstream joins will silently drop rows.
  • PK / FK preservation — primary keys must be unique and non-null in the target; foreign keys must reference an existing key in the parent table.

information_schema.columns — the standard-library inspection query.

-- Source vs target schema parity
WITH src AS (
    SELECT column_name, data_type, is_nullable
    FROM information_schema.columns
    WHERE table_schema = 'staging' AND table_name = 'orders'
), tgt AS (
    SELECT column_name, data_type, is_nullable
    FROM information_schema.columns
    WHERE table_schema = 'warehouse' AND table_name = 'fact_orders'
)
SELECT
    COALESCE(s.column_name, t.column_name) AS column_name,
    s.data_type   AS src_type, t.data_type   AS tgt_type,
    s.is_nullable AS src_null, t.is_nullable AS tgt_null,
    CASE
        WHEN s.column_name IS NULL THEN 'missing in source'
        WHEN t.column_name IS NULL THEN 'missing in target'
        WHEN s.data_type   != t.data_type   THEN 'type mismatch'
        WHEN s.is_nullable != t.is_nullable THEN 'nullability mismatch'
        ELSE 'ok'
    END AS verdict
FROM src s
FULL OUTER JOIN tgt t USING (column_name)
WHERE verdict != 'ok';
Enter fullscreen mode Exit fullscreen mode
  • FULL OUTER JOIN — surfaces both missing-in-source and missing-in-target in one pass.
  • USING (column_name) — equi-join on the column name only; PostgreSQL syntax.
  • verdict CASE — single-column summary; one row per anomaly.
  • Empty result = pass — zero rows means schema parity holds.

schema drift detection in production.

  • Snapshot the schema — persist a schemas table keyed by (table_name, snapshot_ts, column_name, data_type, is_nullable) on every load.
  • Diff against yesterdayLEFT JOIN today's snapshot against yesterday's; non-matching rows are drift events.
  • Alert on drift — page on-call for any unexpected schema change; whitelist intentional changes via PR-tracked allowlist.
  • The 2 a.m. bug — an upstream OLTP team renames cust_id → customer_id on a Friday; without drift detection, your Monday dashboards are wrong and finance is upset.

Worked example — write the 4-rule schema parity loop in one SQL block

Detailed explanation. Real interviews ask you to write the full schema parity check from scratch, not just the row-count one. Below is the canonical four-rule loop that fits on a whiteboard.

Question. Write a single SQL query that surfaces every column-level violation between staging.orders and warehouse.fact_orders for all four schema parity invariants (column existence, type match, nullability match, PK preservation).

Input. Two information_schema.columns row-sets plus two information_schema.table_constraints row-sets for PK introspection.

Code.

WITH src_cols AS (
    SELECT column_name, data_type, is_nullable
    FROM information_schema.columns
    WHERE table_schema = 'staging' AND table_name = 'orders'
), tgt_cols AS (
    SELECT column_name, data_type, is_nullable
    FROM information_schema.columns
    WHERE table_schema = 'warehouse' AND table_name = 'fact_orders'
), col_diff AS (
    SELECT
        COALESCE(s.column_name, t.column_name) AS column_name,
        CASE
            WHEN s.column_name IS NULL THEN 'missing_in_source'
            WHEN t.column_name IS NULL THEN 'missing_in_target'
            WHEN s.data_type   != t.data_type   THEN 'type_mismatch'
            WHEN s.is_nullable != t.is_nullable THEN 'nullability_mismatch'
            ELSE 'ok'
        END AS verdict
    FROM src_cols s
    FULL OUTER JOIN tgt_cols t USING (column_name)
), pk_diff AS (
    SELECT 'pk_count_mismatch' AS verdict
    WHERE (
        SELECT COUNT(*) FROM information_schema.key_column_usage
        WHERE table_name = 'orders' AND constraint_name LIKE '%pkey%'
    ) != (
        SELECT COUNT(*) FROM information_schema.key_column_usage
        WHERE table_name = 'fact_orders' AND constraint_name LIKE '%pkey%'
    )
)
SELECT column_name, verdict FROM col_diff WHERE verdict != 'ok'
UNION ALL
SELECT NULL, verdict FROM pk_diff;
Enter fullscreen mode Exit fullscreen mode

Step-by-step explanation.

  1. src_cols and tgt_cols materialise the source and target column metadata.
  2. col_diff FULL OUTER JOINs them on column_name and emits one of five verdicts per column.
  3. pk_diff separately compares PK column counts via information_schema.key_column_usage.
  4. The final UNION ALL returns one row per violation across all four invariants.
  5. An empty result set = pass; any rows = block the PR until reconciled.

Output (when a single type mismatch exists).

column_name verdict
amount type_mismatch

Rule of thumb: schema parity is the first test that runs in CI and the cheapest one to write — never skip it. Junior testers forget to add it; senior testers refuse to merge without it.

schema drift detection in modern stacks — dbt, Great Expectations, datafold

  • dbt source freshness + dbt testdbt source freshness flags stale upstreams; column-level tests (not_null, unique, accepted_values, relationships) run on every dbt build.
  • Great Expectationsexpect_table_columns_to_match_ordered_list — codifies the column contract so any added or removed column fails the suite.
  • datafold schema diff — surfaces schema deltas as part of every PR review; visualises type and nullability changes in the GitHub UI.
  • Monte Carlo schema observability — auto-detects schema changes across hundreds of tables; alerts on unexpected drift without you writing the check.

SQL
Topic — database
Database / schema drills

Practice →

SQL
Topic — data-validation
Schema validation practice

Practice →

Solution Using information_schema.columns + a frozen contract table

Code.

-- Freeze the contract in a versioned table; diff every load against it.
CREATE TABLE schema_contract_fact_orders AS
SELECT * FROM (VALUES
    ('order_id',    'integer',          'NO'),
    ('customer_id', 'integer',          'NO'),
    ('region',      'character varying','NO'),
    ('amount',      'numeric',          'NO'),
    ('order_ts',    'timestamp with time zone','NO')
) AS t(column_name, data_type, is_nullable);

SELECT
    COALESCE(c.column_name, a.column_name) AS column_name,
    c.data_type   AS contract_type, a.data_type   AS actual_type,
    c.is_nullable AS contract_null, a.is_nullable AS actual_null,
    CASE
        WHEN a.column_name IS NULL THEN 'missing_in_actual'
        WHEN c.column_name IS NULL THEN 'unexpected_in_actual'
        WHEN c.data_type   != a.data_type   THEN 'type_drift'
        WHEN c.is_nullable != a.is_nullable THEN 'nullability_drift'
        ELSE 'ok'
    END AS verdict
FROM schema_contract_fact_orders c
FULL OUTER JOIN (
    SELECT column_name, data_type, is_nullable
    FROM information_schema.columns
    WHERE table_schema = 'warehouse' AND table_name = 'fact_orders'
) a USING (column_name)
WHERE verdict != 'ok';
Enter fullscreen mode Exit fullscreen mode

Step-by-step trace.

column_name contract_type actual_type verdict
amount numeric character varying type_drift
  1. schema_contract_fact_orders is the frozen contract — checked into git, owned by the test suite, never auto-updated.
  2. The FULL OUTER JOIN against information_schema.columns surfaces every drift between contract and reality.
  3. The verdict column emits exactly one drift type per row; empty result = pass.
  4. Any drift row fails the CI gate and pages the owning team.

Output.

column_name contract_type actual_type contract_null actual_null verdict
amount numeric character varying NO NO type_drift

Why this works — concept by concept:

  • Frozen contract — the contract is a git artefact, not a probe; this makes drift a binary, auditable event rather than a fuzzy comparison.
  • FULL OUTER JOIN — catches both missing-in-actual (column dropped) and unexpected-in-actual (column added) in one pass; LEFT JOIN alone would miss additions.
  • Verdict enum — one of five strings, never a free-text reason; downstream alerting can WHERE verdict = 'type_drift' and route accordingly.
  • Nullability matters — flipping a column from NOT NULL to NULL is silent in most ORMs; the contract catches it before downstream joins drop rows.
  • CostO(C) where C = column count (typically 10-100); runs in milliseconds and is the cheapest gate in the suite.

3. Data completeness + row-count testing — count parity, group-by counts, aggregate parity

data completeness testing — three test families, one tolerance band

data completeness testing is the bedrock of ETL QA: every source row that should land in the target must actually land, and every aggregate computed downstream must match the source. There are exactly three test families — raw row count parity, GROUP BY row counts, and aggregate parity (MIN, MAX, SUM, AVG) — and one universal tolerance pattern that wraps all three.

Family 1 — row count parity.

SELECT
    (SELECT COUNT(*) FROM staging.orders)        AS src_rows,
    (SELECT COUNT(*) FROM warehouse.fact_orders) AS tgt_rows,
    (SELECT COUNT(*) FROM staging.orders)
    - (SELECT COUNT(*) FROM warehouse.fact_orders) AS delta;
Enter fullscreen mode Exit fullscreen mode
  • The cheapest test in the suite — runs in milliseconds even on billion-row tables (count metadata is cached).
  • Catches the loudest bug — a LEFT JOIN that should be INNER JOIN and vice versa, a missing partition, a WHERE predicate that drops rows.
  • Tolerance — for non-incremental loads, delta = 0 is the gate; for incremental loads, delta = ROWS_IN_INCREMENT is the gate.

Family 2 — GROUP BY row counts.

SELECT
    s.region,
    COUNT(*)             AS src_rows,
    COALESCE(t.cnt, 0)   AS tgt_rows,
    COUNT(*) - COALESCE(t.cnt, 0) AS delta
FROM staging.orders s
LEFT JOIN (
    SELECT region, COUNT(*) AS cnt
    FROM warehouse.fact_orders
    GROUP BY region
) t USING (region)
GROUP BY s.region, t.cnt
HAVING COUNT(*) - COALESCE(t.cnt, 0) != 0;
Enter fullscreen mode Exit fullscreen mode
  • Catches skewed transforms — a transform that drops only EU rows passes COUNT(*) parity within tolerance if EU is small, but fails per-region.
  • Run on every grouping dimensionregion, currency, status, partition_date; one query per dimension is fine.
  • Empty result = pass — only mismatched groups show up.

Family 3 — aggregate parity (MIN, MAX, SUM, AVG).

SELECT
    'amount' AS metric,
    (SELECT SUM(amount)::numeric(18,4) FROM staging.orders)        AS src_sum,
    (SELECT SUM(amount)::numeric(18,4) FROM warehouse.fact_orders) AS tgt_sum,
    ABS(
        (SELECT SUM(amount) FROM staging.orders)
        - (SELECT SUM(amount) FROM warehouse.fact_orders)
    ) /
    NULLIF((SELECT SUM(amount) FROM staging.orders), 0) AS rel_delta;
Enter fullscreen mode Exit fullscreen mode
  • Catches arithmetic transform bugs — a CAST from numeric(18,4) to numeric(18,2) quietly truncates pennies; SUM parity catches it.
  • Tolerance bandrel_delta < 0.0001 (0.01%) is the canonical pass band; tighter for finance, looser for clickstream.
  • Run for every numeric columnMIN, MAX, SUM, AVG, and COUNT(DISTINCT) per column.
  • NULLIF(...) guards — protects against divide-by-zero when the source is empty.

The FULL OUTER JOIN reconciliation technique.

SELECT
    COALESCE(s.order_id, t.order_id) AS order_id,
    CASE
        WHEN t.order_id IS NULL THEN 'missing_in_target'
        WHEN s.order_id IS NULL THEN 'unexpected_in_target'
        WHEN s.amount  != t.amount  THEN 'amount_mismatch'
        ELSE 'ok'
    END AS verdict
FROM staging.orders s
FULL OUTER JOIN warehouse.fact_orders t USING (order_id)
WHERE verdict != 'ok';
Enter fullscreen mode Exit fullscreen mode
  • Surfaces every diverging row — not just the count, but the actual order_ids.
  • Catches all three failure modes in one query — missing rows, extra rows, and value mismatches.
  • COALESCE(s.order_id, t.order_id) — works whether the row exists in source, target, or both.
  • The investigation query — when row_count parity fails, this is the second query you run.

Worked example — assemble the three completeness families into one SQL block

Detailed explanation. A real interview asks you to write the full completeness loop in one go, not just one family. Below is the canonical block that combines raw count, group-by count, and aggregate parity into a single CI gate.

Question. Write a single CI-friendly SQL block that gates fact_orders on (a) raw row count parity, (b) per-region GROUP BY row counts, and (c) SUM(amount) parity within a 0.01% tolerance.

Input. staging.orders (1,234,567 rows; SUM(amount) = 12,500,000.00) and warehouse.fact_orders (1,234,567 rows; SUM(amount) = 12,500,000.00).

Code.

WITH counts AS (
    SELECT
        (SELECT COUNT(*) FROM staging.orders)        AS src,
        (SELECT COUNT(*) FROM warehouse.fact_orders) AS tgt
), grp AS (
    SELECT region, COUNT(*) AS delta
    FROM (
        SELECT region, COUNT(*) AS cnt FROM staging.orders        GROUP BY region
    ) s
    FULL OUTER JOIN (
        SELECT region, COUNT(*) AS cnt FROM warehouse.fact_orders GROUP BY region
    ) t USING (region)
    WHERE s.cnt IS DISTINCT FROM t.cnt
    GROUP BY region
), amt AS (
    SELECT
        ABS(
            (SELECT SUM(amount) FROM staging.orders)
            - (SELECT SUM(amount) FROM warehouse.fact_orders)
        ) /
        NULLIF((SELECT SUM(amount) FROM staging.orders), 0) AS rel_delta
)
SELECT
    CASE WHEN (SELECT src FROM counts) != (SELECT tgt FROM counts) THEN 'FAIL row_count'
         WHEN (SELECT COUNT(*) FROM grp) > 0                       THEN 'FAIL group_by_count'
         WHEN (SELECT rel_delta FROM amt) > 0.0001                 THEN 'FAIL sum_amount'
         ELSE 'PASS'
    END AS verdict;
Enter fullscreen mode Exit fullscreen mode

Step-by-step explanation.

  1. counts materialises source and target row counts side by side.
  2. grp materialises every region whose per-group count diverges.
  3. amt materialises the relative delta of SUM(amount) against tolerance.
  4. The final SELECT collapses the three checks into a single PASS / FAIL verdict.
  5. The CI gate is WHERE verdict = 'PASS' — non-pass blocks the PR.

Output (when all three pass).

verdict
PASS

Rule of thumb: run all three families on every load; raw count is fast, GROUP BY count is slightly slower, aggregate parity is the most expensive but the most informative.

data completeness testing — the four senior nuances

  • tolerance threshold selection0.0001 (0.01%) for finance, 0.001 (0.1%) for marketing, 0.01 (1%) for clickstream; never raw equality on floating-point.
  • late-arriving data — incremental loads where today's count includes yesterday's late rows; the gate is delta IN (expected_set), not delta = 0.
  • NULL handling in COUNTCOUNT(*) counts all rows, COUNT(col) skips NULLs; senior testers state which one they mean.
  • incremental vs full-refresh — incremental loads gate on partition_count = expected_partition_count, not on total rows.

SQL
Topic — aggregation
Aggregation parity drills

Practice →

SQL
Topic — etl
ETL completeness practice

Practice →

Solution Using a FULL OUTER JOIN + tolerance band

Code.

WITH src AS (
    SELECT order_id, amount FROM staging.orders
), tgt AS (
    SELECT order_id, amount FROM warehouse.fact_orders
), diff AS (
    SELECT
        COALESCE(s.order_id, t.order_id) AS order_id,
        s.amount AS src_amount,
        t.amount AS tgt_amount,
        CASE
            WHEN t.order_id IS NULL THEN 'missing_in_target'
            WHEN s.order_id IS NULL THEN 'unexpected_in_target'
            WHEN ABS(s.amount - t.amount) / NULLIF(ABS(s.amount), 0) > 0.0001
                                    THEN 'amount_drift'
            ELSE 'ok'
        END AS verdict
    FROM src s
    FULL OUTER JOIN tgt t USING (order_id)
)
SELECT verdict, COUNT(*) AS n
FROM diff
WHERE verdict != 'ok'
GROUP BY verdict;
Enter fullscreen mode Exit fullscreen mode

Step-by-step trace.

order_id src_amount tgt_amount verdict
1001 99.50 99.50 ok
1002 250.00 NULL missing_in_target
1003 100.00 100.01 ok (within 0.01%)
1004 75.25 80.00 amount_drift
  1. The FULL OUTER JOIN walks every order_id that appears in either side.
  2. t.order_id IS NULL flags rows present in source but absent in target — the classic "row drop".
  3. s.order_id IS NULL flags rows present in target but absent in source — the classic "duplicate insert" or "stale row".
  4. ABS(s.amount - t.amount) / NULLIF(ABS(s.amount), 0) > 0.0001 flags value drift outside the 0.01% tolerance.
  5. The final GROUP BY verdict aggregates the failures by type — one row per failure class.

Output.

verdict n
missing_in_target 1
amount_drift 1

Why this works — concept by concept:

  • FULL OUTER JOIN — only operator that surfaces missing-in-source and missing-in-target and value-drift in one pass.
  • Tolerance band0.0001 relative delta is the industry-standard finance tolerance; protects against floating-point noise without masking real drift.
  • NULLIF guard/ NULLIF(ABS(s.amount), 0) prevents divide-by-zero when the source amount is zero; the row falls into "ok" if both sides are zero, "drift" otherwise.
  • Verdict enum — discrete failure classes (missing_in_target, unexpected_in_target, amount_drift) feed clean alerting routes.
  • CostO((N + M) log (N + M)) for the hash-join-or-sort plan; on 10M-row tables this completes in seconds.

4. Transformation logic testing — reference tables, rule-based assertions, snapshots

transformation logic testing — three patterns, every transform tested by at least one

transformation logic testing is the discipline that catches arithmetic and business-rule bugs that completeness tests cannot see. There are exactly three patterns — reference-table comparison (for fixed-output transforms), rule-based assertion (for business logic), and snapshot diff (for silent drift) — and every transform should be covered by at least one.

Pattern 1 — reference-table comparison.

-- expected_output is a frozen, hand-curated table of (input, expected) pairs.
SELECT
    e.input_id,
    e.expected_amount,
    a.actual_amount,
    CASE
        WHEN ABS(e.expected_amount - a.actual_amount) > 0.01
        THEN 'fail' ELSE 'pass'
    END AS verdict
FROM expected_output e
LEFT JOIN actual_output a USING (input_id)
WHERE verdict = 'fail';
Enter fullscreen mode Exit fullscreen mode
  • Use case — fixed-output transforms where every input maps to a known output (currency rounding, tax calculation, status normalisation).
  • The contractexpected_output is a hand-curated git-tracked table; updated only via PR.
  • Pass criterion — zero rows returned.
  • Failure mode caught — any transform-logic bug that changes the mapping; the strongest possible assertion when feasible.

Pattern 2 — rule-based assertion.

-- order_total must equal unit_price * qty - discount.
SELECT order_id, unit_price, qty, discount, order_total
FROM fact_orders
WHERE ABS(order_total - (unit_price * qty - discount)) > 0.01;
Enter fullscreen mode Exit fullscreen mode
  • Use case — business rules that hold for every row (order math, percentage calculations, tier assignments).
  • The assertion — a SQL predicate that every row must satisfy; any returned row is a violation.
  • Pass criterion — zero rows returned.
  • Failure mode caught — rule violations introduced by upstream changes, late-arriving columns, or transform refactors.

Pattern 3 — snapshot diff.

-- Compare today's fact_orders against yesterday's snapshot.
SELECT
    COALESCE(t.order_id, y.order_id) AS order_id,
    t.amount AS today_amount,
    y.amount AS yesterday_amount,
    CASE
        WHEN y.order_id IS NULL THEN 'new'
        WHEN t.order_id IS NULL THEN 'dropped'
        WHEN t.amount != y.amount THEN 'changed'
        ELSE 'unchanged'
    END AS verdict
FROM fact_orders t
FULL OUTER JOIN fact_orders_snapshot_yesterday y USING (order_id)
WHERE verdict IN ('dropped', 'changed');
Enter fullscreen mode Exit fullscreen mode
  • Use case — catching silent drift in transforms where neither a reference table nor a rule fully captures the logic.
  • The contract — yesterday's run is the baseline; today's run must differ only in expected ways.
  • Pass criteriondropped and changed counts are within expected bands (or zero for immutable rows).
  • Failure mode caught — silent drift, especially after upstream OLTP changes or transform-version bumps.

dbt audit_helper — the dbt-native pattern.

-- dbt model: tests/audit/fact_orders_audit.sql
{{ audit_helper.compare_relations(
    a_relation=ref('fact_orders'),
    b_relation=ref('fact_orders_old'),
    primary_key='order_id'
) }}
Enter fullscreen mode Exit fullscreen mode
  • compare_relations — compares two dbt models row-by-row, column-by-column; surfaces every diff in a single result set.
  • compare_column_values — narrower comparison, single-column.
  • compare_queries — compares two arbitrary SQL queries; useful for testing model refactors.
  • Ships with dbt-labs — install via dbt deps, configure in packages.yml, run via dbt build.

datafold data-diff — the standalone diff tool.

  • CLI tooldata-diff postgresql://... postgresql://... -k order_id -c amount.
  • PR-integrated — comments on every GitHub PR with the row-level diff.
  • Cross-database — diffs PostgreSQL ↔ Snowflake ↔ BigQuery row-by-row.
  • Reach for it — when reviewing transform refactors; the diff in the PR replaces eyeballing 50 rows.

Worked example — write the three patterns for the same fact_orders model

Detailed explanation. Real interviews want to see that you can write all three transformation tests on the same model. Below is the canonical block.

Question. Write three tests for fact_orders: (a) a reference-table comparison for the status normalisation, (b) a rule-based assertion for order_total = unit_price * qty - discount, (c) a snapshot diff against yesterday.

Input. fact_orders (today's), fact_orders_snapshot_yesterday (yesterday's), and status_reference (a frozen mapping of raw status → normalised status).

Code.

-- (a) reference-table comparison
SELECT f.order_id, f.status_normalised, r.expected
FROM fact_orders f
LEFT JOIN status_reference r ON r.raw = f.status_raw
WHERE f.status_normalised IS DISTINCT FROM r.expected;

-- (b) rule-based assertion
SELECT order_id, unit_price, qty, discount, order_total
FROM fact_orders
WHERE ABS(order_total - (unit_price * qty - discount)) > 0.01;

-- (c) snapshot diff
SELECT
    COALESCE(t.order_id, y.order_id) AS order_id,
    CASE
        WHEN y.order_id IS NULL THEN 'new'
        WHEN t.order_id IS NULL THEN 'dropped'
        WHEN t.amount != y.amount THEN 'changed'
        ELSE 'unchanged'
    END AS verdict
FROM fact_orders t
FULL OUTER JOIN fact_orders_snapshot_yesterday y USING (order_id)
WHERE verdict IN ('dropped', 'changed');
Enter fullscreen mode Exit fullscreen mode

Step-by-step explanation.

  1. (a) — joins fact_orders to a frozen reference; any row where the normalised status disagrees fails.
  2. (b) — re-derives order_total from the source columns; any row where the stored value diverges by more than 1 cent fails.
  3. (c)FULL OUTER JOIN against yesterday; surfaces every dropped, new, or changed row.
  4. Each query is a CI gate; an empty result is a pass.

Output (when all three pass).

query rows_returned verdict
(a) reference 0 PASS
(b) rule 0 PASS
(c) snapshot 0 PASS

Rule of thumb: every transform deserves at least one of the three; the strongest models have all three.

SQL
Topic — data-validation
Transformation logic drills

Practice →

SQL
Topic — sql
SQL rule-assertion practice

Practice →

Solution Using a combined rule-assertion + snapshot-diff pattern

Code.

-- Combined transformation-logic gate in one CTE chain.
WITH rule_violations AS (
    SELECT order_id, 'rule_total_mismatch' AS verdict
    FROM fact_orders
    WHERE ABS(order_total - (unit_price * qty - discount)) > 0.01
), snapshot_diffs AS (
    SELECT
        COALESCE(t.order_id, y.order_id) AS order_id,
        CASE
            WHEN y.order_id IS NULL THEN 'snapshot_new'
            WHEN t.order_id IS NULL THEN 'snapshot_dropped'
            WHEN t.amount != y.amount THEN 'snapshot_changed'
            ELSE 'unchanged'
        END AS verdict
    FROM fact_orders t
    FULL OUTER JOIN fact_orders_snapshot_yesterday y USING (order_id)
)
SELECT order_id, verdict FROM rule_violations
UNION ALL
SELECT order_id, verdict FROM snapshot_diffs
WHERE verdict != 'unchanged';
Enter fullscreen mode Exit fullscreen mode

Step-by-step trace.

order_id verdict
1004 rule_total_mismatch
1008 snapshot_changed
1010 snapshot_dropped
  1. rule_violations emits every row where the business rule fails — one row per violation.
  2. snapshot_diffs emits every row whose state diverged from yesterday — one row per drift event.
  3. The UNION ALL flattens both into a single CI artefact; downstream alerting can WHERE verdict LIKE 'snapshot_%' or WHERE verdict LIKE 'rule_%'.
  4. An empty result is the pass condition; any rows block the PR.

Output.

order_id verdict
1004 rule_total_mismatch
1008 snapshot_changed
1010 snapshot_dropped

Why this works — concept by concept:

  • Rule re-derivationorder_total = unit_price * qty - discount is re-computed from source columns; this is the strongest assertion because it doesn't trust the stored value.
  • Snapshot baseline — yesterday's snapshot is the truth until proven otherwise; today's run must be a known-good superset.
  • Verdict namespacingrule_* and snapshot_* prefixes make alert routing trivial.
  • UNION ALL — preferred over UNION here because we want every row, not deduplicated; faster and simpler.
  • CostO(N) for the rule scan, O((N + M) log (N + M)) for the snapshot join; both index-friendly on order_id.

5. Performance, reconciliation, regression — the senior tester rounds

etl performance testing + data reconciliation + regression testing — the three senior rounds

The senior tester rounds are the three test families that distinguish a junior tester from a senior one — performance testing (does the pipeline finish inside its SLA?), data reconciliation (do source and target ledgers agree within tolerance?), and regression testing (does today's run still produce yesterday's expected output?). Every modern ETL suite ships all three; the etl testing interview questions for experienced loop probes all three.

performance testing — four sub-families.

  • volume testing — load the pipeline with 10x, 100x production data; the SLA must still hold.
  • stress testing — load until something breaks; characterise the failure mode.
  • concurrency testing — run N parallel loads; surface lock contention, deadlocks, slot exhaustion.
  • endurance testing — run the pipeline continuously for 24 hours; surface memory leaks and slow degradation.

Performance gate in SQL.

-- Persist load metrics every run; gate on the SLA.
INSERT INTO pipeline_metrics (run_id, started_at, ended_at, row_count, sla_seconds)
VALUES (...);

SELECT
    run_id,
    EXTRACT(EPOCH FROM (ended_at - started_at)) AS load_seconds,
    sla_seconds,
    CASE
        WHEN EXTRACT(EPOCH FROM (ended_at - started_at)) > sla_seconds
        THEN 'FAIL' ELSE 'PASS'
    END AS verdict
FROM pipeline_metrics
WHERE run_id = :current_run;
Enter fullscreen mode Exit fullscreen mode
  • load_seconds — wall-clock time from start to end of load.
  • sla_seconds — the contract; typically 30 minutes for hourly loads, 4 hours for daily.
  • pipeline_metrics — persisted table; trend analysis over weeks shows slow regression.
  • Alert on trend — page on-call when load_seconds exceeds 0.8 * sla_seconds for 3 consecutive runs.

data reconciliation — the source ledger / target ledger pattern.

WITH src AS (
    SELECT region, SUM(amount) AS total FROM staging.orders GROUP BY region
), tgt AS (
    SELECT region, SUM(amount) AS total FROM warehouse.fact_orders GROUP BY region
)
SELECT
    COALESCE(s.region, t.region) AS region,
    s.total AS src_total,
    t.total AS tgt_total,
    ABS(s.total - t.total) AS abs_delta,
    ABS(s.total - t.total) / NULLIF(ABS(s.total), 0) AS rel_delta,
    CASE
        WHEN ABS(s.total - t.total) / NULLIF(ABS(s.total), 0) > 0.0001
        THEN 'FAIL' ELSE 'PASS'
    END AS verdict
FROM src s FULL OUTER JOIN tgt t USING (region);
Enter fullscreen mode Exit fullscreen mode
  • Source ledgerSUM(amount) GROUP BY region on staging.
  • Target ledgerSUM(amount) GROUP BY region on warehouse.
  • abs_delta — raw dollar / unit difference; useful for the alerting payload.
  • rel_delta — relative difference; the actual gate.
  • Tolerance band0.0001 (0.01%); tighter for finance, looser for analytics.

regression testing — snapshot-based comparison + PR triggers.

-- Snapshot the model on every PR merge; diff every new PR against the latest snapshot.
SELECT
    COALESCE(c.order_id, b.order_id) AS order_id,
    CASE
        WHEN b.order_id IS NULL THEN 'regression_new'
        WHEN c.order_id IS NULL THEN 'regression_dropped'
        WHEN c.amount != b.amount THEN 'regression_amount_change'
        ELSE 'unchanged'
    END AS verdict
FROM fact_orders_candidate c
FULL OUTER JOIN fact_orders_baseline b USING (order_id)
WHERE verdict != 'unchanged';
Enter fullscreen mode Exit fullscreen mode
  • fact_orders_candidate — the model output under the new PR.
  • fact_orders_baseline — the model output under main.
  • PR gate — non-empty result blocks merge; reviewer must approve the deltas explicitly.
  • break-on-fail — CI fails the build the moment regression rows appear; senior testers never override.

Worked example — assemble a performance + reconciliation + regression suite in Python

Detailed explanation. Real senior loops ask you to wire all three into a single Python harness that runs on every PR. Below is the canonical Python orchestration.

Question. Write a Python script that runs (a) the performance SLA gate, (b) the per-region reconciliation, and (c) the regression diff against main, and exits non-zero on any failure.

Input. A psycopg2 connection to the warehouse, a current run_id, and the baseline snapshot table name.

Code.

import psycopg2, sys, time

conn = psycopg2.connect(host="warehouse", database="prod")
cur = conn.cursor()
failures = []

# (a) performance SLA
cur.execute("""
    SELECT EXTRACT(EPOCH FROM (ended_at - started_at)), sla_seconds
    FROM pipeline_metrics WHERE run_id = %s
""", (run_id,))
load_s, sla_s = cur.fetchone()
if load_s > sla_s:
    failures.append(f"perf: {load_s:.0f}s > sla {sla_s}s")

# (b) reconciliation
cur.execute("""
    SELECT COUNT(*) FROM (
        SELECT region,
               ABS(SUM(s.amount) - SUM(t.amount))
               / NULLIF(ABS(SUM(s.amount)), 0) AS rel
        FROM staging.orders s
        FULL OUTER JOIN warehouse.fact_orders t USING (region)
        GROUP BY region
        HAVING ABS(SUM(s.amount) - SUM(t.amount))
               / NULLIF(ABS(SUM(s.amount)), 0) > 0.0001
    ) x
""")
recon_failures = cur.fetchone()[0]
if recon_failures > 0:
    failures.append(f"recon: {recon_failures} regions outside tolerance")

# (c) regression
cur.execute("""
    SELECT COUNT(*) FROM fact_orders_candidate c
    FULL OUTER JOIN fact_orders_baseline b USING (order_id)
    WHERE b.order_id IS NULL OR c.order_id IS NULL OR c.amount != b.amount
""")
reg_failures = cur.fetchone()[0]
if reg_failures > 0:
    failures.append(f"regression: {reg_failures} rows diverge from baseline")

if failures:
    for f in failures:
        print(f"FAIL: {f}")
    sys.exit(1)
print("ALL PASS")
Enter fullscreen mode Exit fullscreen mode

Step-by-step explanation.

  1. The script collects failures instead of short-circuiting; this surfaces all failing rounds per run, not just the first.
  2. (a) compares wall-clock against SLA; one row per run.
  3. (b) counts regions outside the 0.01% reconciliation band; one row per region.
  4. (c) counts every regression row in the snapshot diff; one row per drift.
  5. The script exits non-zero only when at least one round fails; CI gates on the exit code.

Output (when one reconciliation region fails).

FAIL: recon: 1 regions outside tolerance
exit code: 1
Enter fullscreen mode Exit fullscreen mode

Rule of thumb: always run all three rounds on every load; never short-circuit on the first failure — collect them all and report once.

SQL
Topic — aggregation
Reconciliation aggregation drills

Practice →

Python
Topic — etl
Regression + perf Python practice

Practice →

Solution Using a triple-gate harness + non-zero exit on any failure

Code.

def run_triple_gate(cur, run_id: int) -> int:
    """Returns 0 if all three rounds pass, 1 otherwise."""
    failures: list[str] = []

    # 1. Performance
    cur.execute(
        "SELECT EXTRACT(EPOCH FROM (ended_at - started_at)), sla_seconds "
        "FROM pipeline_metrics WHERE run_id = %s",
        (run_id,),
    )
    load_s, sla_s = cur.fetchone()
    if load_s > sla_s:
        failures.append(f"perf: {load_s:.0f}s > sla {sla_s}s")

    # 2. Reconciliation
    cur.execute("""
        SELECT COUNT(*) FROM (
            SELECT region
            FROM staging.orders s FULL OUTER JOIN warehouse.fact_orders t USING (region)
            GROUP BY region
            HAVING ABS(SUM(s.amount) - SUM(t.amount))
                   / NULLIF(ABS(SUM(s.amount)), 0) > 0.0001
        ) x
    """)
    if (recon := cur.fetchone()[0]) > 0:
        failures.append(f"recon: {recon} regions out of tolerance")

    # 3. Regression
    cur.execute("""
        SELECT COUNT(*) FROM fact_orders_candidate c
        FULL OUTER JOIN fact_orders_baseline b USING (order_id)
        WHERE b.order_id IS NULL OR c.order_id IS NULL OR c.amount != b.amount
    """)
    if (reg := cur.fetchone()[0]) > 0:
        failures.append(f"regression: {reg} rows diverge from baseline")

    for f in failures:
        print(f"FAIL: {f}")
    return 1 if failures else 0
Enter fullscreen mode Exit fullscreen mode

Step-by-step trace.

step check observed gate verdict
1 perf load_s=1700, sla_s=1800 load_s <= sla_s PASS
2 recon regions_failing=0 == 0 PASS
3 regression rows_diverging=3 == 0 FAIL
return non-zero failures = [regression] exit 1
  1. Step 1 reads pipeline_metrics; load completed in 1700s < 1800s SLA — PASS.
  2. Step 2 counts regions outside the 0.01% tolerance band; zero — PASS.
  3. Step 3 counts regression rows against baseline; three rows — FAIL.
  4. The function appends to failures, prints each one, and returns 1 to signal CI failure.
  5. The CI gate is if run_triple_gate(...) != 0: sys.exit(1).

Output.

FAIL: regression: 3 rows diverge from baseline
return: 1
Enter fullscreen mode Exit fullscreen mode

Why this works — concept by concept:

  • Triple-gate collection — collects all failures per run, never short-circuits; this is the senior pattern because it surfaces correlated bugs (perf + recon together usually = a transform that runs longer because it's processing extra rows).
  • Tolerance bands everywhere0.0001 on reconciliation, <= sla_seconds on performance, exact equality on regression (transforms must be deterministic).
  • Walrus operator (:=) — captures the count into a variable and tests it in one line; idiomatic Python 3.8+.
  • Non-zero exit — the CI contract; 0 = pass, non-zero = fail. The pipeline blocks until fixed.
  • Cost — each check is a single round-trip to the warehouse; total runtime ~1-3s on top of the load itself; negligible.

6. DQ frameworks — Great Expectations, Soda Core, dbt tests, Monte Carlo

data quality frameworks — four tools, one decision tree

The data quality frameworks market has converged on four production-grade tools: dbt tests (the SQL-native default), Great Expectations (the most expressive), Soda Core (the most analyst-friendly), and Monte Carlo (the leading paid observability platform). Every senior etl tester interview questions loop asks "which would you pick, and why?" The answer is never "all four are great" — it's a decision tree based on stack + team skillset + table count.

Tool 1 — dbt tests (built-in + dbt-expectations).

# models/marts/fact_orders.yml
version: 2
models:
  - name: fact_orders
    columns:
      - name: order_id
        tests: [not_null, unique]
      - name: customer_id
        tests:
          - not_null
          - relationships:
              to: ref('dim_customer')
              field: customer_id
      - name: amount
        tests:
          - dbt_expectations.expect_column_values_to_be_between:
              min_value: 0
              max_value: 1000000
Enter fullscreen mode Exit fullscreen mode
  • Strengths — co-located with the model; runs as part of dbt build; integrates with manifest.json and CI; the strongest default if you already use dbt.
  • Weaknesses — limited to column-level assertions; weak on distribution and statistical tests (use dbt-expectations to extend).
  • Pick it when — your transformation layer is dbt; you want tests as code, in the same repo as the model.
  • Built-in testsnot_null, unique, accepted_values, relationships; four assertions cover ~70% of column invariants.

Tool 2 — Great Expectations.

import great_expectations as ge

context = ge.get_context()
suite = context.add_or_update_expectation_suite("fact_orders_suite")

batch = context.sources.add_postgres("warehouse")\
    .add_table_asset("fact_orders")\
    .build_batch_request()

validator = context.get_validator(batch_request=batch, expectation_suite_name="fact_orders_suite")
validator.expect_column_values_to_not_be_null("order_id")
validator.expect_column_values_to_be_unique("order_id")
validator.expect_column_values_to_be_between("amount", min_value=0, max_value=1_000_000)
validator.expect_column_distinct_values_to_be_in_set("region", ["US","EU","APAC","LATAM"])
validator.expect_column_mean_to_be_between("amount", min_value=50, max_value=500)
Enter fullscreen mode Exit fullscreen mode
  • Strengths — 50+ built-in expectations covering null, uniqueness, range, regex, distribution, statistical checks; auto-generates HTML Data Docs.
  • Weaknesses — heavier setup; Python-first (not SQL-first); steeper learning curve than dbt tests.
  • Pick it when — you need distribution / statistical assertions, or you're outside the dbt world.
  • Key concept — ExpectationSuite — a versioned, JSON-serialised contract.

Tool 3 — Soda Core.

# soda/fact_orders.yml
checks for fact_orders:
  - row_count > 0
  - missing_count(order_id) = 0
  - duplicate_count(order_id) = 0
  - min(amount) >= 0
  - max(amount) <= 1000000
  - failed rows:
      name: status_in_set
      fail query: |
        SELECT * FROM fact_orders
        WHERE status NOT IN ('paid','pending','refunded')
Enter fullscreen mode Exit fullscreen mode
  • Strengths — YAML-first, English-like assertions; fast to onboard non-engineers; SQL fallback for custom checks.
  • Weaknesses — smaller built-in library than Great Expectations; less mature ecosystem.
  • Pick it when — your team includes analysts who'll author tests; you want tests outside the dbt repo.
  • Cloud add-on — Soda Cloud adds incident triage + ownership routing on top of the open-source core.

Tool 4 — Monte Carlo (observability platform).

  • Strengths — auto-detects schema changes, freshness anomalies, volume anomalies, and field-level health across 100s of tables with zero config; lineage-aware alerting.
  • Weaknesses — paid SaaS only; per-table licensing; not a write-tests-in-code platform.
  • Pick it when — you have > 100 tables, multiple teams, and need reactive alerting on unknown failure modes.
  • The complement — Monte Carlo sits above dbt tests + Great Expectations, not instead of them; it observes outputs, others assert invariants.

The 4-tool decision tree.

  • You use dbt and want tests next to models → dbt tests (+ dbt-expectations for advanced).
  • You need distribution / statistical checks → Great Expectations.
  • Analysts will author the tests → Soda Core.
  • You have > 100 tables and need observability on unknowns → Monte Carlo (on top of one of the others).
  • Modern stacks combinedbt tests for column invariants + Great Expectations for distributions + Monte Carlo for observability.

dbt audit_helper and datafold data-diff — the diff-tool category

  • dbt audit_helpercompare_relations, compare_column_values, compare_queries; native to dbt.
  • datafold data-diff — CLI + GitHub-bot; cross-database row-level diff; killer for PR review.
  • Pick dbt audit_helper for in-dbt refactor testing.
  • Pick datafold when reviewing PRs across heterogeneous warehouses or wanting GitHub-comment UX.

The build-vs-buy choice — open source vs paid

  • Build (OS)dbt tests + Great Expectations + Soda Core cost zero in licensing, all developer time; full control.
  • Buy (SaaS)Monte Carlo, Bigeye, Anomalo, Lightup; pay per table or per ingestion volume; faster time-to-value at scale.
  • The hybrid (most common) — open-source tools for write-tests-in-code invariants + one SaaS observability tool for cross-cutting alerting.
  • Interview signal — name your stack's actual mix; "we use dbt tests + Great Expectations + Monte Carlo" is a senior answer.

SQL
Topic — data-validation
DQ framework practice

Practice →

SQL
Topic — etl
ETL DQ drills

Practice →

Solution Using a tool-selection matrix

Code.

-- Materialise the decision tree as a query you can paste into any architecture doc.
CREATE TABLE dq_tool_choice AS
SELECT * FROM (VALUES
    ('dbt tests',         'open source', 'column invariants',       'use dbt',          'low'),
    ('dbt-expectations',  'open source', 'extended dbt assertions', 'use dbt + need range/regex/distribution', 'low'),
    ('Great Expectations','open source', 'distribution + statistical', 'need 50+ expectation types', 'medium'),
    ('Soda Core',         'open source', 'analyst-authored checks', 'analysts author tests',  'low'),
    ('Monte Carlo',       'SaaS',        'observability + lineage', '>100 tables, multi-team', 'high'),
    ('Datafold',          'SaaS',        'PR-gated data-diff',      'reviewing transform PRs', 'medium'),
    ('Bigeye / Anomalo',  'SaaS',        'anomaly detection',       'analytics team, low-code', 'high')
) AS t(tool, licensing, primary_strength, pick_when, setup_cost);
Enter fullscreen mode Exit fullscreen mode

Step-by-step trace.

tool licensing primary_strength pick_when
dbt tests open source column invariants use dbt
Great Expectations open source distribution + statistical need 50+ expectation types
Soda Core open source analyst-authored checks analysts author tests
Monte Carlo SaaS observability + lineage >100 tables, multi-team
Datafold SaaS PR-gated data-diff reviewing transform PRs
  1. Row 1 — dbt tests is always the cheapest path when you already run dbt.
  2. Row 3 — Great Expectations shines for distribution / statistical checks the dbt core can't express.
  3. Row 4 — Soda Core is the YAML-first, English-like option for analyst teams.
  4. Row 5 — Monte Carlo is the paid layer on top of the others; pick it when unknowns dominate.
  5. Row 6 — Datafold is the PR-review tool; complements but doesn't replace the others.

Output.

tool licensing pick_when
dbt tests open source use dbt
Great Expectations open source need distribution / statistical
Soda Core open source analysts author tests
Monte Carlo SaaS >100 tables, multi-team
Datafold SaaS PR-gated diff

Why this works — concept by concept:

  • Decision matrix — turns a vague "which tool?" into a one-row lookup; interviewers love a candidate who has internalised the tradeoffs as data, not opinion.
  • Primary strength column — each row codifies the job the tool is best at; avoids the "all four are good" answer.
  • Setup cost — surfaces the real onboarding cost; senior testers weigh setup_cost against table count and team size.
  • Open source vs SaaS — the licensing column is the budget gate; some orgs cannot adopt SaaS at all.
  • CostO(1) to read the matrix; the actual tools have their own runtime costs but those don't belong in the selection step.

7. ETL tester career path + interview-day playbook

etl tester career path — three rungs and a transition

The etl tester to data engineer transition is the most-asked career question in this interview track. The market splits roles into three rungs — junior ETL tester (0-2 years; manual SQL queries, ad-hoc reconciliation), senior ETL tester (3-7 years; owns the test suite, picks tools, mentors juniors), and data engineer / analytics engineer (the transition target; owns the pipeline plus the tests). The transition path is well-trodden and the interview-day playbook below is the canonical preparation framework.

Rung 1 — junior ETL tester (0-2 years).

  • Daily work — run pre-written SQL queries; reconcile source vs target; file Jira tickets on failures.
  • Tools mastered — SQL, Excel, Jira, one BI tool (Tableau / Power BI).
  • Comp band (US) — $60-90K base.
  • Interview focus — the eight test types, basic SQL (joins, GROUP BY, window functions), 1-2 DQ tool names.

Rung 2 — senior ETL tester (3-7 years).

  • Daily work — design the test suite for new pipelines; pick the DQ tool stack; mentor juniors; own CI gates.
  • Tools mastered — SQL, Python (pytest + pandas + requests), dbt tests, Great Expectations or Soda Core, Airflow basics.
  • Comp band (US) — $90-140K base.
  • Interview focus — opinionated tooling answers, the seven stages, reconciliation tolerance bands, regression discipline.

Rung 3 — data engineer / analytics engineer (the transition).

  • Daily work — own pipelines end-to-end (ingest + transform + test + deploy); run on-call.
  • Tools mastered — SQL, Python, dbt (models + tests + macros), Airflow / Dagster (DAGs + sensors), one warehouse deeply (Snowflake / BigQuery / Redshift), one cloud (AWS / GCP / Azure).
  • Comp band (US) — $130-220K base (much higher at FAANG / fintech).
  • The transition criterion — you ship a production pipeline you own (not just a test suite); promote yourself by demonstrating it.

The etl tester to data engineer transition recipe.

  • Step 1 — deepen Python beyond pytest into pandas, requests, SQLAlchemy; ship a 200-line Python script that does ingest + transform + load.
  • Step 2 — own a dbt project end-to-end; models + tests + docs + CI; check it into a public GitHub repo.
  • Step 3 — add an orchestrator (Airflow or Dagster); ship a DAG with sensors, retries, and SLAs.
  • Step 4 — learn one warehouse deeply; performance-tune one model, document the before/after, write it up.
  • Step 5 — add a cloud certification (AWS Data Engineer Associate, GCP PDE, Azure DP-203); helps clear ATS filters at larger orgs.
  • Step 6 — apply for analytics engineer roles first; the role bridges testing + transformation and is the natural next step.

Certification ROI — the honest answer.

  • Highest ROI — a public portfolio (GitHub repo with dbt project, GE suite, Airflow DAG); demonstrably > any cert.
  • Mid ROIdbt Analytics Engineering Certification (free); AWS Certified Data Engineer Associate; GCP Professional Data Engineer; Azure DP-203.
  • Lower ROI but adjacentISTQB Foundation / Advanced (signals test-design fluency; not data-specific).
  • The bar — at smaller / mid-size orgs, your portfolio matters more; at FAANG / large enterprises, a cert clears the ATS filter before the portfolio matters.
  • Rule of thumb — pick one cert that matches the cloud / warehouse stack you're targeting; spend the rest of the budget on shipping a portfolio.

etl testing interview questions — interview-day playbook

The day-of preparation checklist.

  • The 90-second pitch"I'm an ETL tester transitioning into data engineering. I've shipped (X dbt models / Y GE suites / one production Airflow DAG); the bug I'm proudest of catching was (Z)."
  • The seven-stage map — be able to draw it on a whiteboard from memory; this is the structural answer to most behavioural prompts.
  • One bug story — pick a real bug you caught; have the source data, the failing test query, the fix, and the new regression test ready as a 3-minute story.
  • One opinionated tooling answer — "I prefer dbt tests + Great Expectations over Soda Core because (X)"; never say "all four are good".
  • Tolerance band answers — when asked "how do you compare two amounts?", say "ABS(a - b) / NULLIF(ABS(a), 0) < 0.0001, never raw equality"; this is the senior signal.

Behavioural questions and the STAR-with-data spin.

  • "Tell me about a bug you caught." — STAR + the actual SQL query that surfaced it; bring the predicate.
  • "Tell me about a time you missed a bug." — credibility comes from owning misses; describe the test that would have caught it.
  • "How do you keep your test suite from becoming flaky?" — discuss tolerance bands, idempotency, deterministic transforms, snapshot baselines.
  • "How do you onboard a new pipeline?" — walk the seven-stage map; assert all seven gates on day one.
  • "How would you test late-arriving data?"delta IN (expected_set) rather than delta = 0; partitioned snapshots; tolerance windows that widen for older partitions.

Coding-round prep.

  • SQL — practice the four-rule schema parity loop, the three-family completeness loop, the rule-based assertion, and the snapshot diff until you can write each on paper in 60 seconds.
  • Python — practice the triple-gate harness (perf + recon + regression) from section 5; this is the canonical senior coding question.
  • System design — practice "design the test suite for a new fact_orders pipeline" — walk the seven stages, name tools per stage, draw the CI gates.

Final-round signals interviewers chase.

  • You name tradeoffs, not just tools — every tool has a strength and a weakness, name both.
  • You quote tolerance bands, not exact equality — finance: 0.0001, marketing: 0.001, clickstream: 0.01.
  • You re-derive in your assertionsorder_total = unit_price * qty - discount, not just "trust the stored value".
  • You have a regression story — every PR re-runs the prior suite; you never short-circuit.
  • You can talk to ownership — who owns the test, who owns the alert, who owns the runbook; senior testers always know.

SQL
Topic — etl
ETL interview-day drills

Practice →

SQL
Topic — sql
SQL practice library

Practice →

Solution Using a six-step transition roadmap + day-of checklist

Code.

-- Materialise the career path as a roadmap table; review weekly.
CREATE TABLE etl_to_de_roadmap AS
SELECT * FROM (VALUES
    (1, 'Python beyond pytest',  'pandas + requests + SQLAlchemy', 'ship a 200-line ingest+transform+load script'),
    (2, 'Own a dbt project',     'models + tests + docs + CI',     'public GitHub repo, README + screenshots'),
    (3, 'Add an orchestrator',   'Airflow or Dagster',             'DAG with sensors + retries + SLA'),
    (4, 'Deep-dive one warehouse','Snowflake or BigQuery or Redshift','perf-tune one model, document before/after'),
    (5, 'One cloud cert',        'AWS DEA / GCP PDE / Azure DP-203','clears ATS at larger orgs'),
    (6, 'Apply analytics engineer first','bridges testing + transformation','natural next step from senior tester')
) AS t(step, milestone, tools, artefact);
Enter fullscreen mode Exit fullscreen mode

Step-by-step trace.

step milestone tools artefact
1 Python beyond pytest pandas + requests + SQLAlchemy 200-line ingest+transform+load script
2 Own a dbt project models + tests + docs + CI public GitHub repo
3 Add an orchestrator Airflow or Dagster DAG with sensors + retries + SLA
4 Deep-dive one warehouse Snowflake / BigQuery / Redshift perf-tune one model, document
5 One cloud cert AWS DEA / GCP PDE / Azure DP-203 clears ATS
6 Apply analytics engineer first bridges testing + transformation natural next step
  1. Step 1 — Python skills beyond pytest are the floor; without pandas + requests, you're a tester, not an engineer.
  2. Step 2 — a public dbt project is the strongest single artefact you can ship for the transition.
  3. Step 3 — an orchestrator DAG turns "I ran the script" into "I own the pipeline".
  4. Step 4 — depth on one warehouse beats breadth across five; pick the one your target companies use.
  5. Step 5 — one cert clears the ATS at larger orgs; more than one is diminishing returns.
  6. Step 6 — analytics engineer is the bridging role; apply there first.

Output.

step milestone artefact
1 Python beyond pytest 200-line script
2 Own a dbt project public repo
3 Add an orchestrator DAG with SLAs
4 Deep-dive one warehouse perf-tune writeup
5 One cloud cert ATS clear
6 Apply analytics engineer natural next step

Why this works — concept by concept:

  • Artefact-driven — every step produces a thing you can put in a portfolio; "I learned X" is weaker than "here is the GitHub link".
  • Sequenced — Python before dbt before Airflow; the order matters because each step builds on the previous.
  • Pick one warehouse + one cloud — depth over breadth; interviewers respect depth.
  • Analytics engineer first — the bridging role is easier to land than full data engineer; once you're an AE, the DE jump is internal.
  • Cost — 6-12 months of nights-and-weekends to complete the roadmap; faster if your day job overlaps any step.

Choosing the right ETL test (cheat sheet)

A one-screen cheat sheet for etl testing interview questions — pick the test that matches the failure mode you're worried about.

You want to catch … Test family Canonical query / tool Cadence
Renamed / dropped column metadata + schema parity information_schema.columns + frozen contract every PR
Type drift (INT → VARCHAR) metadata + schema parity information_schema.columns every PR
Missing rows / row drops row count parity COUNT(*) source vs target every load
Skewed transforms (one region drops) GROUP BY row count COUNT(*) GROUP BY region every load
Arithmetic transform bugs aggregate parity SUM / MIN / MAX / AVG with tolerance every load
Status normalisation bugs reference-table comparison LEFT JOIN expected_output every load
Business-rule violations rule-based assertion WHERE order_total != unit_price * qty - discount every load
Silent drift (no rule, no reference) snapshot diff FULL OUTER JOIN yesterday's snapshot every load
SLA breach (load too slow) performance gate load_seconds < sla_seconds every load
Source vs target ledger drift reconciliation SUM(amount) GROUP BY region with 0.0001 tol every load
Regression on PR merge regression diff FULL OUTER JOIN candidate vs baseline every PR
Distribution shift on a column DQ framework (statistical) Great Expectations expect_column_mean_to_be_between every load
Cross-table observability DQ framework (observability) Monte Carlo, Bigeye, Anomalo continuous
PR-gated row-level diff data-diff tool dbt audit_helper or datafold data-diff every PR
Late-arriving data tolerance partitioned snapshot diff delta IN (expected_set) every load

Frequently asked questions

How is this deep-dive different from a quick ETL testing Q&A round-up?

A quick etl testing interview questions cheat sheet covers the eight etl testing types, the four validation rule families, and the canonical eight-table SQL loop in one sitting — perfect for last-minute review. This deep-dive walks the same territory at 7 numbered teaching stages (metadata + schema, completeness + row count, transformation logic, performance + reconciliation + regression, DQ frameworks, ETL tester career path), with full worked examples, tolerance-band derivations, and tool-by-tool tradeoffs. Pick the deep-dive when you have a week to prepare, want to teach the material in a senior loop, or need the seven-stage map memorised. Pick the cheat-sheet round-up the night before. The two formats are complements, not duplicates — same topic, different depth and structure.

What's the canonical seven-stage map of ETL testing?

The seven stages are: (1) metadata + schema testing — column existence, type match, nullability, PK / FK; (2) completeness + row-count testing — raw COUNT(*), GROUP BY counts, aggregate parity with tolerance; (3) transformation logic testing — reference-table comparison, rule-based assertion, snapshot diff; (4) performance testing — volume, stress, concurrency, endurance vs SLA; (5) data reconciliation — source vs target ledger with 0.0001 tolerance band; (6) regression testing — snapshot-based diff against main, run on every PR; (7) DQ frameworksdbt tests, Great Expectations, Soda Core, Monte Carlo. Memorise the seven; every senior interview-day question maps to one of them, and most bugs touch three or four at once. The seven-stage map is the structural answer interviewers reward when they ask "walk me through your test suite".

What tolerance band should I use for ETL reconciliation testing?

The industry default is 0.0001 (0.01%) relative tolerance for financial reconciliation; that is, ABS(src - tgt) / NULLIF(ABS(src), 0) < 0.0001. Use 0.001 (0.1%) for marketing / engagement metrics where late-arriving data and minor rounding are acceptable. Use 0.01 (1%) for clickstream / event-stream data where high-cardinality late events are normal. Never use raw equality on floating-point aggregates — floating-point arithmetic, currency conversions, and timing windows all introduce tiny noise that masks no real bug. Tighten the band on financial models (some teams go to 0.00001 on tax + GAAP-reported numbers); loosen it on analytical pipelines. The canonical interview answer is "I use a 0.01% relative tolerance band on reconciliation, with NULLIF to guard against divide-by-zero, and I tighten or loosen by sector" — that single sentence is the senior signal.

How does etl tester differ from data engineer on the org chart?

ETL tester owns the assertions, the reconciliation queries, the CI gates, and the bug-triage runbook. Data engineer owns the pipeline itself — the ingestion, the transformation, the deployment, the on-call rotation — plus collaborates with the tester on the test suite. At smaller orgs the two roles are one person; at larger orgs they're separate. The day-to-day differences: testers spend most of their time in SQL + DQ tool YAML + Jira; engineers spend most of theirs in dbt models + airflow DAGs + terraform + python. Comp differences in 2026: testers ~$70-140K base, engineers ~$130-220K base (US, much higher at FAANG / fintech). The transition path covered in section 7 — Python, dbt, Airflow, warehouse depth, one cloud cert — is well-trodden; most senior testers can move into analytics engineering within 9-12 months of focused work.

Which DQ framework should I pick — Great Expectations, Soda Core, dbt tests, or Monte Carlo?

There is no single winner — pick the tool that matches your stack and team. dbt tests is the strongest default if you already use dbt; built-in tests (not_null, unique, accepted_values, relationships) cover ~70% of column invariants, and dbt-expectations extends to range / regex / distribution. Great Expectations wins on expressive power — 50+ built-in expectations covering null, uniqueness, range, regex, distribution, statistical checks — and auto-generates HTML Data Docs; reach for it when dbt's built-ins aren't enough. Soda Core is the most analyst-friendly — YAML-first English-like assertions; great for hybrid teams. Monte Carlo is the leading paid observability platform — anomaly detection + lineage + incident triage; reach for it when you have > 100 tables and need reactive alerting on unknown failure modes. Most modern stacks combine dbt tests for column invariants + Great Expectations for distributions + Monte Carlo for cross-cutting observability — the hybrid is the senior answer.

What's the fastest path from etl tester to data engineer in 2026?

Six steps, 9-12 months. (1) Deepen Python beyond pytest into pandas, requests, SQLAlchemy; ship a 200-line ingest + transform + load script in a public GitHub repo. (2) Own a dbt project end-to-end — models + tests + docs + CI; the repo + a README with screenshots is your strongest single portfolio artefact. (3) Add an orchestrator — Airflow or Dagster; ship a DAG with sensors, retries, and an SLA. (4) Deep-dive one warehouse — Snowflake, BigQuery, or Redshift; performance-tune one model, document before/after. (5) Pick one cloud cert — AWS Certified Data Engineer Associate, GCP Professional Data Engineer, or Azure DP-203; one cert clears ATS at larger orgs, more than one is diminishing returns. (6) Apply for analytics engineer roles first; the role bridges testing + transformation and is the natural next step from senior tester. The single highest-ROI investment is not more certs — it's the public portfolio.


Practice on PipeCode

PipeCode ships 450+ data-engineering interview problems — including SQL + Python drills keyed to the same etl testing interview questions skill set this guide teaches (schema parity, row count parity, aggregate reconciliation, regression snapshots, DQ framework wiring, performance gates). Whether you're drilling etl testing questions and answers the night before a screen or grinding the etl tester to data engineer transition over 12 months, the practice library mirrors the same seven-stage mental model — plus the dbt tests + Great Expectations + Soda Core + Monte Carlo tooling you'll wire into your production suite.

Kick off via Explore practice →; drill the SQL practice lane →; fan out into the data-validation lane →; rehearse ETL drills →; reinforce aggregation reconciliation patterns →; widen coverage on the full Python practice library →.

Source: dev.to

arrow_back Back to Tutorials