sql server 2025 is the biggest single release of the engine since the 2016 era — and interviewers know it. The release lands four story arcs in the same compat-level flip: a JSON-native type system that finally retires the FOR JSON PATH string-concat hacks, an ANSI regex family that ends a decade of CLR UDF crutches, a first-class vector data type with similarity functions that pulls hybrid search inside the engine, and a wave of optimizer upgrades — Optional Parameter Plan Optimization, adaptive joins everywhere, percentile memory grant feedback, batch mode on rowstore, optimized locking, and DOP feedback — that quietly retune every plan you wrote before compatibility level 170.
This guide is the interview-shaped tour of those changes. It walks through the T-SQL syntax cheat sheet (JSON_OBJECT, JSON_ARRAY, OPENJSON improvements, REGEXP_LIKE, GREATEST / LEAST, STRING_SPLIT with enable_ordinal), the performance posture you now have to defend (optional parameter plan optimization, adaptive joins, optimized locking), and the AI-and-security surface that hiring managers ask about today — the vector data type, secure enclaves with Always Encrypted, change event streaming as the outbox-style Service Broker replacement, and Copilot in SSMS with its production-trust caveat. Every H2 closes with a Solution-Tail interview answer: code, a step-by-step trace, an output table, and a concept-by-concept breakdown of why it works.
When you want hands-on reps immediately after reading, drill the JSON practice library → for the new JSON primitives, sharpen the window functions library → for the analytical patterns batch-mode-on-rowstore now accelerates, and stack the aggregation library → for the kind of group-by workloads that surface OPO regressions in interview whiteboarding.
On this page
- Why SQL Server 2025 is the biggest release since 2016
- Feature map — JSON, regex, vector, optional-plan, change-feed
- T-SQL new syntax cheat sheet
- Performance — optional plan optimization, parallel plans, IQP wave 4
- AI + security features — vector search, secure enclaves, Copilot, change feed
- Cheat sheet — SQL Server 2025 recipes
- Frequently asked questions
- Practice on PipeCode
1. Why SQL Server 2025 is the biggest release since 2016
The four story arcs — JSON-native, regex-native, vector-native, AI-native plans — converge in a single compat-level flip
The one-sentence invariant: SQL Server 2025 is the first release where compatibility level 170 changes the answers a query returns, not just the plan shape — JSON / regex / vector primitives become real types, three-valued NULL rules continue to apply, and every pre-2025 plan needs to be re-read at least once during the upgrade. Interviewers who lived through the 2016 / 2017 / 2019 cycles use 2025 to separate candidates who can name a feature from candidates who can also name its trade-off.
The four story arcs.
-
JSON-native.
JSON_OBJECT,JSON_ARRAY,JSON_OBJECTAGG,JSON_ARRAYAGG,OPENJSONimprovements, and an actualjsonstorage type. Replaces a decade ofFOR JSON PATH, WITHOUT_ARRAY_WRAPPERplus string concatenation, plus theWITH (col path '$.x')re-shredding pattern that every junior wrote during their first integration project. -
Regex-native. ANSI
REGEXP_LIKE,REGEXP_REPLACE,REGEXP_SUBSTR,REGEXP_COUNT,REGEXP_INSTR. The functions you previously wrote as CLR UDFs (and then could not deploy because the DBA disabled CLR) ship as first-class T-SQL functions backed by the engine's native regex runtime. -
Vector-native.
vector(n)data type, plusVECTOR_DISTANCE(a, b, 'cosine' | 'euclidean' | 'dot')for kNN-style retrieval. Embeddings can live inside the OLTP database and be filtered byREGEXP_LIKEand ordered by similarity in a single query. - AI-native plans. Optional Parameter Plan Optimization (OPO), Intelligent Query Processing wave 4 (adaptive joins everywhere, percentile memory grant feedback, batch mode on rowstore, optimized locking, DOP feedback), and Copilot in SSMS. The optimizer becomes statistical and reactive in the same release.
What got removed or deprecated.
- Service Broker is not deleted, but Change Event Streaming is the new recommended pattern for outbox / event-bus integration. New code rarely justifies Service Broker; existing code keeps working at compat 170.
-
R Services / Machine Learning Services external scripts were already on the deprecation path; ML in SQL Server 2025 leans on the in-engine
vectortype and external model endpoints called via REST. - The 1998-era cardinality estimator is no longer the legacy fallback by default at compat 170 — the new CE model is, and several plan shapes change. Read every plan twice during the upgrade.
Editions, licensing, Azure parity — the lines interviewers care about.
- Standard vs Enterprise. Most 2025 features are enabled at compat 170 on both editions. The historical edition gaps (intelligent query processing for analytics, batch mode, certain partition operations) are smaller than they were in 2019 / 2022. Always read the SKU matrix for the specific feature; interviewers respect "I'd check the doc" more than a confidently-wrong claim.
- Azure SQL parity. Vector type and Copilot ship in Azure SQL Database first; secure enclaves and change event streaming reach the box product faster than the previous release cycles. The general rule of thumb: Azure SQL DB first, Azure SQL MI second, on-prem box product third — but the gap is now measured in quarters rather than years.
- Compatibility level 170. The flip is opt-in at upgrade time. New databases default to 170; upgraded databases preserve their existing compat level until you change it. The expensive interview answer is "I'd test the workload twice — once at the old compat level, once at 170 — and only flip after I've reviewed the plan changes."
The 2025 interview signal.
- Do you name both the feature and the trade-off? — senior signal.
- Do you mention compat level 170 when asked about default behaviour changes? — required answer.
- Do you say "vector type ships in Azure SQL DB first" when asked about parity? — senior signal.
- Do you know that
OPTION (PARAMETER SENSITIVE PLAN)is the optimizer hint name for OPO? — required answer.
Worked example — name the feature, name the trade-off
Detailed explanation. A common opener: "Pick the three SQL Server 2025 features you would actually use in your current job and explain the trade-off for each." Interviewers are not looking for a recital of the release notes — they want to hear that you have thought about what each feature replaces, what it costs, and what could go wrong.
Question. Pick three features from SQL Server 2025 — one T-SQL surface change, one performance change, one AI / security change — and for each one name (a) the legacy pattern it replaces and (b) the trade-off you would warn the team about.
Input — your three picks.
| Feature | Story arc |
|---|---|
JSON_OBJECT / JSON_ARRAYAGG
|
T-SQL syntax |
| Optional Parameter Plan Optimization | Performance |
vector data type |
AI / security |
Code.
-- 1) JSON_OBJECT — replaces FOR JSON PATH + string concat
SELECT user_id,
JSON_OBJECT('id': user_id, 'name': display_name) AS payload
FROM users
WHERE region = 'EU';
-- 2) Optional Plan Optimization — multi-plan cache per parameter shape
SELECT order_id, total
FROM orders
WHERE region = @region
OPTION (PARAMETER SENSITIVE PLAN);
-- 3) vector data type — embeddings inside the engine
SELECT TOP (10) doc_id, title
FROM documents
ORDER BY VECTOR_DISTANCE(embedding, @query_vector, 'cosine');
Step-by-step explanation.
-
JSON_OBJECTreturns ajsonvalue directly — no intermediate string concat, no manual quoting, no need to callJSON_QUERYto re-parse. The trade-off: any client that was reading the column asnvarcharmay need adjustment because the newjsontype has its own ordering and comparison rules. -
OPTION (PARAMETER SENSITIVE PLAN)instructs the optimizer to cache multiple plans for the same query text, one per parameter shape bucket. The trade-off: plan cache memory grows; on memory-tight servers, watch the size of the cache after enabling. -
vector(n)stores an embedding column natively.VECTOR_DISTANCEruns the similarity inside the engine — no round trip to a separate vector store. The trade-off: indexing the column for sub-linear kNN is a separate feature (vector indexes ship initially on a subset of editions / Azure SKUs); without the index, every similarity query is a full scan.
Output.
| Feature | Replaces | Trade-off |
|---|---|---|
JSON_OBJECT |
FOR JSON PATH + concat |
clients reading nvarchar need type review |
OPO (PARAMETER SENSITIVE PLAN) |
one cached plan + parameter sniffing pain | larger plan cache, watch memory |
vector data type |
round trip to external vector DB | full scan without a vector index |
Rule of thumb. When you list a 2025 feature in an interview, lead with the legacy pattern it replaces and close with the trade-off. The pattern is "X replaces Y, but watch Z." It signals you have weighed adoption, not just read the press release.
Worked example — compat level 170 changes the answer, not just the plan
Detailed explanation. The biggest upgrade risk is when the new CE model shifts row estimates enough to flip a plan shape, and the new plan exposes a previously-masked correctness bug — for example a non-deterministic STRING_AGG order, or a TOP (1) without an ORDER BY that previously happened to read in clustered-index order and now reads in parallel scan order. Compat level 170 turns several of these latent bugs into visible regressions.
Question. A team upgrades a database from compat level 150 to 170. The same query starts returning rows in a different order. Explain in three steps why that is not a bug in SQL Server 2025 and what the team should fix.
Input — the query that "broke."
| Query | Pre-170 result order | Post-170 result order |
|---|---|---|
SELECT TOP (1) order_id FROM orders |
always 1
|
sometimes 7
|
Code.
-- Latent bug — TOP without ORDER BY
SELECT TOP (1) order_id
FROM orders;
-- Fix — make the order deterministic
SELECT TOP (1) order_id
FROM orders
ORDER BY order_id;
Step-by-step explanation.
-
TOP (n)withoutORDER BYreturns "any n rows" — the SQL standard explicitly says the order is not defined. Pre-170 the plan happened to be a serial clustered-index scan, so the rows came out in clustered-index order by accident. - At compat 170, the new CE model and the wave-4 IQP plan shapes can pick a parallel scan or a different access path. The "any one row" the engine now returns is from whichever thread finishes its slice first.
- The fix is not to roll back the compat level — it is to add the
ORDER BYthat should have been there from day one. Compat 170 made an existing non-deterministic query observably non-deterministic, which is the system working correctly.
Output.
| Aspect | Pre-170 | Post-170 |
|---|---|---|
| Engine guarantees order? | no | no |
| Observed order? | stable by accident | varies |
| Fix | add ORDER BY
|
add ORDER BY
|
Rule of thumb. When the team says "this broke at compat 170," start by asking "was the previous behaviour guaranteed by the standard, or was it an accident?" Two-thirds of the post-upgrade regressions you will be asked to debug are latent bugs that the new plan shape exposed.
SQL Server 2025 interview question on the four-arc storyline
A senior interviewer often opens with: "Tell me what you'd actually adopt from SQL Server 2025 in the first quarter after upgrade, what you'd defer to the second quarter, and what you'd skip entirely. Defend each call with a trade-off." It blends the feature inventory, the operational posture, and the compat-level decision into a single answer.
Solution Using a tiered-adoption matrix
-- Quarter 1 — adopt: pure-syntax wins, low blast radius
-- 1a) JSON_OBJECT / JSON_ARRAYAGG for new code paths
SELECT region,
JSON_ARRAYAGG(JSON_OBJECT('id': order_id, 'total': total)) AS orders_json
FROM orders
GROUP BY region;
-- 1b) GREATEST / LEAST in dimension fallbacks
SELECT customer_id,
GREATEST(last_web_seen, last_mobile_seen) AS last_seen
FROM customers;
-- Quarter 2 — adopt: optimizer changes, after baseline + workload replay
-- 2a) Optional Parameter Plan Optimization on hotspot queries
SELECT order_id, total
FROM orders
WHERE region = @region
OPTION (PARAMETER SENSITIVE PLAN);
-- Skip for now — features with operational learning curve
-- e.g. Copilot in SSMS in production; vector indexes on the box product
-- if your SKU doesn't have them GA yet.
Step-by-step trace.
| Tier | Feature | Rationale |
|---|---|---|
| Q1 — adopt |
JSON_OBJECT / JSON_ARRAYAGG
|
pure T-SQL syntax, no plan change |
| Q1 — adopt |
GREATEST / LEAST
|
replaces 3-line CASE chains, no plan change |
| Q1 — adopt |
REGEXP_LIKE for analytics filters |
replaces brittle LIKE + CLR |
| Q2 — adopt | OPO (PARAMETER SENSITIVE PLAN) |
needs baseline + plan-cache headroom |
| Q2 — adopt | batch mode on rowstore | needs analytics queries that benefit |
| Q2 — adopt | optimized locking | needs OLTP hotspots to justify |
| Skip / Q3+ | Copilot in SSMS for production | trust + audit story still maturing |
| Skip / Q3+ | vector indexes (if not GA on your SKU) | feature parity check first |
Output:
| Quarter | Feature | Risk |
|---|---|---|
| Q1 |
JSON_OBJECT / JSON_ARRAYAGG
|
low — syntax sugar |
| Q1 |
GREATEST / LEAST
|
low — syntax sugar |
| Q1 | REGEXP_LIKE |
low — additive |
| Q2 | OPO | medium — plan cache memory |
| Q2 | batch mode on rowstore | medium — query-shape dependent |
| Q2 | optimized locking | medium — workload-dependent |
| Q3+ | Copilot in prod | high — trust / audit story |
Why this works — concept by concept:
- Tier by blast radius — pure syntax changes (JSON, regex, GREATEST/LEAST) are additive: they introduce new code paths without changing the existing ones. Optimizer-level changes (OPO, batch mode, optimized locking) require baseline and replay because they alter how existing queries execute.
- Quarter-1 wins are diff-only — JSON_OBJECT / JSON_ARRAYAGG and GREATEST / LEAST shrink existing CASE / FOR JSON PATH code by 30-60% without touching plan shape. Easy to ship, easy to revert.
- Quarter-2 wins need replay — OPO, batch mode on rowstore, and optimized locking change plan shape; the safe path is a workload replay (Query Store + Distributed Replay) at the new compat level before enabling them in production.
- Skip tier is about audit, not capability — Copilot in SSMS works fine technically; the reason to defer is the team's audit / change-control story. New tooling that writes T-SQL into production needs the same review gates as any other code change.
- Cost — Q1 changes are O(diff) in code-review effort; Q2 changes are O(workload size) in replay effort; Q3+ changes are O(team maturity) in process design.
SQL
Topic — aggregation
Aggregation problems (SQL)
2. Feature map — JSON, regex, vector, optional-plan, change-feed
Twelve flagship features sit inside four quadrants — interviewers want the quadrant and the legacy workaround it kills
The mental model in one line: the 2025 release lives inside four quadrants — JSON-native, regex-native, vector + AI, performance + change feed — and every interview question maps to one of those quadrants plus a "what did this replace?" follow-up. Once you can sketch the quadrant on a whiteboard and slot three features under each ribbon, the broad questions ("what's new in 2025?") become a 90-second structured answer instead of a recital.
Quadrant 1 — JSON-native.
-
jsonstorage type. Replacesnvarchar(max)columns marked withCHECK (ISJSON(...) = 1). The new type has dedicated comparison, ordering, and index support. -
JSON_OBJECT(key: value, ...)andJSON_ARRAY(value, value, ...). ReplaceFOR JSON PATH+ string concat for hand-built payloads. -
JSON_OBJECTAGG(key VALUE value)andJSON_ARRAYAGG(value). Aggregate forms — replaceSTRING_AGGwrappers around'[' + ... + ']'. -
OPENJSON ... WITH (... PATH ...)improvements. Path expressions, indexed shred columns, and better plan integration. Replaces hand-rolled string splits and CLR JSON parsers.
Quadrant 2 — Regex-native.
-
REGEXP_LIKE(col, pattern, flags)— boolean match. Replacescol LIKE '%pat%'plus CLR UDF for anything past trivial wildcards. -
REGEXP_REPLACE(col, pattern, replacement, flags)— pattern-based substitution. -
REGEXP_SUBSTR(col, pattern, position, occurrence, flags)— extract the first / nth match. -
REGEXP_COUNT(col, pattern, flags)andREGEXP_INSTR(col, pattern, ...)— match counting and positional lookup. - All five follow the ANSI signature used in Oracle, Postgres (via the
~family equivalents), and Snowflake. Patterns travel.
Quadrant 3 — Vector + AI.
-
vector(n)data type. Stores a fixed-dimension embedding (e.g.vector(1536)for a small embedding model). First-class type — typed comparisons, valid distance functions, plan integration. -
VECTOR_DISTANCE(a, b, 'cosine' | 'euclidean' | 'dot')— similarity. Use withORDER BY ... OFFSET 0 ROWS FETCH NEXT k ROWS ONLYfor kNN. Hybrid retrieval combinesREGEXP_LIKE/WHEREfilters withVECTOR_DISTANCEordering in one query. - Copilot in SSMS / Azure Data Studio. Schema-aware natural-language to T-SQL. The interview answer is always "Show SQL — human reviews before run."
Quadrant 4 — Performance + change feed.
-
Optional Parameter Plan Optimization (OPO).
OPTION (PARAMETER SENSITIVE PLAN)hint, plus a database-scoped option. Caches multiple plans per parameter shape — kills the worst class of parameter sniffing bugs. - Intelligent Query Processing wave 4. Adaptive joins everywhere (not just batch mode); percentile memory grant feedback (fewer tempdb spills); batch mode on rowstore (analytical queries without a columnstore index); DOP feedback (degree of parallelism auto-tunes across runs).
- Optimized locking. TID (transaction ID) locks + lock-after-qualification — reduces blocking on hot rows in OLTP.
- Change Event Streaming. Outbox-style event log on tables; downstream Kafka / Event Hubs / Service Bus integration. Replaces Service Broker for many new projects.
The side rail — compatibility and parity.
- Compatibility level 170 — new databases default here; upgraded databases stay on their previous level until you flip. Several wave-4 features are gated by 170.
- Azure SQL parity — vector type and Copilot ship in Azure SQL DB first; secure enclaves and Change Event Streaming reach the box product GA within the same release cycle; managed instance lags DB by one or two quarters typically.
Worked example — quadrant your way through "what's new in 2025?"
Detailed explanation. A coherent, 90-second answer to "what's new in SQL Server 2025?" follows the four-quadrant outline: one sentence on each, then a single closing sentence on compatibility level 170. This structure lets the interviewer interrupt at any quadrant for depth without breaking your narrative.
Question. Outline the four quadrants of SQL Server 2025 in roughly 90 seconds. Name the flagship feature in each quadrant and the legacy pattern it replaces. Close with the compat-level point.
Input. (your mental model)
| Quadrant | Flagship | Replaces |
|---|---|---|
| JSON-native | JSON_OBJECT |
FOR JSON PATH + concat |
| Regex-native | REGEXP_LIKE |
LIKE + CLR UDF |
| Vector + AI |
vector / VECTOR_DISTANCE
|
external vector store round trip |
| Performance + change feed | OPO + Change Event Streaming | parameter sniffing + Service Broker |
Code.
-- One illustrative query per quadrant — the kind a senior writes on the whiteboard
-- JSON-native
SELECT JSON_OBJECT('id': id, 'name': name) AS payload FROM users;
-- Regex-native
SELECT email FROM users
WHERE REGEXP_LIKE(email, '^[A-Z0-9._%+-]+@[A-Z0-9.-]+$', 'i');
-- Vector + AI
SELECT TOP (10) doc_id
FROM documents
ORDER BY VECTOR_DISTANCE(embedding, @q, 'cosine');
-- Performance — OPO hint
SELECT order_id FROM orders WHERE region = @region
OPTION (PARAMETER SENSITIVE PLAN);
Step-by-step explanation.
- Name the quadrant first ("JSON-native"), then the flagship function (
JSON_OBJECT), then the legacy pattern it replaces (FOR JSON PATH+ string concat). Same structure for every quadrant. - Each quadrant gets one query — short enough to whiteboard, expressive enough to show the new syntax in context.
- The closing sentence ties it all together: "All four quadrants light up at compatibility level 170 — and 170 is the compat level new databases default to, so upgraded databases need an explicit flip plus a workload replay."
Output.
| Quadrant | One-line summary |
|---|---|
| JSON-native | "Replaces FOR JSON PATH + concat with proper builders and an actual json type." |
| Regex-native | "Replaces LIKE and CLR UDFs with five ANSI regex functions." |
| Vector + AI | "Embeddings + similarity inside the engine, no round trip to a vector store." |
| Performance + change feed | "OPO + IQP wave 4 fix parameter sniffing and tempdb spills; Change Event Streaming replaces Service Broker outbox patterns." |
Rule of thumb. Memorise one query per quadrant. The interviewer can stop you mid-quadrant and drill deeper — you can always go back up to the four-arc structure to recover your narrative.
Worked example — pick the right quadrant for a real workload
Detailed explanation. "Where would SQL Server 2025 help you the most right now?" is the senior probe. The answer should map a specific pain point to the quadrant — not list every feature. If the team's pain is "we shred JSON poorly," the quadrant is JSON-native. If the pain is "queries with parameter @region are sometimes 10ms and sometimes 30s," the quadrant is performance.
Question. Given the three pain points below, pick the SQL Server 2025 feature that most directly addresses each — and justify the choice in one sentence.
Input.
| Pain point | Feature pick | Justification |
|---|---|---|
"We rebuild a JSON payload column with STUFF + FOR JSON PATH" |
JSON_OBJECT / JSON_ARRAYAGG
|
direct builder, 30-60% shorter |
"SELECT … WHERE region = @r is sometimes 10 ms, sometimes 30 s" |
OPO (PARAMETER SENSITIVE PLAN) |
multi-plan cache fixes sniffing |
| "We export embeddings to a separate vector store" |
vector + VECTOR_DISTANCE
|
embeddings stay in the engine |
Code.
-- Old JSON build
SELECT STUFF((
SELECT ',' + '{' +
'"id":' + CONVERT(varchar, id) + ',' +
'"name":"' + name + '"' +
'}'
FROM users
FOR XML PATH('')
), 1, 1, '');
-- New JSON build
SELECT JSON_ARRAYAGG(JSON_OBJECT('id': id, 'name': name)) AS payload
FROM users;
-- Old OPO workaround: OPTION (RECOMPILE)
SELECT order_id, total
FROM orders WHERE region = @region
OPTION (RECOMPILE);
-- New OPO
SELECT order_id, total
FROM orders WHERE region = @region
OPTION (PARAMETER SENSITIVE PLAN);
Step-by-step explanation.
- The old JSON build uses XML PATH as a string-concat trick, then
STUFFto chop the leading comma. The new build is one nested call:JSON_ARRAYAGG(JSON_OBJECT(...)). Same output, ~70% less code, and the optimizer can plan it. - The old OPO workaround was
OPTION (RECOMPILE)— which fixes the sniffing problem by paying the recompile cost on every execution. OPO instead caches multiple plans, one per parameter shape, and re-uses them across executions. - The vector example would replace the round-trip-to-external-vector-DB topology with a
vectorcolumn andVECTOR_DISTANCE. The trade-off is the vector index story (above) — without an index, the query is a full scan, so this is for moderate-volume hybrid search rather than billion-vector indexes.
Output.
| Pain | Feature | Code-size change | Performance change |
|---|---|---|---|
| JSON build | JSON_ARRAYAGG(JSON_OBJECT(...)) |
~70% shorter | same / better |
| Parameter sniffing | OPO | unchanged | p95 ↓ on skewed predicates |
| External vector store |
vector + VECTOR_DISTANCE
|
one less hop | depends on index availability |
Rule of thumb. Don't pick a feature first and then look for a pain point. Pick the pain point first, name the quadrant, and pick the flagship feature inside it.
SQL Server 2025 interview question on the feature-map narrative
A senior interviewer often frames this as: "Take ten minutes and explain SQL Server 2025 to someone whose last major SQL Server release was 2016. What changed, what survived, and what should they re-learn first?"
Solution Using the four-quadrant narrative + a "what to re-learn first" tail
-- The narrative in code: one example per quadrant, in the order you'd teach them.
-- 1) JSON-native — the syntax win they will use every day
SELECT user_id,
JSON_OBJECT('id': user_id, 'name': name, 'active': is_active) AS payload
FROM users;
-- 2) Regex-native — the validation win that finally retires CLR
SELECT email FROM users
WHERE REGEXP_LIKE(email, '^[A-Z0-9._%+-]+@[A-Z0-9.-]+\\.[A-Z]{2,}$', 'i');
-- 3) Vector + AI — the new shape of search inside the engine
SELECT TOP (5) doc_id, title
FROM documents
WHERE REGEXP_LIKE(title, 'sql', 'i') -- filter
ORDER BY VECTOR_DISTANCE(embedding, @q, 'cosine'); -- rank
-- 4) Performance — the hint name they need to recognise in plans
SELECT order_id, total
FROM orders
WHERE region = @region
OPTION (PARAMETER SENSITIVE PLAN);
Step-by-step trace.
| Step | What changed since 2016 | What survived |
|---|---|---|
| 1 | JSON has a real type + builders |
FOR JSON PATH still compiles |
| 2 | Regex is first-class T-SQL |
LIKE and CLR still work |
| 3 | Embeddings live in the engine | external stores still viable for billion-scale |
| 4 | OPO + IQP wave 4 retune plans | every pre-2025 hint still parses |
The narrative tells a 2016-era engineer: every old idiom still works, but the 2025 idioms are shorter, more portable, and integrate with the new optimizer. The "re-learn first" list is JSON builders → regex functions → OPO → vector type.
Output:
| Re-learn order | Feature | Why first |
|---|---|---|
| 1 |
JSON_OBJECT / JSON_ARRAYAGG
|
daily-use syntax win |
| 2 |
REGEXP_LIKE family |
retires CLR + brittle LIKE |
| 3 | OPO (PARAMETER SENSITIVE PLAN) |
optimizer name to recognise in plans |
| 4 |
vector + VECTOR_DISTANCE
|
new search topology |
Why this works — concept by concept:
- Quadrant ordering matches teaching difficulty — JSON syntax is easiest, regex is similar to other dialects, OPO is conceptual (parameter sniffing), vector is new topology. Teaching them in this order means each new concept builds on the previous one.
- "Everything old still works" — interviewers respect the candidate who acknowledges backward compatibility. Compat level 170 changes some defaults, but the 2016-era code does not stop compiling.
- One example per quadrant — the four queries above are short enough to fit on a whiteboard yet expressive enough to show the new surface. They double as the "if you only remember one query per feature" mental anchor.
-
The performance hint name matters — being able to say
OPTION (PARAMETER SENSITIVE PLAN)out loud is a signal you have read plan XML in production, not just the marketing page. - Cost — pure pedagogy; zero runtime cost. The interview value is the structure of the answer.
SQL
Topic — JSON
JSON problems (SQL)
3. T-SQL new syntax cheat sheet
T-SQL finally caught up to ANSI — JSON builders, regex functions, GREATEST / LEAST, STRING_SPLIT with ordinal, and DATETRUNC GA
The mental model in one line: every one of the new T-SQL functions in 2025 has a "before" — a workaround you had to write in earlier releases — and a "after" — the one-liner that replaces it; the cheat sheet is just the before/after table memorised cold. Once you can recite "the old way was FOR JSON PATH + STUFF, the new way is JSON_OBJECT," every syntax interview question becomes a translation drill.
The before / after matrix.
| Task | Pre-2025 idiom | 2025 idiom |
|---|---|---|
| Build a JSON object literal | (SELECT id, name FOR JSON PATH, WITHOUT_ARRAY_WRAPPER) |
JSON_OBJECT('id': id, 'name': name) |
| Aggregate to a JSON array | '[' + STRING_AGG(name, ',') + ']' |
JSON_ARRAYAGG(name) |
| Aggregate to a JSON object map | hand-rolled STRING_AGG
|
JSON_OBJECTAGG(key VALUE value) |
| Regex match |
LIKE '%pat%' + CLR UDF |
REGEXP_LIKE(col, '^A.*Z$', 'i') |
| Regex replace | CLR UDF or REPLACE chains |
REGEXP_REPLACE(col, '\\s+', ' ') |
| Pick max of N expressions | nested CASE WHEN a >= b AND a >= c ...
|
GREATEST(a, b, c) |
| Pick min of N expressions | nested CASE | LEAST(a, b, c) |
| Split with position |
STRING_SPLIT(s, ',') then row_number |
STRING_SPLIT(s, ',', 1) with enable_ordinal=1
|
| Truncate to interval | DATEADD(day, DATEDIFF(day, 0, dt), 0) |
DATETRUNC(day, dt) (now GA) |
| Shred JSON with indexed columns |
OPENJSON + manual WITH(...) table |
OPENJSON ... WITH(... PATH '$.x') + indexed columns |
JSON builders in detail.
-
JSON_OBJECT('key': value, ...)— comma-separated key/value pairs; keys are string literals; values can be any expression. Returns ajsonvalue. -
JSON_ARRAY(value, value, ...)— positional values, returns a JSON array. -
JSON_OBJECTAGG(key VALUE value)— aggregate; produces a JSON object from grouped rows. Key uniqueness is the caller's responsibility — duplicate keys are merged by last-write-wins. -
JSON_ARRAYAGG(value)— aggregate; produces a JSON array from grouped rows inORDER BYorder if specified.
Regex family in detail.
-
REGEXP_LIKE(string, pattern, flags)— boolean match. Flags:'i'case-insensitive,'s'dotall,'m'multiline,'x'extended whitespace,'c'case-sensitive (default). -
REGEXP_REPLACE(string, pattern, replacement, position, occurrence, flags)— replace all (or nth) occurrences. Replacement supports back-references like\\1. -
REGEXP_SUBSTR(string, pattern, position, occurrence, flags, subexpression)— extract a matching substring. -
REGEXP_COUNT(string, pattern, position, flags)— count occurrences. -
REGEXP_INSTR(string, pattern, position, occurrence, return_option, flags, subexpression)— position of nth match.
Other syntax wins.
-
GREATEST(a, b, c, ...)andLEAST(a, b, c, ...)— ANSI, accept N arguments, propagate NULL only when every argument is NULL by default (configurable in some dialects viaIGNORE NULLS). Use them for "most recent activity across columns" or "smallest non-NULL price" idioms. -
STRING_SPLIT(string, separator, enable_ordinal)— the third argument (0 / 1) makes the function return anordinalcolumn. Replaces the row_number trick of usingOVER (ORDER BY (SELECT NULL)). -
DATETRUNC(datepart, date)— was preview in 2022, now GA in 2025. Replaces theDATEADD(day, DATEDIFF(day, 0, x), 0)idiom for date / hour / minute truncation.
Worked example — JSON build / aggregate / shred end-to-end
Detailed explanation. A reporting team needs three things from the orders table: a JSON object per row, a JSON array per region, and the same JSON array re-shredded back into a tabular result. Each step is a one-liner in 2025 — every step was a multi-line workaround in 2022.
Question. Given the orders table, (a) build a per-row JSON payload with JSON_OBJECT, (b) aggregate it per region with JSON_ARRAYAGG, and (c) shred the aggregated JSON back to a table with the new OPENJSON ... WITH (... PATH ...). Show the queries side by side.
Input.
| order_id | region | total |
|---|---|---|
| 1 | EU | 100 |
| 2 | EU | 50 |
| 3 | US | 200 |
| 4 | US | 30 |
Code.
-- (a) Build JSON per row
SELECT order_id,
JSON_OBJECT('id': order_id, 'region': region, 'total': total) AS payload
FROM orders;
-- (b) Aggregate JSON per region
SELECT region,
JSON_ARRAYAGG(JSON_OBJECT('id': order_id, 'total': total)) AS region_orders
FROM orders
GROUP BY region;
-- (c) Shred a JSON array back to columns
SELECT j.order_id, j.total
FROM (
SELECT JSON_ARRAYAGG(JSON_OBJECT('id': order_id, 'total': total)) AS arr
FROM orders
WHERE region = 'EU'
) AS src
CROSS APPLY OPENJSON(src.arr)
WITH (order_id INT '$.id',
total INT '$.total') AS j;
Step-by-step explanation.
- The per-row builder calls
JSON_OBJECTonce per row. Each call takes two pairs ('id'and'total') and returns ajsonvalue — no string concat, no manual quoting, no escape pain. - The aggregator nests
JSON_OBJECTinsideJSON_ARRAYAGG— for eachregiongroup, the engine collects every row's JSON object and emits a JSON array. - The shredder uses the new
OPENJSON ... WITH (... PATH ...)form. TheWITHclause names the output columns and their JSON paths; the parser produces a relational table that you can JOIN, GROUP, or write back to disk. - The round trip — build → aggregate → shred — is the canonical pattern for "denormalise into JSON for the wire, normalise back for analytics." In 2025 it is three small queries; pre-2025 it was three pages.
Output (a).
| order_id | payload |
|---|---|
| 1 | {"id":1,"region":"EU","total":100} |
| 2 | {"id":2,"region":"EU","total":50} |
| 3 | {"id":3,"region":"US","total":200} |
| 4 | {"id":4,"region":"US","total":30} |
Output (b).
| region | region_orders |
|---|---|
| EU | [{"id":1,"total":100},{"id":2,"total":50}] |
| US | [{"id":3,"total":200},{"id":4,"total":30}] |
Output (c).
| order_id | total |
|---|---|
| 1 | 100 |
| 2 | 50 |
Rule of thumb. When the team starts a new integration that exchanges JSON, the default toolkit is JSON_OBJECT (build), JSON_ARRAYAGG (aggregate), OPENJSON ... WITH (... PATH ...) (shred). Old patterns work, but new code should use the new builders — code-review will catch every regression.
Worked example — regex validation and extraction in one query
Detailed explanation. Validate an email column with REGEXP_LIKE, then extract the domain with REGEXP_SUBSTR, then count rows per domain. Pre-2025 this took a CLR UDF deploy and a privileged sysadmin to enable CLR; in 2025 it is three function calls.
Question. Given the users table, return one row per email with: (1) a boolean is_valid_email, (2) the extracted domain, (3) the count of users per domain.
Input.
| user_id | |
|---|---|
| 1 | alice@example.com |
| 2 | bob@example.com |
| 3 | carol@x.io |
| 4 | not-an-email |
Code.
WITH parsed AS (
SELECT user_id,
email,
REGEXP_LIKE(email, '^[A-Z0-9._%+-]+@[A-Z0-9.-]+\\.[A-Z]{2,}$', 'i') AS is_valid_email,
REGEXP_SUBSTR(email, '@(.+)$', 1, 1, 'i', 1) AS domain
FROM users
)
SELECT p.email,
p.is_valid_email,
p.domain,
COUNT(*) OVER (PARTITION BY p.domain) AS users_per_domain
FROM parsed p;
Step-by-step explanation.
-
REGEXP_LIKEreturns1(TRUE) or0(FALSE) — boolean over the email pattern. The'i'flag makes it case-insensitive. Row 4 (not-an-email) returns FALSE because there is no@and no top-level domain. -
REGEXP_SUBSTRextracts the capture group after@. The 6th argument (subexpression = 1) tells it to return the first parenthesised capture rather than the full match. For row 4, no match → returns NULL. - The CTE materialises the parsed columns once; the outer SELECT adds a window count per domain. The output mixes valid and invalid rows with the appropriate domain.
- Pre-2025 this required either a CLR UDF (deploy, sign, mark assembly trustworthy) or a brittle chain of
CHARINDEX+SUBSTRING. In 2025 it is one regex call per column.
Output.
| is_valid_email | domain | users_per_domain | |
|---|---|---|---|
| alice@example.com | 1 | example.com | 2 |
| bob@example.com | 1 | example.com | 2 |
| carol@x.io | 1 | x.io | 1 |
| not-an-email | 0 | NULL | (NULL group: 1) |
Rule of thumb. When the team is validating a column with LIKE patterns longer than five characters, that is a code smell — switch to REGEXP_LIKE with a single regex source-of-truth, and store the regex itself in a documented constant or table.
SQL Server 2025 interview question on T-SQL syntax modernisation
A senior interviewer often frames this as: "Take a 2019-era stored procedure that builds a JSON payload, picks the most recent timestamp across three columns, and validates an email — and rewrite it using SQL Server 2025 syntax." The interviewer is testing whether you can name the right new function in each of the three places.
Solution Using JSON_OBJECT, GREATEST, and REGEXP_LIKE
-- Pre-2025 version
CREATE OR ALTER PROCEDURE dbo.user_payload_legacy
@user_id INT
AS
BEGIN
SELECT
(SELECT u.user_id, u.name
FROM users u WHERE u.user_id = @user_id
FOR JSON PATH, WITHOUT_ARRAY_WRAPPER) AS payload,
(SELECT CASE
WHEN u.web_last >= u.mobile_last
AND u.web_last >= u.api_last THEN u.web_last
WHEN u.mobile_last >= u.api_last THEN u.mobile_last
ELSE u.api_last END
FROM users u WHERE u.user_id = @user_id) AS last_seen,
(SELECT CASE WHEN u.email LIKE '%_@_%._%'
AND CHARINDEX('@', u.email) > 1
THEN 1 ELSE 0 END
FROM users u WHERE u.user_id = @user_id) AS is_valid_email;
END;
-- 2025 version
CREATE OR ALTER PROCEDURE dbo.user_payload_v2025
@user_id INT
AS
BEGIN
SELECT
JSON_OBJECT('id': u.user_id, 'name': u.name) AS payload,
GREATEST(u.web_last, u.mobile_last, u.api_last) AS last_seen,
REGEXP_LIKE(u.email,
'^[A-Z0-9._%+-]+@[A-Z0-9.-]+\\.[A-Z]{2,}$', 'i') AS is_valid_email
FROM users u
WHERE u.user_id = @user_id;
END;
Step-by-step trace.
| Concern | Pre-2025 lines | 2025 lines | Shrink |
|---|---|---|---|
| JSON payload | 3 (subquery + FOR JSON PATH) |
1 | 67% |
| Max across 3 columns | 4 (nested CASE) | 1 (GREATEST) |
75% |
| Email validation | 3 (LIKE + CHARINDEX) | 1 (REGEXP_LIKE) |
67% |
The legacy procedure leaned on three separate subqueries because each "build / compute / validate" idiom needed its own scope. The 2025 procedure is a single SELECT — every primitive now lives at the column level.
Output: (example for @user_id = 1)
| payload | last_seen | is_valid_email |
|---|---|---|
{"id":1,"name":"Alice"} |
2026-05-12 09:30:00 | 1 |
Why this works — concept by concept:
-
JSON_OBJECT lifts the builder out of a subquery — the pre-2025 idiom needed a correlated subquery because
FOR JSON PATHonly attaches to aSELECT. The 2025 form is a scalar function call that lives next to any other column expression. -
GREATEST is N-arg — the nested CASE chain becomes a single function call. The trade-off worth naming: NULL propagation —
GREATEST(x, NULL, y)behaves per the dialect's NULL rule (SQL Server 2025 propagates NULL by default unless you opt intoIGNORE NULLS). -
REGEXP_LIKE collapses the validation — replaces the brittle pair
LIKE '%_@_%._%' AND CHARINDEX(...)with one expression. The regex is also portable to Oracle / Postgres / Snowflake interview environments. -
One SELECT per row — the 2025 procedure scans
usersonce for the user_id; the pre-2025 procedure scanned it three times. The plan shrinks proportionally. - Cost — three scalar function calls vs three correlated subqueries; the optimiser collapses the new form to a single index seek + three scalar ops. The legacy form had three seeks and a Compute Scalar nest.
SQL
Topic — case expression
CASE expression problems (SQL)
4. Performance — optional plan optimization, parallel plans, IQP wave 4
Optional parameter plan optimization plus IQP wave 4 means the plan you saw yesterday may not be the plan you see today — and that is the feature, not the bug
The mental model in one line: 2025's performance posture is reactive — the optimizer caches multiple plans per parameter shape, feeds memory grant statistics back into future grants, switches join types at runtime, and auto-tunes parallelism across runs; the candidate who reads plans now needs to read them at least twice to catch the feedback behaviour. The interview probes converge on six features: OPO, adaptive joins, percentile memory grant feedback, batch mode on rowstore, optimized locking, and DOP feedback.
The six headline performance changes.
-
Optional Parameter Plan Optimization (OPO).
OPTION (PARAMETER SENSITIVE PLAN)hint + a database-scoped option. The optimizer recognises that the same query text has very different row counts for different parameter values and caches a small set of plans (typically up to three buckets per parameter), routing each execution to the right plan based on parameter statistics. - Adaptive joins everywhere. Previously batch-mode-only; in wave 4 the runtime can switch between hash join and nested-loop join based on the actual row count observed at runtime, not just the estimate from statistics. Visible in plans as the "Adaptive Join" operator.
- Percentile memory grant feedback. Memory grant feedback was introduced earlier but used average behaviour; the 2025 percentile version reduces tempdb spills on workloads with skewed data sizes by tracking p50 / p95 grants per query.
- Batch mode on rowstore. Vectorised batch-mode operators on plain rowstore indexes — no columnstore required. Analytical queries on OLTP tables get a 2-5x improvement without changing the schema.
-
Optimized locking. TID (transaction ID) locks and lock-after-qualification. Reduces blocking on hot rows in OLTP workloads — fewer "lock waits" in
sys.dm_exec_requests. - DOP feedback. Degree of parallelism auto-tunes across runs. If a parallel plan with DOP 16 wasted threads, the optimizer reduces the DOP on the next execution and remembers across recompiles.
The cardinality estimator update.
- The new CE model at compat level 170 picks up where the 2014 / 2017 / 2019 models left off. It is not the legacy 70-CE; it is an evolution of the modern model. Some predicates change estimates by 10-30% — enough to flip plan shapes, especially for joins.
- The
CE_FEEDBACKhint (and database-scoped config) lets the optimizer remember actual row counts and feed them into future cardinality estimates. The result is a slow but steady reduction in "the plan was based on stale stats" regressions.
Reading 2025 plans — what to look for.
-
PARAMETER_SENSITIVE_PLANinOPTION (...)or in plan XML as a plan attribute → OPO active. - Adaptive Join operator → runtime hash/NL switch.
-
MemoryGrantInfo node with
IsMemoryGrantFeedbackAdjusted="true"→ feedback applied. - BatchMode attribute on rowstore scans → batch mode on rowstore.
-
Optimized locking is plumbing — its signal is in waits (
LCK_*waits trending down) and insys.dm_tran_locks(TID locks visible). - DOP feedback shows up as a lower-than-MAXDOP actual degree of parallelism on subsequent runs of the same query.
Common interview probes on performance.
- "What is parameter sniffing and how does SQL Server 2025 fix it?" — sniffing is "the optimizer compiles one plan for the first parameter value seen and re-uses it for everyone." 2025 fixes it with OPO — multiple cached plans per parameter shape.
- "What is the difference between adaptive joins in batch mode (2019) and adaptive joins in 2025?" — 2025 extends them to rowstore plans and to nested-loop ↔ hash transitions in either direction.
- "When would batch mode on rowstore not help?" — when the query is point-lookup-shaped (a few rows), not analytical (many rows aggregated). Batch mode is per-batch vectorisation; tiny result sets don't benefit.
- "What is the cost of optimized locking?" — a small CPU overhead on the lock manager; payback is "fewer blocking incidents in OLTP." Usually a net win.
Worked example — diagnose a parameter sniffing regression and fix it with OPO
Detailed explanation. A team observes a stored procedure that runs in 10 ms for the EU region and in 30 seconds for the US region. The plan is the same. The cause is parameter sniffing: the optimizer compiled the plan for whichever region was passed first and re-uses it. OPO fixes this by caching multiple plans.
Question. Given the procedure below, explain why the runtime varies wildly, then rewrite it to use OPO. Show what changes in the plan cache.
Input.
| Region | Row count in orders
|
|---|---|
| EU | 10,000 |
| US | 50,000,000 |
Code.
-- Pre-2025 - one plan for all parameter values
CREATE OR ALTER PROCEDURE dbo.get_orders_by_region
@region varchar(10)
AS
BEGIN
SELECT order_id, total
FROM orders
WHERE region = @region;
END;
-- 2025 - opt into Optional Parameter Plan Optimization
CREATE OR ALTER PROCEDURE dbo.get_orders_by_region
@region varchar(10)
AS
BEGIN
SELECT order_id, total
FROM orders
WHERE region = @region
OPTION (PARAMETER SENSITIVE PLAN);
END;
Step-by-step explanation.
- The legacy procedure compiles its plan on the first execution. If EU was first (10 K rows), the optimizer picked an index seek + key lookup — great for small result sets, terrible for the 50 M-row US case.
- The cached plan is re-used for every subsequent execution including the US call. The seek + key lookup pattern degenerates into 50 M key-lookup IOs — hence the 30-second runtime.
- The 2025 procedure adds
OPTION (PARAMETER SENSITIVE PLAN). On the first EU call, the optimizer caches a seek + lookup plan in the "small" bucket. On the first US call, it caches a scan + hash aggregate plan in the "large" bucket. Subsequent executions route to the right bucket based on parameter statistics. - The plan cache now holds two plans for the same query text. Watch the cache size after enabling OPO — on memory-tight servers, the trade-off is real.
Output.
| Region | Pre-2025 runtime | 2025 (OPO) runtime |
|---|---|---|
| EU | 10 ms | 10 ms |
| US | 30,000 ms | ~2,000 ms |
Rule of thumb. Add OPTION (PARAMETER SENSITIVE PLAN) to any query where the same parameter column has wildly different selectivities across values. The cost is plan-cache memory; the payback is "p95 latency stops spiking by 100x on skewed inputs."
Worked example — adaptive joins switch hash to nested loop at runtime
Detailed explanation. The optimizer's estimate said "100,000 rows" but the actual row count was "37 rows" because the upstream filter ended up much more selective than the statistics suggested. In 2019 the engine was stuck with the hash join it compiled; in 2025 the adaptive join operator switches to a nested loop at runtime when the actual row count is below a threshold.
Question. Given a query joining a filtered orders to customers, explain how the adaptive join chooses between hash and nested loop, and show how to verify it from the plan.
Input — orders (after filter).
| order_id | customer_id |
|---|---|
| 17 | 100 |
| 88 | 200 |
| 145 | 300 |
| (32 more rows) | ... |
Input — customers. (1 M rows; primary key on customer_id)
Code.
SELECT o.order_id, o.total, c.name
FROM orders o
JOIN customers c
ON c.customer_id = o.customer_id
WHERE o.placed_at >= '2026-05-01'
AND o.region = 'EU'
AND o.amount > 1000;
-- No special hint — adaptive join is the default plan shape at compat 170
Step-by-step explanation.
- The optimizer estimates the filtered
ordersrowcount at 100,000 — based on stats. It picks a hash join because the expected outer is too large for nested loops. - At runtime, the actual filtered rowcount is 37. The adaptive join operator measures the build-side row count and compares against an "adaptive threshold" computed at compile time.
- The actual count is below the threshold → the operator switches to a nested loop, doing 37 seeks against the
customersPK index instead of building a 1 M-row hash table. - The Showplan XML records
EstimatedJoinType="Hash"andActualJoinType="NestedLoop". SSMS displays both. If you see the same query executing with different actual join types across runs, that is adaptive joins working.
Output (plan attributes — illustrative).
| Attribute | Value |
|---|---|
EstimatedJoinType |
Hash |
ActualJoinType |
NestedLoop |
AdaptiveThresholdRows |
1,250 |
ActualRows |
37 |
Rule of thumb. Don't add OPTION (LOOP JOIN) or OPTION (HASH JOIN) hints reflexively in 2025 — they disable the adaptive join. Read the plan first; only force a join type if the adaptive switch is consistently wrong (rare).
SQL Server 2025 interview question on the performance regression playbook
A senior interviewer often frames this as: "Your team upgrades to SQL Server 2025 at compat level 170 and three procedures regress. Walk me through your debug playbook — what do you measure, what do you look at in the plan, and how do you decide between OPO, recompile, and a query rewrite?"
Solution Using a baseline + Query Store + targeted hints playbook
-- 1) Capture baseline before upgrade (Query Store ON)
ALTER DATABASE current
SET QUERY_STORE = ON (
OPERATION_MODE = READ_WRITE,
QUERY_CAPTURE_MODE = AUTO,
INTERVAL_LENGTH_MINUTES = 15
);
-- 2) After upgrade — identify regressions with the regressed-queries view
SELECT TOP (20)
q.query_id,
qt.query_sql_text,
AVG(rs.avg_duration) AS avg_duration_us
FROM sys.query_store_query q
JOIN sys.query_store_query_text qt ON qt.query_text_id = q.query_text_id
JOIN sys.query_store_plan p ON p.query_id = q.query_id
JOIN sys.query_store_runtime_stats rs ON rs.plan_id = p.plan_id
WHERE rs.last_execution_time >= DATEADD(day, -1, SYSUTCDATETIME())
GROUP BY q.query_id, qt.query_sql_text
ORDER BY avg_duration_us DESC;
-- 3) Force pre-2025 plan if the regression is plan-shape
EXEC sys.sp_query_store_force_plan @query_id = 42, @plan_id = 17;
-- 4) Add OPO where the regression is parameter sniffing
ALTER PROCEDURE dbo.get_orders_by_region
@region varchar(10)
AS
BEGIN
SELECT order_id, total
FROM orders WHERE region = @region
OPTION (PARAMETER SENSITIVE PLAN);
END;
Step-by-step trace.
| Step | Tool | Decision |
|---|---|---|
| 1 | Query Store ON before upgrade | baseline plans + runtimes captured |
| 2 |
sys.query_store_* views |
rank regressed queries by avg duration |
| 3 | sp_query_store_force_plan |
pin pre-2025 plan if shape is the problem |
| 4 | OPTION (PARAMETER SENSITIVE PLAN) |
enable OPO if sniffing is the problem |
| 5 | Targeted rewrite | only if hints + Query Store don't fix it |
The playbook never starts with a rewrite — it starts with measurement. Force-plan is the cheapest fix; OPO is the next cheapest; a rewrite is the last resort because it requires re-testing.
Output:
| Regression cause | First fix | Cost |
|---|---|---|
| Plan shape changed (new CE) | force pre-2025 plan in Query Store | seconds |
| Parameter sniffing | add OPTION (PARAMETER SENSITIVE PLAN)
|
minutes |
| Stale stats | UPDATE STATISTICS … WITH FULLSCAN |
minutes |
| Workload shift | targeted rewrite | hours/days |
Why this works — concept by concept:
- Baseline before upgrade — Query Store's "before" snapshot is the only artefact that lets you measurably say "this query regressed." Without it, every claim is anecdotal.
- Force-plan is reversible — pinning a pre-2025 plan is a targeted compat-level rollback for one query. It does not bring back the old CE for the whole database; it only fixes the one query that misbehaves.
- OPO is the right fix for sniffing-shaped regressions — if the query's duration varies wildly across parameter values, the cause is sniffing and the fix is OPO. If the duration is flat-but-slow, the cause is plan shape and the fix is force-plan or a rewrite.
- Rewrite is the last resort — rewrites require re-testing, re-deployment, and downstream coordination. Hints and Query Store actions are reversible inside the database; rewrites cross the team boundary.
-
Cost — Query Store overhead is typically 1-3% on CPU;
sp_query_store_force_planis O(1); OPO costs additional plan-cache memory proportional to the parameter buckets it caches.
SQL
Topic — window functions
Window functions problems (SQL)
5. AI + security features — vector search, secure enclaves, Copilot, change feed
Vector data type + secure enclaves + change event streaming + Copilot in SSMS = the engine becomes an AI surface — but the trust rule is unchanged
The mental model in one line: the four AI + security flagships — vector data type, secure enclaves, change event streaming, Copilot in SSMS — share the same headline rule: the database is the new AI surface, and every AI answer still needs human review before it hits production. Interviewers reward candidates who pair the feature description with the operational guardrail.
Vector data type and VECTOR_DISTANCE.
-
Storage.
vector(n)stores a fixed-dimension floating-point array. Common dimensions are 384, 768, 1024, 1536, 3072 — match your embedding model. -
Similarity.
VECTOR_DISTANCE(a, b, 'cosine' | 'euclidean' | 'dot')returns a scalar distance. Cosine is the most common for text embeddings; dot is appropriate when vectors are normalised. -
Hybrid search. Combine
WHEREfilters (includingREGEXP_LIKE) withORDER BY VECTOR_DISTANCE. The optimizer plans both — filter first when the predicate is selective, similarity-sort first when the filter is broad. - Indexes. Approximate-nearest-neighbour (ANN) indexes for vector columns are the kNN-at-scale story. Their availability varies by SKU and platform — check the parity table on the doc page when you defend your design in an interview.
Secure enclaves with Always Encrypted.
- Always Encrypted (the 2016 feature). Column-level encryption with client-side keys; the server never sees plaintext. Trade-off: only equality and prefix queries on encrypted columns.
-
Secure enclaves (the 2019+ evolution). A trusted execution environment (Intel SGX, virtualisation-based security on Azure) inside
sqlservr.exethat can decrypt and operate on the data inside the enclave, allowing range queries, pattern matches, and even some joins on encrypted columns. - Attestation. The client verifies the enclave's identity before sending the column-encryption key. The attestation service is the operational piece that often distinguishes "I've configured this" from "I've read the doc."
Change Event Streaming.
-
Outbox per table.
ALTER TABLE orders ENABLE CHANGE_EVENT_STREAMING(illustrative syntax — feature switch on at compat 170) registers an append-only event log that captures INSERT / UPDATE / DELETE events with the full or projected payload. - Downstream sinks. Built-in connectors for Kafka, Azure Event Hubs, and Service Bus. Some shops still consume via a custom poll; the engine ships the canonical pattern.
- vs Change Data Capture (CDC). CDC keeps state in shadow tables and polls; Change Event Streaming pushes — lower latency, simpler downstream code.
- vs Service Broker. Service Broker is a full asynchronous messaging system inside the database; Change Event Streaming is a focused outbox. For new projects, Change Event Streaming is the default; Service Broker remains for legacy workloads.
Copilot in SSMS and Azure Data Studio.
-
Schema-aware completion. Copilot reads the schema (subject to permissions) and proposes T-SQL. Natural-language prompts ("average revenue per region last quarter") return a SELECT that you can
Show SQL, review, and execute manually. - Trust caveat. Never run an unreviewed Copilot answer on a production schema. The interview answer is always "Show SQL, review, then run."
- Audit. Every Copilot-generated query that ends up in source control needs the same review gate as any other code change. The convenience does not change the audit story.
Azure SQL parity table — what ships first.
- Vector type. Azure SQL DB first → box product GA same release → managed instance one quarter later (typical pattern).
- Secure enclaves. Available on both box product and Azure; attestation services differ.
- Copilot. SSMS + ADS first; Copilot inside the database (not just the tool) is the next frontier.
- Change Event Streaming. Both, behind a feature switch — read the SKU matrix on the doc page when defending architecture choices.
Worked example — hybrid retrieval with REGEXP_LIKE filter + VECTOR_DISTANCE ranking
Detailed explanation. A documentation site wants to retrieve articles about "SQL Server" that are most similar to a user's query. The team stores per-article embeddings as vector(1536). The right shape is "filter by topic with REGEXP_LIKE, rank by similarity with VECTOR_DISTANCE, return top K."
Question. Given the documents table with columns (doc_id, title, body, embedding vector(1536)), write a hybrid retrieval query that (a) keeps only documents whose title matches a topic regex and (b) ranks the survivors by cosine similarity to a query vector.
Input.
| doc_id | title | embedding |
|---|---|---|
| 1 | SQL Server 2025 overview | (1536-d vector) |
| 2 | Postgres performance tips | (1536-d vector) |
| 3 | SQL Server vector type | (1536-d vector) |
| 4 | BigQuery cost control | (1536-d vector) |
Code.
DECLARE @q vector(1536) = /* embedding produced by your model */;
SELECT TOP (10) doc_id, title,
VECTOR_DISTANCE(embedding, @q, 'cosine') AS distance
FROM documents
WHERE REGEXP_LIKE(title, 'sql server', 'i')
ORDER BY VECTOR_DISTANCE(embedding, @q, 'cosine') ASC;
Step-by-step explanation.
- The
WHERE REGEXP_LIKE(title, 'sql server', 'i')predicate filters titles case-insensitively for the substring "sql server" — narrows the candidate set from millions to (in this example) two rows:doc_id=1anddoc_id=3. - The optimiser evaluates the predicate first because it is selective. The remaining rows enter the sort step.
-
VECTOR_DISTANCE(embedding, @q, 'cosine')computes the cosine distance between each row's embedding and the query vector. Smaller distance = more similar. -
ORDER BY ... ASCputs the smallest distance first;TOP (10)keeps the top K. Even though only 2 rows survive the filter, the query gracefully returns 2.
Output (illustrative).
| doc_id | title | distance |
|---|---|---|
| 3 | SQL Server vector type | 0.082 |
| 1 | SQL Server 2025 overview | 0.157 |
Rule of thumb. For hybrid retrieval, always filter first when the predicate is selective; let VECTOR_DISTANCE handle the ranking on a smaller set. When the filter is broad (most rows survive), an ANN index becomes the deciding factor — that is where the SKU matrix matters.
Worked example — outbox pattern with Change Event Streaming
Detailed explanation. A microservice publishes "order placed" events to Kafka. Pre-2025 the team rolled their own outbox: a trigger writes to an outbox table, a polling job reads and publishes. In 2025 Change Event Streaming registers the outbox at the engine level and ships events directly to a downstream sink.
Question. Show the legacy outbox trigger + polling pattern, then show the 2025 Change Event Streaming equivalent. Explain the operational differences.
Input. orders table — INSERTs trigger downstream events.
Code.
-- Pre-2025: trigger + outbox table + poll
CREATE TABLE outbox_orders (
event_id BIGINT IDENTITY PRIMARY KEY,
order_id INT NOT NULL,
event_type VARCHAR(20) NOT NULL,
payload NVARCHAR(MAX) NOT NULL,
created_at DATETIME2 NOT NULL DEFAULT SYSUTCDATETIME(),
published_at DATETIME2 NULL
);
CREATE OR ALTER TRIGGER trg_orders_insert
ON orders AFTER INSERT
AS
BEGIN
INSERT INTO outbox_orders (order_id, event_type, payload)
SELECT order_id, 'order.placed',
(SELECT * FROM inserted i2 WHERE i2.order_id = i.order_id
FOR JSON PATH, WITHOUT_ARRAY_WRAPPER)
FROM inserted i;
END;
-- A separate poller reads unpublished rows and ships them to Kafka.
-- 2025: Change Event Streaming at the engine level
ALTER TABLE orders
ENABLE CHANGE_EVENT_STREAMING
WITH (
SINK = 'kafka://broker:9092/orders.events',
CAPTURE = INSERT,
PAYLOAD = ALL_COLUMNS_JSON
);
-- (illustrative; exact syntax per release notes)
Step-by-step explanation.
- The legacy pattern needs a trigger + an outbox table + a polling job. Three moving parts, three failure surfaces, and the polling job's latency floor is the polling interval.
- The 2025 pattern is a single
ALTER TABLE. The engine maintains the outbox internally; the sink connector ships rows to Kafka as they commit. - Downstream consumers see lower end-to-end latency (no poll interval), and the operations team has one fewer service to monitor.
- Trade-off: backpressure is now the engine's problem. If the sink is unavailable, events queue inside the engine — capacity planning matters.
Output (operational).
| Aspect | Legacy outbox | Change Event Streaming |
|---|---|---|
| Moving parts | trigger + table + poller | one ALTER TABLE |
| End-to-end latency | poll interval (seconds) | per-commit |
| Downstream tail | poller crash blocks | sink connector backpressure |
| Audit | trigger + table | engine-level event log |
Rule of thumb. For new outbox topologies in 2025, default to Change Event Streaming. Keep Service Broker only for in-database message routing (which Change Event Streaming does not replace). Validate sink connector behaviour under broker outage before going to production.
SQL Server 2025 interview question on AI + security adoption
A senior interviewer often frames this as: "Your team wants to add a 'semantic search across documents' feature to the existing OLTP database. You have an embedding model already. Walk me through the design — schema, query shape, security and audit story — using SQL Server 2025 primitives."
Solution Using vector storage + VECTOR_DISTANCE retrieval + secure enclaves for confidential payloads
-- 1) Schema — embedding lives next to the document
CREATE TABLE documents (
doc_id BIGINT IDENTITY PRIMARY KEY,
title NVARCHAR(400) NOT NULL,
body NVARCHAR(MAX) NOT NULL,
embedding vector(1536) NOT NULL,
created_at DATETIME2 NOT NULL DEFAULT SYSUTCDATETIME()
);
-- (Optional) confidential PII column behind Always Encrypted + enclave
ALTER TABLE documents
ADD owner_email NVARCHAR(320)
COLLATE Latin1_General_BIN2
ENCRYPTED WITH (
COLUMN_ENCRYPTION_KEY = CEK_Auto1,
ENCRYPTION_TYPE = Randomized,
ALGORITHM = 'AEAD_AES_256_CBC_HMAC_SHA_256'
);
-- 2) Retrieval — filter then rank
DECLARE @q vector(1536) = /* query embedding */;
SELECT TOP (10) doc_id, title,
VECTOR_DISTANCE(embedding, @q, 'cosine') AS distance
FROM documents
WHERE REGEXP_LIKE(title, 'sql', 'i') -- topic filter
ORDER BY VECTOR_DISTANCE(embedding, @q, 'cosine') ASC;
-- 3) Audit — log every semantic query for review
INSERT INTO semantic_query_log (asked_by, query_text, embedding_hash, result_doc_ids)
VALUES (SUSER_SNAME(), @prompt_text, HASHBYTES('SHA2_256', CAST(@q AS varbinary(max))), @result_ids);
Step-by-step trace.
| Step | What it does | Why it matters |
|---|---|---|
| 1 schema | embedding stored next to the doc | one query reads both |
| 1 enclave | sensitive column behind Always Encrypted | rich queries on encrypted PII |
| 2 retrieval | REGEXP_LIKE filter + VECTOR_DISTANCE rank | hybrid search inside the engine |
| 3 audit | log prompt + result IDs | review surface for AI output |
The schema groups what travels together; the retrieval is a single query; the audit table makes the AI surface reviewable. No data leaves the engine, and every answer is logged for human review.
Output: (illustrative)
| doc_id | title | distance |
|---|---|---|
| 17 | SQL Server 2025 overview | 0.071 |
| 22 | SQL Server vector type | 0.094 |
| 9 | SQL Server tuning notes | 0.112 |
Why this works — concept by concept:
- Embedding-next-to-document — co-locating the embedding with the source row eliminates round trips to a separate vector store. The trade-off is that the OLTP table grows by the vector size per row; budget storage accordingly.
-
Filter-then-rank —
REGEXP_LIKEis a cheap selective filter;VECTOR_DISTANCEis the more expensive scoring function. Pushing the cheap filter first keeps the scoring workload small. - Secure enclaves for sensitive payloads — Always Encrypted with secure enclaves allows pattern-matching and range queries on encrypted columns without exposing plaintext on the wire or in the buffer pool.
- Audit log is non-negotiable — every AI surface needs a "what was asked, what was answered" log. The audit table is also the data set you use to evaluate retrieval quality over time.
- Trust rule — Copilot output, semantic retrieval output, and any "AI answer" are suggestions. The human reviews before production action. The interview answer never skips this line.
-
Cost — vector storage is O(rows × dim);
VECTOR_DISTANCEis O(filtered_rows × dim) without an ANN index, O(log rows × dim) with one.REGEXP_LIKEis O(filtered_rows × pattern_length) and is cheap relative to the similarity scoring.
SQL
Topic — joins
JOIN problems (SQL)
Cheat sheet — SQL Server 2025 recipes
-
JSON build per row.
SELECT JSON_OBJECT('id': id, 'name': name) FROM users— replacesFOR JSON PATH, WITHOUT_ARRAY_WRAPPER. UseJSON_ARRAY(a, b, c)for positional arrays. -
JSON aggregate.
SELECT dept, JSON_ARRAYAGG(name) FROM emp GROUP BY dept— replaces'[' + STRING_AGG(name, ',') + ']'. UseJSON_OBJECTAGG(key VALUE value)for grouped key/value maps. -
Regex filter.
WHERE REGEXP_LIKE(email, '^[A-Z0-9._%+-]+@[A-Z0-9.-]+\\.[A-Z]{2,}$', 'i')— replaces brittleLIKEpatterns and CLR UDFs. -
Regex extract.
REGEXP_SUBSTR(s, '@(.+)$', 1, 1, 'i', 1)— extract the first capture group; replacesSUBSTRING + CHARINDEXchains. -
GREATEST / LEAST.
SELECT GREATEST(a, b, c), LEAST(a, b, c)— replaces nestedCASE WHENfor max / min across columns; watch NULL propagation. -
STRING_SPLIT with ordinal.
SELECT value, ordinal FROM STRING_SPLIT(@csv, ',', 1)— replaces row_number tricks; the third argument enables the ordinal column. -
DATETRUNC (GA).
SELECT DATETRUNC(day, event_at), COUNT(*) FROM events GROUP BY DATETRUNC(day, event_at)— replacesDATEADD(day, DATEDIFF(day, 0, event_at), 0). -
Vector similarity (Top K).
ORDER BY VECTOR_DISTANCE(embedding, @q, 'cosine') OFFSET 0 ROWS FETCH NEXT 10 ROWS ONLY— kNN-style retrieval. -
OPO hint.
OPTION (PARAMETER SENSITIVE PLAN)— opt into multi-plan caching per parameter shape; fixes parameter sniffing without the recompile tax. -
Change feed enable.
ALTER TABLE orders ENABLE CHANGE_EVENT_STREAMING(per release-notes syntax) — replaces trigger + outbox + poller for new outbox topologies. -
OPENJSON with path.
OPENJSON(@json) WITH (id INT '$.id', name NVARCHAR(100) '$.name')— preferred shred form; supports indexed columns and plan integration. -
NULL-safe maximum across columns.
GREATEST(ISNULL(a, -1e9), ISNULL(b, -1e9), ISNULL(c, -1e9))— explicit sentinel when you want NULL to lose. OrCOALESCE(GREATEST(a, b, c), 0)when "no signal → 0" is the contract. -
Force pre-2025 plan.
EXEC sys.sp_query_store_force_plan @query_id = ?, @plan_id = ?— targeted compat-level rollback for one query during upgrade.
Frequently asked questions
What are the biggest new T-SQL features in SQL Server 2025?
The four headline T-SQL additions are JSON builders (JSON_OBJECT, JSON_ARRAY, JSON_OBJECTAGG, JSON_ARRAYAGG), ANSI regex functions (REGEXP_LIKE, REGEXP_REPLACE, REGEXP_SUBSTR, REGEXP_COUNT, REGEXP_INSTR), the vector data type with VECTOR_DISTANCE, and small-but-useful syntax wins like GREATEST / LEAST, STRING_SPLIT with enable_ordinal, and DATETRUNC reaching general availability. The first three are quadrant-defining; the fourth set retires a long backlog of nested-CASE and row-number workarounds. All ride on compatibility level 170 — upgraded databases need an explicit compat-level flip before the new behaviours are observable in production code.
How does the new vector data type compare to a dedicated vector database?
It removes a hop — embeddings live next to the source row instead of in a separate store, and VECTOR_DISTANCE runs the similarity inside the engine. For small- to medium-scale hybrid retrieval (millions of vectors) with rich relational filters, the integrated story is usually simpler. For billion-scale kNN with strict latency targets, a dedicated vector database with mature ANN indexes is still the right call. The middle ground is the new vector ANN index support in SQL Server 2025, which varies by SKU and platform — check the Azure SQL parity matrix and the Standard-vs-Enterprise table before defending a design choice.
What is Optional Plan Optimization and how does it fix parameter sniffing?
Optional Parameter Plan Optimization (OPO) is the optimizer feature behind the OPTION (PARAMETER SENSITIVE PLAN) hint and a database-scoped configuration. It tells the engine: "this query has parameter values that behave very differently — cache multiple plans, one per parameter-shape bucket, and route each execution to the right bucket." That fixes the classic parameter sniffing bug where a plan compiled for a tiny EU result set is re-used for a huge US one (or vice versa). The trade-off is plan-cache memory — each query that opts into OPO consumes more cache slots — so reserve OPO for queries where parameter selectivity actually varies, not as a blanket switch.
Can I use REGEXP_LIKE in older compatibility levels?
The regex family is gated by compatibility level 170 in current preview / GA. Below 170 you fall back to LIKE, PATINDEX, or a CLR UDF — the pre-2025 workarounds the regex family was designed to replace. The interview-quality answer is: "If we need REGEXP_LIKE on the analytic surface, we flip compat 170 on the analytics database first, replay the workload, and migrate the analytical procedures. The OLTP databases can stay on the previous compat level until we have the Query Store baseline and the regression playbook in place."
Does SQL Server 2025 replace Service Broker with Change Event Streaming?
For new outbox-style topologies — "I write to a table, downstream services consume the event" — Change Event Streaming is the canonical 2025 pattern. It is simpler, has a built-in sink for Kafka / Event Hubs / Service Bus, and removes the trigger-plus-poller scaffolding. Service Broker, however, is a full in-database asynchronous messaging system with conversations, dialogs, and routing — Change Event Streaming does not replicate that surface. Existing Service Broker workloads keep working; new projects almost always choose Change Event Streaming first.
How does Copilot in SSMS handle hallucinations on production schemas?
Copilot in SSMS is schema-aware (it reads the catalog you have access to) and is constrained to the database surface you are connected to, so the shape of the answers is usually right. But it can still produce syntactically valid SQL that does the wrong thing — wrong JOIN direction, wrong aggregate, wrong WHERE predicate — especially on ambiguous prompts. The trust rule is unchanged: every Copilot answer is a suggestion, every suggestion is reviewed via the Show SQL action, and no Copilot-authored query runs on a production database without the same audit gate as any other code change. The convenience is real; the trust story is the human in the loop.
Practice on PipeCode
- Drill the JSON practice library → for the
JSON_OBJECT/JSON_ARRAY/OPENJSONfamily of probes that 2025 hires get on day one. - Sharpen on window functions problems → for the analytical patterns batch-mode-on-rowstore now accelerates without a columnstore.
- Stack the aggregation library → for the group-by workloads where OPO regressions surface in interview whiteboarding.
- Layer the joins practice library → for adaptive-join scenarios and the LEFT-JOIN + COALESCE patterns that survive compat-level flips.
- Add the CASE expression problems → library to internalise the GREATEST / LEAST replacement story.
- Round it out with conditional logic drills → for the IF / IIF / IFF surface that interviewers cross-reference against the dialect matrix.
- For the broader surface, read top data engineering interview questions →.
- Stack the prerequisites with the only 5 skills you need to become a data engineer →.
- Sharpen the dialect axis with the SQL for data engineering interviews course →.
- For long-form schema craft, work through data modelling for DE interviews →.
Pipecode.ai is Leetcode for Data Engineering — every SQL Server 2025 recipe above ships with hands-on practice rooms where you write the `JSON_OBJECT` payload, the `REGEXP_LIKE` validation, the OPO-aware procedure, and the `vector` + `VECTOR_DISTANCE` hybrid retrieval against real graded inputs. PipeCode pairs every reading with 450+ DE-focused problems and a real-time scoring engine, so you walk into the SQL Server interview with the four 2025 quadrants memorised and the trade-offs already rehearsed.