SQL Interview Questions 2026 with Runnable Answers
Fifty SQL interview questions interviewers actually ask in 2026 — sorted by level, each paired with a runnable PostgreSQL answer and the rubric an interviewer scores you on. Whether you are a data analyst or an experienced developer prepping for a senior loop, you can copy any query into AI2SQL's demo database and confirm it before your phone screen.
Why SQL Interview Prep Matters in 2026
SQL remains the most requested technical skill for data analysts, backend engineers, data engineers, and even product managers. According to Stack Overflow's 2025 survey, SQL ranked as the third most-used programming language overall and the number one language for data roles.
In 2026, SQL interview questions have shifted. Interviewers care less about memorizing syntax and more about problem-solving: can you translate a business question into a working query? Can you optimize a slow query? Can you design a schema that scales?
This guide covers 50 questions across all difficulty levels. Each answer includes the reasoning behind it, not just the query. Read through once, then practice the queries in AI2SQL with the built-in demo database.
Basic SQL Interview Questions (1-15)
These questions test your understanding of SQL fundamentals. Every candidate should nail these.
1. What is the difference between WHERE and HAVING?
WHERE filters individual rows before any grouping happens. HAVING filters groups after GROUP BY has been applied.
-- WHERE: filter rows before grouping
SELECT department, COUNT(*) AS emp_count
FROM employees
WHERE salary > 50000
GROUP BY department;
-- HAVING: filter groups after aggregation
SELECT department, COUNT(*) AS emp_count
FROM employees
GROUP BY department
HAVING COUNT(*) > 10;
2. What is the difference between DELETE, TRUNCATE, and DROP?
DELETE removes specific rows and can be rolled back. TRUNCATE removes all rows instantly and resets auto-increment counters (cannot be rolled back in most databases). DROP removes the entire table structure and all data permanently.
-- DELETE: removes rows, can use WHERE, logged
DELETE FROM orders WHERE status = 'cancelled';
-- TRUNCATE: removes all rows, faster, minimal logging
TRUNCATE TABLE temp_logs;
-- DROP: removes the entire table
DROP TABLE IF EXISTS old_backups;
3. What are the different types of JOINs?
SQL has five main JOIN types:
- INNER JOIN returns only matching rows from both tables
- LEFT JOIN returns all rows from the left table plus matching rows from the right
- RIGHT JOIN returns all rows from the right table plus matching rows from the left
- FULL OUTER JOIN returns all rows from both tables
- CROSS JOIN returns every combination (cartesian product)
-- INNER JOIN
SELECT e.name, d.department_name
FROM employees e
INNER JOIN departments d ON e.dept_id = d.id;
-- LEFT JOIN: includes employees with no department
SELECT e.name, d.department_name
FROM employees e
LEFT JOIN departments d ON e.dept_id = d.id;
4. What is a PRIMARY KEY?
A primary key is a column (or combination of columns) that uniquely identifies each row in a table. It enforces two constraints: uniqueness (no duplicate values) and NOT NULL (cannot be empty). Each table can have only one primary key.
CREATE TABLE users (
id SERIAL PRIMARY KEY,
email VARCHAR(255) UNIQUE NOT NULL,
name VARCHAR(100) NOT NULL
);
5. What is the difference between UNION and UNION ALL?
UNION combines results from two queries and removes duplicate rows. UNION ALL combines results but keeps all duplicates. UNION ALL is faster because it skips the deduplication step.
-- UNION: removes duplicates
SELECT city FROM customers
UNION
SELECT city FROM suppliers;
-- UNION ALL: keeps duplicates, faster
SELECT city FROM customers
UNION ALL
SELECT city FROM suppliers;
6. What does DISTINCT do?
DISTINCT removes duplicate rows from the result set. It applies to all selected columns, not just the first one.
-- Unique cities
SELECT DISTINCT city FROM customers;
-- Unique city + state combinations
SELECT DISTINCT city, state FROM customers;
7. Explain NULL in SQL. How do you check for NULL values?
NULL represents a missing or unknown value. It is not equal to zero, an empty string, or false. You cannot use = to compare NULL values. Use IS NULL or IS NOT NULL instead.
-- Correct way to check for NULL
SELECT * FROM orders WHERE shipped_date IS NULL;
-- COALESCE: return first non-null value
SELECT name, COALESCE(phone, email, 'No contact') AS contact
FROM customers;
8. What is the order of execution in a SQL query?
SQL does not execute in the order you write it. The actual execution order is:
- FROM and JOIN (identify tables)
- WHERE (filter rows)
- GROUP BY (create groups)
- HAVING (filter groups)
- SELECT (choose columns)
- DISTINCT (remove duplicates)
- ORDER BY (sort results)
- LIMIT / OFFSET (paginate)
This is why you cannot use a column alias from SELECT in your WHERE clause but you can use it in ORDER BY.
9. What is the difference between CHAR and VARCHAR?
CHAR(n) is fixed-length and always stores exactly n characters, padding shorter strings with spaces. VARCHAR(n) is variable-length and stores only the actual characters. Use CHAR for fixed-format data like country codes (US, UK). Use VARCHAR for everything else.
10. Write a query to find duplicate records.
SELECT email, COUNT(*) AS duplicate_count
FROM users
GROUP BY email
HAVING COUNT(*) > 1
ORDER BY duplicate_count DESC;
11. What is an alias and why use one?
An alias gives a temporary name to a table or column using the AS keyword. It makes queries more readable, especially when dealing with JOINs across multiple tables or computed columns.
SELECT
e.first_name AS name,
d.department_name AS dept,
e.salary * 12 AS annual_salary
FROM employees AS e
JOIN departments AS d ON e.dept_id = d.id;
12. What is the difference between IN and EXISTS?
IN checks if a value matches any value in a list or subquery result. EXISTS checks whether a subquery returns any rows at all. For large datasets, EXISTS is usually faster because it stops scanning once it finds the first match.
-- IN: evaluates entire subquery
SELECT * FROM customers
WHERE id IN (SELECT customer_id FROM orders);
-- EXISTS: stops at first match
SELECT * FROM customers c
WHERE EXISTS (
SELECT 1 FROM orders o WHERE o.customer_id = c.id
);
13. How do you insert data from one table into another?
-- INSERT INTO ... SELECT
INSERT INTO archived_orders (id, customer_id, total, order_date)
SELECT id, customer_id, total_amount, order_date
FROM orders
WHERE order_date < '2025-01-01';
14. What is a FOREIGN KEY?
A foreign key is a column that references the primary key of another table. It enforces referential integrity, meaning you cannot insert a value that does not exist in the referenced table.
CREATE TABLE orders (
id SERIAL PRIMARY KEY,
customer_id INT NOT NULL,
total_amount DECIMAL(10,2),
FOREIGN KEY (customer_id) REFERENCES customers(id)
);
15. Write a query to get the second highest salary.
-- Using LIMIT/OFFSET
SELECT DISTINCT salary
FROM employees
ORDER BY salary DESC
LIMIT 1 OFFSET 1;
-- Using a subquery
SELECT MAX(salary) AS second_highest
FROM employees
WHERE salary < (SELECT MAX(salary) FROM employees);
Intermediate SQL Interview Questions (16-30)
These questions go beyond syntax and test your ability to write multi-step queries.
16. What is a self-join? When would you use one?
A self-join joins a table to itself. It is used when rows in the same table have a hierarchical or comparative relationship, like employees and their managers.
-- Find each employee and their manager's name
SELECT
e.name AS employee,
m.name AS manager
FROM employees e
LEFT JOIN employees m ON e.manager_id = m.id;
17. Explain GROUP BY with an example.
GROUP BY groups rows that share the same values in specified columns, allowing you to run aggregate functions (COUNT, SUM, AVG) on each group.
-- Total revenue by product category
SELECT
category,
COUNT(*) AS total_orders,
SUM(amount) AS revenue,
AVG(amount) AS avg_order_value
FROM orders o
JOIN products p ON o.product_id = p.id
GROUP BY category
ORDER BY revenue DESC;
18. What is a subquery? What are correlated vs non-correlated subqueries?
A subquery is a query nested inside another query. A non-correlated subquery runs independently and executes once. A correlated subquery references the outer query and executes once per outer row.
-- Non-correlated: runs once
SELECT * FROM employees
WHERE dept_id IN (SELECT id FROM departments WHERE location = 'NYC');
-- Correlated: runs once per employee
SELECT * FROM employees e
WHERE salary > (
SELECT AVG(salary) FROM employees
WHERE dept_id = e.dept_id
);
19. What is a CTE (Common Table Expression)?
A CTE is a temporary named result set defined with the WITH keyword. It makes complex queries readable by breaking them into logical steps. CTEs exist only for the duration of the query.
WITH monthly_sales AS (
SELECT
DATE_TRUNC('month', order_date) AS month,
SUM(total_amount) AS revenue
FROM orders
GROUP BY DATE_TRUNC('month', order_date)
)
SELECT
month,
revenue,
LAG(revenue) OVER (ORDER BY month) AS prev_month,
ROUND((revenue - LAG(revenue) OVER (ORDER BY month))
/ LAG(revenue) OVER (ORDER BY month) * 100, 1) AS growth_pct
FROM monthly_sales
ORDER BY month;
20. What is the difference between a VIEW and a TABLE?
A table stores data physically on disk. A view is a saved query that acts like a virtual table. Views do not store data; they execute the underlying query each time you select from them. Views are useful for simplifying complex queries, enforcing security (exposing only certain columns), and creating reusable query logic.
CREATE VIEW active_customers AS
SELECT c.id, c.name, c.email, COUNT(o.id) AS order_count
FROM customers c
JOIN orders o ON c.id = o.customer_id
WHERE o.order_date >= CURRENT_DATE - INTERVAL '90 days'
GROUP BY c.id, c.name, c.email;
-- Use it like a table
SELECT * FROM active_customers WHERE order_count > 5;
21. Write a query to find customers who have never placed an order.
-- Using LEFT JOIN + NULL check
SELECT c.id, c.name, c.email
FROM customers c
LEFT JOIN orders o ON c.id = o.customer_id
WHERE o.id IS NULL;
-- Using NOT EXISTS
SELECT c.id, c.name, c.email
FROM customers c
WHERE NOT EXISTS (
SELECT 1 FROM orders o WHERE o.customer_id = c.id
);
22. What is CASE WHEN and how do you use it?
CASE WHEN is SQL's conditional expression, similar to if-else in other languages. It can be used in SELECT, WHERE, ORDER BY, and even inside aggregate functions.
SELECT
name,
salary,
CASE
WHEN salary >= 120000 THEN 'Senior'
WHEN salary >= 80000 THEN 'Mid'
WHEN salary >= 50000 THEN 'Junior'
ELSE 'Entry'
END AS level
FROM employees;
-- CASE inside an aggregate
SELECT
department,
COUNT(CASE WHEN status = 'active' THEN 1 END) AS active_count,
COUNT(CASE WHEN status = 'inactive' THEN 1 END) AS inactive_count
FROM employees
GROUP BY department;
23. How do you calculate a running total?
SELECT
order_date,
total_amount,
SUM(total_amount) OVER (ORDER BY order_date) AS running_total
FROM orders;
24. What is the difference between INNER JOIN and LEFT JOIN?
INNER JOIN returns only rows where there is a match in both tables. LEFT JOIN returns all rows from the left table, and for rows with no match in the right table, the right-side columns are filled with NULL.
-- INNER JOIN: only customers with orders
SELECT c.name, o.id AS order_id
FROM customers c
INNER JOIN orders o ON c.id = o.customer_id;
-- Result: 850 rows
-- LEFT JOIN: all customers, including those without orders
SELECT c.name, o.id AS order_id
FROM customers c
LEFT JOIN orders o ON c.id = o.customer_id;
-- Result: 1000 rows (150 with NULL order_id)
25. How do you handle duplicate rows in a result?
-- Option 1: DISTINCT
SELECT DISTINCT customer_id, product_id FROM orders;
-- Option 2: GROUP BY
SELECT customer_id, product_id, COUNT(*) AS times_ordered
FROM orders
GROUP BY customer_id, product_id;
-- Option 3: ROW_NUMBER to keep only the latest
WITH ranked AS (
SELECT *,
ROW_NUMBER() OVER (
PARTITION BY customer_id, product_id
ORDER BY order_date DESC
) AS rn
FROM orders
)
SELECT * FROM ranked WHERE rn = 1;
26. What are aggregate functions? Name five.
Aggregate functions perform a calculation on a set of rows and return a single value. The five most common are:
- COUNT(*) counts rows
- SUM(column) totals numeric values
- AVG(column) calculates the mean
- MIN(column) finds the smallest value
- MAX(column) finds the largest value
SELECT
department,
COUNT(*) AS headcount,
SUM(salary) AS total_payroll,
AVG(salary) AS avg_salary,
MIN(salary) AS lowest_salary,
MAX(salary) AS highest_salary
FROM employees
GROUP BY department;
27. What is a transaction? Explain ACID properties.
A transaction is a sequence of SQL operations executed as a single unit. It either completes fully or not at all. ACID stands for:
- Atomicity: All operations succeed or all are rolled back
- Consistency: The database moves from one valid state to another
- Isolation: Concurrent transactions do not interfere with each other
- Durability: Once committed, data survives crashes
BEGIN TRANSACTION;
UPDATE accounts SET balance = balance - 500 WHERE id = 1;
UPDATE accounts SET balance = balance + 500 WHERE id = 2;
-- If both succeed:
COMMIT;
-- If anything fails:
ROLLBACK;
28. Write a query to find the top 3 products by revenue per category.
WITH product_revenue AS (
SELECT
p.category,
p.product_name,
SUM(oi.quantity * oi.unit_price) AS revenue,
RANK() OVER (
PARTITION BY p.category
ORDER BY SUM(oi.quantity * oi.unit_price) DESC
) AS rank
FROM order_items oi
JOIN products p ON oi.product_id = p.id
GROUP BY p.category, p.product_name
)
SELECT category, product_name, revenue
FROM product_revenue
WHERE rank <= 3
ORDER BY category, rank;
29. What is the difference between RANK(), DENSE_RANK(), and ROW_NUMBER()?
All three assign a number to each row based on ordering, but they handle ties differently:
- ROW_NUMBER(): Always unique (1, 2, 3, 4). Ties get arbitrary order.
- RANK(): Ties get the same number, then skips (1, 2, 2, 4).
- DENSE_RANK(): Ties get the same number, no skip (1, 2, 2, 3).
SELECT
name,
salary,
ROW_NUMBER() OVER (ORDER BY salary DESC) AS row_num,
RANK() OVER (ORDER BY salary DESC) AS rank,
DENSE_RANK() OVER (ORDER BY salary DESC) AS dense_rank
FROM employees;
30. Write a query to pivot data (rows to columns).
-- Using CASE + GROUP BY (works in all databases)
SELECT
product_id,
SUM(CASE WHEN EXTRACT(MONTH FROM order_date) = 1 THEN amount ELSE 0 END) AS jan,
SUM(CASE WHEN EXTRACT(MONTH FROM order_date) = 2 THEN amount ELSE 0 END) AS feb,
SUM(CASE WHEN EXTRACT(MONTH FROM order_date) = 3 THEN amount ELSE 0 END) AS mar
FROM orders
WHERE EXTRACT(YEAR FROM order_date) = 2026
GROUP BY product_id;
Advanced SQL Interview Questions (31-40)
These questions separate strong candidates from average ones. Expect these at senior data roles and FAANG-level interviews.
31. What are window functions? How do they differ from aggregate functions?
Window functions perform calculations across a set of rows related to the current row without collapsing them into a single result. Unlike aggregate functions with GROUP BY, window functions keep all individual rows in the output.
-- Aggregate: collapses to one row per department
SELECT department, AVG(salary) FROM employees GROUP BY department;
-- Window function: keeps all rows, adds department average
SELECT
name,
department,
salary,
AVG(salary) OVER (PARTITION BY department) AS dept_avg,
salary - AVG(salary) OVER (PARTITION BY department) AS diff_from_avg
FROM employees;
32. Explain LAG() and LEAD() with a practical example.
LAG() accesses data from a previous row. LEAD() accesses data from a following row. Both are window functions.
-- Month-over-month revenue comparison
SELECT
month,
revenue,
LAG(revenue, 1) OVER (ORDER BY month) AS prev_month_revenue,
ROUND(
(revenue - LAG(revenue, 1) OVER (ORDER BY month))
/ LAG(revenue, 1) OVER (ORDER BY month) * 100, 1
) AS mom_growth_pct,
LEAD(revenue, 1) OVER (ORDER BY month) AS next_month_revenue
FROM monthly_revenue;
33. What is a recursive CTE? Write an example.
A recursive CTE references itself to traverse hierarchical data like org charts, category trees, or graph structures.
-- Build an org chart hierarchy
WITH RECURSIVE org_tree AS (
-- Base case: top-level managers (no manager)
SELECT id, name, manager_id, 1 AS level
FROM employees
WHERE manager_id IS NULL
UNION ALL
-- Recursive case: find direct reports
SELECT e.id, e.name, e.manager_id, ot.level + 1
FROM employees e
JOIN org_tree ot ON e.manager_id = ot.id
)
SELECT id, name, level
FROM org_tree
ORDER BY level, name;
34. What are indexes? When should you NOT create an index?
An index is a data structure that speeds up data retrieval at the cost of slower writes and extra storage. You should NOT create an index when:
- The table is small (under a few thousand rows)
- The column has very low cardinality (like a boolean or status with 2-3 values)
- The table has heavy write traffic and few reads
- The column is rarely used in WHERE, JOIN, or ORDER BY
-- Create an index on frequently queried columns
CREATE INDEX idx_orders_customer_date
ON orders(customer_id, order_date);
-- Check if your query uses the index
EXPLAIN ANALYZE
SELECT * FROM orders
WHERE customer_id = 42 AND order_date >= '2026-01-01';
35. What is the difference between a clustered and non-clustered index?
A clustered index determines the physical order of rows on disk. Each table can have only one clustered index (typically the primary key). A non-clustered index creates a separate structure with pointers to the actual rows. A table can have many non-clustered indexes. Think of a clustered index as the table of contents, and non-clustered indexes as the book's index at the back.
36. Write a query to find gaps in a sequence.
-- Find missing invoice numbers using LEAD
SELECT invoice_number + 1 AS gap_start,
next_num - 1 AS gap_end
FROM (
SELECT invoice_number,
LEAD(invoice_number) OVER (ORDER BY invoice_number) AS next_num
FROM invoices
) t
WHERE next_num - invoice_number > 1;
37. How do you implement pagination efficiently on large tables?
The standard LIMIT/OFFSET approach becomes slow on large tables because the database still scans all skipped rows. Keyset pagination (also called seek method) is faster.
-- Slow: LIMIT/OFFSET (scans 10,000 rows to skip them)
SELECT * FROM products
ORDER BY id
LIMIT 20 OFFSET 10000;
-- Fast: Keyset pagination (jumps directly)
SELECT * FROM products
WHERE id > 10000
ORDER BY id
LIMIT 20;
38. What is a materialized view?
A materialized view stores the query result physically on disk, unlike a regular view that re-executes the query each time. It is useful for expensive aggregation queries that do not need real-time data. You refresh it manually or on a schedule.
-- Create materialized view
CREATE MATERIALIZED VIEW monthly_summary AS
SELECT
DATE_TRUNC('month', order_date) AS month,
COUNT(*) AS total_orders,
SUM(total_amount) AS revenue
FROM orders
GROUP BY DATE_TRUNC('month', order_date);
-- Refresh when needed
REFRESH MATERIALIZED VIEW monthly_summary;
39. Explain the difference between OLTP and OLAP.
OLTP (Online Transaction Processing) handles day-to-day operations: inserts, updates, short queries. It uses normalized schemas with many small tables. Examples: PostgreSQL, MySQL for your app's database.
OLAP (Online Analytical Processing) handles complex analytical queries across large datasets. It uses denormalized schemas (star/snowflake) optimized for reads. Examples: BigQuery, Snowflake, Redshift.
In interviews, this often leads to follow-up questions about data warehousing and ETL pipelines.
40. Write a query to calculate a 7-day moving average.
SELECT
order_date,
daily_revenue,
AVG(daily_revenue) OVER (
ORDER BY order_date
ROWS BETWEEN 6 PRECEDING AND CURRENT ROW
) AS moving_avg_7d
FROM (
SELECT
order_date,
SUM(total_amount) AS daily_revenue
FROM orders
GROUP BY order_date
) daily
ORDER BY order_date;
SQL Performance & Optimization Questions (41-45)
These questions test whether you can write queries that actually run fast in production.
41. What is an execution plan and how do you read one?
An execution plan shows how the database engine will execute your query: which indexes it uses, how it joins tables, and the estimated cost. Use EXPLAIN (or EXPLAIN ANALYZE for actual timing) to see it.
EXPLAIN ANALYZE
SELECT c.name, COUNT(o.id) AS order_count
FROM customers c
JOIN orders o ON c.id = o.customer_id
WHERE o.order_date >= '2026-01-01'
GROUP BY c.name
ORDER BY order_count DESC;
Key things to look for: Seq Scan (full table scan, usually bad on large tables), Index Scan (good), Nested Loop vs Hash Join (depends on data size), and the actual time vs estimated rows.
42. Why is SELECT * considered bad practice?
SELECT * causes problems in production:
- Reads unnecessary columns, wasting I/O and memory
- Prevents the optimizer from using covering indexes
- Breaks application code when columns are added or removed
- Transfers more data over the network
Always specify the exact columns you need.
43. How do you optimize a slow query?
Follow this systematic approach:
- Run
EXPLAIN ANALYZEto see the execution plan - Look for full table scans on large tables
- Add indexes on columns used in WHERE, JOIN, and ORDER BY
- Rewrite correlated subqueries as JOINs
- Replace
SELECT *with specific columns - Avoid functions on indexed columns in WHERE clauses
- Consider partitioning for very large tables
-- Bad: function on indexed column prevents index use
SELECT * FROM orders
WHERE YEAR(order_date) = 2026;
-- Good: range comparison uses the index
SELECT * FROM orders
WHERE order_date >= '2026-01-01' AND order_date < '2027-01-01';
44. What is database normalization? Explain 1NF, 2NF, and 3NF.
Normalization is the process of organizing data to reduce redundancy.
- 1NF (First Normal Form): Each column contains atomic values (no arrays or nested data). Each row is unique.
- 2NF (Second Normal Form): Meets 1NF plus every non-key column depends on the entire primary key (no partial dependencies).
- 3NF (Third Normal Form): Meets 2NF plus no non-key column depends on another non-key column (no transitive dependencies).
In practice, most production databases aim for 3NF. OLAP systems intentionally denormalize for query speed.
45. What is table partitioning and when should you use it?
Partitioning splits a large table into smaller physical pieces based on a column value (usually date or ID range). Queries that filter on the partition key only scan relevant partitions instead of the entire table.
-- Range partitioning by date (PostgreSQL)
CREATE TABLE orders (
id SERIAL,
customer_id INT,
total_amount DECIMAL(10,2),
order_date DATE NOT NULL
) PARTITION BY RANGE (order_date);
CREATE TABLE orders_2025 PARTITION OF orders
FOR VALUES FROM ('2025-01-01') TO ('2026-01-01');
CREATE TABLE orders_2026 PARTITION OF orders
FOR VALUES FROM ('2026-01-01') TO ('2027-01-01');
Use partitioning when a table exceeds millions of rows and queries consistently filter on a specific column.
Database Design Interview Questions (46-50)
These questions test your ability to think beyond queries and design systems that scale.
46. Design a schema for an e-commerce system.
A minimal e-commerce schema needs these core tables:
CREATE TABLE customers (
id SERIAL PRIMARY KEY,
name VARCHAR(100) NOT NULL,
email VARCHAR(255) UNIQUE NOT NULL,
created_at TIMESTAMP DEFAULT NOW()
);
CREATE TABLE products (
id SERIAL PRIMARY KEY,
name VARCHAR(200) NOT NULL,
price DECIMAL(10,2) NOT NULL,
category VARCHAR(50),
stock_quantity INT DEFAULT 0
);
CREATE TABLE orders (
id SERIAL PRIMARY KEY,
customer_id INT REFERENCES customers(id),
total_amount DECIMAL(10,2),
status VARCHAR(20) DEFAULT 'pending',
order_date TIMESTAMP DEFAULT NOW()
);
CREATE TABLE order_items (
id SERIAL PRIMARY KEY,
order_id INT REFERENCES orders(id),
product_id INT REFERENCES products(id),
quantity INT NOT NULL,
unit_price DECIMAL(10,2) NOT NULL
);
Key design decisions: separate order_items table (many-to-many between orders and products), store unit_price in order_items (price may change later), use status for order lifecycle tracking.
47. What is the difference between a star schema and a snowflake schema?
Both are used in data warehousing. A star schema has a central fact table connected directly to dimension tables (one level). A snowflake schema normalizes dimension tables into sub-dimensions (multiple levels). Star schemas are simpler and faster for queries. Snowflake schemas save storage but require more JOINs.
48. How would you design a schema for a social media "likes" feature?
CREATE TABLE likes (
user_id INT NOT NULL REFERENCES users(id),
post_id INT NOT NULL REFERENCES posts(id),
created_at TIMESTAMP DEFAULT NOW(),
PRIMARY KEY (user_id, post_id) -- prevents duplicate likes
);
-- Create index for counting likes per post
CREATE INDEX idx_likes_post ON likes(post_id);
-- Get like count for a post
SELECT COUNT(*) FROM likes WHERE post_id = 123;
-- Check if user liked a post
SELECT EXISTS (
SELECT 1 FROM likes WHERE user_id = 42 AND post_id = 123
);
The composite primary key (user_id, post_id) prevents duplicate likes at the database level.
49. What are stored procedures and when should you use them?
Stored procedures are precompiled SQL code blocks stored in the database. They accept parameters, can contain control flow logic, and execute as a single unit. Use them for complex business logic that must run close to the data, multi-step operations that need transaction safety, and frequently executed operations that benefit from precompilation.
-- PostgreSQL function example
CREATE OR REPLACE FUNCTION transfer_funds(
sender_id INT,
receiver_id INT,
amount DECIMAL
) RETURNS VOID AS $$
BEGIN
UPDATE accounts SET balance = balance - amount
WHERE id = sender_id AND balance >= amount;
IF NOT FOUND THEN
RAISE EXCEPTION 'Insufficient funds';
END IF;
UPDATE accounts SET balance = balance + amount
WHERE id = receiver_id;
END;
$$ LANGUAGE plpgsql;
50. How do you handle soft deletes vs hard deletes?
A hard delete removes the row permanently with DELETE. A soft delete marks the row as deleted using a flag column, keeping the data for audit trails or recovery.
-- Soft delete: add a column
ALTER TABLE users ADD COLUMN deleted_at TIMESTAMP NULL;
-- "Delete" a user
UPDATE users SET deleted_at = NOW() WHERE id = 42;
-- Query only active users
SELECT * FROM users WHERE deleted_at IS NULL;
-- Create a view for convenience
CREATE VIEW active_users AS
SELECT * FROM users WHERE deleted_at IS NULL;
Soft deletes are preferred when you need audit logs, undo functionality, or regulatory compliance (GDPR data retention). Hard deletes are better when storage is a concern or when data must be permanently removed.
How to Practice SQL Interviews
Reading answers is not enough. You need to write queries with your own hands to build the muscle memory that interviews require.
- Use a real database. AI2SQL has a built-in demo database you can query instantly. No setup needed. Write the queries from this guide and verify they work.
- Practice under time pressure. Set a 5-minute timer per question. In real interviews, you will not have unlimited time.
- Explain your thought process out loud. Interviewers want to hear how you break down a problem, not just see the final query. Practice narrating: "First I need to join these two tables, then aggregate by category..."
- Study execution plans. Run
EXPLAIN ANALYZEon your queries. Understand why one approach is faster than another. - Recreate real scenarios. Take a business question ("Which customers spent the most last quarter?") and translate it to SQL. This is exactly what interviews test.
Start practicing now with AI2SQL's built-in database. Describe what you need in plain English and see the SQL generated instantly.
Common Mistakes to Avoid
These mistakes cost candidates in interviews every day. Avoid them.
- Using SELECT * in your answer. Always specify columns. It shows you understand performance and maintainability.
- Forgetting NULL handling. NULL breaks equality checks. Use
IS NULL,COALESCE, orNULLIFwhere appropriate. - Confusing WHERE and HAVING. WHERE filters rows before grouping. HAVING filters groups after aggregation. Mixing them up signals a weak foundation.
- Not using aliases. When joining multiple tables, unaliased column names create confusion. Always alias your tables and computed columns.
- Writing everything in one giant query. Break complex logic into CTEs. It makes your query readable and shows you can think in steps.
- Ignoring edge cases. What if the table is empty? What if there are ties? What about NULL values in JOIN columns? Address these before the interviewer asks.
- Not asking clarifying questions. Before writing SQL, ask: Which database? How large is the table? Are there indexes? What does "most recent" mean? Asking smart questions is a signal of seniority.
- Forgetting ORDER BY. SQL does not guarantee row order without ORDER BY. If the question asks for a "top N" result, always include it.
SQL Interview Questions for Experienced Developers
Once you have 5+ years on the job, panels stop asking what a JOIN is and start probing how you reason about concurrency, query plans, and data at scale. The five questions below come from senior backend, platform, and data-engineering loops in 2026.
1. How would you detect blocked or deadlocked sessions in a production database right now?
Why interviewers ask: they want to see whether you have actually been on call. Naming the system view is the entry ticket; explaining the wait chain is the signal.
-- PostgreSQL: pg_stat_activity + pg_locks join
SELECT
blocked.pid AS blocked_pid,
blocked.usename AS blocked_user,
blocking.pid AS blocking_pid,
blocking.usename AS blocking_user,
blocked.query AS blocked_query,
blocking.query AS blocking_query,
blocked.wait_event_type,
blocked.wait_event
FROM pg_stat_activity AS blocked
JOIN pg_stat_activity AS blocking
ON blocking.pid = ANY(pg_blocking_pids(blocked.pid))
WHERE blocked.wait_event_type = 'Lock';
-- SQL Server equivalent: sys.dm_exec_requests + sys.dm_tran_locks
-- MySQL 8.0 equivalent: performance_schema.data_lock_waits
The follow-up is always: "How would you resolve it without bouncing the database?" Mention pg_cancel_backend(pid) first (graceful), pg_terminate_backend(pid) only if cancel fails, and never restart unless the lock chain is rooted in an unkillable autovacuum.
2. You have an N+1 lurking in a reporting query. Rewrite it as a single window-function pass.
Why interviewers ask: N+1 is the single most common reason a senior candidate's query gets thrown out. Window functions are the canonical fix and the panel wants to see you reach for them automatically.
-- BAD: scalar subquery per row → O(N*M)
SELECT
o.id,
o.customer_id,
o.total,
(SELECT COUNT(*) FROM orders o2
WHERE o2.customer_id = o.customer_id
AND o2.created_at <= o.created_at) AS order_seq
FROM orders o;
-- GOOD: single sort, single pass
SELECT
id,
customer_id,
total,
ROW_NUMBER() OVER (
PARTITION BY customer_id
ORDER BY created_at
) AS order_seq
FROM orders;
Senior interviewers will then ask the cost difference. The scalar subquery is roughly O(N²) per partition; the window pass is O(N log N) once and indexable on (customer_id, created_at).
3. Walk an org chart of arbitrary depth using a single query.
Why interviewers ask: recursive CTEs are the standard test for whether you understand iteration inside SQL. The trap is forgetting the termination condition or the cycle guard.
-- PostgreSQL / SQL Server / MySQL 8.0
WITH RECURSIVE org AS (
SELECT id, name, manager_id, 1 AS depth, ARRAY[id] AS path
FROM employees
WHERE manager_id IS NULL -- anchor: the CEO
UNION ALL
SELECT e.id, e.name, e.manager_id, o.depth + 1, o.path || e.id
FROM employees e
JOIN org o ON e.manager_id = o.id
WHERE NOT e.id = ANY(o.path) -- cycle guard
)
SELECT depth, name, path
FROM org
ORDER BY path;
The cycle guard (NOT e.id = ANY(o.path)) is what separates juniors from seniors. Without it, a single bad row in employees sends the recursion to the planner's iteration limit and your query fails in production but passes in dev.
4. A nightly job updates 10M rows and now reads from another transaction see stale data. Diagnose.
Why interviewers ask: they want to hear "isolation level" without prompting. Bonus points for naming the actual phenomenon (non-repeatable read vs phantom).
-- Check what isolation the offending transactions ran at
-- PostgreSQL
SHOW transaction_isolation;
-- or per-transaction:
BEGIN ISOLATION LEVEL REPEATABLE READ;
-- ... statements ...
COMMIT;
-- SQL Server: SET TRANSACTION ISOLATION LEVEL SNAPSHOT;
-- MySQL InnoDB default is REPEATABLE READ (different semantics from PG!)
The senior answer connects the dots: PostgreSQL's READ COMMITTED sees each statement at a fresh snapshot, so a long report can read mid-update state. REPEATABLE READ pins one snapshot for the whole transaction and makes the report self-consistent at the cost of update conflicts. Mention SERIALIZABLE only if asked — the false-positive serialization failures are why most teams stay on REPEATABLE READ.
5. Your events table is 4 TB and growing. How do you partition it?
Why interviewers ask: partitioning is where seniors are expected to think about access patterns first and DDL second.
-- PostgreSQL 16: declarative range partitioning by month
CREATE TABLE events (
id BIGSERIAL,
user_id BIGINT NOT NULL,
event_type TEXT NOT NULL,
created_at TIMESTAMPTZ NOT NULL,
payload JSONB
) PARTITION BY RANGE (created_at);
CREATE TABLE events_2026_05 PARTITION OF events
FOR VALUES FROM ('2026-05-01') TO ('2026-06-01');
CREATE INDEX ON events_2026_05 (user_id, created_at DESC);
-- Query that benefits from partition pruning
SELECT user_id, COUNT(*)
FROM events
WHERE created_at >= '2026-05-01'
AND created_at < '2026-06-01'
GROUP BY user_id;
The interviewer is testing whether you reach for partition pruning (queries with a created_at filter only scan one child) and whether you know that hash partitioning is rarely worth it for time-series. Closing strong: mention the operational story — detaching old partitions is metadata-only, deleting 4 TB of rows is hours of WAL.
SQL Interview Questions for Data Analysts
Analyst interviews are less about syntax recall and more about translating product or finance questions into SQL the team can read. The five below are taken from analyst loops at marketplace, fintech, and B2B SaaS panels in 2026.
1. Build a weekly cohort retention table from a user_events log.
Interviewer's rubric: can you anchor cohorts to first activity, can you express week diffs without DATE_TRUNC drift, and do you keep cohort size visible?
WITH first_seen AS (
SELECT
user_id,
DATE_TRUNC('week', MIN(event_at)) AS cohort_week
FROM user_events
GROUP BY user_id
),
activity AS (
SELECT
u.user_id,
f.cohort_week,
DATE_TRUNC('week', u.event_at) AS active_week
FROM user_events u
JOIN first_seen f USING (user_id)
)
SELECT
cohort_week,
(active_week - cohort_week) / 7 AS week_offset,
COUNT(DISTINCT user_id) AS active_users
FROM activity
GROUP BY cohort_week, week_offset
ORDER BY cohort_week, week_offset;
Strong candidates also volunteer the cohort-size denominator (week 0 count) so the table can be normalized to a percentage downstream.
2. Compute a 4-step funnel: view → click → add_to_cart → purchase.
Interviewer's rubric: ordered per user (not aggregate counts that double-count), and step-to-step conversion clearly labeled.
WITH steps AS (
SELECT
user_id,
MAX(CASE WHEN event_type = 'view' THEN 1 ELSE 0 END) AS s1,
MAX(CASE WHEN event_type = 'click' THEN 1 ELSE 0 END) AS s2,
MAX(CASE WHEN event_type = 'add_to_cart' THEN 1 ELSE 0 END) AS s3,
MAX(CASE WHEN event_type = 'purchase' THEN 1 ELSE 0 END) AS s4
FROM user_events
WHERE event_at >= NOW() - INTERVAL '30 days'
GROUP BY user_id
)
SELECT
SUM(s1) AS viewed,
SUM(s1 * s2) AS clicked,
SUM(s1 * s2 * s3) AS added,
SUM(s1 * s2 * s3 * s4) AS purchased,
ROUND(100.0 * SUM(s1*s2*s3*s4) / NULLIF(SUM(s1), 0), 2) AS overall_pct
FROM steps;
Bonus points for the NULLIF guard — empty datasets crash unprotected division and that bug always ships to dashboards.
3. Show week-over-week percent change in revenue with the prior week visible.
Interviewer's rubric: use LAG(), do not self-join, and handle the first row gracefully.
WITH weekly AS (
SELECT
DATE_TRUNC('week', order_date) AS week,
SUM(total) AS revenue
FROM orders
GROUP BY 1
)
SELECT
week,
revenue,
LAG(revenue) OVER (ORDER BY week) AS prev_revenue,
ROUND(
100.0 * (revenue - LAG(revenue) OVER (ORDER BY week))
/ NULLIF(LAG(revenue) OVER (ORDER BY week), 0),
2
) AS wow_pct
FROM weekly
ORDER BY week;
If the candidate writes a self-join on week = prev.week + INTERVAL '7 days' instead of LAG(), expect a follow-up about scaling.
4. Top 3 products by revenue within each category.
Interviewer's rubric: reach for ROW_NUMBER / RANK over PARTITION BY. LIMIT 3 here is a fail.
WITH ranked AS (
SELECT
category,
product_id,
SUM(amount) AS revenue,
ROW_NUMBER() OVER (
PARTITION BY category
ORDER BY SUM(amount) DESC
) AS rn
FROM order_items
GROUP BY category, product_id
)
SELECT category, product_id, revenue
FROM ranked
WHERE rn <= 3
ORDER BY category, rn;
Use RANK() instead of ROW_NUMBER() if the prompt says "top 3 including ties"; that wording is the trap.
5. Day-1, Day-7, Day-30 retention from a single signups table.
Interviewer's rubric: set-based, not row-by-row; one query, three columns; date math correct on inclusive boundaries.
SELECT
DATE_TRUNC('day', s.signup_at)::date AS signup_day,
COUNT(DISTINCT s.user_id) AS cohort_size,
COUNT(DISTINCT CASE
WHEN e.event_at::date - s.signup_at::date = 1 THEN s.user_id END) AS d1,
COUNT(DISTINCT CASE
WHEN e.event_at::date - s.signup_at::date = 7 THEN s.user_id END) AS d7,
COUNT(DISTINCT CASE
WHEN e.event_at::date - s.signup_at::date = 30 THEN s.user_id END) AS d30
FROM signups s
LEFT JOIN user_events e USING (user_id)
WHERE s.signup_at >= NOW() - INTERVAL '60 days'
GROUP BY 1
ORDER BY 1;
Senior analyst panels expect you to flag the LEFT JOIN explicitly — an INNER JOIN here would silently drop the cohort denominator for any signup that never returned, breaking every retention calculation downstream.
Common Mistakes That Fail SQL Interviews
These are the patterns that get you flagged "no hire" on the panel debrief, in rough order of how often they show up.
- Using
LIMITto get top-N per group.LIMITis global. The right tool isROW_NUMBER() OVER (PARTITION BY group ORDER BY metric DESC)filtered torn <= N.LIMIT 3afterORDER BY category, revenue DESCreturns the top 3 rows of the whole table, not 3 per category. - Reaching for
INNER JOINwhen the question impliesLEFT JOIN. "Show every customer and their order count" is aLEFT JOINwithCOUNT(o.id)(notCOUNT(*)).INNER JOINdrops customers who never ordered, which is usually the cohort the question cares about. COUNT(*)vsCOUNT(column)on nullable columns.COUNT(*)counts rows;COUNT(col)ignores rows wherecol IS NULL. Ifshipped_atis nullable,COUNT(shipped_at)is "shipped orders" — different from the total. Use whichever you mean and say which.GROUP BYwithout including every non-aggregatedSELECTcolumn. PostgreSQL and SQL Server reject this outright. MySQL withONLY_FULL_GROUP_BYoff accepts it and silently picks an arbitrary row, which is the bug. Always group by every non-aggregated column or wrap it inMAX()/MIN()deliberately.- Scalar subqueries in
SELECTover thousands of rows. A pattern likeSELECT id, (SELECT COUNT(*) FROM orders WHERE customer_id = c.id) FROM customers cruns the subquery once per row. Rewrite with aLEFT JOIN ... GROUP BYor a window function. The senior loop will check the plan; the junior loop will not, but the answer is the same. - Forgetting NULL semantics in
WHEREandJOIN.WHERE col != 'x'filters out NULL rows becauseNULL != 'x'is unknown, not true. UseWHERE col IS DISTINCT FROM 'x'(PostgreSQL) orWHERE col != 'x' OR col IS NULLwhen you need NULL rows included. Same trap onNOT IN (subquery)when the subquery contains a NULL — it returns zero rows, always. IN (subquery)when the list is large. Most planners materialize the subquery; switching toEXISTSlets the engine short-circuit on the first match. For million-row anti-joins,NOT EXISTSis also safer thanNOT INbecause of the NULL trap above.- Ignoring index implications when filtering.
WHERE LOWER(email) = 'x@y.com'can not use an index onemail— it needs an index onLOWER(email)or acitextcolumn. Same withWHERE created_at::date = '2026-05-08'versusWHERE created_at >= '2026-05-08' AND created_at < '2026-05-09'. Wrapping the indexed column in a function disables the index.
Frequently Asked Questions
What SQL topics are most commonly asked in interviews?
The most commonly asked SQL interview topics are JOINs (INNER, LEFT, RIGHT, FULL), aggregation with GROUP BY and HAVING, subqueries, window functions (ROW_NUMBER, RANK), indexing, and the difference between WHERE and HAVING. Expect at least one question on each of these in a typical technical interview.
How many SQL questions are typically asked in a technical interview?
Most technical interviews include 3 to 8 SQL questions. Phone screens typically have 2-3 conceptual questions, while on-site or take-home rounds may include 4-6 hands-on query writing exercises. Data engineering roles tend to have more SQL questions than general backend roles.
Should I practice SQL on a real database before my interview?
Yes. Reading answers is not enough. Practice writing queries against a real database to build muscle memory. Tools like AI2SQL provide a built-in demo database where you can run queries instantly without any setup. Aim for at least 20-30 practice queries before your interview.
Do I need to know database-specific syntax for SQL interviews?
Usually not. Most interviews test standard ANSI SQL. However, if the job description mentions a specific database (PostgreSQL, MySQL, SQL Server), it helps to know that dialect's unique features. For example, PostgreSQL's array types, MySQL's GROUP_CONCAT, or SQL Server's TOP instead of LIMIT.
What is the best way to prepare for SQL interviews in 2026?
Start by reviewing core concepts (JOINs, aggregation, subqueries, window functions). Then practice writing queries on a real database. Study execution plans with EXPLAIN to understand performance. Finally, practice explaining your thought process out loud, since interviewers want to hear how you approach problems, not just see the final query.
What is the most common SQL interview trap question?
The Nth highest salary problem — usually phrased "find the second or third highest salary in the employees table". Juniors solve it with ORDER BY ... LIMIT 1 OFFSET 1, which silently fails on ties. The expected answer in 2026 uses DENSE_RANK() OVER (ORDER BY salary DESC) and filters WHERE rnk = N, which handles duplicates correctly. Window functions require MySQL 8.0+ and PostgreSQL 8.4+.
Should I use JOIN or subquery in SQL interviews?
Default to JOIN when you need columns from both tables in the output. Use a subquery (or EXISTS) when you only need to filter on a related condition without surfacing the other table's columns. Modern query planners in PostgreSQL 16 and MySQL 8.0 collapse most subqueries into joins anyway, but the JOIN version is easier to read and easier to defend in an interview.
How long does a SQL technical interview usually take?
Phone screens average 30 to 45 minutes with 2 to 3 SQL questions, while on-site loops budget 60 minutes per round and include 4 to 6 hands-on query tasks. FAANG-tier data engineering loops run 5 rounds totalling 4 to 5 hours, with at least one round dedicated to query optimization and execution plans rather than pure syntax.
What SQL questions are asked in interviews for experienced developers?
Senior loops in 2026 skip syntax recall and probe production reasoning. Expect to: detect a deadlock or blocked session live (pg_stat_activity + pg_blocking_pids), rewrite an N+1 scalar subquery as a single window-function pass, walk a hierarchical org chart with a recursive CTE plus a cycle guard, diagnose stale-read bugs by naming the right isolation level (READ COMMITTED vs REPEATABLE READ), and design a partitioning strategy for a multi-terabyte time-series table. Interviewers care more about the operational follow-up — how do you fix it without bouncing the database — than the initial query.
How can I practice SQL interview questions before an interview?
Reading answers builds zero muscle memory. Open a real database, copy each question from this guide into the editor, and run it. AI2SQL ships a built-in demo database so you can practice joins, window functions, and recursive CTEs without setup. Aim for 25 to 40 queries written by hand before your interview, set a 5-minute timer per question to simulate real pressure, and run EXPLAIN ANALYZE on every answer so you can defend the plan when the interviewer asks why your query is fast.