SQL Practice Exercises With Answers: 50 Beginner to Advanced Problems (2026)
All 50 exercises run on a single ecommerce + HR + events schema, so you build context once and apply it everywhere. The difficulty climbs from Beginner SELECTs through Intermediate aggregations, Advanced window functions and recursive CTEs, and Expert patterns like gaps-and-islands and sessionization. Every answer is folded behind a "Show answer" toggle and is runnable PostgreSQL — solve first, then peek.
The Sample Schema
Every exercise on this page uses the same six tables. Read them once and you'll have all the context you need for the next 50 problems.
customers
CREATE TABLE customers (
id SERIAL PRIMARY KEY,
name VARCHAR(120) NOT NULL,
email VARCHAR(255) UNIQUE NOT NULL,
country VARCHAR(60),
created_at TIMESTAMP DEFAULT NOW()
);
orders
CREATE TABLE orders (
id SERIAL PRIMARY KEY,
customer_id INT REFERENCES customers(id),
order_date DATE NOT NULL,
status VARCHAR(30) NOT NULL, -- placed | shipped | delivered | cancelled | refunded
total_amount NUMERIC(10,2) NOT NULL
);
order_items
CREATE TABLE order_items (
id SERIAL PRIMARY KEY,
order_id INT REFERENCES orders(id),
product_id INT REFERENCES products(id),
quantity INT NOT NULL,
unit_price NUMERIC(10,2) NOT NULL
);
products
CREATE TABLE products (
id SERIAL PRIMARY KEY,
name VARCHAR(200) NOT NULL,
category VARCHAR(60),
price NUMERIC(10,2) NOT NULL,
in_stock BOOLEAN DEFAULT true
);
employees (for self-join exercises)
CREATE TABLE employees (
id SERIAL PRIMARY KEY,
name VARCHAR(120) NOT NULL,
department VARCHAR(60),
salary NUMERIC(10,2) NOT NULL,
hired_at DATE NOT NULL,
manager_id INT REFERENCES employees(id)
);
events (for analyst / cohort exercises)
CREATE TABLE events (
user_id INT NOT NULL,
event_name VARCHAR(60) NOT NULL,
event_date TIMESTAMP NOT NULL
);
Want to actually run these? Spin up a free AI2SQL workspace — the demo database has these tables seeded.
Beginner Exercises (1-15) — SELECT, WHERE, ORDER BY, basic JOINs
The goal of this block: read tables, filter, sort, count, and run your first JOIN. If any of these takes more than 2 minutes, slow down and re-read the schema before continuing.
Exercise 1: List all customers
Prompt: Return every customer's name and email.
Difficulty: Beginner
Show answer
SELECT name, email
FROM customers;
Why this works: A bare SELECT with no WHERE returns every row. Naming the columns explicitly is preferred over SELECT * for readability and stability.
Exercise 2: Customers in Germany
Prompt: Return all columns for customers whose country is Germany.
Difficulty: Beginner
Show answer
SELECT *
FROM customers
WHERE country = 'Germany';
Why this works: Single-equals string comparison is the most basic WHERE filter. Quotes must be single quotes in standard SQL, not double.
Exercise 3: Most recent orders first
Prompt: List orders sorted from newest to oldest.
Difficulty: Beginner
Show answer
SELECT id, customer_id, order_date, total_amount
FROM orders
ORDER BY order_date DESC;
Why this works: ORDER BY ... DESC sorts descending; default is ascending. Sorting on a date column is index-friendly when an index exists on order_date.
Exercise 4: Total number of orders
Prompt: Count how many orders exist in total.
Difficulty: Beginner
Show answer
SELECT COUNT(*) AS total_orders
FROM orders;
Why this works: COUNT(*) counts rows including those with NULLs. COUNT(column) ignores NULLs in that column — pick deliberately.
Exercise 5: Total revenue
Prompt: Sum every order's total_amount to get lifetime revenue.
Difficulty: Beginner
Show answer
SELECT SUM(total_amount) AS lifetime_revenue
FROM orders;
Why this works: SUM ignores NULLs automatically, so you don't need COALESCE here unless you want a specific fallback.
Exercise 6: Average order value
Prompt: Return the average total_amount across all orders.
Difficulty: Beginner
Show answer
SELECT AVG(total_amount) AS avg_order_value
FROM orders;
Why this works: AVG returns SUM/COUNT of non-null values. If you want a rounded display, wrap in ROUND(AVG(...), 2).
Exercise 7: Customers with a Gmail address
Prompt: Find customers whose email ends in @gmail.com.
Difficulty: Beginner
Show answer
SELECT name, email
FROM customers
WHERE email LIKE '%@gmail.com';
Why this works: % matches any sequence of characters in LIKE. Anchor the pattern carefully — '%gmail%' would also match gmail.com.fake.example.
Exercise 8: Orders in three statuses
Prompt: Return orders whose status is placed, shipped, or delivered.
Difficulty: Beginner
Show answer
SELECT id, status, total_amount
FROM orders
WHERE status IN ('placed', 'shipped', 'delivered');
Why this works: IN is shorthand for multiple OR equality checks and reads more cleanly. Most planners optimize it identically.
Exercise 9: Orders in Q1 2026
Prompt: Return orders placed between January 1 and March 31, 2026.
Difficulty: Beginner
Show answer
SELECT id, order_date, total_amount
FROM orders
WHERE order_date BETWEEN '2026-01-01' AND '2026-03-31';
Why this works: BETWEEN is inclusive on both ends. For timestamps, prefer order_date >= '2026-01-01' AND order_date < '2026-04-01' to avoid end-of-day off-by-one issues.
Exercise 10: Orders with customer names
Prompt: List each order's id, date, and the placing customer's name.
Difficulty: Beginner
Show answer
SELECT o.id, o.order_date, c.name
FROM orders o
INNER JOIN customers c ON c.id = o.customer_id;
Why this works: INNER JOIN returns rows only where both sides match. Aliasing tables (o, c) keeps the column references readable.
Exercise 11: Customers who never ordered
Prompt: Return every customer that has zero orders.
Difficulty: Beginner
Show answer
SELECT c.id, c.name, c.email
FROM customers c
LEFT JOIN orders o ON o.customer_id = c.id
WHERE o.id IS NULL;
Why this works: A LEFT JOIN keeps every customer; rows that didn't match get NULL on the right side. Filtering WHERE o.id IS NULL isolates the non-matchers. This is the canonical "anti-join" pattern.
Exercise 12: Distinct product categories
Prompt: Return the unique list of categories sold.
Difficulty: Beginner
Show answer
SELECT DISTINCT category
FROM products
ORDER BY category;
Why this works: DISTINCT deduplicates across the selected columns. Sorting makes the output easier to scan.
Exercise 13: Page 3 of customers (20 per page)
Prompt: Return rows 41–60 of customers, ordered by created_at descending.
Difficulty: Beginner
Show answer
SELECT id, name, created_at
FROM customers
ORDER BY created_at DESC
LIMIT 20 OFFSET 40;
Why this works: LIMIT caps row count, OFFSET skips rows. For large pagination prefer keyset pagination (WHERE created_at < :cursor) since OFFSET scans skipped rows.
Exercise 14: Customers with no recorded country
Prompt: Find customers whose country is NULL.
Difficulty: Beginner
Show answer
SELECT id, name, email
FROM customers
WHERE country IS NULL;
Why this works: NULL is not equal to anything — country = NULL always returns unknown and matches no rows. Use IS NULL / IS NOT NULL.
Exercise 15: Rename columns for a report
Prompt: Return order id as order_no and total_amount as amount_usd.
Difficulty: Beginner
Show answer
SELECT
id AS order_no,
total_amount AS amount_usd
FROM orders;
Why this works: AS assigns column aliases that flow into the result set, JSON exports, and downstream BI tools. The AS keyword itself is optional but improves readability.
Intermediate Exercises (16-30) — GROUP BY, HAVING, subqueries, multi-table JOINs
This block introduces aggregation, group filtering, subqueries, and stitching three or more tables together. These are the patterns that fill 70% of dashboards and ad-hoc analyst tickets.
Exercise 16: Revenue by product category
Prompt: For each category, return total revenue across order_items.
Difficulty: Intermediate
Show answer
SELECT
p.category,
SUM(oi.quantity * oi.unit_price) AS revenue
FROM order_items oi
JOIN products p ON p.id = oi.product_id
GROUP BY p.category
ORDER BY revenue DESC;
Why this works: Compute the per-line revenue inside SUM, then group by category. Joining first lets us aggregate across the right grouping key.
Exercise 17: Categories with more than 10 products
Prompt: Show categories that contain more than 10 distinct products.
Difficulty: Intermediate
Show answer
SELECT category, COUNT(*) AS num_products
FROM products
GROUP BY category
HAVING COUNT(*) > 10
ORDER BY num_products DESC;
Why this works: WHERE filters rows before aggregation; HAVING filters groups after. You cannot put COUNT(*) > 10 in WHERE.
Exercise 18: Top 10 customers by lifetime spend
Prompt: Return the 10 customers with the highest total order amount.
Difficulty: Intermediate
Show answer
SELECT
c.id,
c.name,
SUM(o.total_amount) AS lifetime_spend
FROM customers c
JOIN orders o ON o.customer_id = c.id
GROUP BY c.id, c.name
ORDER BY lifetime_spend DESC
LIMIT 10;
Why this works: Group by the primary key plus any non-aggregated columns (PostgreSQL allows just the PK; other dialects require all selected columns). LIMIT after ORDER BY picks the top N.
Exercise 19: Orders per month
Prompt: Count orders grouped by calendar month.
Difficulty: Intermediate
Show answer
SELECT
DATE_TRUNC('month', order_date) AS order_month,
COUNT(*) AS order_count
FROM orders
GROUP BY DATE_TRUNC('month', order_date)
ORDER BY order_month;
Why this works: DATE_TRUNC rounds a date down to a unit. MySQL equivalent: DATE_FORMAT(order_date, '%Y-%m-01'). SQL Server: DATEFROMPARTS(YEAR(order_date), MONTH(order_date), 1).
Exercise 20: Customers who spent more than the average
Prompt: Return customers whose lifetime spend exceeds the global average lifetime spend.
Difficulty: Intermediate
Show answer
SELECT
c.id,
c.name,
SUM(o.total_amount) AS lifetime_spend
FROM customers c
JOIN orders o ON o.customer_id = c.id
GROUP BY c.id, c.name
HAVING SUM(o.total_amount) > (
SELECT AVG(spend)
FROM (
SELECT SUM(total_amount) AS spend
FROM orders
GROUP BY customer_id
) per_customer
);
Why this works: The inner subquery computes per-customer spend, then the next layer averages those numbers. HAVING compares each group's sum to the constant scalar.
Exercise 21: Products that have been ordered at least once
Prompt: Return products with at least one matching order_items row.
Difficulty: Intermediate
Show answer
SELECT p.id, p.name
FROM products p
WHERE EXISTS (
SELECT 1
FROM order_items oi
WHERE oi.product_id = p.id
);
Why this works: EXISTS short-circuits at the first match per outer row, which beats IN for big subqueries. The SELECT 1 is convention — the value is ignored.
Exercise 22: Combined customers and employees by name
Prompt: Return a single column of names from both customers and employees.
Difficulty: Intermediate
Show answer
SELECT name FROM customers
UNION
SELECT name FROM employees
ORDER BY name;
Why this works: UNION deduplicates; UNION ALL preserves duplicates and is faster when you know there are no overlaps. Both sides must have the same column count and compatible types.
Exercise 23: Customers in both orders and events tables
Prompt: Find user IDs that appear in both orders.customer_id and events.user_id.
Difficulty: Intermediate
Show answer
-- PostgreSQL / SQL Server
SELECT customer_id AS user_id FROM orders
INTERSECT
SELECT user_id FROM events;
-- Customers who exist in events but never placed an order:
SELECT user_id FROM events
EXCEPT
SELECT customer_id FROM orders;
Why this works: INTERSECT returns rows in both queries, EXCEPT (MINUS in Oracle) returns rows in the first but not the second. MySQL versions before 8.0.31 do not support these — use semi/anti-joins instead.
Exercise 24: Customer + order + product report
Prompt: For every order line, show customer name, order date, product name, and line revenue.
Difficulty: Intermediate
Show answer
SELECT
c.name AS customer_name,
o.order_date,
p.name AS product_name,
oi.quantity * oi.unit_price AS line_revenue
FROM orders o
JOIN customers c ON c.id = o.customer_id
JOIN order_items oi ON oi.order_id = o.id
JOIN products p ON p.id = oi.product_id
ORDER BY o.order_date DESC;
Why this works: A 4-table star pattern: orders is the fact table; customers and products are dimensions; order_items is the bridge. Always JOIN through the natural keys.
Exercise 25: Employees and their managers
Prompt: List each employee with the name of their direct manager.
Difficulty: Intermediate
Show answer
SELECT
e.name AS employee,
m.name AS manager
FROM employees e
LEFT JOIN employees m ON m.id = e.manager_id
ORDER BY e.name;
Why this works: A self-join queries the same table twice with different aliases. LEFT JOIN keeps top-of-hierarchy employees whose manager_id is NULL.
Exercise 26: Bucket orders into size tiers
Prompt: Label each order as small (< $50), medium ($50–$200), or large (> $200).
Difficulty: Intermediate
Show answer
SELECT
id,
total_amount,
CASE
WHEN total_amount < 50 THEN 'small'
WHEN total_amount <= 200 THEN 'medium'
ELSE 'large'
END AS size_tier
FROM orders;
Why this works: CASE expressions evaluate top-to-bottom and return the first matching result. The ELSE is optional but recommended — without it, unmatched rows return NULL.
Exercise 27: Coalesce missing country to 'Unknown'
Prompt: Return customers with a non-null display country, defaulting to 'Unknown'.
Difficulty: Intermediate
Show answer
SELECT
name,
COALESCE(country, 'Unknown') AS display_country
FROM customers;
Why this works: COALESCE returns the first non-null argument. Cleaner than CASE WHEN col IS NULL THEN ... and works across dialects.
Exercise 28: Conditional counts with FILTER
Prompt: For each customer, count their total orders and their cancelled orders in one row.
Difficulty: Intermediate
Show answer
-- PostgreSQL (FILTER clause)
SELECT
customer_id,
COUNT(*) AS total_orders,
COUNT(*) FILTER (WHERE status = 'cancelled') AS cancelled_orders
FROM orders
GROUP BY customer_id;
-- Cross-dialect equivalent (CASE inside SUM)
SELECT
customer_id,
COUNT(*) AS total_orders,
SUM(CASE WHEN status = 'cancelled' THEN 1 ELSE 0 END) AS cancelled_orders
FROM orders
GROUP BY customer_id;
Why this works: FILTER applies a per-aggregate predicate without splitting into multiple queries. Falls back to SUM(CASE ...) in MySQL and SQL Server.
Exercise 29: Every category × every country combination
Prompt: Generate the full grid of product categories crossed with customer countries (for a coverage matrix).
Difficulty: Intermediate
Show answer
SELECT DISTINCT
p.category,
c.country
FROM products p
CROSS JOIN customers c
WHERE c.country IS NOT NULL
ORDER BY p.category, c.country;
Why this works: CROSS JOIN returns the cartesian product. Use it deliberately for grids and calendar tables; on big tables it explodes row counts (a×b).
Exercise 30: Subquery in FROM — per-customer averages
Prompt: Return the average across customers of "average order value per customer".
Difficulty: Intermediate
Show answer
SELECT AVG(per_customer_avg) AS avg_of_avgs
FROM (
SELECT customer_id, AVG(total_amount) AS per_customer_avg
FROM orders
GROUP BY customer_id
) t;
Why this works: A derived table (subquery in FROM) lets you aggregate twice. AVG(total_amount) directly would be a row-weighted average, not customer-weighted.
Stuck on a query? Describe what you need in plain English — AI2SQL writes the SQL for you and you can run it instantly against the demo schema above.
Advanced Exercises (31-42) — window functions, CTEs, recursive queries
Window functions, CTEs, and recursion are where SQL goes from "list of records" to "real analytics engine." Plan to spend more time per exercise here — thinking through the partition and frame is half the work.
Exercise 31: Top 3 most recent orders per customer
Prompt: For every customer, return their three most recent orders.
Difficulty: Advanced
Show answer
WITH ranked AS (
SELECT
o.*,
ROW_NUMBER() OVER (
PARTITION BY customer_id
ORDER BY order_date DESC
) AS rn
FROM orders o
)
SELECT id, customer_id, order_date, total_amount
FROM ranked
WHERE rn <= 3
ORDER BY customer_id, order_date DESC;
Why this works: ROW_NUMBER with PARTITION BY resets per customer. You must wrap in a CTE/subquery because window functions are evaluated after WHERE.
Exercise 32: RANK vs DENSE_RANK on tied salaries
Prompt: Show employees with their RANK and DENSE_RANK by salary in descending order.
Difficulty: Advanced
Show answer
SELECT
name,
salary,
RANK() OVER (ORDER BY salary DESC) AS r,
DENSE_RANK() OVER (ORDER BY salary DESC) AS dr
FROM employees;
Why this works: Both functions assign the same value to ties. RANK leaves a gap after a tie (1, 2, 2, 4), DENSE_RANK does not (1, 2, 2, 3). Use DENSE_RANK when "top 3 distinct salaries" is the goal.
Exercise 33: Each employee's salary as percent of department total
Prompt: For every employee, compute what percentage of their department's payroll they represent.
Difficulty: Advanced
Show answer
SELECT
name,
department,
salary,
ROUND(
100.0 * salary / SUM(salary) OVER (PARTITION BY department),
2
) AS pct_of_dept
FROM employees;
Why this works: SUM(...) OVER (PARTITION BY department) returns the department total on every row without collapsing — perfect for share-of-total calculations.
Exercise 34: Running total of daily revenue
Prompt: For each day, return that day's revenue and the cumulative revenue to date.
Difficulty: Advanced
Show answer
WITH daily AS (
SELECT
order_date,
SUM(total_amount) AS daily_revenue
FROM orders
GROUP BY order_date
)
SELECT
order_date,
daily_revenue,
SUM(daily_revenue) OVER (
ORDER BY order_date
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
) AS running_total
FROM daily
ORDER BY order_date;
Why this works: The explicit frame ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW tells the engine exactly which rows to sum. Without it, the default frame is the same here, but being explicit avoids surprises with ties.
Exercise 35: 7-day moving average of revenue
Prompt: Compute a trailing 7-day average of daily revenue (gap-safe).
Difficulty: Advanced
Show answer
WITH daily AS (
SELECT order_date, SUM(total_amount) AS daily_revenue
FROM orders
GROUP BY order_date
)
SELECT
order_date,
daily_revenue,
AVG(daily_revenue) OVER (
ORDER BY order_date
RANGE BETWEEN INTERVAL '6 days' PRECEDING AND CURRENT ROW
) AS moving_avg_7d
FROM daily
ORDER BY order_date;
Why this works: RANGE uses value-based windowing, so missing days don't shift the window. ROWS would average "the last 7 rows," which lies if days are missing.
Exercise 36: Month-over-month change with LAG
Prompt: For each month, show its revenue and the change versus the previous month.
Difficulty: Advanced
Show answer
WITH monthly AS (
SELECT
DATE_TRUNC('month', order_date) AS month,
SUM(total_amount) AS revenue
FROM orders
GROUP BY DATE_TRUNC('month', order_date)
)
SELECT
month,
revenue,
LAG(revenue) OVER (ORDER BY month) AS prev_month,
revenue - LAG(revenue) OVER (ORDER BY month) AS mom_change,
ROUND(
100.0 * (revenue - LAG(revenue) OVER (ORDER BY month))
/ NULLIF(LAG(revenue) OVER (ORDER BY month), 0),
2
) AS mom_pct
FROM monthly
ORDER BY month;
Why this works: LAG returns the previous row's value within the window. Wrap the denominator in NULLIF(..., 0) to avoid divide-by-zero on the first row.
Exercise 37: First and last order amount per customer
Prompt: For each customer, return their first ever order amount and their most recent order amount.
Difficulty: Advanced
Show answer
SELECT DISTINCT
customer_id,
FIRST_VALUE(total_amount) OVER w AS first_order_amount,
LAST_VALUE(total_amount) OVER w AS last_order_amount
FROM orders
WINDOW w AS (
PARTITION BY customer_id
ORDER BY order_date
ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
);
Why this works: LAST_VALUE with the default frame stops at the current row, which would just echo total_amount. The explicit UNBOUNDED FOLLOWING frame fixes that. WINDOW aliases keep the SQL DRY.
Exercise 38: Bucket customers into spending quartiles
Prompt: Tag each customer with their lifetime-spend quartile (1 = top 25%).
Difficulty: Advanced
Show answer
WITH per_customer AS (
SELECT customer_id, SUM(total_amount) AS lifetime_spend
FROM orders
GROUP BY customer_id
)
SELECT
customer_id,
lifetime_spend,
NTILE(4) OVER (ORDER BY lifetime_spend DESC) AS revenue_quartile
FROM per_customer;
Why this works: NTILE(n) splits an ordered set into n roughly equal buckets. Use 10 for deciles, 100 for percentiles. Much simpler than a self-join with manual thresholds.
Exercise 39: Recursive CTE — org chart with reporting depth
Prompt: Walk the employees.manager_id chain and tag each employee with their depth from the CEO (top of chain = 0).
Difficulty: Advanced
Show answer
WITH RECURSIVE chain AS (
-- Anchor: top-level employees (no manager)
SELECT id, name, manager_id, 0 AS depth
FROM employees
WHERE manager_id IS NULL
UNION ALL
-- Recurse: join children to their parent in the CTE
SELECT e.id, e.name, e.manager_id, c.depth + 1
FROM employees e
JOIN chain c ON c.id = e.manager_id
)
SELECT id, name, depth
FROM chain
ORDER BY depth, name;
Why this works: A recursive CTE has an anchor query and a recursive query joined by UNION ALL. The recursive part keeps firing until it returns no new rows. Always put a stop condition in mind — circular data infinite-loops.
Exercise 40: Recursive CTE — date series for the last 30 days
Prompt: Generate a row for every day in the last 30 days, even days with no orders, so you can left-join orders for a gap-free chart.
Difficulty: Advanced
Show answer
-- PostgreSQL: built-in generate_series is simpler
SELECT generate_series(
CURRENT_DATE - INTERVAL '29 days',
CURRENT_DATE,
INTERVAL '1 day'
)::date AS day;
-- Portable recursive CTE version
WITH RECURSIVE days(day) AS (
SELECT CURRENT_DATE - INTERVAL '29 days'
UNION ALL
SELECT day + INTERVAL '1 day'
FROM days
WHERE day < CURRENT_DATE
)
SELECT day FROM days;
Why this works: A recursive CTE is the dialect-portable way to build a calendar table on the fly. PostgreSQL's generate_series is shorter when available.
Exercise 41: Multiple chained CTEs — active vs revenue users
Prompt: For the last 30 days, show users who logged in at least 3 times and their total spend in the same window.
Difficulty: Advanced
Show answer
WITH active_users AS (
SELECT user_id, COUNT(*) AS logins
FROM events
WHERE event_name = 'login'
AND event_date >= CURRENT_DATE - INTERVAL '30 days'
GROUP BY user_id
HAVING COUNT(*) >= 3
),
recent_spend AS (
SELECT customer_id, SUM(total_amount) AS spend_30d
FROM orders
WHERE order_date >= CURRENT_DATE - INTERVAL '30 days'
GROUP BY customer_id
)
SELECT
a.user_id,
a.logins,
COALESCE(r.spend_30d, 0) AS spend_30d
FROM active_users a
LEFT JOIN recent_spend r ON r.customer_id = a.user_id
ORDER BY spend_30d DESC;
Why this works: Chaining CTEs lets you name intermediate result sets, which beats nested subqueries for readability. The optimizer treats them like inline views in modern PostgreSQL.
Exercise 42: Cohort retention — week 0 vs week 1 returns
Prompt: For each signup-week cohort, return how many users came back in the following week.
Difficulty: Advanced
Show answer
WITH signup_week AS (
SELECT
user_id,
DATE_TRUNC('week', MIN(event_date))::date AS cohort_week
FROM events
WHERE event_name = 'signup'
GROUP BY user_id
),
return_week AS (
SELECT DISTINCT
s.user_id,
s.cohort_week,
DATE_TRUNC('week', e.event_date)::date AS active_week
FROM signup_week s
JOIN events e ON e.user_id = s.user_id
WHERE e.event_date >= s.cohort_week + INTERVAL '7 days'
AND e.event_date < s.cohort_week + INTERVAL '14 days'
)
SELECT
s.cohort_week,
COUNT(DISTINCT s.user_id) AS cohort_size,
COUNT(DISTINCT r.user_id) AS week1_returners,
ROUND(
100.0 * COUNT(DISTINCT r.user_id) / NULLIF(COUNT(DISTINCT s.user_id), 0),
2
) AS week1_retention_pct
FROM signup_week s
LEFT JOIN return_week r
ON r.user_id = s.user_id
AND r.cohort_week = s.cohort_week
GROUP BY s.cohort_week
ORDER BY s.cohort_week;
Why this works: Cohort math is two date keys: cohort week (when the user signed up) and activity week. Bucket signups, then probe activity in week+1 with a date-range join.
Expert Exercises (43-50) — pivot, gaps & islands, percentiles, analytics
The hardest patterns in real analytics work. Each exercise here is something you'll see in a senior interview or a production dashboard at least once a quarter.
Exercise 43: Pivot — revenue by category as columns
Prompt: Build a wide table with one row per month and one column per category (electronics, books, clothing).
Difficulty: Expert
Show answer
SELECT
DATE_TRUNC('month', o.order_date)::date AS month,
SUM(CASE WHEN p.category = 'electronics'
THEN oi.quantity * oi.unit_price ELSE 0 END) AS electronics_rev,
SUM(CASE WHEN p.category = 'books'
THEN oi.quantity * oi.unit_price ELSE 0 END) AS books_rev,
SUM(CASE WHEN p.category = 'clothing'
THEN oi.quantity * oi.unit_price ELSE 0 END) AS clothing_rev
FROM orders o
JOIN order_items oi ON oi.order_id = o.id
JOIN products p ON p.id = oi.product_id
GROUP BY DATE_TRUNC('month', o.order_date)
ORDER BY month;
Why this works: Conditional aggregation pivots rows into columns without a dialect-specific PIVOT clause. SQL Server has a native PIVOT, PostgreSQL has the tablefunc extension's crosstab, but CASE-inside-SUM is portable.
Exercise 44: Unpivot — turn category columns back into rows
Prompt: Given a wide products dimension with q1_sales, q2_sales, q3_sales, q4_sales, reshape into rows with quarter + sales.
Difficulty: Expert
Show answer
-- Portable UNION ALL approach
SELECT product_id, 'Q1' AS quarter, q1_sales AS sales FROM product_sales_wide
UNION ALL
SELECT product_id, 'Q2', q2_sales FROM product_sales_wide
UNION ALL
SELECT product_id, 'Q3', q3_sales FROM product_sales_wide
UNION ALL
SELECT product_id, 'Q4', q4_sales FROM product_sales_wide;
-- PostgreSQL shortcut: VALUES + LATERAL
SELECT s.product_id, t.quarter, t.sales
FROM product_sales_wide s,
LATERAL (VALUES
('Q1', s.q1_sales),
('Q2', s.q2_sales),
('Q3', s.q3_sales),
('Q4', s.q4_sales)
) AS t(quarter, sales);
Why this works: UNION ALL is the universal unpivot; LATERAL VALUES in PostgreSQL keeps it on a single source pass. SQL Server has a native UNPIVOT operator.
Exercise 45: Gaps and islands — consecutive active days
Prompt: For each user, find runs of consecutive days they had at least one event. Return the start and end of each run.
Difficulty: Expert
Show answer
WITH user_days AS (
SELECT DISTINCT user_id, event_date::date AS day
FROM events
),
flagged AS (
SELECT
user_id,
day,
day - (ROW_NUMBER() OVER (PARTITION BY user_id ORDER BY day))::int AS island
FROM user_days
)
SELECT
user_id,
MIN(day) AS run_start,
MAX(day) AS run_end,
COUNT(*) AS run_length
FROM flagged
GROUP BY user_id, island
ORDER BY user_id, run_start;
Why this works: The classic gaps-and-islands trick: subtract a row number from the date. Consecutive days have the same offset; gaps create new offsets, which become group keys.
Exercise 46: Sessionize events with a 30-minute gap rule
Prompt: Group events into sessions where any gap longer than 30 minutes starts a new session.
Difficulty: Expert
Show answer
WITH gaps AS (
SELECT
user_id,
event_name,
event_date,
CASE
WHEN event_date - LAG(event_date) OVER (
PARTITION BY user_id ORDER BY event_date
) > INTERVAL '30 minutes'
OR LAG(event_date) OVER (
PARTITION BY user_id ORDER BY event_date
) IS NULL
THEN 1 ELSE 0
END AS new_session
FROM events
),
sessionized AS (
SELECT
user_id,
event_name,
event_date,
SUM(new_session) OVER (
PARTITION BY user_id ORDER BY event_date
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
) AS session_id
FROM gaps
)
SELECT
user_id,
session_id,
MIN(event_date) AS session_start,
MAX(event_date) AS session_end,
COUNT(*) AS event_count
FROM sessionized
GROUP BY user_id, session_id
ORDER BY user_id, session_start;
Why this works: Mark every "session-starting" event with 1, then take a running sum — each new 1 increments a stable session id per user. Standard analytics primitive.
Exercise 47: Median order value per country
Prompt: Compute the median order value within each customer country.
Difficulty: Expert
Show answer
-- PostgreSQL / Oracle / SQL Server (modern): PERCENTILE_CONT
SELECT
c.country,
PERCENTILE_CONT(0.5) WITHIN GROUP (ORDER BY o.total_amount) AS median_order_value
FROM customers c
JOIN orders o ON o.customer_id = c.id
GROUP BY c.country
ORDER BY median_order_value DESC;
-- MySQL fallback: average of middle row(s) using window functions
WITH ranked AS (
SELECT
c.country,
o.total_amount,
ROW_NUMBER() OVER (PARTITION BY c.country ORDER BY o.total_amount) AS rn,
COUNT(*) OVER (PARTITION BY c.country) AS cnt
FROM customers c JOIN orders o ON o.customer_id = c.id
)
SELECT country, AVG(total_amount) AS median_order_value
FROM ranked
WHERE rn IN ((cnt + 1) / 2, (cnt + 2) / 2)
GROUP BY country;
Why this works: PERCENTILE_CONT is the standard median (interpolated). For older MySQL, manually pick the middle row(s) by row number and average them.
Exercise 48: Funnel conversion — signup → first_query → checkout
Prompt: For each user who signed up, return whether they hit first_query and whether they hit checkout after that.
Difficulty: Expert
Show answer
WITH first_event AS (
SELECT
user_id,
MIN(event_date) FILTER (WHERE event_name = 'signup') AS signup_at,
MIN(event_date) FILTER (WHERE event_name = 'first_query') AS first_query_at,
MIN(event_date) FILTER (WHERE event_name = 'checkout') AS checkout_at
FROM events
GROUP BY user_id
)
SELECT
COUNT(*) FILTER (WHERE signup_at IS NOT NULL) AS signups,
COUNT(*) FILTER (WHERE first_query_at >= signup_at) AS reached_first_query,
COUNT(*) FILTER (WHERE checkout_at >= first_query_at) AS reached_checkout,
ROUND(100.0 *
COUNT(*) FILTER (WHERE checkout_at >= first_query_at)
/ NULLIF(COUNT(*) FILTER (WHERE signup_at IS NOT NULL), 0), 2) AS overall_conversion_pct
FROM first_event;
Why this works: Pre-compute the first timestamp of each step per user with MIN ... FILTER, then count step-to-step adherence with the ordering constraint baked in. Funnels are easier when each user becomes one row.
Exercise 49: Year-over-year same-period revenue
Prompt: For each month in 2026, show that month's revenue, the same month's revenue in 2025, and the YoY % change.
Difficulty: Expert
Show answer
WITH monthly AS (
SELECT
EXTRACT(YEAR FROM order_date)::int AS yr,
EXTRACT(MONTH FROM order_date)::int AS mo,
SUM(total_amount) AS revenue
FROM orders
GROUP BY EXTRACT(YEAR FROM order_date), EXTRACT(MONTH FROM order_date)
)
SELECT
cur.mo AS month,
cur.revenue AS revenue_2026,
prv.revenue AS revenue_2025,
ROUND(
100.0 * (cur.revenue - prv.revenue) / NULLIF(prv.revenue, 0),
2
) AS yoy_pct
FROM monthly cur
LEFT JOIN monthly prv
ON prv.mo = cur.mo
AND prv.yr = cur.yr - 1
WHERE cur.yr = 2026
ORDER BY cur.mo;
Why this works: Self-join the monthly aggregate to itself, lining up by mo and shifting the year. Wrap the denominator in NULLIF to keep months without a 2025 baseline from blowing up.
Exercise 50: Cumulative distinct customer count over time
Prompt: For each day, return the cumulative number of unique customers who have placed at least one order up to and including that day.
Difficulty: Expert
Show answer
WITH first_seen AS (
-- The day each customer placed their first order
SELECT customer_id, MIN(order_date) AS first_order_date
FROM orders
GROUP BY customer_id
),
daily_new AS (
SELECT first_order_date AS day, COUNT(*) AS new_customers
FROM first_seen
GROUP BY first_order_date
)
SELECT
day,
new_customers,
SUM(new_customers) OVER (
ORDER BY day
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
) AS cumulative_unique_customers
FROM daily_new
ORDER BY day;
Why this works: Cumulative COUNT DISTINCT is not a standard window function. Trick: per customer, find the day they were first seen, count those per day, then take a running sum. The cumulative sum equals the cumulative distinct count.
How to Practice SQL Effectively
1. Solve before you peek. Spend at least 5 minutes attempting each exercise before opening the answer. The struggle is what builds intuition — reading a clean answer feels like understanding but doesn't transfer to the next problem.
2. Rewrite the answer in your own words. Once you've seen the solution, close it and recreate it from a blank editor. If you can't, you didn't actually internalize it — you recognized it. Recognition fades; reconstruction sticks.
3. Vary the prompt slightly. If the exercise asked for the top 3 per group, redo it for top 5 per group, then top 1 with ties allowed (DENSE_RANK). Tiny variations expose the parts you actually understand.
4. Predict the result before you run. Before hitting "execute," estimate how many rows you'll get and what the first row should look like. If your prediction is wrong, you misread the schema — that's the lesson, not the syntax.
5. Explain the query out loud. The rubber-duck method works for SQL too. Read the query top to bottom describing what each clause does. Window functions and recursive CTEs especially benefit from being verbalized — if you stumble on the explanation, you've found a weak spot.
Frequently Asked Questions
Where can I practice SQL with real data?
The fastest path is a hosted workspace with a pre-seeded demo database — AI2SQL gives you customers, orders, products, and event tables out of the box, so you can paste any exercise above and hit run. Alternatives include local PostgreSQL or SQLite with a public dataset (Chinook, Northwind, IMDB), DB-Fiddle, or SQLZoo. The key is to run queries against a real engine, not just stare at example output, because dialect quirks and NULL handling only show up when you execute.
How long does it take to get good at SQL?
Most working data analysts become productive in 2–4 weeks of daily practice. SELECT, WHERE, GROUP BY, and basic JOINs cover roughly 80% of day-to-day queries and can be learned in a weekend. Window functions, CTEs, and recursive queries are where the curve gets steep — plan on another 1–2 months of regular use before they feel natural. Working through the 50 exercises on this page in order, then rewriting your favorite ones from memory, is a tested 4-week path.
What's the best way to practice SQL JOINs?
Start with INNER JOIN on a 2-table relationship, then deliberately switch to LEFT JOIN and ask which rows are now NULL. The mental model that sticks is: INNER returns intersection, LEFT keeps every left-side row whether or not the right matches. Practice the "find rows in A with no match in B" pattern (LEFT JOIN ... WHERE right.id IS NULL) — it appears in dozens of real reports (inactive users, products never sold, missing translations). Exercises 10, 11, and 25 in this page drill exactly that.
Are these exercises good for SQL interviews?
Yes — the intermediate, advanced, and expert blocks (exercises 16–50) closely mirror what FAANG, fintech, and analytics-heavy companies ask. Top-N per group, gaps and islands, sessionization, cohort retention, running totals, and median calculation are interview staples. For full interview prep, pair this page with the SQL Interview Questions guide and time yourself: 5 minutes for beginner, 10 for intermediate, 15 for advanced. If you can hit those numbers without peeking, you're interview-ready.
How is this different from SQLBolt or LeetCode SQL?
SQLBolt is excellent for first-principles teaching but stops before window functions and recursive CTEs. LeetCode SQL focuses on interview puzzles, which often use contrived schemas. This page uses one consistent ecommerce + HR + events schema across all 50 problems, which is closer to how real work flows — you build context on the data once and apply it everywhere. It also covers PostgreSQL-first syntax with dialect notes, where LeetCode skews MySQL.
Can AI2SQL generate similar exercises for me?
Yes. Describe a topic in plain English (e.g. "give me 5 window function exercises on monthly cohort retention") and AI2SQL generates the prompts, runnable answers, and an explanation of why each one works. You can also point it at your own schema — paste your CREATE TABLE statements and it will tailor exercises to your actual columns, which is the fastest way to onboard onto a new database at work.