SQL Tutorial

SQL Subqueries Explained: Complete Guide with Examples (2026)

A comprehensive guide to SQL subqueries covering scalar subqueries, correlated subqueries, EXISTS, IN, NOT EXISTS, subqueries in SELECT/FROM/WHERE clauses, and when to choose subqueries over JOINs and CTEs.

Mar 23, 2026 19 min read

What Is a Subquery?

A subquery is a complete SQL query nested inside another query. The inner query runs and produces a result, which the outer query then uses. You can put subqueries in the SELECT clause, the FROM clause, the WHERE clause, and the HAVING clause.

Think of it like asking a question that depends on the answer to another question. "Show me employees who earn more than the average salary" requires two steps: first, calculate the average salary, then find employees above it. A subquery does both in one statement.

-- The subquery calculates the average
-- The outer query uses it as a filter
SELECT name, salary
FROM employees
WHERE salary > (SELECT AVG(salary) FROM employees);

The subquery (SELECT AVG(salary) FROM employees) runs first and returns a single value (say, 85000). The outer query then becomes WHERE salary > 85000 and returns all employees earning above that.

Here is the sample data we will use:

CREATE TABLE departments (
    id   INT PRIMARY KEY,
    name VARCHAR(50)
);

CREATE TABLE employees (
    id            INT PRIMARY KEY,
    name          VARCHAR(50),
    department_id INT REFERENCES departments(id),
    salary        DECIMAL(10,2),
    hire_date     DATE
);

INSERT INTO departments VALUES (1, 'Engineering'), (2, 'Marketing'), (3, 'Sales'), (4, 'HR');

INSERT INTO employees VALUES
(1,  'Alice',   1, 120000, '2022-03-15'),
(2,  'Bob',     1,  75000, '2024-01-10'),
(3,  'Charlie', 1, 115000, '2021-07-22'),
(4,  'Diana',   2,  95000, '2023-05-01'),
(5,  'Eve',     2,  70000, '2024-06-15'),
(6,  'Frank',   3,  65000, '2023-11-20'),
(7,  'Grace',   3,  68000, '2022-09-01'),
(8,  'Hank',    3,  90000, '2021-02-14'),
(9,  'Ivy',     1,  72000, '2025-01-08'),
(10, 'Jack',    NULL, 85000, '2024-08-30');

Subqueries in WHERE

The most common place for a subquery is the WHERE clause. There are several patterns.

Scalar subquery (returns one value)

-- Employees earning above the company average
SELECT name, salary
FROM employees
WHERE salary > (SELECT AVG(salary) FROM employees);

-- The subquery returns: 85500
-- So this becomes: WHERE salary > 85500

name	salary
Alice	120000.00
Charlie	115000.00
Diana	95000.00
Hank	90000.00

IN subquery (returns a list)

-- Employees in departments that have more than 2 people
SELECT name, department_id
FROM employees
WHERE department_id IN (
    SELECT department_id
    FROM employees
    WHERE department_id IS NOT NULL
    GROUP BY department_id
    HAVING COUNT(*) > 2
);

name	department_id
Alice	1
Bob	1
Charlie	1
Frank	3
Grace	3
Hank	3
Ivy	1

The subquery returns the list (1, 3) because Engineering has 4 employees and Sales has 3. The outer query then returns all employees in those departments.

NOT IN: watch out for NULLs

-- DANGEROUS: NOT IN with NULLs
-- If the subquery returns ANY null, NOT IN returns NOTHING
SELECT name FROM employees
WHERE department_id NOT IN (
    SELECT department_id FROM employees  -- includes NULL!
);
-- Returns: empty set (because NULL is in the list)

-- SAFE: use NOT EXISTS instead
SELECT e.name FROM employees e
WHERE NOT EXISTS (
    SELECT 1 FROM employees e2
    WHERE e2.department_id = e.department_id
    AND e2.id != e.id
);

-- Or filter NULLs explicitly
SELECT name FROM employees
WHERE department_id NOT IN (
    SELECT department_id FROM employees
    WHERE department_id IS NOT NULL
);

Critical rule: Never use NOT IN with a subquery that might return NULL values. Use NOT EXISTS instead. This is one of the most common SQL bugs in production code.

EXISTS and NOT EXISTS

EXISTS checks whether a subquery returns any rows at all. It returns TRUE or FALSE and is the most efficient way to check for the presence or absence of related data.

-- Find departments that have at least one employee
SELECT d.name
FROM departments d
WHERE EXISTS (
    SELECT 1
    FROM employees e
    WHERE e.department_id = d.id
);

name
Engineering
Marketing
Sales

HR is excluded because no employee has department_id = 4.

-- NOT EXISTS: find departments with NO employees
SELECT d.name
FROM departments d
WHERE NOT EXISTS (
    SELECT 1
    FROM employees e
    WHERE e.department_id = d.id
);

name
HR

EXISTS vs IN performance: EXISTS stops scanning as soon as it finds the first matching row. IN must build the complete list of values. For large subquery results, EXISTS is usually faster. For small result sets, the difference is negligible.

-- Equivalent queries, EXISTS is typically faster for large tables
-- Using IN
SELECT name FROM customers
WHERE id IN (SELECT customer_id FROM orders WHERE amount > 1000);

-- Using EXISTS (preferred for large tables)
SELECT name FROM customers c
WHERE EXISTS (
    SELECT 1 FROM orders o
    WHERE o.customer_id = c.id AND o.amount > 1000
);

Correlated Subqueries

A correlated subquery references columns from the outer query. This means it re-executes for each row of the outer query, unlike a regular subquery which runs once.

-- Employees earning above their department's average
SELECT e.name, e.salary, e.department_id
FROM employees e
WHERE e.department_id IS NOT NULL
  AND e.salary > (
    SELECT AVG(e2.salary)
    FROM employees e2
    WHERE e2.department_id = e.department_id
);

name	salary	department_id
Alice	120000.00	1
Charlie	115000.00	1
Diana	95000.00	2
Hank	90000.00	3

For Alice (Engineering), the subquery calculates the Engineering average (95500). Since 120000 > 95500, Alice is included. The same subquery runs again for Bob with the same department, but 75000 is not above 95500 so Bob is excluded.

You can often rewrite correlated subqueries using window functions for better performance:

-- Same result using a window function (often faster)
SELECT name, salary, department_id
FROM (
    SELECT
        name, salary, department_id,
        AVG(salary) OVER (PARTITION BY department_id) AS dept_avg
    FROM employees
    WHERE department_id IS NOT NULL
) sub
WHERE salary > dept_avg;

Subqueries in SELECT

A scalar subquery in SELECT adds a computed column to each row. The subquery must return exactly one value.

-- Show each employee with their department's headcount
SELECT
    e.name,
    e.salary,
    (SELECT d.name FROM departments d WHERE d.id = e.department_id) AS dept_name,
    (SELECT COUNT(*)
     FROM employees e2
     WHERE e2.department_id = e.department_id) AS dept_headcount
FROM employees e
WHERE e.department_id IS NOT NULL;

name	salary	dept_name	dept_headcount
Alice	120000	Engineering	4
Bob	75000	Engineering	4
Diana	95000	Marketing	2
Frank	65000	Sales	3

While this works, a JOIN or window function is usually cleaner and faster for this pattern:

-- Cleaner version with JOIN
SELECT e.name, e.salary, d.name AS dept_name,
    COUNT(*) OVER (PARTITION BY e.department_id) AS dept_headcount
FROM employees e
JOIN departments d ON d.id = e.department_id;

Subqueries in FROM (Derived Tables)

A subquery in the FROM clause creates a temporary result set (derived table) that you can query like a regular table. This is useful for multi-step transformations.

-- Find departments where the top salary exceeds 100K
SELECT dept_stats.dept_name, dept_stats.max_salary, dept_stats.headcount
FROM (
    SELECT
        d.name AS dept_name,
        MAX(e.salary) AS max_salary,
        COUNT(*) AS headcount,
        AVG(e.salary) AS avg_salary
    FROM employees e
    JOIN departments d ON d.id = e.department_id
    GROUP BY d.name
) AS dept_stats
WHERE dept_stats.max_salary > 100000;

dept_name	max_salary	headcount
Engineering	120000.00	4

For readability, CTEs (Common Table Expressions) are generally preferred over derived tables:

-- Same query using a CTE (more readable)
WITH dept_stats AS (
    SELECT
        d.name AS dept_name,
        MAX(e.salary) AS max_salary,
        COUNT(*) AS headcount
    FROM employees e
    JOIN departments d ON d.id = e.department_id
    GROUP BY d.name
)
SELECT dept_name, max_salary, headcount
FROM dept_stats
WHERE max_salary > 100000;

Subquery vs JOIN vs CTE: When to Use Which

The same problem can often be solved with a subquery, a JOIN, or a CTE. Here is when to prefer each:

Use a subquery when:

You need to filter based on an aggregate value (WHERE salary > (SELECT AVG...))
You are doing an existence check (EXISTS/NOT EXISTS)
The inner query is simple and self-contained

Use a JOIN when:

You need columns from both tables in the output
You are combining rows from related tables
The relationship is straightforward (foreign key match)

Use a CTE when:

You need to reference the same subquery result multiple times
The query has multiple logical steps
Readability is important (CTEs have descriptive names)
You need recursion

-- Three ways to express the same thing:

-- 1. Subquery approach
SELECT name FROM employees
WHERE department_id IN (
    SELECT id FROM departments WHERE name = 'Engineering'
);

-- 2. JOIN approach
SELECT e.name
FROM employees e
JOIN departments d ON d.id = e.department_id
WHERE d.name = 'Engineering';

-- 3. CTE approach
WITH eng AS (
    SELECT id FROM departments WHERE name = 'Engineering'
)
SELECT e.name
FROM employees e
JOIN eng ON eng.id = e.department_id;

For this simple case, the JOIN is cleanest. All three produce the same execution plan in most databases. The choice is about readability and maintainability, not performance.

For more on JOINs, see our SQL Joins guide. For CTEs, see our CTE guide.

Real-World Subquery Patterns

1. Find customers who never ordered

-- NOT EXISTS is the best approach here
SELECT c.name, c.email
FROM customers c
WHERE NOT EXISTS (
    SELECT 1 FROM orders o WHERE o.customer_id = c.id
);

2. Latest record per group

-- Most recent order per customer using a correlated subquery
SELECT o.*
FROM orders o
WHERE o.order_date = (
    SELECT MAX(o2.order_date)
    FROM orders o2
    WHERE o2.customer_id = o.customer_id
);

3. Running comparison with aggregates

-- Products priced above their category average
SELECT p.name, p.price, p.category
FROM products p
WHERE p.price > (
    SELECT AVG(p2.price)
    FROM products p2
    WHERE p2.category = p.category
)
ORDER BY p.category, p.price DESC;

4. Multi-level aggregation

-- Average of department averages (average salary across departments)
SELECT ROUND(AVG(dept_avg), 2) AS avg_of_dept_averages
FROM (
    SELECT department_id, AVG(salary) AS dept_avg
    FROM employees
    WHERE department_id IS NOT NULL
    GROUP BY department_id
) AS dept_avgs;

5. Conditional aggregation with subqueries

-- Departments where the newest hire earns more than the oldest hire
SELECT d.name AS department
FROM departments d
WHERE (
    SELECT salary FROM employees e
    WHERE e.department_id = d.id
    ORDER BY hire_date DESC LIMIT 1
) > (
    SELECT salary FROM employees e
    WHERE e.department_id = d.id
    ORDER BY hire_date ASC LIMIT 1
);

Let AI write your subqueries. Describe the data you need in plain English, and AI2SQL generates the correct subquery, JOIN, or CTE for your database.

Frequently Asked Questions

What is a subquery in SQL?

A subquery is a SQL query nested inside another query. It can appear in the SELECT, FROM, WHERE, or HAVING clause. The inner query runs first (or for each row in correlated subqueries), and its result is used by the outer query. Subqueries let you break complex problems into smaller, logical steps and reference computed values without creating temporary tables.

What is the difference between a subquery and a JOIN?

A JOIN combines columns from two or more tables into a single result set. A subquery is a complete query nested inside another query. JOINs are better when you need columns from both tables in the output. Subqueries are better for existence checks (EXISTS), value filtering (IN), and when the inner query produces an aggregate that the outer query uses as a filter.

What is a correlated subquery?

A correlated subquery references columns from the outer query, so it must re-execute for every row of the outer query. For example, finding employees earning above their department's average requires the subquery to calculate a different average for each employee's department. Correlated subqueries can be slow on large tables, but they are sometimes the most natural way to express a condition.

When should I use EXISTS vs IN?

Use EXISTS when you only need to check whether matching rows exist, especially with large subquery results. EXISTS stops as soon as it finds the first match. Use IN when the subquery returns a small list of values. Avoid IN with large result sets. Most importantly, never use NOT IN with a subquery that might return NULL values. Use NOT EXISTS instead, because NOT IN returns no results if any NULL is in the list.

Can AI generate subqueries from plain English?

Yes. AI2SQL lets you describe what you need in plain English, such as "find customers who have never placed an order" or "show employees earning above their department average," and the AI generates the correct subquery or CTE. This is especially helpful because deciding between subqueries, JOINs, and CTEs requires understanding their trade-offs, which the AI handles automatically.