SQL Aggregate Functions Interview Questions (2026): COUNT, SUM, AVG, GROUP BY With Real Examples
SQL aggregate function interview questions with answers — COUNT(*) vs COUNT(column), GROUP BY traps, HAVING vs WHERE, DISTINCT pitfalls, ROLLUP and CUBE. Real questions from data interviews in 2026.
Overview
Aggregate functions look easy on the surface, which is exactly why interviewers use them as a junior-vs-senior filter. Subtle differences between COUNT(*) and COUNT(column), HAVING vs WHERE, and how DISTINCT interacts with GROUP BY separate candidates fast. The questions below cover the six aggregation patterns that come up most often, with the trap each one hides.
Questions & Answers
Q1. COUNT(*) vs COUNT(column) vs COUNT(DISTINCT) Common · Easy
Prompt: Given orders(id, customer_id, coupon_code) where coupon_code is sometimes NULL, explain the difference between the three counts.
SELECT
COUNT(*) AS total_orders,
COUNT(coupon_code) AS orders_with_coupon,
COUNT(DISTINCT coupon_code) AS unique_coupons_used
FROM orders;
Why this matters: COUNT(*) counts rows. COUNT(column) counts non-NULL values in that column. COUNT(DISTINCT column) counts unique non-NULL values. Mixing them up is the most common interview SQL mistake — read the question carefully and state which one you mean.
Q2. GROUP BY trap — non-aggregate column Common · Medium
Prompt: What is wrong with this query, and how do you fix it? `SELECT customer_id, name, SUM(total) FROM orders JOIN customers ON ... GROUP BY customer_id;`
-- Wrong on PostgreSQL/SQL Server, silently wrong on MySQL with ONLY_FULL_GROUP_BY off
SELECT customer_id, name, SUM(total)
FROM orders
JOIN customers ON customers.id = orders.customer_id
GROUP BY customer_id;
-- Correct: include every non-aggregate column in GROUP BY
SELECT
customers.id AS customer_id,
customers.name,
SUM(orders.total) AS total_spent
FROM orders
JOIN customers ON customers.id = orders.customer_id
GROUP BY customers.id, customers.name;
Why this matters: When MySQL has ONLY_FULL_GROUP_BY off (legacy default), the first query "works" but returns an arbitrary name per customer_id. PostgreSQL, SQL Server, and modern MySQL throw an error. Always GROUP BY every non-aggregate column you SELECT.
Q3. HAVING vs WHERE Common · Medium
Prompt: Find customers who have placed more than 5 orders this year, with each order over $100.
SELECT
customer_id,
COUNT(*) AS order_count
FROM orders
WHERE order_date >= DATE_TRUNC('year', CURRENT_DATE)
AND total > 100
GROUP BY customer_id
HAVING COUNT(*) > 5
ORDER BY order_count DESC;
Why this matters: WHERE filters rows BEFORE aggregation. HAVING filters groups AFTER aggregation. `WHERE COUNT(*) > 5` is always invalid because COUNT does not exist before GROUP BY runs. Conversely, `HAVING order_date > ...` is legal but inefficient — it filters too late.
Q4. Average that excludes outliers Common · Medium
Prompt: Compute the trimmed average order value, ignoring the top and bottom 5% of orders.
WITH bounds AS (
SELECT
PERCENTILE_CONT(0.05) WITHIN GROUP (ORDER BY total) AS lo,
PERCENTILE_CONT(0.95) WITHIN GROUP (ORDER BY total) AS hi
FROM orders
)
SELECT
ROUND(AVG(total)::numeric, 2) AS trimmed_avg
FROM orders, bounds
WHERE total >= lo AND total <= hi;
Why this matters: AVG is sensitive to outliers — a few huge orders skew the mean. Interviewers ask about trimmed averages and medians to test whether you understand this. PERCENTILE_CONT works on PostgreSQL/Snowflake/BigQuery; MySQL <8.0 needs window-function emulation.
Q5. AVG ignores NULLs (and that matters) Common · Medium
Prompt: Given employees(id, salary, bonus) where bonus may be NULL, compute average total compensation. What changes if you treat NULL bonus as 0?
-- AVG(salary + bonus) returns NULL for any row with NULL bonus,
-- and AVG SKIPS those rows entirely → biased upward
SELECT AVG(salary + bonus) AS biased_avg
FROM employees;
-- Correct: COALESCE NULL to 0 first
SELECT AVG(salary + COALESCE(bonus, 0)) AS true_avg_total_comp
FROM employees;
Why this matters: AVG ignores NULL inputs entirely. When you compute AVG(salary + bonus), rows where bonus is NULL produce NULL totals and are dropped from the average — biasing it upward toward the bonus-getters. COALESCE(bonus, 0) is the fix when "no bonus" should count as zero, not as missing.
Q6. ROLLUP and subtotals Common · Hard
Prompt: For sales(region, category, total), produce a report with subtotals per region, per category, and a grand total.
SELECT
COALESCE(region, 'ALL REGIONS') AS region,
COALESCE(category, 'ALL CATEGORIES') AS category,
SUM(total) AS revenue
FROM sales
GROUP BY ROLLUP (region, category)
ORDER BY region NULLS LAST, category NULLS LAST;
Why this matters: ROLLUP generates subtotals along the grouping hierarchy. Each NULL in the result represents an aggregate level — COALESCE makes the report human-readable. CUBE generates every combination of subtotals (not just hierarchical), useful for OLAP-style summaries.
Common Pitfalls
- Using SUM(DISTINCT col) when you mean SUM over distinct rows — SUM(DISTINCT col) sums unique values, which is rarely what you want.
- AVG of an integer column truncates the result — cast to NUMERIC or FLOAT first.
- Forgetting COUNT(*) counts NULLs but COUNT(column) does not — common source of off-by-N bugs.
- GROUP BY column names vs GROUP BY 1, 2 — the latter is positional and breaks if the SELECT list reorders. Use names in production code.
- Mixing aggregates and window functions without a CTE — window functions run AFTER GROUP BY but before ORDER BY. Stack them in a subquery to avoid surprises.
Practice With AI2SQL
The fastest way to lock in these patterns is to write them by hand, then check against AI2SQL output. Generate variations of every question above with different schemas, rewrite them from scratch, and you will have these patterns memorized within a week.
No credit card required
Frequently Asked Questions
Why does AVG return NULL when there are NULL values?
AVG ignores NULL inputs by design (per the SQL standard). It returns NULL only when ALL values are NULL or the table is empty. If you want NULL to count as zero, use AVG(COALESCE(col, 0)).
When should I use HAVING instead of WHERE?
Use HAVING when the filter depends on an aggregate (HAVING COUNT(*) > 5, HAVING SUM(total) > 1000). Use WHERE for row-level filters that do not need aggregation. Putting non-aggregate filters in HAVING works but is slower.
What is the difference between ROLLUP and CUBE?
ROLLUP generates subtotals along the grouping hierarchy: (region, category), (region), (). CUBE generates every combination: (region, category), (region), (category), (). Use ROLLUP for hierarchical reports, CUBE for full cross-tab summaries.
How can I avoid GROUP BY mistakes in interviews?
Practice the rule: every non-aggregate column in SELECT must be in GROUP BY. AI2SQL generates correct GROUP BY clauses from plain English so you can study working examples and rewrite them.