SQL Tutorial

SQL GROUP BY Explained: Complete Guide with Examples (2026)

A comprehensive guide to SQL GROUP BY with practical examples covering aggregate functions, HAVING, multiple columns, and real-world patterns used in production databases every day.

Mar 11, 2026 20 min read

What Does GROUP BY Do in SQL?

GROUP BY takes a table full of individual rows and collapses them into groups based on one or more columns. Once rows are grouped, you use aggregate functions like COUNT, SUM, and AVG to calculate a single value for each group.

Think of it like sorting a deck of playing cards by suit. You start with 52 individual cards scattered across the table. After grouping by suit, you have four neat piles: hearts, diamonds, clubs, and spades. Each pile is a group. You can then count how many cards are in each pile, find the highest card, or calculate the average value.

Without GROUP BY, aggregate functions operate on the entire table and return a single row. With GROUP BY, they operate on each group independently and return one row per group.

-- Without GROUP BY: one result for the whole table
SELECT COUNT(*) AS total_employees
FROM employees;
-- Returns: 50

-- With GROUP BY: one result per department
SELECT department, COUNT(*) AS employee_count
FROM employees
GROUP BY department;
-- Returns: one row for each department

This is the fundamental building block of data analysis in SQL. Every report, dashboard, and summary you have ever seen is powered by GROUP BY under the hood. Whether you are calculating monthly revenue, counting users by country, or finding average order values by product category, GROUP BY is what makes it possible.

Here is the sample data we will use throughout this guide:

CREATE TABLE employees (
    id         INT PRIMARY KEY,
    name       VARCHAR(50),
    department VARCHAR(30),
    job_title  VARCHAR(30),
    salary     DECIMAL(10,2),
    hire_date  DATE
);

INSERT INTO employees VALUES
(1,  'Alice',   'Engineering', 'Senior Dev',    120000, '2022-03-15'),
(2,  'Bob',     'Engineering', 'Junior Dev',     75000, '2024-01-10'),
(3,  'Charlie', 'Engineering', 'Senior Dev',    115000, '2021-07-22'),
(4,  'Diana',   'Marketing',   'Manager',        95000, '2023-05-01'),
(5,  'Eve',     'Marketing',   'Analyst',         70000, '2024-06-15'),
(6,  'Frank',   'Sales',       'Rep',             65000, '2023-11-20'),
(7,  'Grace',   'Sales',       'Rep',             68000, '2022-09-01'),
(8,  'Hank',    'Sales',       'Manager',         90000, '2021-02-14'),
(9,  'Ivy',     'Engineering', 'Junior Dev',      72000, '2025-01-08'),
(10, 'Jack',    NULL,          'Consultant',      85000, '2024-08-30');

Notice that Jack has a NULL department. We will see how GROUP BY handles this edge case later in the guide.

Basic GROUP BY Syntax

The GROUP BY clause comes after WHERE (if present) and before ORDER BY. Here is the full execution order of a SQL query:

SELECT column, AGGREGATE(column)   -- 5. Select and compute
FROM table                          -- 1. From which table
WHERE condition                     -- 2. Filter individual rows
GROUP BY column                     -- 3. Group the remaining rows
HAVING condition                    -- 4. Filter groups
ORDER BY column;                    -- 6. Sort the final result

Understanding this execution order is critical. WHERE runs before grouping, so it filters individual rows. HAVING runs after grouping, so it filters groups. We will cover HAVING in detail below.

Simple example

SELECT department, COUNT(*) AS employee_count
FROM employees
WHERE department IS NOT NULL
GROUP BY department;

Result

department	employee_count
Engineering	4
Marketing	2
Sales	3

The database takes the 9 non-NULL rows, groups them into three buckets (Engineering, Marketing, Sales), and counts the rows in each bucket. The result is one row per department.

Key rule: Every column in your SELECT must either appear in the GROUP BY clause or be inside an aggregate function. If you try to SELECT name without grouping by it or aggregating it, you will get an error in most databases because the database does not know which name to pick from a group of rows.

GROUP BY with Aggregate Functions

Aggregate functions are the reason GROUP BY exists. They take multiple values from a group of rows and condense them into one value. Here are the five core aggregate functions with detailed examples.

COUNT(*) vs COUNT(column)

COUNT has two forms that behave differently:

COUNT(*) counts all rows in each group, including rows with NULLs
COUNT(column) counts only non-NULL values in that column

SELECT
    department,
    COUNT(*)           AS total_rows,
    COUNT(department)  AS non_null_dept
FROM employees
GROUP BY department;

Result

department	total_rows	non_null_dept
Engineering	4	4
Marketing	2	2
Sales	3	3
NULL	1	0

The NULL group has total_rows = 1 but non_null_dept = 0 because COUNT(department) skips NULL values. This distinction matters when you are counting optional fields.

You can also count distinct values within each group:

SELECT
    department,
    COUNT(*) AS total_employees,
    COUNT(DISTINCT job_title) AS unique_titles
FROM employees
WHERE department IS NOT NULL
GROUP BY department;

department	total_employees	unique_titles
Engineering	4	2
Marketing	2	2
Sales	3	2

Engineering has 4 employees but only 2 unique job titles (Senior Dev and Junior Dev).

SUM() with Sales Data

SUM adds up numeric values within each group. Here is a practical example with an orders table:

CREATE TABLE orders (
    id          INT PRIMARY KEY,
    customer_id INT,
    region      VARCHAR(20),
    amount      DECIMAL(10,2),
    order_date  DATE
);

INSERT INTO orders VALUES
(1, 101, 'North', 250.00, '2026-01-15'),
(2, 102, 'South', 180.00, '2026-01-20'),
(3, 103, 'North', 320.00, '2026-02-05'),
(4, 101, 'North', 150.00, '2026-02-10'),
(5, 104, 'South', 420.00, '2026-02-18'),
(6, 102, 'South', 275.00, '2026-03-01'),
(7, 105, 'West',  190.00, '2026-03-05'),
(8, 103, 'North', 380.00, '2026-03-10');

SELECT
    region,
    COUNT(*) AS order_count,
    SUM(amount) AS total_revenue,
    ROUND(SUM(amount) / COUNT(*), 2) AS avg_order_value
FROM orders
GROUP BY region;

region	order_count	total_revenue	avg_order_value
North	4	1100.00	275.00
South	3	875.00	291.67
West	1	190.00	190.00

SUM ignores NULL values. If a row has amount = NULL, it is skipped in the sum calculation, not treated as zero.

AVG() with Salary Data

AVG calculates the arithmetic mean of numeric values in each group:

SELECT
    department,
    COUNT(*) AS headcount,
    ROUND(AVG(salary), 2) AS avg_salary,
    MIN(salary) AS lowest_salary,
    MAX(salary) AS highest_salary,
    MAX(salary) - MIN(salary) AS salary_spread
FROM employees
WHERE department IS NOT NULL
GROUP BY department;

department	headcount	avg_salary	lowest_salary	highest_salary	salary_spread
Engineering	4	95500.00	72000.00	120000.00	48000.00
Marketing	2	82500.00	70000.00	95000.00	25000.00
Sales	3	74333.33	65000.00	90000.00	25000.00

This single query gives you a complete salary overview per department. Combining multiple aggregate functions in one SELECT is one of the most powerful patterns in SQL analytics.

Important: AVG skips NULL values entirely. If you have 5 rows in a group but 2 have NULL salaries, AVG computes the average of the remaining 3 values, not 5. This can lead to misleading results if you do not account for missing data.

MIN() and MAX()

MIN and MAX return the smallest and largest values in each group. They work on numbers, strings, and dates:

SELECT
    department,
    MIN(hire_date) AS first_hire,
    MAX(hire_date) AS latest_hire,
    MIN(name) AS first_alphabetically,
    MAX(salary) AS top_salary
FROM employees
WHERE department IS NOT NULL
GROUP BY department;

department	first_hire	latest_hire	first_alphabetically	top_salary
Engineering	2021-07-22	2025-01-08	Alice	120000.00
Marketing	2023-05-01	2024-06-15	Diana	95000.00
Sales	2021-02-14	2023-11-20	Frank	90000.00

MIN and MAX on dates are especially useful for finding the earliest and latest events in each group, like first purchase date per customer or most recent login per user.

For more aggregate function patterns, see our SQL cheat sheet.

GROUP BY with HAVING

HAVING filters groups after aggregation, the same way WHERE filters rows before aggregation. This is one of the most commonly confused topics in SQL, so let us make it clear.

-- WHERE: filters rows BEFORE grouping
-- HAVING: filters groups AFTER grouping

-- Find departments with more than 2 employees
SELECT department, COUNT(*) AS headcount
FROM employees
WHERE department IS NOT NULL
GROUP BY department
HAVING COUNT(*) > 2;

department	headcount
Engineering	4
Sales	3

Marketing (2 employees) is excluded by the HAVING filter. Here is how the execution works step by step:

FROM employees - start with all 10 rows
WHERE department IS NOT NULL - drops Jack, leaving 9 rows
GROUP BY department - creates 3 groups (Engineering: 4, Marketing: 2, Sales: 3)
HAVING COUNT(*) > 2 - removes Marketing group
SELECT - returns department and count for surviving groups

Combining WHERE and HAVING

You can use both in the same query. WHERE narrows the rows, HAVING narrows the groups:

-- Departments where the average salary of employees
-- hired after 2023 exceeds $70,000
SELECT
    department,
    COUNT(*) AS recent_hires,
    ROUND(AVG(salary), 2) AS avg_salary
FROM employees
WHERE hire_date >= '2023-01-01'
  AND department IS NOT NULL
GROUP BY department
HAVING AVG(salary) > 70000;

department	recent_hires	avg_salary
Marketing	2	82500.00
Sales	1	90000.00

Multiple HAVING conditions

-- Departments with 2+ people AND total payroll over $150K
SELECT
    department,
    COUNT(*) AS headcount,
    SUM(salary) AS total_payroll
FROM employees
WHERE department IS NOT NULL
GROUP BY department
HAVING COUNT(*) >= 2
   AND SUM(salary) > 150000;

department	headcount	total_payroll
Engineering	4	382000.00
Sales	3	223000.00

Common mistake: Trying to use an aggregate in WHERE. This will always fail because WHERE runs before grouping:

-- WRONG: aggregate in WHERE
SELECT department, COUNT(*)
FROM employees
WHERE COUNT(*) > 2   -- ERROR!
GROUP BY department;

-- RIGHT: aggregate in HAVING
SELECT department, COUNT(*)
FROM employees
GROUP BY department
HAVING COUNT(*) > 2;  -- Correct

If you are studying for SQL interviews, the WHERE vs HAVING distinction is a question that comes up constantly.

GROUP BY Multiple Columns

You can group by two or more columns to get finer-grained aggregations. The result contains one row for each unique combination of the grouped columns.

SELECT
    department,
    job_title,
    COUNT(*) AS headcount,
    ROUND(AVG(salary), 2) AS avg_salary
FROM employees
WHERE department IS NOT NULL
GROUP BY department, job_title
ORDER BY department, job_title;

department	job_title	headcount	avg_salary
Engineering	Junior Dev	2	73500.00
Engineering	Senior Dev	2	117500.00
Marketing	Analyst	1	70000.00
Marketing	Manager	1	95000.00
Sales	Manager	1	90000.00
Sales	Rep	2	66500.00

Instead of 3 rows (one per department), we now get 6 rows (one per department-title combination). This is how you drill down into your data. Each additional column in GROUP BY creates a more specific breakdown.

When to use multiple columns: Whenever you need a breakdown within a breakdown. Revenue by region and month. Users by country and plan type. Orders by category and status. If your stakeholder says "break it down by X and Y," you need GROUP BY x, y.

GROUP BY with ORDER BY

GROUP BY does not guarantee any particular order in the results. Always pair it with ORDER BY when you need sorted output:

-- Sort by aggregate value descending
SELECT
    department,
    SUM(salary) AS total_payroll
FROM employees
WHERE department IS NOT NULL
GROUP BY department
ORDER BY total_payroll DESC;

department	total_payroll
Engineering	382000.00
Sales	223000.00
Marketing	165000.00

TOP N patterns

Combine GROUP BY with ORDER BY and LIMIT to get the top or bottom results:

-- Top 3 customers by total spending
SELECT
    customer_id,
    COUNT(*) AS order_count,
    SUM(amount) AS total_spent
FROM orders
GROUP BY customer_id
ORDER BY total_spent DESC
LIMIT 3;

customer_id	order_count	total_spent
103	2	700.00
102	2	455.00
104	1	420.00

This pattern is the backbone of ranking reports. Top products, worst-performing regions, most active users. You will write this pattern hundreds of times in your career.

-- In SQL Server, use TOP instead of LIMIT
SELECT TOP 3
    customer_id,
    SUM(amount) AS total_spent
FROM orders
GROUP BY customer_id
ORDER BY total_spent DESC;

For more patterns combining GROUP BY with JOINs and subqueries, see our joins guide.

Common GROUP BY Mistakes

These are the errors that trip up everyone from beginners to experienced developers.

Mistake 1: Selecting non-aggregated columns

-- WRONG: 'name' is not in GROUP BY or an aggregate
SELECT department, name, COUNT(*)
FROM employees
GROUP BY department;
-- Error: column "name" must appear in GROUP BY clause
-- or be used in an aggregate function

-- FIX OPTION 1: add to GROUP BY (changes the result)
SELECT department, name, COUNT(*)
FROM employees
GROUP BY department, name;

-- FIX OPTION 2: aggregate it
SELECT department, MAX(name) AS sample_name, COUNT(*)
FROM employees
GROUP BY department;

-- FIX OPTION 3: use STRING_AGG to list all names (PostgreSQL)
SELECT department, STRING_AGG(name, ', ') AS all_names, COUNT(*)
FROM employees
WHERE department IS NOT NULL
GROUP BY department;

Warning about MySQL: With ONLY_FULL_GROUP_BY disabled, MySQL silently picks an arbitrary value for non-aggregated columns. This is technically valid MySQL but it produces unpredictable results. Always write standard SQL that works across all databases.

Mistake 2: Using WHERE instead of HAVING

-- WRONG: can't use aggregate in WHERE
SELECT department, AVG(salary)
FROM employees
WHERE AVG(salary) > 80000
GROUP BY department;
-- ERROR: aggregate functions not allowed in WHERE

-- RIGHT: use HAVING for aggregate conditions
SELECT department, AVG(salary)
FROM employees
GROUP BY department
HAVING AVG(salary) > 80000;

Rule of thumb: If your condition involves a raw column value (like salary > 50000), use WHERE. If it involves an aggregate result (like AVG(salary) > 80000), use HAVING.

Mistake 3: Forgetting how NULLs behave

-- NULL values get their own group
SELECT department, COUNT(*) AS cnt
FROM employees
GROUP BY department;

-- Results include a row where department = NULL
-- This might skew your reports

-- Fix: filter NULLs in WHERE if you don't want them
SELECT department, COUNT(*) AS cnt
FROM employees
WHERE department IS NOT NULL
GROUP BY department;

All NULL values are grouped together into a single group. If your table has 100 rows with department = NULL, they all land in one NULL group. This is standard SQL behavior across every database.

Mistake 4: Wrong COUNT column

-- These produce different results!
SELECT department,
    COUNT(*) AS count_all,          -- counts every row
    COUNT(manager_id) AS count_mgr  -- counts non-NULL manager_id only
FROM employees
GROUP BY department;

-- If you want to count rows, always use COUNT(*)
-- If you want to count non-NULL values in a specific column, use COUNT(column)

Using COUNT(some_nullable_column) when you meant COUNT(*) is a subtle bug that can undercount your data without any error message.

Real-World GROUP BY Patterns

These are queries you will encounter in production databases. Copy and adapt them to your own tables.

1. Monthly revenue report

SELECT
    DATE_TRUNC('month', order_date) AS month,
    COUNT(*) AS total_orders,
    SUM(amount) AS revenue,
    ROUND(AVG(amount), 2) AS avg_order_value,
    COUNT(DISTINCT customer_id) AS unique_customers
FROM orders
WHERE order_date >= '2026-01-01'
GROUP BY DATE_TRUNC('month', order_date)
ORDER BY month;

month	total_orders	revenue	avg_order_value	unique_customers
2026-01-01	2	430.00	215.00	2
2026-02-01	3	890.00	296.67	3
2026-03-01	3	845.00	281.67	3

MySQL alternative: Use DATE_FORMAT(order_date, '%Y-%m-01') instead of DATE_TRUNC.

2. Top 10 customers by spend

SELECT
    c.id AS customer_id,
    c.name,
    c.email,
    COUNT(o.id) AS order_count,
    SUM(o.amount) AS total_spent,
    MAX(o.order_date) AS last_order
FROM customers c
INNER JOIN orders o ON c.id = o.customer_id
GROUP BY c.id, c.name, c.email
ORDER BY total_spent DESC
LIMIT 10;

This query combines GROUP BY with a JOIN to pull customer names alongside their aggregated order data. The JOIN runs first, then GROUP BY collapses the joined rows.

3. User activity by day of week

SELECT
    EXTRACT(DOW FROM login_date) AS day_number,
    CASE EXTRACT(DOW FROM login_date)
        WHEN 0 THEN 'Sunday'
        WHEN 1 THEN 'Monday'
        WHEN 2 THEN 'Tuesday'
        WHEN 3 THEN 'Wednesday'
        WHEN 4 THEN 'Thursday'
        WHEN 5 THEN 'Friday'
        WHEN 6 THEN 'Saturday'
    END AS day_name,
    COUNT(DISTINCT user_id) AS unique_users,
    COUNT(*) AS total_logins
FROM user_logins
WHERE login_date >= CURRENT_DATE - INTERVAL '30 days'
GROUP BY EXTRACT(DOW FROM login_date)
ORDER BY day_number;

This pattern helps you find when your users are most active. You can group by hour, day of week, or month to discover usage patterns. Product teams use this to decide when to schedule maintenance windows or send notification emails.

4. Category sales comparison

SELECT
    p.category,
    COUNT(DISTINCT o.id) AS orders,
    COUNT(DISTINCT o.customer_id) AS buyers,
    SUM(oi.quantity) AS units_sold,
    SUM(oi.quantity * oi.unit_price) AS gross_revenue,
    ROUND(SUM(oi.quantity * oi.unit_price) / COUNT(DISTINCT o.id), 2) AS revenue_per_order
FROM order_items oi
INNER JOIN orders o ON oi.order_id = o.id
INNER JOIN products p ON oi.product_id = p.id
WHERE o.order_date >= '2026-01-01'
GROUP BY p.category
ORDER BY gross_revenue DESC;

Multi-table GROUP BY queries like this are the foundation of every e-commerce dashboard. Notice how we use COUNT(DISTINCT) to avoid inflating numbers when a single order contains multiple items.

5. Finding duplicates

-- Find duplicate email addresses
SELECT
    email,
    COUNT(*) AS duplicate_count
FROM users
GROUP BY email
HAVING COUNT(*) > 1
ORDER BY duplicate_count DESC;

email	duplicate_count
john@example.com	3
jane@example.com	2

This is the simplest and most reliable way to find duplicates. GROUP BY the column you suspect has duplicates, then use HAVING COUNT(*) > 1 to show only the values that appear more than once. This is a common pattern in SQL interviews and data cleaning workflows.

To see the actual duplicate rows with all their details:

-- Get full rows for duplicate emails
SELECT *
FROM users
WHERE email IN (
    SELECT email
    FROM users
    GROUP BY email
    HAVING COUNT(*) > 1
)
ORDER BY email, id;

Let AI write these queries for you. Describe the report you need in plain English, and AI2SQL generates the GROUP BY query with the right aggregations.

GROUP BY vs DISTINCT

GROUP BY without aggregate functions and SELECT DISTINCT both return unique values. But they serve different purposes and have different performance characteristics.

-- These produce the same result:
SELECT DISTINCT department FROM employees;
SELECT department FROM employees GROUP BY department;

-- But only GROUP BY can aggregate:
SELECT department, COUNT(*)
FROM employees
GROUP BY department;
-- DISTINCT cannot do this

When to use DISTINCT: When you just want unique values without any aggregation. It communicates your intent more clearly.

When to use GROUP BY: When you need to compute aggregate values (COUNT, SUM, AVG, etc.) for each group. This is the only option for aggregation.

-- DISTINCT: "give me the unique departments"
SELECT DISTINCT department
FROM employees
WHERE department IS NOT NULL;

-- GROUP BY: "give me each department with its employee count"
SELECT department, COUNT(*) AS headcount
FROM employees
WHERE department IS NOT NULL
GROUP BY department;

In terms of performance, most modern database optimizers produce the same execution plan for simple cases. For complex queries with large datasets, test both approaches with EXPLAIN and pick the faster one.

For more comparisons like this, check our SQL WHERE clause guide which covers filtering patterns in depth.

GROUP BY Performance Tips

GROUP BY operations can be expensive on large tables because the database needs to sort or hash all the rows. Here is how to keep them fast.

1. Index your GROUP BY columns

-- Create an index on columns you frequently group by
CREATE INDEX idx_orders_region ON orders(region);
CREATE INDEX idx_orders_date ON orders(order_date);

-- For multi-column GROUP BY, use a composite index
-- Column order should match your GROUP BY clause
CREATE INDEX idx_orders_region_date ON orders(region, order_date);

An index lets the database read pre-sorted data instead of sorting it at query time. For GROUP BY queries on millions of rows, this can be the difference between seconds and minutes.

2. Avoid functions on grouped columns

-- SLOW: function on column prevents index use
SELECT YEAR(order_date) AS yr, SUM(amount)
FROM orders
GROUP BY YEAR(order_date);

-- FASTER: use a range filter + direct column grouping
SELECT order_date, SUM(amount)
FROM orders
WHERE order_date >= '2026-01-01' AND order_date < '2027-01-01'
GROUP BY order_date;

-- OR: add a computed/generated column and index it
ALTER TABLE orders ADD COLUMN order_year INT
    GENERATED ALWAYS AS (YEAR(order_date));
CREATE INDEX idx_orders_year ON orders(order_year);

When you wrap a column in a function like YEAR(), LOWER(), or DATE_TRUNC(), the database cannot use an index on that column. It must evaluate the function for every single row.

3. Filter early with WHERE

-- SLOWER: groups all rows, then filters
SELECT department, AVG(salary)
FROM employees
GROUP BY department
HAVING department != 'Intern';

-- FASTER: filters rows first, then groups
SELECT department, AVG(salary)
FROM employees
WHERE department != 'Intern'
GROUP BY department;

Move any condition that does not involve an aggregate from HAVING to WHERE. WHERE reduces the number of rows the database needs to group, which is always faster.

4. Use approximate counts for large datasets

-- PostgreSQL: approximate distinct count (much faster on huge tables)
SELECT
    region,
    COUNT(*) AS exact_count
    -- For very large tables, some databases support:
    -- APPROX_COUNT_DISTINCT(customer_id) AS approx_unique
FROM orders
GROUP BY region;

When exact precision is not required (dashboards, monitoring), approximate aggregation functions can be orders of magnitude faster on tables with hundreds of millions of rows.

5. Check your query plan

-- PostgreSQL
EXPLAIN ANALYZE
SELECT department, COUNT(*), AVG(salary)
FROM employees
GROUP BY department;

-- Look for:
-- "Sort" or "HashAggregate" = how the database groups
-- "Seq Scan" = no index being used
-- "Index Scan" or "Index Only Scan" = index is helping

-- MySQL
EXPLAIN
SELECT department, COUNT(*), AVG(salary)
FROM employees
GROUP BY department;

Always validate your assumptions with EXPLAIN. What you think is fast might not be. What you think needs an index might already be optimized by the query planner.

For a complete reference of SQL optimization techniques, see our guide on SQL cheat sheet.

Frequently Asked Questions

What is the difference between WHERE and HAVING in SQL?

WHERE filters individual rows before grouping happens. HAVING filters groups after the aggregation is complete. Use WHERE to exclude rows you do not want in the calculation (e.g., WHERE status = 'active'), and HAVING to filter based on aggregate results (e.g., HAVING COUNT(*) > 5). You can use both in the same query: WHERE runs first, then GROUP BY groups the surviving rows, then HAVING filters those groups.

Can I use GROUP BY without an aggregate function?

Yes, GROUP BY without an aggregate function works like SELECT DISTINCT. It returns one row per unique combination of the grouped columns. However, this is considered bad practice because it obscures your intent. If you just want unique values, use SELECT DISTINCT instead. GROUP BY is designed for aggregation, so always pair it with at least one aggregate function like COUNT, SUM, AVG, MIN, or MAX.

Why does my GROUP BY query give an error about non-aggregated columns?

In standard SQL, every column in your SELECT list must either appear in the GROUP BY clause or be wrapped in an aggregate function. If you SELECT name, department, COUNT(*) but only GROUP BY department, the database does not know which name to show for each department group. Fix this by either adding the column to GROUP BY or wrapping it in an aggregate like MAX(name) or ANY_VALUE(name). MySQL with ONLY_FULL_GROUP_BY disabled may silently pick an arbitrary value, which can cause subtle bugs.

How does GROUP BY handle NULL values?

GROUP BY treats all NULL values as belonging to the same group. If you have rows where the grouped column is NULL, they will all be collapsed into a single group in the result. This applies to all major databases including PostgreSQL, MySQL, SQL Server, and Oracle. If you want to exclude NULLs from the grouping, add a WHERE clause: WHERE column IS NOT NULL before the GROUP BY.

Can AI generate GROUP BY queries for me?

Yes. Tools like AI2SQL let you describe the summary you need in plain English, such as "show me total sales by region for the last quarter," and the AI generates the correct GROUP BY query with the right aggregate functions, filters, and ordering. This is especially helpful for complex aggregations involving multiple columns, HAVING clauses, and nested subqueries. You describe what you want and get working SQL in seconds.

SQL GROUP BY Explained: Complete Guide with Examples (2026)

What Does GROUP BY Do in SQL?

Basic GROUP BY Syntax

Simple example

Result

GROUP BY with Aggregate Functions

COUNT(*) vs COUNT(column)

Result

SUM() with Sales Data

AVG() with Salary Data

MIN() and MAX()

GROUP BY with HAVING

Combining WHERE and HAVING

Multiple HAVING conditions

GROUP BY Multiple Columns

GROUP BY with ORDER BY

TOP N patterns

Common GROUP BY Mistakes

Mistake 1: Selecting non-aggregated columns

Mistake 2: Using WHERE instead of HAVING

Mistake 3: Forgetting how NULLs behave

Mistake 4: Wrong COUNT column

Real-World GROUP BY Patterns

1. Monthly revenue report

2. Top 10 customers by spend

3. User activity by day of week

4. Category sales comparison

5. Finding duplicates

GROUP BY vs DISTINCT

GROUP BY Performance Tips

1. Index your GROUP BY columns

2. Avoid functions on grouped columns

3. Filter early with WHERE

4. Use approximate counts for large datasets

5. Check your query plan

Frequently Asked Questions

Generate GROUP BY Queries from Plain English