SQL Performance

SQL Indexing Best Practices: The Complete Guide (2026)

Q: What is a SQL index and how does it work?

A SQL index is a data structure (usually a B-tree) that stores a sorted copy of selected columns from a table. When you query those columns, the database can use the index to find rows quickly instead of scanning the entire table. Think of it like a book's index: instead of reading every page to find a topic, you look up the topic in the index and jump directly to the right page. Indexes speed up SELECT queries but slow down INSERT, UPDATE, and DELETE operations because the index must be maintained alongside the table data.

Q: How many indexes should a table have?

There is no fixed number. The right number depends on your workload. For read-heavy tables (reporting, analytics), more indexes help because they speed up diverse queries. For write-heavy tables (logging, event streams), fewer indexes are better because each INSERT must update every index. A good starting point: create indexes for columns used in WHERE, JOIN, and ORDER BY clauses, then monitor with EXPLAIN to remove unused ones. Most OLTP tables have between 3 and 8 indexes.

Q: What is the difference between a clustered and non-clustered index?

A clustered index determines the physical order of data in the table. Each table can have only one clustered index (usually the primary key). A non-clustered index is a separate structure that points back to the table rows. You can have many non-clustered indexes per table. In SQL Server, the primary key is clustered by default. In PostgreSQL, tables are heap-organized and all indexes are non-clustered, but you can use CLUSTER to reorder data once.

Q: What is a covering index?

A covering index includes all the columns a query needs, so the database can answer the query entirely from the index without reading the table. This eliminates the expensive 'key lookup' step. In SQL Server, you use INCLUDE to add non-key columns. In PostgreSQL, you use INCLUDE (PostgreSQL 11+). For example, if you query SELECT email FROM users WHERE status = 'active', an index on (status) INCLUDE (email) is a covering index that avoids touching the table.

Q: Can AI help me optimize my SQL indexes?

Yes. AI2SQL can analyze your queries and suggest appropriate indexes. Describe your slow query in plain English, and AI2SQL generates optimized SQL along with index recommendations. It understands different database dialects (MySQL, PostgreSQL, SQL Server) and suggests dialect-specific index syntax including composite indexes, covering indexes, and partial indexes.

Everything you need to know about SQL indexes. Covers B-tree internals, composite index column order, covering indexes, partial indexes, and the most common indexing mistakes that destroy query performance.

Mar 23, 2026 22 min read

What Is a SQL Index?

A SQL index is a data structure that helps the database find rows faster. Without an index, the database must scan every row in the table to find the ones matching your query. This is called a full table scan, and it gets slower as the table grows.

With an index, the database can jump directly to the relevant rows. The most common index type is a B-tree (balanced tree), which organizes data in a sorted, hierarchical structure. A B-tree index on a column with 10 million rows can find any value in about 3-4 page reads instead of scanning all 10 million rows.

Here is the trade-off: indexes speed up reads but slow down writes. Every INSERT, UPDATE, or DELETE must also update every index on the table. If you have 10 indexes on a table, every single INSERT must write to the table and all 10 indexes.

-- Create a basic index
CREATE INDEX idx_orders_customer_id
ON orders(customer_id);

-- Now this query uses an index seek instead of a full scan
SELECT * FROM orders WHERE customer_id = 42;

The difference is dramatic. On a table with 1 million rows, a full table scan might read 50,000 pages. An index seek reads 3-4 pages. That is a 10,000x improvement in I/O.

Types of SQL Indexes

B-tree index (default)

The B-tree is the default index type in every major database. It works for equality lookups (=), range queries (BETWEEN, >, <), sorting (ORDER BY), and prefix matching (LIKE 'abc%'). It does not help with pattern matching in the middle of strings (LIKE '%abc%').

-- B-tree index (default in all databases)
CREATE INDEX idx_users_email ON users(email);

-- These queries benefit from the B-tree:
SELECT * FROM users WHERE email = 'alice@example.com';  -- equality
SELECT * FROM users WHERE email BETWEEN 'a' AND 'd';    -- range
SELECT * FROM users ORDER BY email LIMIT 10;             -- sorting
SELECT * FROM users WHERE email LIKE 'alice%';           -- prefix

-- This does NOT benefit from the B-tree:
SELECT * FROM users WHERE email LIKE '%@gmail.com';      -- suffix

Unique index

A unique index enforces that no two rows can have the same value in the indexed columns. Primary keys automatically create a unique index.

-- Enforce uniqueness
CREATE UNIQUE INDEX idx_users_email_unique ON users(email);

-- This INSERT would fail if the email already exists
INSERT INTO users (name, email) VALUES ('Alice', 'alice@example.com');

Composite (multi-column) index

A composite index covers two or more columns. Column order matters significantly, as we will cover in detail below.

-- Composite index on two columns
CREATE INDEX idx_orders_status_date
ON orders(status, order_date);

Partial index (PostgreSQL) / Filtered index (SQL Server)

A partial index only indexes rows that match a condition. This makes the index smaller and faster.

-- PostgreSQL: partial index
CREATE INDEX idx_orders_active
ON orders(customer_id, order_date)
WHERE status = 'active';

-- SQL Server: filtered index
CREATE INDEX idx_orders_active
ON orders(customer_id, order_date)
WHERE status = 'active';

Full-text index

For searching within text content, full-text indexes tokenize text and support natural language search:

-- MySQL
CREATE FULLTEXT INDEX idx_articles_content ON articles(title, body);
SELECT * FROM articles WHERE MATCH(title, body) AGAINST('database optimization');

-- PostgreSQL
CREATE INDEX idx_articles_search ON articles USING gin(to_tsvector('english', title || ' ' || body));
SELECT * FROM articles WHERE to_tsvector('english', title || ' ' || body) @@ to_tsquery('database & optimization');

GIN and GiST indexes (PostgreSQL)

-- GIN: for JSONB, arrays, full-text
CREATE INDEX idx_products_tags ON products USING gin(tags);
SELECT * FROM products WHERE tags @> '["electronics"]';

-- GiST: for geometric, range, and proximity queries
CREATE INDEX idx_locations_coords ON locations USING gist(coordinates);
SELECT * FROM locations WHERE coordinates <-> point(40.7, -74.0) < 1.0;

Composite Index Column Order: The Most Important Decision

The single most impactful indexing decision is the column order in a composite index. The database can only use a composite index efficiently if the query filters on columns from left to right, starting at the first column. This is called the leftmost prefix rule.

-- Composite index: (status, customer_id, order_date)
CREATE INDEX idx_orders_status_cust_date
ON orders(status, customer_id, order_date);

-- Uses the full index (all 3 columns matched left-to-right)
SELECT * FROM orders
WHERE status = 'shipped'
  AND customer_id = 42
  AND order_date > '2026-01-01';

-- Uses the first 2 columns of the index
SELECT * FROM orders
WHERE status = 'shipped'
  AND customer_id = 42;

-- Uses only the first column
SELECT * FROM orders
WHERE status = 'shipped';

-- CANNOT use the index (skips the first column)
SELECT * FROM orders
WHERE customer_id = 42;

-- CANNOT use the index efficiently (skips status)
SELECT * FROM orders
WHERE customer_id = 42
  AND order_date > '2026-01-01';

Column order rules:

Equality columns first. Columns filtered with = should come before range columns (>, <, BETWEEN).
Range column next. The first range condition can use the index. Any columns after a range condition cannot.
High selectivity first. Among equality columns, put the most selective one first (the one that filters out the most rows).

-- BAD order: range column (date) before equality column (status)
CREATE INDEX idx_bad ON orders(order_date, status);
-- Query: WHERE order_date > '2026-01-01' AND status = 'shipped'
-- Only order_date can use the index; status cannot

-- GOOD order: equality column first, then range
CREATE INDEX idx_good ON orders(status, order_date);
-- Query: WHERE status = 'shipped' AND order_date > '2026-01-01'
-- Both columns use the index efficiently

Covering Indexes: Avoid Table Lookups

A covering index contains all the columns a query needs, so the database can answer the query entirely from the index without going back to the table. This eliminates the expensive "key lookup" or "table access by ROWID" step.

-- Query: find active user emails
SELECT email FROM users WHERE status = 'active';

-- Non-covering index: finds matching rows, then reads table for email
CREATE INDEX idx_users_status ON users(status);
-- Plan: Index Scan on idx_users_status → Key Lookup to get email

-- Covering index: includes email, no table access needed
CREATE INDEX idx_users_status_email ON users(status) INCLUDE (email);
-- Plan: Index Only Scan on idx_users_status_email (much faster)

The INCLUDE clause (SQL Server and PostgreSQL 11+) adds columns to the leaf level of the index without making them part of the search key. This keeps the index smaller while still covering the query.

-- SQL Server and PostgreSQL 11+
CREATE INDEX idx_orders_covering
ON orders(customer_id, order_date)
INCLUDE (amount, status);

-- This query is now fully covered:
SELECT amount, status
FROM orders
WHERE customer_id = 42
  AND order_date > '2026-01-01';

-- MySQL alternative: add columns to the index key
-- MySQL does not have INCLUDE, so add them at the end
CREATE INDEX idx_orders_covering
ON orders(customer_id, order_date, amount, status);

When to use covering indexes: For your most frequently run queries, especially dashboard queries and API endpoints that run thousands of times per day. The I/O savings compound dramatically at scale.

Indexing for JOIN Performance

JOIN performance depends heavily on indexes on the join columns. Without indexes, the database may resort to nested loop scans that are orders of magnitude slower.

-- Ensure both sides of the JOIN have indexes
CREATE INDEX idx_orders_customer_id ON orders(customer_id);
-- customers.id is already indexed (primary key)

-- This JOIN now uses index lookups
SELECT c.name, o.amount
FROM customers c
INNER JOIN orders o ON o.customer_id = c.id
WHERE c.country = 'US';

-- For filtered JOINs, a composite index helps even more
CREATE INDEX idx_orders_customer_status
ON orders(customer_id, status);

SELECT c.name, o.amount
FROM customers c
INNER JOIN orders o ON o.customer_id = c.id
WHERE o.status = 'completed';

Key rules for JOIN indexes:

Index the foreign key column on the "many" side of the relationship
The primary key on the "one" side is already indexed
For multi-column JOIN conditions, create a composite index matching the join columns
If you have WHERE filters on the joined table, add those columns to the index

For more JOIN patterns, see our SQL Joins Explained guide.

Indexing for ORDER BY and GROUP BY

Indexes can eliminate the sort step in ORDER BY queries. Without an index, the database must sort all matching rows in memory (or on disk for large results). With the right index, data is already in order.

-- Without index: sort 1M rows in memory
SELECT * FROM orders
WHERE status = 'pending'
ORDER BY created_at DESC
LIMIT 20;
-- Plan: Filter → Sort → Limit (slow for large tables)

-- With composite index: no sort needed
CREATE INDEX idx_orders_status_created
ON orders(status, created_at DESC);
-- Plan: Index Scan (backward) → Limit (fast)

The index must match both the WHERE filter columns and the ORDER BY columns, in that order. The sort direction (ASC/DESC) must also match, though most databases can scan an index backwards.

-- GROUP BY also benefits from indexes
-- Without index: hash or sort all rows
SELECT status, COUNT(*) FROM orders GROUP BY status;

-- With index on status: index-only scan
CREATE INDEX idx_orders_status ON orders(status);

-- For GROUP BY with aggregation on another column:
CREATE INDEX idx_orders_status_amount
ON orders(status) INCLUDE (amount);

SELECT status, SUM(amount)
FROM orders
GROUP BY status;

7 Common Indexing Mistakes

1. Indexing every column individually

-- BAD: separate index per column
CREATE INDEX idx_a ON orders(status);
CREATE INDEX idx_b ON orders(customer_id);
CREATE INDEX idx_c ON orders(order_date);

-- Query uses only ONE of these indexes
SELECT * FROM orders
WHERE status = 'shipped'
  AND customer_id = 42
  AND order_date > '2026-01-01';

-- GOOD: one composite index
CREATE INDEX idx_combined ON orders(status, customer_id, order_date);

The database typically picks only one index per table access. Three separate single-column indexes cannot be combined as effectively as one composite index.

2. Wrong column order in composite indexes

-- BAD: low selectivity column first
CREATE INDEX idx_bad ON users(is_active, email);
-- is_active has only 2 values (true/false), email is unique

-- GOOD: high selectivity column first for equality lookups
CREATE INDEX idx_good ON users(email, is_active);
-- But if you always filter by is_active first, it depends on the query

3. Functions on indexed columns

-- BAD: function on column prevents index use
SELECT * FROM users WHERE LOWER(email) = 'alice@example.com';
SELECT * FROM orders WHERE YEAR(order_date) = 2026;

-- GOOD: rewrite to avoid the function
SELECT * FROM users WHERE email = 'alice@example.com';  -- if case-insensitive collation
SELECT * FROM orders WHERE order_date >= '2026-01-01' AND order_date < '2027-01-01';

-- OR: create a functional/expression index
-- PostgreSQL
CREATE INDEX idx_users_email_lower ON users(LOWER(email));
-- SQL Server (computed column + index)
ALTER TABLE users ADD email_lower AS LOWER(email);
CREATE INDEX idx_users_email_lower ON users(email_lower);

4. Too many indexes on write-heavy tables

-- Check index usage in PostgreSQL
SELECT schemaname, relname, indexrelname, idx_scan
FROM pg_stat_user_indexes
WHERE idx_scan = 0
ORDER BY relname;

-- Drop unused indexes
DROP INDEX idx_never_used;

5. Missing indexes on foreign keys

-- Foreign keys do NOT automatically create indexes in PostgreSQL
ALTER TABLE orders ADD FOREIGN KEY (customer_id) REFERENCES customers(id);
-- You MUST create the index yourself:
CREATE INDEX idx_orders_customer_id ON orders(customer_id);

-- In MySQL InnoDB, foreign keys DO auto-create indexes

6. Ignoring NULL in index design

-- B-tree indexes include NULL values
-- But IS NULL and IS NOT NULL queries may or may not use the index

-- PostgreSQL: partial index for non-NULL values only
CREATE INDEX idx_orders_shipped
ON orders(shipped_date)
WHERE shipped_date IS NOT NULL;

-- This query now uses the smaller, more efficient partial index
SELECT * FROM orders WHERE shipped_date IS NOT NULL AND shipped_date > '2026-01-01';

7. Not analyzing index usage

-- PostgreSQL: find unused indexes
SELECT
    schemaname || '.' || relname AS table,
    indexrelname AS index,
    pg_size_pretty(pg_relation_size(i.indexrelid)) AS index_size,
    idx_scan AS times_used
FROM pg_stat_user_indexes i
JOIN pg_index USING (indexrelid)
WHERE NOT indisunique
ORDER BY idx_scan ASC, pg_relation_size(i.indexrelid) DESC;

-- MySQL: check index usage
SELECT * FROM sys.schema_unused_indexes;

-- SQL Server: find missing indexes
SELECT * FROM sys.dm_db_missing_index_details;

Using EXPLAIN to Verify Index Usage

Never guess whether your index is being used. Always verify with EXPLAIN.

-- PostgreSQL: EXPLAIN ANALYZE (actually runs the query)
EXPLAIN ANALYZE
SELECT * FROM orders
WHERE customer_id = 42
  AND status = 'shipped';

-- Look for:
-- "Index Scan" or "Index Only Scan" = index is used (good)
-- "Seq Scan" = full table scan (probably bad for large tables)
-- "Bitmap Index Scan" = partial index use (can be good or bad)
-- Actual rows vs estimated rows = accuracy of statistics

-- MySQL: EXPLAIN with key info
EXPLAIN SELECT * FROM orders
WHERE customer_id = 42 AND status = 'shipped';

-- Look for:
-- key: which index is used (NULL = no index)
-- rows: estimated rows to examine
-- Extra: "Using index" = covering index (best)
-- Extra: "Using where" = filtering after index lookup

-- SQL Server: actual execution plan
SET STATISTICS IO ON;
SELECT * FROM orders
WHERE customer_id = 42 AND status = 'shipped';

-- Look for:
-- "Index Seek" = efficient lookup (best)
-- "Index Scan" = scanning entire index (mediocre)
-- "Table Scan" = no index used (worst)
-- "Key Lookup" = covering index missing some columns

Important: Run EXPLAIN after loading realistic data volumes. The optimizer makes different decisions on 100 rows versus 10 million rows. An index that appears used on a small test table might be ignored on a production-sized table if the optimizer decides a full scan is cheaper.

Index Maintenance

Indexes degrade over time as data changes. Fragmentation, bloat, and stale statistics can make indexes less effective.

-- PostgreSQL: update statistics
ANALYZE orders;

-- PostgreSQL: rebuild bloated indexes
REINDEX INDEX idx_orders_customer_id;
-- Or non-blocking:
REINDEX INDEX CONCURRENTLY idx_orders_customer_id;

-- MySQL: rebuild indexes
OPTIMIZE TABLE orders;
-- Or:
ALTER TABLE orders ENGINE=InnoDB;

-- SQL Server: rebuild fragmented indexes
ALTER INDEX idx_orders_customer_id ON orders REBUILD;
-- Or reorganize (less locking):
ALTER INDEX idx_orders_customer_id ON orders REORGANIZE;

For high-traffic production databases, schedule index maintenance during low-traffic windows. PostgreSQL's autovacuum handles most maintenance automatically, but heavily updated tables may need manual REINDEX. SQL Server's maintenance plans can automate rebuild/reorganize based on fragmentation levels.

Let AI optimize your queries and indexes. Describe your slow query in plain English, and AI2SQL generates optimized SQL with index recommendations for your database.

Frequently Asked Questions

What is a SQL index and how does it work?

A SQL index is a data structure (usually a B-tree) that stores a sorted copy of selected columns from a table. When you query those columns, the database can use the index to find rows quickly instead of scanning the entire table. Think of it like a book's index: instead of reading every page to find a topic, you look up the topic in the index and jump directly to the right page. Indexes speed up SELECT queries but slow down INSERT, UPDATE, and DELETE operations because the index must be maintained alongside the table data.

How many indexes should a table have?

There is no fixed number. The right number depends on your workload. For read-heavy tables (reporting, analytics), more indexes help because they speed up diverse queries. For write-heavy tables (logging, event streams), fewer indexes are better because each INSERT must update every index. A good starting point: create indexes for columns used in WHERE, JOIN, and ORDER BY clauses, then monitor with EXPLAIN to remove unused ones. Most OLTP tables have between 3 and 8 indexes.

What is the difference between a clustered and non-clustered index?

A clustered index determines the physical order of data in the table. Each table can have only one clustered index (usually the primary key). A non-clustered index is a separate structure that points back to the table rows. You can have many non-clustered indexes per table. In SQL Server, the primary key is clustered by default. In PostgreSQL, tables are heap-organized and all indexes are non-clustered, but you can use CLUSTER to reorder data once.

What is a covering index?

A covering index includes all the columns a query needs, so the database can answer the query entirely from the index without reading the table. This eliminates the expensive "key lookup" step. In SQL Server, you use INCLUDE to add non-key columns. In PostgreSQL, you use INCLUDE (PostgreSQL 11+). For example, if you query SELECT email FROM users WHERE status = 'active', an index on (status) INCLUDE (email) is a covering index that avoids touching the table.

Can AI help me optimize my SQL indexes?

Yes. AI2SQL can analyze your queries and suggest appropriate indexes. Describe your slow query in plain English, and AI2SQL generates optimized SQL along with index recommendations. It understands different database dialects (MySQL, PostgreSQL, SQL Server) and suggests dialect-specific index syntax including composite indexes, covering indexes, and partial indexes.

SQL Indexing Best Practices: The Complete Guide (2026)

What Is a SQL Index?

Types of SQL Indexes

B-tree index (default)

Unique index

Composite (multi-column) index

Partial index (PostgreSQL) / Filtered index (SQL Server)

Full-text index

GIN and GiST indexes (PostgreSQL)

Composite Index Column Order: The Most Important Decision

Covering Indexes: Avoid Table Lookups

Indexing for JOIN Performance

Indexing for ORDER BY and GROUP BY

7 Common Indexing Mistakes

1. Indexing every column individually

2. Wrong column order in composite indexes

3. Functions on indexed columns

4. Too many indexes on write-heavy tables

5. Missing indexes on foreign keys

6. Ignoring NULL in index design

7. Not analyzing index usage

Using EXPLAIN to Verify Index Usage

Index Maintenance

Frequently Asked Questions

Write Faster SQL with AI-Powered Optimization