SQL for Data Analytics: The Complete Guide for Analysts (2026)
12 essential SQL query patterns every data analyst needs: KPI dashboards, cohort analysis, funnel queries, retention metrics, revenue calculations, growth rates, segmentation, A/B test analysis, time series, ranking, correlation, and anomaly detection. Each with production-ready code.
Why SQL Is the Analyst's Most Important Tool
Every data analytics workflow starts with a question. And in almost every organization, the answer lives in a relational database. SQL is the language that gets you from question to answer.
Unlike Python or R, SQL runs directly where the data lives. There is no data export step, no CSV download, no waiting for a notebook to load gigabytes into memory. You write a query, the database engine optimizes it, and you get results in seconds. For data analysts working with production databases, data warehouses like Snowflake or BigQuery, or analytics platforms like Redshift, SQL is the fastest path from question to insight.
In 2026, over 90% of data analyst job postings list SQL as a required skill. It is the most common language on analytics team hiring assessments. Even as AI tools make query writing faster, understanding SQL patterns remains essential because you need to verify what AI generates, optimize slow queries, and handle edge cases that require domain knowledge.
This guide covers the 12 SQL query patterns that handle the vast majority of analytics work. Each pattern includes production-ready code you can adapt to your own database.
1. KPI Dashboard Queries
Dashboard queries aggregate key business metrics into a single result set. They are the foundation of every analytics team's daily workflow.
SELECT
DATE_TRUNC('day', created_at) AS report_date,
COUNT(DISTINCT user_id) AS daily_active_users,
COUNT(*) AS total_events,
COUNT(DISTINCT CASE WHEN event_type = 'purchase' THEN user_id END) AS paying_users,
SUM(CASE WHEN event_type = 'purchase' THEN amount ELSE 0 END) AS daily_revenue,
ROUND(
SUM(CASE WHEN event_type = 'purchase' THEN amount ELSE 0 END)::NUMERIC
/ NULLIF(COUNT(DISTINCT CASE WHEN event_type = 'purchase' THEN user_id END), 0),
2
) AS avg_revenue_per_paying_user
FROM events
WHERE created_at >= CURRENT_DATE - INTERVAL '30 days'
GROUP BY DATE_TRUNC('day', created_at)
ORDER BY report_date DESC;
This query calculates daily active users, total events, paying user count, revenue, and average revenue per paying user for the last 30 days. The NULLIF prevents division by zero on days with no purchases. Use DATE_TRUNC to change granularity from daily to weekly or monthly.
2. Cohort Analysis
Cohort analysis groups users by when they signed up, then tracks their behavior over time. It reveals whether newer cohorts are performing better or worse than older ones.
WITH user_cohorts AS (
SELECT
user_id,
DATE_TRUNC('month', created_at) AS cohort_month
FROM users
),
monthly_activity AS (
SELECT
e.user_id,
DATE_TRUNC('month', e.created_at) AS activity_month
FROM events e
WHERE e.event_type = 'login'
GROUP BY e.user_id, DATE_TRUNC('month', e.created_at)
)
SELECT
uc.cohort_month,
COUNT(DISTINCT uc.user_id) AS cohort_size,
COUNT(DISTINCT CASE
WHEN ma.activity_month = uc.cohort_month + INTERVAL '1 month'
THEN ma.user_id END) AS month_1,
COUNT(DISTINCT CASE
WHEN ma.activity_month = uc.cohort_month + INTERVAL '2 months'
THEN ma.user_id END) AS month_2,
COUNT(DISTINCT CASE
WHEN ma.activity_month = uc.cohort_month + INTERVAL '3 months'
THEN ma.user_id END) AS month_3,
ROUND(100.0 * COUNT(DISTINCT CASE
WHEN ma.activity_month = uc.cohort_month + INTERVAL '1 month'
THEN ma.user_id END) / COUNT(DISTINCT uc.user_id), 1) AS month_1_pct
FROM user_cohorts uc
LEFT JOIN monthly_activity ma ON uc.user_id = ma.user_id
GROUP BY uc.cohort_month
ORDER BY uc.cohort_month;
The CTE structure keeps the logic readable. user_cohorts assigns each user to their signup month. monthly_activity deduplicates activity by month. The main query uses conditional aggregation to count how many users from each cohort were active in subsequent months.
3. Funnel Analysis
Funnel queries measure conversion rates between sequential steps. They reveal where users drop off in a multi-step process like onboarding, checkout, or feature adoption.
WITH funnel AS (
SELECT
COUNT(DISTINCT CASE WHEN event_type = 'page_view' THEN user_id END) AS step_1_visitors,
COUNT(DISTINCT CASE WHEN event_type = 'signup' THEN user_id END) AS step_2_signups,
COUNT(DISTINCT CASE WHEN event_type = 'activation' THEN user_id END) AS step_3_activated,
COUNT(DISTINCT CASE WHEN event_type = 'purchase' THEN user_id END) AS step_4_purchased
FROM events
WHERE created_at >= CURRENT_DATE - INTERVAL '7 days'
)
SELECT
step_1_visitors,
step_2_signups,
ROUND(100.0 * step_2_signups / NULLIF(step_1_visitors, 0), 1) AS signup_rate,
step_3_activated,
ROUND(100.0 * step_3_activated / NULLIF(step_2_signups, 0), 1) AS activation_rate,
step_4_purchased,
ROUND(100.0 * step_4_purchased / NULLIF(step_3_activated, 0), 1) AS purchase_rate,
ROUND(100.0 * step_4_purchased / NULLIF(step_1_visitors, 0), 1) AS overall_conversion
FROM funnel;
This pattern uses conditional COUNT(DISTINCT) to count unique users at each step, then calculates step-over-step and overall conversion rates. Adjust the event types and date range for your specific funnel.
4. Retention Analysis
Retention queries answer: "Of users who started in week/month X, what percentage came back in subsequent periods?" This is the single most important metric for subscription and SaaS businesses.
WITH first_activity AS (
SELECT
user_id,
MIN(DATE_TRUNC('week', created_at)) AS first_week
FROM events
GROUP BY user_id
),
weekly_activity AS (
SELECT DISTINCT
user_id,
DATE_TRUNC('week', created_at) AS active_week
FROM events
)
SELECT
fa.first_week,
COUNT(DISTINCT fa.user_id) AS cohort_users,
COUNT(DISTINCT CASE WHEN wa.active_week = fa.first_week + INTERVAL '1 week' THEN wa.user_id END) AS week_1,
COUNT(DISTINCT CASE WHEN wa.active_week = fa.first_week + INTERVAL '2 weeks' THEN wa.user_id END) AS week_2,
COUNT(DISTINCT CASE WHEN wa.active_week = fa.first_week + INTERVAL '4 weeks' THEN wa.user_id END) AS week_4,
COUNT(DISTINCT CASE WHEN wa.active_week = fa.first_week + INTERVAL '8 weeks' THEN wa.user_id END) AS week_8,
ROUND(100.0 * COUNT(DISTINCT CASE WHEN wa.active_week = fa.first_week + INTERVAL '1 week' THEN wa.user_id END)
/ COUNT(DISTINCT fa.user_id), 1) AS week_1_retention
FROM first_activity fa
LEFT JOIN weekly_activity wa ON fa.user_id = wa.user_id
GROUP BY fa.first_week
ORDER BY fa.first_week DESC;
The key difference from cohort analysis: retention uses first activity date rather than signup date, and focuses on return visits rather than any specific action. The LEFT JOIN ensures users who never returned still appear in the cohort count.
5. Revenue Metrics (MRR, ARPU, LTV)
Revenue queries calculate the financial health of a business. Monthly Recurring Revenue (MRR), Average Revenue Per User (ARPU), and Lifetime Value (LTV) are the three metrics every analyst must track.
WITH monthly_revenue AS (
SELECT
DATE_TRUNC('month', payment_date) AS month,
COUNT(DISTINCT user_id) AS paying_users,
SUM(amount) AS total_revenue,
SUM(CASE WHEN is_recurring = true THEN amount ELSE 0 END) AS mrr
FROM payments
WHERE status = 'succeeded'
GROUP BY DATE_TRUNC('month', payment_date)
),
user_ltv AS (
SELECT
user_id,
SUM(amount) AS lifetime_value,
MIN(payment_date) AS first_payment,
MAX(payment_date) AS last_payment,
COUNT(*) AS total_payments
FROM payments
WHERE status = 'succeeded'
GROUP BY user_id
)
SELECT
mr.month,
mr.mrr,
mr.total_revenue,
mr.paying_users,
ROUND(mr.total_revenue / NULLIF(mr.paying_users, 0), 2) AS arpu,
ROUND(AVG(ul.lifetime_value), 2) AS avg_ltv,
ROUND(PERCENTILE_CONT(0.5) WITHIN GROUP (ORDER BY ul.lifetime_value), 2) AS median_ltv
FROM monthly_revenue mr
CROSS JOIN LATERAL (
SELECT lifetime_value
FROM user_ltv
WHERE first_payment <= mr.month + INTERVAL '1 month'
) ul
GROUP BY mr.month, mr.mrr, mr.total_revenue, mr.paying_users
ORDER BY mr.month DESC;
This query separates recurring revenue from total revenue, calculates ARPU per month, and computes both average and median LTV. The median is important because LTV distributions are typically skewed by a few high-value customers.
6. Growth Rate Calculations
Growth queries compare a metric across time periods. Month-over-month (MoM) and week-over-week (WoW) growth rates reveal trends that raw numbers hide.
WITH monthly_metrics AS (
SELECT
DATE_TRUNC('month', created_at) AS month,
COUNT(DISTINCT user_id) AS active_users,
SUM(amount) AS revenue
FROM events
WHERE event_type = 'purchase'
GROUP BY DATE_TRUNC('month', created_at)
)
SELECT
month,
active_users,
revenue,
LAG(active_users) OVER (ORDER BY month) AS prev_month_users,
LAG(revenue) OVER (ORDER BY month) AS prev_month_revenue,
ROUND(100.0 * (active_users - LAG(active_users) OVER (ORDER BY month))
/ NULLIF(LAG(active_users) OVER (ORDER BY month), 0), 1) AS user_growth_pct,
ROUND(100.0 * (revenue - LAG(revenue) OVER (ORDER BY month))
/ NULLIF(LAG(revenue) OVER (ORDER BY month), 0), 1) AS revenue_growth_pct
FROM monthly_metrics
ORDER BY month DESC;
The LAG window function accesses the previous row's value without a self-join. This is cleaner and faster than joining the table to itself. You can calculate WoW growth by changing DATE_TRUNC to 'week'.
7. User Segmentation
Segmentation queries divide users into groups based on behavior, value, or attributes. The most common pattern is RFM (Recency, Frequency, Monetary) segmentation.
WITH user_rfm AS (
SELECT
user_id,
EXTRACT(DAY FROM NOW() - MAX(created_at)) AS recency_days,
COUNT(*) AS frequency,
SUM(amount) AS monetary
FROM orders
WHERE created_at >= CURRENT_DATE - INTERVAL '12 months'
GROUP BY user_id
),
scored AS (
SELECT
user_id,
recency_days,
frequency,
monetary,
NTILE(5) OVER (ORDER BY recency_days ASC) AS r_score,
NTILE(5) OVER (ORDER BY frequency DESC) AS f_score,
NTILE(5) OVER (ORDER BY monetary DESC) AS m_score
FROM user_rfm
)
SELECT
user_id,
recency_days,
frequency,
monetary,
r_score,
f_score,
m_score,
CASE
WHEN r_score >= 4 AND f_score >= 4 THEN 'Champion'
WHEN r_score >= 4 AND f_score <= 2 THEN 'New Customer'
WHEN r_score <= 2 AND f_score >= 4 THEN 'At Risk'
WHEN r_score <= 2 AND f_score <= 2 THEN 'Lost'
ELSE 'Regular'
END AS segment
FROM scored
ORDER BY monetary DESC;
NTILE(5) divides users into 5 equal-sized buckets for each dimension. The CASE statement maps score combinations to human-readable segments. Adjust the segment definitions to match your business model.
8. A/B Test Analysis
A/B test queries compare metrics between a control group and one or more variants. The query needs to calculate conversion rates, averages, and sample sizes for statistical significance assessment.
WITH experiment_users AS (
SELECT
user_id,
variant,
MIN(created_at) AS assignment_date
FROM experiment_assignments
WHERE experiment_name = 'checkout_redesign'
GROUP BY user_id, variant
),
experiment_results AS (
SELECT
eu.variant,
COUNT(DISTINCT eu.user_id) AS total_users,
COUNT(DISTINCT CASE WHEN o.id IS NOT NULL THEN eu.user_id END) AS converted_users,
SUM(COALESCE(o.amount, 0)) AS total_revenue,
AVG(COALESCE(o.amount, 0)) AS avg_revenue_per_user
FROM experiment_users eu
LEFT JOIN orders o ON eu.user_id = o.user_id
AND o.created_at >= eu.assignment_date
AND o.created_at <= eu.assignment_date + INTERVAL '14 days'
GROUP BY eu.variant
)
SELECT
variant,
total_users,
converted_users,
ROUND(100.0 * converted_users / NULLIF(total_users, 0), 2) AS conversion_rate,
ROUND(avg_revenue_per_user, 2) AS avg_revenue_per_user,
ROUND(total_revenue / NULLIF(converted_users, 0), 2) AS avg_order_value
FROM experiment_results
ORDER BY variant;
The 14-day window after assignment prevents late conversions from skewing results. The LEFT JOIN ensures users who did not convert are counted in the denominator. You will still need a statistical significance calculator outside SQL, but this query gives you the inputs.
9. Time Series Analysis
Time series queries track a metric over time and compare it to previous periods. Rolling averages smooth out daily noise to reveal the underlying trend.
WITH daily_metrics AS (
SELECT
DATE_TRUNC('day', created_at) AS day,
COUNT(*) AS daily_count,
SUM(amount) AS daily_revenue
FROM orders
WHERE created_at >= CURRENT_DATE - INTERVAL '90 days'
GROUP BY DATE_TRUNC('day', created_at)
)
SELECT
day,
daily_count,
daily_revenue,
ROUND(AVG(daily_count) OVER (
ORDER BY day ROWS BETWEEN 6 PRECEDING AND CURRENT ROW
), 1) AS rolling_7d_avg_count,
ROUND(AVG(daily_revenue) OVER (
ORDER BY day ROWS BETWEEN 6 PRECEDING AND CURRENT ROW
), 2) AS rolling_7d_avg_revenue,
SUM(daily_revenue) OVER (
ORDER BY day ROWS BETWEEN 6 PRECEDING AND CURRENT ROW
) AS rolling_7d_total_revenue,
LAG(daily_count, 7) OVER (ORDER BY day) AS same_day_last_week,
ROUND(100.0 * (daily_count - LAG(daily_count, 7) OVER (ORDER BY day))
/ NULLIF(LAG(daily_count, 7) OVER (ORDER BY day), 0), 1) AS wow_change_pct
FROM daily_metrics
ORDER BY day DESC;
The ROWS BETWEEN 6 PRECEDING AND CURRENT ROW creates a 7-day rolling window. LAG(daily_count, 7) compares each day to the same day last week, which accounts for day-of-week seasonality. This is the pattern most analytics dashboards use internally.
10. Ranking and Top-N Queries
Ranking queries identify the top (or bottom) performers across any dimension: top products by revenue, top customers by order count, top pages by conversion rate.
WITH product_metrics AS (
SELECT
p.product_name,
p.category,
COUNT(DISTINCT oi.order_id) AS order_count,
SUM(oi.quantity) AS units_sold,
SUM(oi.quantity * oi.unit_price) AS total_revenue,
COUNT(DISTINCT o.user_id) AS unique_buyers
FROM order_items oi
JOIN products p ON oi.product_id = p.id
JOIN orders o ON oi.order_id = o.id
WHERE o.created_at >= CURRENT_DATE - INTERVAL '30 days'
GROUP BY p.product_name, p.category
)
SELECT
product_name,
category,
order_count,
units_sold,
total_revenue,
unique_buyers,
RANK() OVER (ORDER BY total_revenue DESC) AS overall_rank,
RANK() OVER (PARTITION BY category ORDER BY total_revenue DESC) AS category_rank,
ROUND(100.0 * total_revenue / SUM(total_revenue) OVER (), 2) AS revenue_share_pct,
SUM(total_revenue) OVER (ORDER BY total_revenue DESC) AS cumulative_revenue
FROM product_metrics
ORDER BY total_revenue DESC;
RANK() with PARTITION BY gives you rankings within each category. The SUM() OVER () without an ORDER BY gives you the grand total for calculating percentage share. The cumulative revenue column helps identify the 80/20 rule: often 20% of products generate 80% of revenue.
11. Correlation Analysis
Correlation queries measure whether two metrics move together. While SQL does not have a built-in Pearson correlation function in every database, you can calculate it with aggregate functions.
WITH user_metrics AS (
SELECT
user_id,
COUNT(DISTINCT DATE_TRUNC('day', created_at)) AS active_days,
SUM(CASE WHEN event_type = 'purchase' THEN amount ELSE 0 END) AS total_spend,
COUNT(CASE WHEN event_type = 'support_ticket' THEN 1 END) AS support_tickets,
COUNT(CASE WHEN event_type = 'feature_used' THEN 1 END) AS features_used
FROM events
WHERE created_at >= CURRENT_DATE - INTERVAL '90 days'
GROUP BY user_id
)
SELECT
ROUND(CORR(active_days, total_spend)::NUMERIC, 3) AS activity_spend_corr,
ROUND(CORR(features_used, total_spend)::NUMERIC, 3) AS features_spend_corr,
ROUND(CORR(support_tickets, total_spend)::NUMERIC, 3) AS tickets_spend_corr,
ROUND(CORR(active_days, features_used)::NUMERIC, 3) AS activity_features_corr,
COUNT(*) AS sample_size
FROM user_metrics
WHERE active_days > 0;
PostgreSQL and BigQuery support CORR() natively. The result ranges from -1 (perfect negative correlation) to 1 (perfect positive). Values above 0.5 or below -0.5 suggest a meaningful relationship worth investigating further. Filter out zero-activity users to avoid skewing the correlation.
12. Anomaly Detection
Anomaly detection queries flag data points that deviate significantly from the norm. The standard approach uses z-scores: how many standard deviations a value is from the mean.
WITH daily_metrics AS (
SELECT
DATE_TRUNC('day', created_at) AS day,
COUNT(*) AS daily_orders,
SUM(amount) AS daily_revenue
FROM orders
WHERE created_at >= CURRENT_DATE - INTERVAL '90 days'
GROUP BY DATE_TRUNC('day', created_at)
),
stats AS (
SELECT
AVG(daily_orders) AS avg_orders,
STDDEV(daily_orders) AS stddev_orders,
AVG(daily_revenue) AS avg_revenue,
STDDEV(daily_revenue) AS stddev_revenue
FROM daily_metrics
)
SELECT
dm.day,
dm.daily_orders,
dm.daily_revenue,
ROUND((dm.daily_orders - s.avg_orders) / NULLIF(s.stddev_orders, 0), 2) AS orders_z_score,
ROUND((dm.daily_revenue - s.avg_revenue) / NULLIF(s.stddev_revenue, 0), 2) AS revenue_z_score,
CASE
WHEN ABS((dm.daily_orders - s.avg_orders) / NULLIF(s.stddev_orders, 0)) > 2 THEN 'ANOMALY'
WHEN ABS((dm.daily_orders - s.avg_orders) / NULLIF(s.stddev_orders, 0)) > 1.5 THEN 'WARNING'
ELSE 'NORMAL'
END AS order_status,
CASE
WHEN ABS((dm.daily_revenue - s.avg_revenue) / NULLIF(s.stddev_revenue, 0)) > 2 THEN 'ANOMALY'
WHEN ABS((dm.daily_revenue - s.avg_revenue) / NULLIF(s.stddev_revenue, 0)) > 1.5 THEN 'WARNING'
ELSE 'NORMAL'
END AS revenue_status
FROM daily_metrics dm
CROSS JOIN stats s
ORDER BY dm.day DESC;
A z-score above 2 or below -2 means the value is more than 2 standard deviations from the mean, which happens less than 5% of the time in a normal distribution. This query flags those days as anomalies. Use this to build automated alerts for unusual spikes or drops in your key metrics.
How AI Tools Speed Up Analytics SQL
Writing analytics queries from scratch takes time. A cohort analysis query can take 30 minutes to build and debug. A funnel query with five steps requires careful conditional aggregation. These are the cases where AI SQL tools deliver the most value.
With AI2SQL, you describe what you need in plain English:
- "Show me monthly retention rates for users who signed up in Q1 2026" generates a cohort retention query with the correct date logic.
- "Calculate week-over-week revenue growth by product category" produces the right combination of window functions and GROUP BY.
- "Find anomalous days where orders dropped more than 2 standard deviations" builds the z-score calculation automatically.
AI tools are especially useful when you know what you want but are unsure about the exact window function syntax, the right date arithmetic for your database, or how to structure nested CTEs. They reduce time-to-query from minutes to seconds for experienced analysts and make these patterns accessible to team members without deep SQL backgrounds.
The queries in this guide work as-is for PostgreSQL and BigQuery. For MySQL or SQL Server, some functions like DATE_TRUNC, CORR, and PERCENTILE_CONT need dialect-specific equivalents. An AI SQL tool handles these translations automatically when you specify your database type.
Try AI2SQL free to generate analytics queries from plain English. Connect your schema for accurate, database-specific output.
Frequently Asked Questions
What SQL skills do data analysts need?
Data analysts need to master SELECT queries, JOINs, GROUP BY with aggregation, window functions (ROW_NUMBER, RANK, LAG, LEAD), CTEs for readability, date functions for time-series analysis, and CASE WHEN for conditional logic. These cover the vast majority of analytics work including dashboards, cohort analysis, funnel queries, and retention metrics.
Is SQL enough for a data analytics career?
SQL is the single most important skill for data analysts and is sufficient for many analyst roles. According to job postings in 2026, over 90% of data analyst positions list SQL as a requirement. However, adding Python or R for statistical analysis, and a visualization tool like Tableau or Power BI, makes you significantly more competitive.
How do I write a cohort analysis query in SQL?
A cohort analysis query groups users by their signup month (the cohort), then tracks their activity in subsequent months. Use DATE_TRUNC to assign cohort months, a LEFT JOIN to match activity data, and GROUP BY cohort_month plus period_number. Window functions or CASE WHEN statements help calculate retention rates per cohort.
What is the difference between SQL for analytics and SQL for application development?
Analytics SQL focuses on reading and aggregating data: complex SELECT queries, window functions, CTEs, and GROUP BY operations across large datasets. Application SQL focuses on transactional operations: INSERT, UPDATE, DELETE with strict constraints, indexing for fast lookups, and connection pooling. Analysts rarely write to the database; they query read replicas or data warehouses.
Can AI tools help write analytics SQL queries?
Yes. AI SQL tools like AI2SQL let you describe your analytics question in plain English and generate the correct query automatically. This is especially useful for complex patterns like cohort analysis, funnel queries, or window function calculations. AI tools speed up query writing by 3-5x for experienced analysts and make SQL accessible to non-technical team members.