BigQuery SQL Tutorial: Complete Guide with Examples (2026)
A practical reference for BigQuery SQL covering syntax differences, ARRAY and STRUCT handling, window functions, cost optimization, and the most common mistakes engineers make when moving to BigQuery.
What Is BigQuery?
Google BigQuery is a fully managed, serverless data warehouse built for large-scale analytics. Unlike traditional databases that require you to provision servers and manage clusters, BigQuery handles all infrastructure automatically. You load data, run queries, and pay for what you scan — there are no virtual warehouses to size, no clusters to tune, and no idle compute costs when queries are not running.
Under the hood, BigQuery uses a columnar storage format called Capacitor and a distributed query engine called Dremel. This architecture makes it exceptional at scanning billions of rows across hundreds of columns in seconds. It is designed for analytical workloads — aggregations, GROUP BY heavy reports, time-series analysis — rather than high-concurrency transactional writes. BigQuery can hold petabytes of data in a single dataset and query across multiple projects without any data movement.
BigQuery SQL vs Standard SQL
BigQuery uses GoogleSQL (formerly BigQuery Standard SQL), which is ANSI SQL-compliant with several important differences you need to know before writing your first query.
Table references use backticks and dot notation. Every table is referenced as project_id.dataset_name.table_name wrapped in backticks. Hyphens in project names are valid but require backtick quoting.
-- Standard SQL (most databases)
SELECT * FROM orders;
-- BigQuery
SELECT * FROM `my-project.sales_data.orders`;
Native ARRAY and STRUCT types. BigQuery supports complex nested data types natively. A single row can contain an array of values or a struct (record) with multiple named fields. This is common with data exported from event tracking systems or APIs.
UNNEST to flatten arrays. To work with array elements as individual rows, use UNNEST. This does not exist in most traditional SQL dialects.
DATE_TRUNC instead of TRUNC for dates. In BigQuery, DATE_TRUNC(date_col, MONTH) truncates a date to the start of the month. Using TRUNC alone applies to numeric values only.
UPDATE requires a WHERE clause. BigQuery DML enforces that every UPDATE and DELETE includes a WHERE clause. You cannot run UPDATE table SET col = val without a filter.
EXCEPT DISTINCT instead of EXCEPT. BigQuery requires EXCEPT DISTINCT rather than bare EXCEPT for set operations.
-- BigQuery set difference
SELECT user_id FROM active_users
EXCEPT DISTINCT
SELECT user_id FROM churned_users;
Basic Queries in BigQuery
Standard SELECT, WHERE, GROUP BY, and ORDER BY all work as expected. The main adjustment is the table reference format.
-- Count orders by status for the last 30 days
SELECT
status,
COUNT(*) AS order_count,
SUM(amount_usd) AS total_revenue
FROM `my-project.ecommerce.orders`
WHERE created_at >= DATE_SUB(CURRENT_DATE(), INTERVAL 30 DAY)
GROUP BY status
ORDER BY total_revenue DESC;
-- Join customers to orders
SELECT
c.customer_id,
c.email,
COUNT(o.order_id) AS lifetime_orders,
SUM(o.amount_usd) AS lifetime_value
FROM `my-project.ecommerce.customers` AS c
LEFT JOIN `my-project.ecommerce.orders` AS o
ON c.customer_id = o.customer_id
GROUP BY c.customer_id, c.email
ORDER BY lifetime_value DESC
LIMIT 100;
CTEs work exactly as in standard SQL and are the recommended way to structure complex logic in BigQuery:
WITH monthly_revenue AS (
SELECT
DATE_TRUNC(created_at, MONTH) AS month,
SUM(amount_usd) AS revenue
FROM `my-project.ecommerce.orders`
WHERE status = 'completed'
GROUP BY month
)
SELECT
month,
revenue,
LAG(revenue) OVER (ORDER BY month) AS prev_month_revenue,
ROUND((revenue - LAG(revenue) OVER (ORDER BY month)) / LAG(revenue) OVER (ORDER BY month) * 100, 2) AS mom_growth_pct
FROM monthly_revenue
ORDER BY month;
BigQuery-Specific Functions
BigQuery ships with a set of functions not available in most other SQL dialects. These cover date math, safe arithmetic, and advanced aggregation.
DATE_TRUNC — Truncate a timestamp or date to a time boundary:
SELECT
DATE_TRUNC(event_timestamp, WEEK) AS week_start,
COUNT(*) AS events
FROM `my-project.analytics.events`
GROUP BY week_start
ORDER BY week_start;
TIMESTAMP_DIFF — Calculate the difference between two timestamps in a given unit:
SELECT
session_id,
TIMESTAMP_DIFF(session_end, session_start, MINUTE) AS session_duration_minutes
FROM `my-project.analytics.sessions`;
ARRAY_AGG — Aggregate values from multiple rows into a single array:
SELECT
user_id,
ARRAY_AGG(product_id ORDER BY purchased_at) AS purchase_sequence
FROM `my-project.ecommerce.orders`
GROUP BY user_id;
STRING_AGG — Concatenate string values across rows with a delimiter:
SELECT
order_id,
STRING_AGG(tag, ', ' ORDER BY tag) AS tags
FROM `my-project.ecommerce.order_tags`
GROUP BY order_id;
COUNTIF — Count rows matching a condition without a CASE WHEN:
SELECT
DATE_TRUNC(created_at, DAY) AS day,
COUNT(*) AS total_orders,
COUNTIF(status = 'refunded') AS refunded_orders,
SAFE_DIVIDE(COUNTIF(status = 'refunded'), COUNT(*)) AS refund_rate
FROM `my-project.ecommerce.orders`
GROUP BY day;
SAFE_DIVIDE — Divide two numbers and return NULL instead of an error when the denominator is zero. This replaces the common NULLIF(denominator, 0) pattern.
COALESCE — Return the first non-NULL value from a list of expressions. Works identically to other SQL dialects:
SELECT
user_id,
COALESCE(display_name, email, CAST(user_id AS STRING)) AS best_name
FROM `my-project.users.profiles`;
Working with ARRAYs and STRUCTs
BigQuery's support for nested and repeated fields is one of its most powerful — and most misunderstood — features. Data loaded from JSON APIs, Firebase, or Google Analytics 4 often contains arrays of structs within a single row.
To query inside an array, use UNNEST with a CROSS JOIN (or the shorthand comma syntax). Each element of the array becomes a new row in the result:
-- GA4-style events table with an items array
SELECT
event_date,
event_name,
item.item_id,
item.item_name,
item.price
FROM `my-project.analytics.events`
CROSS JOIN UNNEST(items) AS item
WHERE event_name = 'purchase'
AND event_date = '2026-04-01';
Accessing nested STRUCT fields uses dot notation. If a column user_info is of type STRUCT, access its fields with user_info.email:
SELECT
event_timestamp,
user_info.email,
user_info.country,
device.category AS device_type
FROM `my-project.analytics.sessions`
WHERE user_info.country = 'US';
You can also filter inside arrays without flattening using EXISTS or IN with a subquery over UNNEST:
-- Sessions where at least one page was the pricing page
SELECT session_id, user_id
FROM `my-project.analytics.sessions`
WHERE 'pricing' IN UNNEST(pages_visited);
Working with BigQuery? Connect your project to AI2SQL and query nested data in plain English — no UNNEST syntax required.
BigQuery Window Functions
BigQuery supports the full set of SQL window functions. The syntax is identical to standard SQL, making it straightforward if you have used window functions in PostgreSQL or Snowflake. See our complete window functions guide for deeper coverage.
Common patterns in BigQuery analytics:
-- Running total and 7-day moving average of daily revenue
SELECT
day,
daily_revenue,
SUM(daily_revenue) OVER (ORDER BY day) AS running_total,
AVG(daily_revenue) OVER (
ORDER BY day
ROWS BETWEEN 6 PRECEDING AND CURRENT ROW
) AS moving_avg_7d
FROM (
SELECT
DATE(created_at) AS day,
SUM(amount_usd) AS daily_revenue
FROM `my-project.ecommerce.orders`
WHERE status = 'completed'
GROUP BY day
)
ORDER BY day;
-- Rank customers by lifetime value within each country
SELECT
customer_id,
country,
lifetime_value,
RANK() OVER (PARTITION BY country ORDER BY lifetime_value DESC) AS country_rank,
NTILE(10) OVER (ORDER BY lifetime_value DESC) AS global_decile
FROM `my-project.ecommerce.customer_summary`;
-- Find the first purchase date per user
SELECT DISTINCT
user_id,
FIRST_VALUE(created_at) OVER (
PARTITION BY user_id
ORDER BY created_at
ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
) AS first_purchase_date
FROM `my-project.ecommerce.orders`;
Cost Optimization
BigQuery pricing is based on bytes processed (on-demand pricing) or reserved slots. Because it is columnar, the most impactful optimization is column selection. Every column you omit from your SELECT clause is a column BigQuery does not read.
Avoid SELECT *. On a wide table with 100 columns, selecting all of them forces a full column scan. If you only need 5 columns, you pay for 5 columns.
-- Expensive: scans all columns
SELECT * FROM `my-project.ecommerce.orders`;
-- Cheap: scans only what you need
SELECT order_id, customer_id, amount_usd, status
FROM `my-project.ecommerce.orders`;
Partition by date. BigQuery supports partitioned tables, most commonly by a DATE or TIMESTAMP column. When you filter on the partition column, BigQuery skips entire partitions rather than scanning the full table. Always include a partition filter in your WHERE clause when querying partitioned tables.
-- Partition pruning: only reads data for the specified date range
SELECT event_name, COUNT(*) AS cnt
FROM `my-project.analytics.events`
WHERE event_date BETWEEN '2026-03-01' AND '2026-03-31'
GROUP BY event_name;
Use clustering keys. Clustering sorts table data by one or more columns within each partition. Queries that filter on cluster keys skip blocks of data, reducing bytes scanned without the overhead of a full table scan.
Preview bytes before running. In the BigQuery Console, the query validator shows estimated bytes processed before you click Run. Use this to catch expensive queries before they execute.
Use approximate aggregation for large datasets. Functions like APPROX_COUNT_DISTINCT are significantly faster and cheaper than exact COUNT(DISTINCT) and return results within 1% accuracy for most use cases.
-- Exact (slower, more expensive on large tables)
SELECT COUNT(DISTINCT user_id) FROM `my-project.analytics.events`;
-- Approximate (faster, cheaper, ~1% error)
SELECT APPROX_COUNT_DISTINCT(user_id) FROM `my-project.analytics.events`;
BigQuery vs Snowflake vs Redshift
All three platforms handle analytical SQL workloads well. The right choice depends on your cloud provider, data volume patterns, and team preferences. See also our Snowflake SQL tutorial for side-by-side syntax comparisons.
| Factor | BigQuery | Snowflake | Redshift |
|---|---|---|---|
| Cloud | Google Cloud (GCP) | Multi-cloud (AWS, GCP, Azure) | AWS |
| Pricing model | Bytes scanned or slots | Compute credits by second | Node-based or Serverless |
| Infrastructure | Fully serverless | Virtual warehouses (managed) | Clusters or Serverless |
| Nested data | Native ARRAY/STRUCT | VARIANT (semi-structured) | SUPER type |
| Best for | GCP teams, ad hoc analytics | Cross-cloud data sharing | AWS-native steady workloads |
| SQL dialect | GoogleSQL | Snowflake SQL | PostgreSQL-compatible |
Common BigQuery SQL Mistakes
These are the errors that trip up engineers moving to BigQuery from other platforms.
Forgetting backticks on table names. BigQuery table references require backticks when the name contains hyphens or dots. Omitting them causes a syntax error.
-- Wrong
SELECT * FROM my-project.dataset.table;
-- Correct
SELECT * FROM `my-project.dataset.table`;
Using LIMIT to reduce cost. LIMIT controls the number of rows returned but does not reduce bytes scanned. BigQuery scans the full dataset before applying LIMIT. Use WHERE filters and column selection to reduce cost.
Implicit cross joins on arrays. If you reference an array field in the SELECT without UNNEST, BigQuery returns the entire array as a single value. Use CROSS JOIN UNNEST when you want one row per array element.
Using non-partition filters on partitioned tables. If your WHERE clause filters on a non-partition column, BigQuery cannot prune partitions and will scan the full table. Check that your WHERE clause includes the partition column directly.
Treating DATE and TIMESTAMP as interchangeable. BigQuery has separate DATE, TIME, DATETIME, and TIMESTAMP types. TIMESTAMP stores UTC time with timezone awareness; DATETIME does not. Mixing them in comparisons produces type errors or unexpected results. Use explicit casting when needed:
-- Cast a TIMESTAMP to DATE for partition filtering
SELECT *
FROM `my-project.analytics.events`
WHERE DATE(event_timestamp) = '2026-04-09';
Writing UPDATE without WHERE. BigQuery requires a WHERE clause on every UPDATE and DELETE. A bare UPDATE will fail with a syntax error. If you genuinely need to update all rows, use WHERE TRUE.
For more on joining patterns, see our guide on SQL JOINs explained.
Frequently Asked Questions
Is BigQuery SQL the same as standard SQL?
BigQuery uses GoogleSQL, which is ANSI SQL-compliant with extensions. Most standard SQL knowledge transfers directly. The main differences are the backtick table naming convention, native ARRAY/STRUCT types with UNNEST, and functions like DATE_TRUNC and TIMESTAMP_DIFF.
How do I write a basic SELECT query in BigQuery?
Reference the table using its fully qualified name in backticks: SELECT col1, col2 FROM `project_id.dataset_name.table_name` WHERE condition LIMIT 100. If you are working within the same project, you can omit the project prefix.
Why does SELECT * cost more in BigQuery?
BigQuery charges by bytes scanned, not rows. Because it is columnar, each column is stored separately. Selecting all columns forces BigQuery to read every column's data. Selecting only the columns you need can reduce query costs by 50-90% on wide tables.
What is UNNEST in BigQuery?
UNNEST flattens an ARRAY column into individual rows. A row with an array of 5 items produces 5 output rows after UNNEST. It is commonly paired with CROSS JOIN to expand repeated fields in nested data loaded from JSON or event tracking systems.
How does BigQuery compare to Snowflake and Redshift?
BigQuery is best for GCP-native teams and ad hoc large-scale analytics with no infrastructure management. Snowflake excels at cross-cloud workloads and real-time data sharing. Redshift is the natural fit for AWS-centric stacks with predictable, steady query workloads. All three support standard SQL with dialect-specific differences.