BigQuerySQL

BigQuery SQL Tutorial: Complete Guide with Examples (2026)

Q: Is BigQuery SQL the same as standard SQL?

BigQuery uses GoogleSQL (formerly called BigQuery Standard SQL), which is largely ANSI-compliant but has key differences: table names require backtick quoting for project.dataset.table references, it supports ARRAY and STRUCT types with UNNEST, and uses functions like DATE_TRUNC and TIMESTAMP_DIFF instead of standard equivalents. Most ANSI SQL knowledge transfers directly.

Q: How do I write a basic SELECT query in BigQuery?

In BigQuery, reference tables using the fully qualified name in backticks: SELECT column1, column2 FROM `project_id.dataset_name.table_name` WHERE condition LIMIT 100. If you are working within the same project, you can omit the project prefix and use `dataset_name.table_name`.

Q: Why does SELECT * cost more in BigQuery?

BigQuery charges based on bytes scanned, not rows. Because it is a columnar database, selecting all columns forces it to scan every column's data. Selecting only the columns you need can reduce query costs by 50-90% on wide tables. Always specify column names rather than using SELECT *.

Q: What is UNNEST in BigQuery?

UNNEST is a BigQuery function that flattens an ARRAY column into individual rows. If a row contains an array of 5 items, UNNEST produces 5 output rows from that single input row. It is commonly used with CROSS JOIN to expand repeated fields in nested data structures.

Q: How does BigQuery compare to Snowflake and Redshift?

BigQuery is best for Google Cloud-native teams and ad hoc analytics on large datasets with no infrastructure to manage. Snowflake excels at cross-cloud workloads and data sharing between organizations. Redshift is the natural choice for AWS-heavy stacks with steady, predictable workloads. All three support standard SQL with dialect-specific differences.

A practical reference for BigQuery SQL covering syntax differences, ARRAY and STRUCT handling, window functions, cost optimization, and the most common mistakes engineers make when moving to BigQuery.

Apr 9, 202615 min read

What Is BigQuery?

Google BigQuery is a fully managed, serverless data warehouse built for large-scale analytics. Unlike traditional databases that require you to provision servers and manage clusters, BigQuery handles all infrastructure automatically. You load data, run queries, and pay for what you scan — there are no virtual warehouses to size, no clusters to tune, and no idle compute costs when queries are not running.

Under the hood, BigQuery uses a columnar storage format called Capacitor and a distributed query engine called Dremel. This architecture makes it exceptional at scanning billions of rows across hundreds of columns in seconds. It is designed for analytical workloads — aggregations, GROUP BY heavy reports, time-series analysis — rather than high-concurrency transactional writes. BigQuery can hold petabytes of data in a single dataset and query across multiple projects without any data movement.

BigQuery SQL vs Standard SQL

BigQuery uses GoogleSQL (formerly BigQuery Standard SQL), which is ANSI SQL-compliant with several important differences you need to know before writing your first query.

Table references use backticks and dot notation. Every table is referenced as project_id.dataset_name.table_name wrapped in backticks. Hyphens in project names are valid but require backtick quoting.

-- Standard SQL (most databases)
SELECT * FROM orders;

-- BigQuery
SELECT * FROM `my-project.sales_data.orders`;

Native ARRAY and STRUCT types. BigQuery supports complex nested data types natively. A single row can contain an array of values or a struct (record) with multiple named fields. This is common with data exported from event tracking systems or APIs.

UNNEST to flatten arrays. To work with array elements as individual rows, use UNNEST. This does not exist in most traditional SQL dialects.

DATE_TRUNC instead of TRUNC for dates. In BigQuery, DATE_TRUNC(date_col, MONTH) truncates a date to the start of the month. Using TRUNC alone applies to numeric values only.

UPDATE requires a WHERE clause. BigQuery DML enforces that every UPDATE and DELETE includes a WHERE clause. You cannot run UPDATE table SET col = val without a filter.

EXCEPT DISTINCT instead of EXCEPT. BigQuery requires EXCEPT DISTINCT rather than bare EXCEPT for set operations.

-- BigQuery set difference
SELECT user_id FROM active_users
EXCEPT DISTINCT
SELECT user_id FROM churned_users;

Basic Queries in BigQuery

Standard SELECT, WHERE, GROUP BY, and ORDER BY all work as expected. The main adjustment is the table reference format.

-- Count orders by status for the last 30 days
SELECT
  status,
  COUNT(*) AS order_count,
  SUM(amount_usd) AS total_revenue
FROM `my-project.ecommerce.orders`
WHERE created_at >= DATE_SUB(CURRENT_DATE(), INTERVAL 30 DAY)
GROUP BY status
ORDER BY total_revenue DESC;

-- Join customers to orders
SELECT
  c.customer_id,
  c.email,
  COUNT(o.order_id) AS lifetime_orders,
  SUM(o.amount_usd) AS lifetime_value
FROM `my-project.ecommerce.customers` AS c
LEFT JOIN `my-project.ecommerce.orders` AS o
  ON c.customer_id = o.customer_id
GROUP BY c.customer_id, c.email
ORDER BY lifetime_value DESC
LIMIT 100;

CTEs work exactly as in standard SQL and are the recommended way to structure complex logic in BigQuery:

WITH monthly_revenue AS (
  SELECT
    DATE_TRUNC(created_at, MONTH) AS month,
    SUM(amount_usd) AS revenue
  FROM `my-project.ecommerce.orders`
  WHERE status = 'completed'
  GROUP BY month
)
SELECT
  month,
  revenue,
  LAG(revenue) OVER (ORDER BY month) AS prev_month_revenue,
  ROUND((revenue - LAG(revenue) OVER (ORDER BY month)) / LAG(revenue) OVER (ORDER BY month) * 100, 2) AS mom_growth_pct
FROM monthly_revenue
ORDER BY month;

BigQuery-Specific Functions

BigQuery ships with a set of functions not available in most other SQL dialects. These cover date math, safe arithmetic, and advanced aggregation.

DATE_TRUNC — Truncate a timestamp or date to a time boundary:

SELECT
  DATE_TRUNC(event_timestamp, WEEK) AS week_start,
  COUNT(*) AS events
FROM `my-project.analytics.events`
GROUP BY week_start
ORDER BY week_start;

TIMESTAMP_DIFF — Calculate the difference between two timestamps in a given unit:

SELECT
  session_id,
  TIMESTAMP_DIFF(session_end, session_start, MINUTE) AS session_duration_minutes
FROM `my-project.analytics.sessions`;

ARRAY_AGG — Aggregate values from multiple rows into a single array:

SELECT
  user_id,
  ARRAY_AGG(product_id ORDER BY purchased_at) AS purchase_sequence
FROM `my-project.ecommerce.orders`
GROUP BY user_id;

STRING_AGG — Concatenate string values across rows with a delimiter:

SELECT
  order_id,
  STRING_AGG(tag, ', ' ORDER BY tag) AS tags
FROM `my-project.ecommerce.order_tags`
GROUP BY order_id;

COUNTIF — Count rows matching a condition without a CASE WHEN:

SELECT
  DATE_TRUNC(created_at, DAY) AS day,
  COUNT(*) AS total_orders,
  COUNTIF(status = 'refunded') AS refunded_orders,
  SAFE_DIVIDE(COUNTIF(status = 'refunded'), COUNT(*)) AS refund_rate
FROM `my-project.ecommerce.orders`
GROUP BY day;

SAFE_DIVIDE — Divide two numbers and return NULL instead of an error when the denominator is zero. This replaces the common NULLIF(denominator, 0) pattern.

COALESCE — Return the first non-NULL value from a list of expressions. Works identically to other SQL dialects:

SELECT
  user_id,
  COALESCE(display_name, email, CAST(user_id AS STRING)) AS best_name
FROM `my-project.users.profiles`;

Working with ARRAYs and STRUCTs

BigQuery's support for nested and repeated fields is one of its most powerful — and most misunderstood — features. Data loaded from JSON APIs, Firebase, or Google Analytics 4 often contains arrays of structs within a single row.

To query inside an array, use UNNEST with a CROSS JOIN (or the shorthand comma syntax). Each element of the array becomes a new row in the result:

-- GA4-style events table with an items array
SELECT
  event_date,
  event_name,
  item.item_id,
  item.item_name,
  item.price
FROM `my-project.analytics.events`
CROSS JOIN UNNEST(items) AS item
WHERE event_name = 'purchase'
  AND event_date = '2026-04-01';

Accessing nested STRUCT fields uses dot notation. If a column user_info is of type STRUCT, access its fields with user_info.email:

SELECT
  event_timestamp,
  user_info.email,
  user_info.country,
  device.category AS device_type
FROM `my-project.analytics.sessions`
WHERE user_info.country = 'US';

You can also filter inside arrays without flattening using EXISTS or IN with a subquery over UNNEST:

-- Sessions where at least one page was the pricing page
SELECT session_id, user_id
FROM `my-project.analytics.sessions`
WHERE 'pricing' IN UNNEST(pages_visited);

Working with BigQuery? Connect your project to AI2SQL and query nested data in plain English — no UNNEST syntax required.

BigQuery Window Functions

BigQuery supports the full set of SQL window functions. The syntax is identical to standard SQL, making it straightforward if you have used window functions in PostgreSQL or Snowflake. See our complete window functions guide for deeper coverage.

Common patterns in BigQuery analytics:

-- Running total and 7-day moving average of daily revenue
SELECT
  day,
  daily_revenue,
  SUM(daily_revenue) OVER (ORDER BY day) AS running_total,
  AVG(daily_revenue) OVER (
    ORDER BY day
    ROWS BETWEEN 6 PRECEDING AND CURRENT ROW
  ) AS moving_avg_7d
FROM (
  SELECT
    DATE(created_at) AS day,
    SUM(amount_usd) AS daily_revenue
  FROM `my-project.ecommerce.orders`
  WHERE status = 'completed'
  GROUP BY day
)
ORDER BY day;

-- Rank customers by lifetime value within each country
SELECT
  customer_id,
  country,
  lifetime_value,
  RANK() OVER (PARTITION BY country ORDER BY lifetime_value DESC) AS country_rank,
  NTILE(10) OVER (ORDER BY lifetime_value DESC) AS global_decile
FROM `my-project.ecommerce.customer_summary`;

-- Find the first purchase date per user
SELECT DISTINCT
  user_id,
  FIRST_VALUE(created_at) OVER (
    PARTITION BY user_id
    ORDER BY created_at
    ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
  ) AS first_purchase_date
FROM `my-project.ecommerce.orders`;

Cost Optimization

BigQuery pricing is based on bytes processed (on-demand pricing) or reserved slots. Because it is columnar, the most impactful optimization is column selection. Every column you omit from your SELECT clause is a column BigQuery does not read.

Avoid SELECT *. On a wide table with 100 columns, selecting all of them forces a full column scan. If you only need 5 columns, you pay for 5 columns.

-- Expensive: scans all columns
SELECT * FROM `my-project.ecommerce.orders`;

-- Cheap: scans only what you need
SELECT order_id, customer_id, amount_usd, status
FROM `my-project.ecommerce.orders`;

Partition by date. BigQuery supports partitioned tables, most commonly by a DATE or TIMESTAMP column. When you filter on the partition column, BigQuery skips entire partitions rather than scanning the full table. Always include a partition filter in your WHERE clause when querying partitioned tables.

-- Partition pruning: only reads data for the specified date range
SELECT event_name, COUNT(*) AS cnt
FROM `my-project.analytics.events`
WHERE event_date BETWEEN '2026-03-01' AND '2026-03-31'
GROUP BY event_name;

Use clustering keys. Clustering sorts table data by one or more columns within each partition. Queries that filter on cluster keys skip blocks of data, reducing bytes scanned without the overhead of a full table scan.

Preview bytes before running. In the BigQuery Console, the query validator shows estimated bytes processed before you click Run. Use this to catch expensive queries before they execute.

Use approximate aggregation for large datasets. Functions like APPROX_COUNT_DISTINCT are significantly faster and cheaper than exact COUNT(DISTINCT) and return results within 1% accuracy for most use cases.

-- Exact (slower, more expensive on large tables)
SELECT COUNT(DISTINCT user_id) FROM `my-project.analytics.events`;

-- Approximate (faster, cheaper, ~1% error)
SELECT APPROX_COUNT_DISTINCT(user_id) FROM `my-project.analytics.events`;

BigQuery vs Snowflake vs Redshift

All three platforms handle analytical SQL workloads well. The right choice depends on your cloud provider, data volume patterns, and team preferences. See also our Snowflake SQL tutorial for side-by-side syntax comparisons.

Factor	BigQuery	Snowflake	Redshift
Cloud	Google Cloud (GCP)	Multi-cloud (AWS, GCP, Azure)	AWS
Pricing model	Bytes scanned or slots	Compute credits by second	Node-based or Serverless
Infrastructure	Fully serverless	Virtual warehouses (managed)	Clusters or Serverless
Nested data	Native ARRAY/STRUCT	VARIANT (semi-structured)	SUPER type
Best for	GCP teams, ad hoc analytics	Cross-cloud data sharing	AWS-native steady workloads
SQL dialect	GoogleSQL	Snowflake SQL	PostgreSQL-compatible

Common BigQuery SQL Mistakes

These are the errors that trip up engineers moving to BigQuery from other platforms.

Forgetting backticks on table names. BigQuery table references require backticks when the name contains hyphens or dots. Omitting them causes a syntax error.

-- Wrong
SELECT * FROM my-project.dataset.table;

-- Correct
SELECT * FROM `my-project.dataset.table`;

Using LIMIT to reduce cost. LIMIT controls the number of rows returned but does not reduce bytes scanned. BigQuery scans the full dataset before applying LIMIT. Use WHERE filters and column selection to reduce cost.

Implicit cross joins on arrays. If you reference an array field in the SELECT without UNNEST, BigQuery returns the entire array as a single value. Use CROSS JOIN UNNEST when you want one row per array element.

Using non-partition filters on partitioned tables. If your WHERE clause filters on a non-partition column, BigQuery cannot prune partitions and will scan the full table. Check that your WHERE clause includes the partition column directly.

Treating DATE and TIMESTAMP as interchangeable. BigQuery has separate DATE, TIME, DATETIME, and TIMESTAMP types. TIMESTAMP stores UTC time with timezone awareness; DATETIME does not. Mixing them in comparisons produces type errors or unexpected results. Use explicit casting when needed:

-- Cast a TIMESTAMP to DATE for partition filtering
SELECT *
FROM `my-project.analytics.events`
WHERE DATE(event_timestamp) = '2026-04-09';

Writing UPDATE without WHERE. BigQuery requires a WHERE clause on every UPDATE and DELETE. A bare UPDATE will fail with a syntax error. If you genuinely need to update all rows, use WHERE TRUE.

For more on joining patterns, see our guide on SQL JOINs explained.

Frequently Asked Questions

Is BigQuery SQL the same as standard SQL?

BigQuery uses GoogleSQL, which is ANSI SQL-compliant with extensions. Most standard SQL knowledge transfers directly. The main differences are the backtick table naming convention, native ARRAY/STRUCT types with UNNEST, and functions like DATE_TRUNC and TIMESTAMP_DIFF.

How do I write a basic SELECT query in BigQuery?

Reference the table using its fully qualified name in backticks: SELECT col1, col2 FROM `project_id.dataset_name.table_name` WHERE condition LIMIT 100. If you are working within the same project, you can omit the project prefix.

Why does SELECT * cost more in BigQuery?

BigQuery charges by bytes scanned, not rows. Because it is columnar, each column is stored separately. Selecting all columns forces BigQuery to read every column's data. Selecting only the columns you need can reduce query costs by 50-90% on wide tables.

What is UNNEST in BigQuery?

UNNEST flattens an ARRAY column into individual rows. A row with an array of 5 items produces 5 output rows after UNNEST. It is commonly paired with CROSS JOIN to expand repeated fields in nested data loaded from JSON or event tracking systems.

How does BigQuery compare to Snowflake and Redshift?

BigQuery is best for GCP-native teams and ad hoc large-scale analytics with no infrastructure management. Snowflake excels at cross-cloud workloads and real-time data sharing. Redshift is the natural fit for AWS-centric stacks with predictable, steady query workloads. All three support standard SQL with dialect-specific differences.