Snowflake SQL Tutorial: Complete Guide with Examples (2026)
A practical Snowflake SQL guide covering everything from basic queries to VARIANT, FLATTEN, QUALIFY, TIME TRAVEL, window functions, and performance optimization — with working examples for each feature.
What is Snowflake?
Snowflake is a cloud-native data warehouse that separates storage and compute, letting you scale each independently. Unlike traditional on-premise databases, Snowflake runs entirely in the cloud on AWS, Azure, or GCP and charges only for the compute you actually use. Data teams at companies of all sizes use it to centralize analytics, run ETL pipelines, and query terabytes of structured and semi-structured data without managing infrastructure.
If you already know SQL, most of what you know transfers directly to Snowflake. Standard DML, DDL, CTEs, joins, aggregations, and subqueries all work as expected. Where Snowflake diverges is in its extensions for semi-structured data, historical queries, and filtering patterns — and those differences are worth learning specifically because they unlock capabilities that standard SQL cannot match.
Snowflake SQL vs Standard SQL
Snowflake is ANSI SQL compatible, meaning queries you already know will run without modification. But Snowflake adds several first-class extensions that distinguish it from traditional databases:
| Feature | Standard SQL | Snowflake |
|---|---|---|
| Semi-structured data | JSON stored as text | Native VARIANT type |
| Array expansion | Not standard | FLATTEN function |
| Window filter | Subquery required | QUALIFY clause |
| Historical queries | Not available | TIME TRAVEL (AT / BEFORE) |
| Pivot | Vendor-specific | Native PIVOT / UNPIVOT |
| Null-safe arithmetic | COALESCE / NULLIF | ZEROIFNULL, NULLIFZERO |
The biggest practical differences appear when your data includes JSON payloads or when you need to debug a query that ran an hour ago. The rest of this guide walks through each of these areas with runnable examples.
Getting Started: Basic Queries
If you are new to Snowflake, the fastest way to start is the Snowflake web UI (Snowsight) or any JDBC/ODBC client. Once connected, standard SQL works immediately:
-- Create a simple orders table
CREATE TABLE orders (
order_id INT AUTOINCREMENT PRIMARY KEY,
customer_id INT,
product VARCHAR(100),
amount DECIMAL(10,2),
created_at TIMESTAMP_NTZ DEFAULT CURRENT_TIMESTAMP()
);
-- Insert sample rows
INSERT INTO orders (customer_id, product, amount) VALUES
(101, 'Pro Plan', 99.00),
(102, 'Starter Plan', 29.00),
(101, 'Pro Plan', 99.00),
(103, 'Team Plan', 199.00),
(102, 'Pro Plan', 99.00);
-- Basic aggregation — identical to standard SQL
SELECT
customer_id,
COUNT(*) AS total_orders,
SUM(amount) AS total_spent,
AVG(amount) AS avg_order
FROM orders
GROUP BY customer_id
ORDER BY total_spent DESC;
Snowflake uses TIMESTAMP_NTZ (no timezone), TIMESTAMP_LTZ (local timezone), and TIMESTAMP_TZ (with timezone offset) as distinct types, unlike most databases that have a single TIMESTAMP. For analytics work, TIMESTAMP_NTZ is usually the right choice.
Snowflake-Specific Functions
Several functions in Snowflake have no direct equivalent in standard SQL. These come up frequently in real-world queries:
IFF — Inline conditional
IFF(condition, true_val, false_val) is a shorthand for a single-branch CASE expression. It makes simple conditionals more readable:
SELECT
order_id,
amount,
IFF(amount >= 99, 'high_value', 'standard') AS tier
FROM orders;
ZEROIFNULL and NULLIFZERO
ZEROIFNULL(expr) returns 0 when the expression is NULL, avoiding broken arithmetic in aggregations. NULLIFZERO(expr) does the inverse — useful for preventing division-by-zero errors:
SELECT
customer_id,
ZEROIFNULL(SUM(amount)) AS total_spent,
100 / NULLIFZERO(COUNT(DISTINCT product)) AS revenue_per_product
FROM orders
GROUP BY customer_id;
TO_DATE and date conversion
Snowflake's TO_DATE accepts both standard ISO strings and custom format patterns:
-- Convert string to date with explicit format
SELECT TO_DATE('09/04/2026', 'DD/MM/YYYY');
-- Extract date parts
SELECT
DATE_TRUNC('month', created_at) AS month,
EXTRACT(dow FROM created_at) AS day_of_week,
DATEDIFF('day', '2026-01-01', created_at) AS days_since_jan
FROM orders;
PARSE_JSON
PARSE_JSON converts a VARCHAR containing JSON into a VARIANT value so you can use Snowflake's native path operators on it:
SELECT PARSE_JSON('{"plan": "pro", "seats": 5}'):plan::VARCHAR AS plan_name;
-- Returns: pro
Working with Semi-Structured Data (JSON / VARIANT)
Snowflake's VARIANT type is its most distinctive feature. You can store JSON, arrays, or any nested structure in a single column and query it using colon notation. This is far more efficient than storing JSON as text and parsing it in application code.
-- Table with a VARIANT column for raw event data
CREATE TABLE events (
event_id INT AUTOINCREMENT,
user_id INT,
event_data VARIANT,
logged_at TIMESTAMP_NTZ DEFAULT CURRENT_TIMESTAMP()
);
-- Insert JSON directly — Snowflake parses it automatically
INSERT INTO events (user_id, event_data) SELECT
201,
PARSE_JSON('{"event":"page_view","page":"/pricing","duration_ms":1240,"tags":["paid","trial"]}');
INSERT INTO events (user_id, event_data) SELECT
202,
PARSE_JSON('{"event":"signup","plan":"pro","referrer":"google","tags":["organic","new_user"]}');
Once the data is stored, you navigate nested fields using the colon operator and cast to a concrete type with :::
-- Access nested fields
SELECT
user_id,
event_data:event::VARCHAR AS event_name,
event_data:page::VARCHAR AS page,
event_data:duration_ms::INT AS duration_ms
FROM events
WHERE event_data:event::VARCHAR = 'page_view';
FLATTEN — expanding arrays
When a VARIANT column contains an array, FLATTEN turns each array element into its own row. This is the standard way to unnest in Snowflake:
-- Expand the "tags" array: each tag becomes a separate row
SELECT
e.user_id,
e.event_data:event::VARCHAR AS event_name,
f.value::VARCHAR AS tag
FROM events e,
LATERAL FLATTEN(INPUT => e.event_data:tags) f;
The result has one row per tag per event, making it easy to count or filter by individual tag values without any string splitting.
Working with Snowflake? Connect your Snowflake warehouse to AI2SQL and query it in plain English — no SQL required.
Window Functions in Snowflake
Snowflake supports the full range of standard SQL window functions, including ranking, offset, and aggregate variants. The syntax is identical to PostgreSQL: function() OVER (PARTITION BY ... ORDER BY ... ROWS/RANGE ...).
-- Running total of revenue and rank per customer
SELECT
order_id,
customer_id,
amount,
SUM(amount) OVER (PARTITION BY customer_id ORDER BY created_at
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS running_total,
RANK() OVER (ORDER BY amount DESC) AS amount_rank,
LAG(amount) OVER (PARTITION BY customer_id ORDER BY created_at) AS prev_order_amount
FROM orders;
One practical difference in Snowflake is that window functions run on virtual warehouses and are eligible for result caching — if the underlying data has not changed, re-running the same window query returns the cached result almost instantly.
Time Travel Queries
Snowflake TIME TRAVEL is one of its most operationally useful features. It lets you query any table as it existed at a past point in time, using either an absolute timestamp or a specific statement ID. The retention period defaults to 1 day and can be extended to 90 days on Enterprise edition.
-- Query the orders table as it was 2 hours ago
SELECT *
FROM orders
AT (OFFSET => -60 * 60 * 2);
-- Query as it was at a specific timestamp
SELECT *
FROM orders
AT (TIMESTAMP => '2026-04-09 08:00:00'::TIMESTAMP_NTZ);
-- Query the state just BEFORE a specific query ran
-- (useful for recovering from accidental deletes or updates)
SELECT *
FROM orders
BEFORE (STATEMENT => '01b33e2e-0001-b9e4-0000-0001a8f26b1e');
A common production use case is recovering from an accidental DELETE. Find the query ID of the delete statement in Snowflake's query history, then use BEFORE (STATEMENT => ...) to restore the rows:
-- Restore rows deleted by a specific statement
INSERT INTO orders
SELECT *
FROM orders
BEFORE (STATEMENT => '<query_id_of_the_delete>')
WHERE order_id NOT IN (SELECT order_id FROM orders);
The QUALIFY Clause
QUALIFY is a Snowflake-specific clause that filters the results of a window function in the same query, without requiring a CTE or subquery. It is evaluated after SELECT but before ORDER BY, making it the cleanest way to write "top N per group" queries.
Without QUALIFY, you need a wrapper subquery:
-- Standard SQL: requires CTE or subquery
WITH ranked AS (
SELECT
customer_id,
order_id,
amount,
ROW_NUMBER() OVER (PARTITION BY customer_id ORDER BY created_at DESC) AS rn
FROM orders
)
SELECT customer_id, order_id, amount
FROM ranked
WHERE rn = 1;
With QUALIFY, you write it inline:
-- Snowflake QUALIFY: no CTE needed
SELECT
customer_id,
order_id,
amount
FROM orders
QUALIFY ROW_NUMBER() OVER (PARTITION BY customer_id ORDER BY created_at DESC) = 1;
QUALIFY works with any window function, not just ROW_NUMBER. You can also use it to keep only rows where the running total exceeds a threshold, or where the current row is above the partition average:
-- Keep only rows where the order amount is above the customer's own average
SELECT customer_id, order_id, amount
FROM orders
QUALIFY amount > AVG(amount) OVER (PARTITION BY customer_id);
Performance Tips
Snowflake handles most query optimization automatically, but a few patterns have a significant impact on query speed and credit consumption.
Result caching
Snowflake caches query results for 24 hours. If you re-run the exact same query and the underlying data has not changed, Snowflake returns the cached result without using any warehouse credits. This works across sessions and users, which is highly cost-effective for dashboards that query the same data repeatedly.
-- This second run will hit the result cache (zero credits used)
SELECT customer_id, SUM(amount) FROM orders GROUP BY customer_id;
Clustering keys
By default, Snowflake prunes micro-partitions based on ingestion order. If your queries frequently filter on a non-ingestion-order column (like created_at when data was loaded out of order), defining a clustering key lets Snowflake skip entire micro-partitions:
-- Define a clustering key on the date column
ALTER TABLE orders CLUSTER BY (DATE_TRUNC('day', created_at));
-- Check clustering health
SELECT SYSTEM$CLUSTERING_INFORMATION('orders', '(DATE_TRUNC(''day'', created_at))');
LIMIT and early termination
Snowflake evaluates LIMIT after all other clauses, so adding LIMIT 10 does not skip the full scan on a large table. For exploratory work, use SAMPLE to read a statistical subset without a full table scan:
-- Read a random 1% sample — much faster than a full scan for exploration
SELECT * FROM orders SAMPLE (1);
Common Snowflake SQL Mistakes
These are the errors most teams hit when they first move workloads to Snowflake:
- Forgetting to cast VARIANT fields. Snowflake does not auto-cast when you access a VARIANT path.
event_data:amountis still VARIANT — you must writeevent_data:amount::DECIMAL(10,2)for arithmetic to work correctly. - Using TIMESTAMP without specifying the timezone variant. Snowflake has three TIMESTAMP types. Mixing
TIMESTAMP_NTZandTIMESTAMP_TZin joins or comparisons can produce subtle off-by-hour bugs, especially across daylight saving time boundaries. - Treating FLATTEN output as a single row. FLATTEN produces one row per array element with several columns:
seq,key,path,index,value, andthis. The actual element you want is invalue, so always referencef.value, not justf. - Running heavy transformations without a large warehouse. Snowflake compute is elastic but not automatic — if a transformation is slow on an X-Small warehouse, resizing to Medium can cut the runtime by 4x for the same total cost.
- Expecting QUALIFY to be available in other databases. QUALIFY is Snowflake-specific. If you need the same logic in PostgreSQL or MySQL, use a subquery or CTE with a WHERE filter on the window function alias.
FAQ
Is Snowflake SQL the same as standard SQL?
Snowflake SQL is largely ANSI SQL compatible, so most standard queries work without changes. Snowflake adds unique extensions including the VARIANT type, FLATTEN, QUALIFY, TIME TRAVEL syntax, and convenience functions like IFF and ZEROIFNULL. These additions make Snowflake significantly more capable for modern analytics workloads, but they are non-standard and will not run in MySQL or PostgreSQL without modification.
What is the VARIANT type in Snowflake?
VARIANT is a flexible Snowflake data type that can store any JSON, Avro, Parquet, or XML value — including deeply nested objects and arrays. You access nested fields with colon notation (column:field::TYPE) and expand arrays with FLATTEN. A VARIANT column is ideal for raw event logs, API responses, and any data where the schema is not fully known at ingestion time.
How far back can TIME TRAVEL go in Snowflake?
The default retention period is 1 day (24 hours). Snowflake Enterprise edition allows up to 90 days of TIME TRAVEL retention per table, configured with DATA_RETENTION_TIME_IN_DAYS. After the retention period expires, the historical data is permanently purged and cannot be recovered through TIME TRAVEL. For longer-term audit trails, use Snowflake Fail-Safe or replicate to an archival table.
When should I use QUALIFY instead of a subquery?
Use QUALIFY any time you need to filter on a window function result and the logic fits in a single SELECT statement. It is most useful for deduplication (keep the latest row per key), top-N-per-group queries, and filtering rows that are above/below a partition-level aggregate. If the filtering logic is complex or needs to be reused across multiple queries, a named CTE is more readable.
Can AI write Snowflake SQL queries?
Yes. AI2SQL supports Snowflake natively and understands Snowflake-specific syntax including QUALIFY, FLATTEN, VARIANT colon notation, and TIME TRAVEL. You describe what you need in plain English — "find the most recent order per customer" or "expand the tags array and count by tag" — and AI2SQL generates the correct Snowflake SQL. This is especially useful for analysts who are new to Snowflake or who need to write complex queries quickly.