Snowflake18 min read

Snowflake File Formats Explained for Data Engineers

Learn Snowflake file formats with real-world examples, CSV, JSON, Parquet, COPY INTO, file format parameters, compression, error handling, and enterprise ETL best practices.

2026-05-22

Part of Series

Snowflake Fundamentals

Progress

9/9

← Previous Article

Snowflake Table Types Explained

Current Article

Snowflake File Formats Explained for Data Engineers

Part 9

Snowflake File Formats Explained for Data Engineers

When loading data into Snowflake, one of the most important concepts is:

File Formats

Because before Snowflake loads any file, it must understand:

  • How data is structured
  • How columns are separated
  • How dates are formatted
  • How null values are represented
  • Whether files are compressed
  • How JSON or Parquet should be interpreted

Without correct file formats:

  • Data loads incorrectly
  • Pipelines fail
  • Columns shift
  • Parsing errors occur
  • ETL jobs become unreliable

Understanding Snowflake file formats is extremely important for:

  • Data ingestion
  • ETL pipelines
  • Batch processing
  • API ingestion
  • Cloud storage integration
  • Real-world Data Engineering projects

Let’s understand everything step-by-step in the simplest possible way with practical examples.


Why File Formats Matter

Think of file formats like reading instructions.

Suppose someone gives you a recipe.

Recipe says:

1 cup sugar

But you read it as:

1 spoon sugar

Result:

  • Entire dish becomes incorrect

Data loading works similarly.

If Snowflake misunderstands the structure of incoming files:

  • Data becomes corrupted
  • Columns shift incorrectly
  • Parsing errors happen

This is why file formats are critical in production pipelines.


What is a File Format in Snowflake?

A File Format tells Snowflake:

How to interpret incoming files.

It defines:

  • File type
  • Delimiters
  • Compression
  • Date format
  • Null handling
  • Quoting rules
  • Escape characters
  • JSON parsing behavior

Snowflake uses this information during:

  • COPY INTO
  • External stage loading
  • Snowpipe ingestion
  • ETL pipelines

Supported File Formats in Snowflake

Snowflake supports multiple file formats.


Structured File Formats

FormatUsage
CSVTraditional tabular data
TSVTab-separated files

Semi-Structured File Formats

FormatUsage
JSONAPI payloads, logs
XMLEnterprise integrations
AvroStreaming systems
ORCHadoop ecosystem
ParquetBig Data analytics

Most Commonly Used Formats

In real-world Data Engineering:

FormatMost Common Use
CSVClient files, reports
JSONAPIs, event logs
ParquetBig Data pipelines

These are the most important formats to understand.


Why Named File Formats Are Important

Many beginners directly write parameters inside COPY INTO statements.

This becomes messy in enterprise systems.

Instead:

Create reusable named file formats.


Benefits of Named File Formats

1. Reusable

Same format can be used across multiple pipelines.


2. Cleaner COPY INTO Commands

ETL code becomes easier to maintain.


3. Reduces Errors

Avoid repeated parameter mistakes.


4. Centralized Management

Update format once → all pipelines benefit.


Real-World Example

Suppose:

  • Daily marketing CSV files arrive
  • All have same structure

Instead of rewriting parameters daily:

Create reusable file format once.

Very common in enterprise ETL pipelines.


Basic File Format Syntax

Example

CREATE FILE FORMAT my_csv_format TYPE = CSV;

This creates a simple CSV file format.

But in real-world systems:

More parameters are usually required.


Important File Format Parameters

These are extremely important in real Data Engineering projects.


1. TYPE

Defines file type.


Examples

TYPE = CSV
TYPE = JSON
TYPE = PARQUET

This is mandatory.


2. FIELD_DELIMITER

Used mainly in CSV files.

Defines how columns are separated.


Example

FIELD_DELIMITER = ','

Other Common Delimiters

DelimiterUsage
,CSV
|Pipe-separated
\tTab-separated

Real-World Example

Client files often use:

|

instead of commas.

Very common in banking and telecom projects.


3. SKIP_HEADER

Used to skip header rows.


Example

SKIP_HEADER = 1

This ignores:

name,country,revenue

during loading.


Real-World Example

Almost all client CSV files contain headers.

This parameter is extremely common.


4. FIELD_OPTIONALLY_ENCLOSED_BY

Handles quoted text fields.


Example

FIELD_OPTIONALLY_ENCLOSED_BY = '"'

Example Input File

"Soumya","India","1000"

Without proper handling:

  • Quotes may load incorrectly
  • Parsing errors may occur

Very important in messy CSV files.


5. NULL_IF

Defines which values should be treated as NULL.


Example

NULL_IF = ('NULL', 'null', '')

Real-World Example

Client files may contain:

NULL N/A blank values

This parameter helps standardize missing values.

Very important for data quality.


6. COMPRESSION

Defines whether files are compressed.

Compressed files:

  • Reduce storage
  • Improve transfer speed
  • Improve loading performance

Example

COMPRESSION = GZIP

Supported Compression Types

CompressionUsage
GZIPMost common
BZIP2Large compressed files
AUTOAuto-detection

Best Practice

Use:

COMPRESSION = AUTO

in most pipelines.


7. DATE_FORMAT / TIME_FORMAT

Defines how dates should be parsed.


Example

DATE_FORMAT = 'YYYY-MM-DD'

Real-World Example

In enterprise systems:

  • Different countries send different formats
  • Some use DD/MM/YYYY
  • Some use MM/DD/YYYY

Incorrect parsing can create huge data quality problems.


8. ON_ERROR (Error Handling)

Defines behavior when bad records appear.


Example

ON_ERROR = CONTINUE

Common Options

OptionBehavior
ABORT_STATEMENTStop entire load
CONTINUESkip bad rows
SKIP_FILESkip entire file

Real-World Example

Suppose:

  • One row corrupted in 10 million rows

You may prefer:

CONTINUE

instead of failing entire ETL job.

Very common enterprise decision.


Full Real-World CSV File Format Example

CREATE OR REPLACE FILE FORMAT marketing_csv_format TYPE = CSV FIELD_DELIMITER = ',' SKIP_HEADER = 1 FIELD_OPTIONALLY_ENCLOSED_BY = '"' NULL_IF = ('NULL', '') COMPRESSION = AUTO;

This is a production-style reusable file format.


Using File Format with COPY INTO

File formats are commonly used with:

COPY INTO

which loads data into Snowflake tables.


Example

COPY INTO sales_data FROM @my_stage FILE_FORMAT = ( FORMAT_NAME = marketing_csv_format );

This is one of the most common Snowflake ingestion patterns.


Real-World ETL Flow

Typical enterprise loading flow:

S3 / Azure Blob / GCS Stage File Format COPY INTO Snowflake Table

Understanding this flow is critical for Data Engineers.


JSON File Format Example

Snowflake handles JSON extremely well.


Example

CREATE FILE FORMAT json_format TYPE = JSON STRIP_OUTER_ARRAY = TRUE;

Querying JSON

SELECT data:country FROM raw_json;

Snowflake directly reads JSON fields.

Very powerful feature.


Real-World Example

Used for:

  • API ingestion
  • Kafka events
  • Event logs
  • Application telemetry
  • Streaming data

Very common in modern cloud architectures.


What is STRIP_OUTER_ARRAY?

Suppose JSON looks like:

[ { "id": 1 }, { "id": 2 } ]

Snowflake can flatten outer array automatically using:

STRIP_OUTER_ARRAY = TRUE

Very common interview topic.


Parquet File Format Example

Parquet is extremely important in Big Data systems.


Example

CREATE FILE FORMAT parquet_format TYPE = PARQUET;

Parquet provides:

  • Columnar storage
  • Compression
  • Faster analytics
  • Better performance
  • Reduced storage cost

Very common in:

  • Spark
  • Databricks
  • Data Lakes
  • Lakehouse architectures

Real-World Example

Modern pipelines often move:

Spark → Parquet → Snowflake

This is extremely common in enterprise Data Engineering.


Common File Loading Problems

1. Wrong Delimiter

Columns shift incorrectly.


2. Date Parsing Failure

Incorrect date formats cause load errors.


3. Null Handling Issues

Blank values may become incorrect strings.


4. Quoted Text Parsing Problems

Improper enclosure handling breaks CSV parsing.


5. Compression Mismatch

Wrong compression setting causes load failure.

Understanding file formats properly avoids these problems.


Best Practices for Snowflake File Formats

1. Always Use Named File Formats

Cleaner and reusable.


2. Test with Small Files First

Avoid large production failures.


3. Use Compression

Improves performance and storage efficiency.


4. Handle NULL Values Carefully

Important for data quality.


5. Define Proper Date Formats

Avoid regional parsing issues.


6. Use ON_ERROR Carefully

Depends on business requirements.


7. Standardize Enterprise File Formats

Very important for scalable ETL systems.


Real-World Enterprise Scenario

Suppose company receives:

  • CSV files from vendors
  • JSON from APIs
  • Parquet from Spark pipelines

Each source requires:

  • Different parsing rules
  • Different file formats
  • Different ingestion logic

This is why reusable file format design is critical in enterprise Snowflake platforms.


Common Snowflake Interview Questions

Beginner Level

  1. What is a file format in Snowflake?
  2. Why use named file formats?
  3. What is FIELD_DELIMITER?
  4. What is SKIP_HEADER?
  5. How does COPY INTO work?

Intermediate Level

  1. What is FIELD_OPTIONALLY_ENCLOSED_BY?
  2. What is STRIP_OUTER_ARRAY?
  3. Why use compression?
  4. Difference between CSV and Parquet?
  5. How does ON_ERROR work?

Scenario-Based Questions

  1. How would you handle messy client CSV files?
  2. Why did columns shift during loading?
  3. How would you load JSON into Snowflake?
  4. Why is Parquet preferred in Big Data pipelines?
  5. How would you design reusable enterprise file formats?

These are very common Snowflake interview topics.


What’s Next?

In the next article, we will learn:

Snowflake Stages Explained

We will cover:

  • Internal stages
  • External stages
  • Named stages
  • User stages
  • Table stages
  • S3 integration
  • Azure Blob integration
  • Real-world ingestion architecture

because stages are the backbone of Snowflake data loading pipelines.


If this article helped you, continue following the Snowflake Fundamentals learning path to master Snowflake step-by-step with practical examples, interview-focused concepts, architecture understanding, and real-world Data Engineering scenarios.

Soumya Ranjan Bisoyi

Written By

Soumya Ranjan Bisoyi

Data Engineer • Mentor • Educator

Helping aspiring Data Engineers learn SQL, Spark, Snowflake, Azure, and real-world Data Engineering concepts through practical, beginner-friendly content.