columnar

What is columnar?

Columnar (or column‑oriented) refers to a way of storing data where all the values of a single column are kept together, instead of storing complete rows one after another. Think of a spreadsheet turned on its side: each column’s data is stored in a continuous block.

Let's break it down

Row‑based storage: A row contains all fields for one record (e.g., name, age, city) stored together.
Columnar storage: All values for the same field (e.g., all names, then all ages, then all cities) are stored in separate, contiguous blocks.
Because similar data types sit side by side, the system can compress them more efficiently and read only the columns needed for a query.

Why does it matter?

When you run analytical queries that only need a few columns from a huge table, a columnar layout lets the engine skip the irrelevant data, dramatically reducing I/O and speeding up results. It also enables better compression, saving storage space and further improving performance.

Where is it used?

Data warehouses (e.g., Amazon Redshift, Snowflake, Google BigQuery)
Analytical databases (e.g., ClickHouse, Apache Parquet files, Apache ORC)
Business intelligence tools that query large datasets for reporting and dashboards
Big‑data processing frameworks like Apache Spark when reading columnar file formats.

Good things about it

Fast read performance for analytics and reporting
High compression ratios because similar values are stored together
Selective column reading reduces I/O and network traffic
Efficient for aggregations (sums, averages, counts) on large datasets
Works well with modern hardware (SSD, parallel processing).

Not-so-good things

Slower write and update performance; inserting or modifying rows can be costly.
Not ideal for transactional workloads (OLTP) that need frequent single‑row inserts/updates.
Complexity in data loading pipelines; may require batch loading or special tools.
Potentially higher memory usage during query execution if many columns are accessed simultaneously.