What is columnar?

Columnar (or column‑oriented) refers to a way of storing data where all the values of a single column are kept together, instead of storing complete rows one after another. Think of a spreadsheet turned on its side: each column’s data is stored in a continuous block.

Let's break it down

  • Row‑based storage: A row contains all fields for one record (e.g., name, age, city) stored together.
  • Columnar storage: All values for the same field (e.g., all names, then all ages, then all cities) are stored in separate, contiguous blocks.
  • Because similar data types sit side by side, the system can compress them more efficiently and read only the columns needed for a query.

Why does it matter?

When you run analytical queries that only need a few columns from a huge table, a columnar layout lets the engine skip the irrelevant data, dramatically reducing I/O and speeding up results. It also enables better compression, saving storage space and further improving performance.

Where is it used?

  • Data warehouses (e.g., Amazon Redshift, Snowflake, Google BigQuery)
  • Analytical databases (e.g., ClickHouse, Apache Parquet files, Apache ORC)
  • Business intelligence tools that query large datasets for reporting and dashboards
  • Big‑data processing frameworks like Apache Spark when reading columnar file formats.

Good things about it

  • Fast read performance for analytics and reporting
  • High compression ratios because similar values are stored together
  • Selective column reading reduces I/O and network traffic
  • Efficient for aggregations (sums, averages, counts) on large datasets
  • Works well with modern hardware (SSD, parallel processing).

Not-so-good things

  • Slower write and update performance; inserting or modifying rows can be costly.
  • Not ideal for transactional workloads (OLTP) that need frequent single‑row inserts/updates.
  • Complexity in data loading pipelines; may require batch loading or special tools.
  • Potentially higher memory usage during query execution if many columns are accessed simultaneously.