What is gz?

A .gz file is a compressed file that uses the gzip format. It takes a regular file (like a text document, image, or program) and squeezes the data into a smaller size using the DEFLATE compression algorithm, then adds a .gz extension to show it’s been compressed.

Let's break it down

  • Original file: Any type of data you want to shrink.
  • Compression algorithm (DEFLATE): Combines two techniques-LZ77 (repeating patterns) and Huffman coding (frequency‑based bit reduction).
  • gzip wrapper: Adds a small header (metadata) and a trailer (checksum) around the compressed data so programs know how to decompress it.
  • Result: A single .gz file that is usually much smaller than the original.

Why does it matter?

  • Saves storage: Smaller files take up less disk space.
  • Speeds up transfers: Less data to send over the internet or network means faster downloads/uploads.
  • Preserves data: Compression is lossless, so the original file can be perfectly restored.

Where is it used?

  • Web servers: HTML, CSS, and JavaScript files are often served gzipped to browsers for quicker page loads.
  • Linux/Unix tools: Commands like gzip, gunzip, zcat, and tar (with the -z flag) handle .gz files for backups and archiving.
  • Software distribution: Source code packages (e.g., .tar.gz) are common in open‑source projects.
  • Data pipelines: Log files and large datasets are compressed with gzip to reduce storage and I/O costs.

Good things about it

  • Widely supported: Almost every operating system and programming language can read/write .gz files.
  • Fast compression/decompression: Good balance between speed and size reduction.
  • Streaming friendly: Can compress or decompress data on the fly without needing the whole file in memory.
  • Standardized: The format is defined by RFC 1952, ensuring consistent behavior across tools.

Not-so-good things

  • Compression ratio: Newer algorithms (e.g., Brotli, Zstandard) often produce smaller files for the same data.
  • Single‑file focus: gzip compresses one file at a time; to bundle many files you need an extra step like creating a tar archive first.
  • Limited metadata: Only basic information (original name, timestamp, checksum) is stored; no permissions or extended attributes.
  • No built‑in encryption: Data is only compressed, not protected, so you need separate tools for security.