What is cvat?

CVAT (Computer Vision Annotation Tool) is a free, open-source web application that lets people draw boxes, lines, masks, and other labels on images and videos so computers can learn to recognize what’s in them.

Let's break it down

  • Computer Vision: teaching computers to see and understand pictures or video.
  • Annotation: adding notes or marks (like boxes or outlines) that tell the computer what each part of the picture is.
  • Tool: a software program you use to do the work.
  • Open-source: anyone can look at the code, use it for free, and change it if they want.
  • Web application: you run it in a browser, not on a single desktop program.
  • Images and videos: the kinds of visual data you can label.

Why does it matter?

Good labels are the foundation of any computer-vision model; without accurate annotations the AI will learn the wrong things. CVAT makes labeling faster, cheaper, and easier for teams, helping projects move from data collection to a working model more quickly.

Where is it used?

  • Autonomous driving: labeling street-level photos and video to teach cars to spot pedestrians, traffic signs, and other vehicles.
  • Medical imaging: outlining tumors or organs in scans so AI can assist doctors with diagnosis.
  • Retail & e-commerce: marking products on shelf photos for inventory tracking and visual search.
  • Robotics: annotating objects in a robot’s camera view so it can grasp or avoid them safely.

Good things about it

  • Free and open-source - no licensing fees.
  • Runs in a browser, so multiple users can work together on the same project.
  • Supports many annotation types (boxes, polygons, masks, points, tracks) and many file formats.
  • Built-in tools for video frame interpolation speed up labeling of video sequences.
  • Easy integration with popular machine-learning pipelines (e.g., exporting to COCO, YOLO, TFRecord).

Not-so-good things

  • Requires a server (Docker or similar) to set up, which can be a hurdle for non-technical users.
  • Can become slow or laggy when handling very large datasets or high-resolution video.
  • The user interface, while functional, feels less polished than some commercial alternatives.
  • Advanced features like semi-automatic labeling or AI-assisted suggestions are limited compared to paid tools.