cvat

What is cvat?

CVAT (Computer Vision Annotation Tool) is a free, open-source web application that lets people draw boxes, lines, masks, and other labels on images and videos so computers can learn to recognize what’s in them.

Let's break it down

Computer Vision: teaching computers to see and understand pictures or video.
Annotation: adding notes or marks (like boxes or outlines) that tell the computer what each part of the picture is.
Tool: a software program you use to do the work.
Open-source: anyone can look at the code, use it for free, and change it if they want.
Web application: you run it in a browser, not on a single desktop program.
Images and videos: the kinds of visual data you can label.

Why does it matter?

Good labels are the foundation of any computer-vision model; without accurate annotations the AI will learn the wrong things. CVAT makes labeling faster, cheaper, and easier for teams, helping projects move from data collection to a working model more quickly.

Where is it used?

Autonomous driving: labeling street-level photos and video to teach cars to spot pedestrians, traffic signs, and other vehicles.
Medical imaging: outlining tumors or organs in scans so AI can assist doctors with diagnosis.
Retail & e-commerce: marking products on shelf photos for inventory tracking and visual search.
Robotics: annotating objects in a robot’s camera view so it can grasp or avoid them safely.

Good things about it

Free and open-source - no licensing fees.
Runs in a browser, so multiple users can work together on the same project.
Supports many annotation types (boxes, polygons, masks, points, tracks) and many file formats.
Built-in tools for video frame interpolation speed up labeling of video sequences.
Easy integration with popular machine-learning pipelines (e.g., exporting to COCO, YOLO, TFRecord).

Not-so-good things

Requires a server (Docker or similar) to set up, which can be a hurdle for non-technical users.
Can become slow or laggy when handling very large datasets or high-resolution video.
The user interface, while functional, feels less polished than some commercial alternatives.
Advanced features like semi-automatic labeling or AI-assisted suggestions are limited compared to paid tools.