YOLOv9 Object Detection Format¶
Overview¶
YOLOv9 is a well-known object detection model in the You Only Look Once (YOLO) series, introduced in the paper YOLOv9: Learning What You Want to Learn Using Programmable Gradient Information. Building upon the foundations of YOLOv5 through YOLOv8, YOLOv9 introduces significant advancements in model architecture and training methodologies, enhancing both accuracy and efficiency. Despite these improvements, YOLOv9 retains the same object detection format as its predecessors, utilizing normalized coordinates in text files. This consistency ensures seamless integration and compatibility with existing workflows while benefiting from YOLOv9's enhanced performance.
Specification of YOLOv9 Detection Format¶
The YOLOv9 detection format remains consistent with previous versions (v5-v8), ensuring ease of adoption and compatibility. Below are the detailed specifications:
-
One Text File per Image:
For every image in your dataset, there exists a corresponding.txt
file containing annotation data. -
Object Representation:
Each line in the text file represents a single object detected within the image, following the format:<class_id> <x_center> <y_center> <width> <height>
<class_id>
(Integer): An integer representing the object's class.<x_center>
and<y_center>
(Float): The normalized coordinates of the object's center relative to the image's width and height.<width>
and<height>
(Float): The normalized width and height of the bounding box encompassing the object.
-
Normalization of Values:
All coordinate and size values are normalized to a range between0.0
and1.0
. -
Normalization Formula:
To convert pixel values to normalized coordinates:
normalized_x = x_pixel / image_width normalized_y = y_pixel / image_height normalized_width = box_width_pixel / image_width normalized_height = box_height_pixel / image_height
-
Class ID Indexing:
Class IDs start from0
, with each ID corresponding to a specific object category defined in thedata.yaml
configuration file. Class IDs must be contiguous integers (e.g., 0,1,2 and not 0,2,3) to ensure proper model training and inference. -
Configuration via
data.yaml
:
Thedata.yaml
file contains essential configuration settings, including paths to training and validation datasets, number of classes (nc
), and a mapping of class names to their respective IDs (names
).
Directory Structure of YOLOv9 Dataset¶
The dataset must maintain a parallel directory structure for images and their corresponding label files. There are two common organizational patterns:
Pattern 1: Images and Labels as Root Directories¶
dataset/
├── data.yaml
├── images/
│ ├── train/
│ │ ├── image1.jpg
│ │ └── image2.jpg
│ └── val/
│ ├── image3.jpg
│ └── image4.jpg
└── labels/
├── train/
│ ├── image1.txt
│ └── image2.txt
└── val/
├── image3.txt
└── image4.txt
Pattern 2: Train/Val as Root Directories¶
dataset/
├── data.yaml
├── train/
│ ├── images/
│ │ ├── image1.jpg
│ │ └── image2.jpg
│ └── labels/
│ ├── image1.txt
│ └── image2.txt
└── val/
├── images/
│ ├── image3.jpg
│ └── image4.jpg
└── labels/
├── image3.txt
└── image4.txt
The corresponding data.yaml
configuration should match your chosen structure:
# For Pattern 1:
path: . # Optional - defaults to current directory if omitted
train: images/train # Path to training images
val: images/val # Path to validation images
# For Pattern 2:
path: . # Optional - defaults to current directory if omitted
train: train/images # Path to training images
val: val/images # Path to validation images
Important: Label files must have the same name as their corresponding image files (excluding the file extension) and must maintain the parallel directory structure, only replacing images
with labels
in the path.
Benefits of YOLOv9 Format¶
- Simplicity: Easy to read and write, facilitating quick dataset preparation.
- Efficiency: Compact representation reduces storage requirements.
- Compatibility: Maintains consistency across YOLO versions, ensuring seamless integration with various tools and frameworks.
Example of YOLOv9 Format¶
Example of data.yaml
:
path: . # Dataset root directory (defaults to current directory if omitted). Can also be `../dataset`.
train: images/train # Directory for training images
val: images/val # Directory for validation images
test: images/test # Directory for test images (optional)
names:
0: cat
1: dog
2: person
Example Annotation
For an image named image1.jpg
, the corresponding image1.txt
might contain:
0 0.716797 0.395833 0.216406 0.147222
1 0.687500 0.379167 0.255208 0.175000
Explanation:
- The first line represents an object of class 0
(e.g., cat
) with its bounding box centered at (0.716797, 0.395833)
relative to the image dimensions, and a width and height of 0.216406
and 0.147222
respectively.
- The second line represents an object of class 1
(e.g., dog
) with its own bounding box specifications.
Normalizing Bounding Box Coordinates for YOLOv9¶
To convert pixel values to normalized values required by YOLOv9:
# Given pixel values and image dimensions
x_top_left = 150 # x coordinate of top-left corner of bounding box
y_top_left = 200 # y coordinate of top-left corner of bounding box
width_pixel = 50 # width of bounding box
height_pixel = 80 # height of bounding box
image_width = 640
image_height = 480
# 1. Convert top-left coordinates to center coordinates (in pixels)
x_center_pixel = x_top_left + (width_pixel / 2) # 150 + (50/2) = 175
y_center_pixel = y_top_left + (height_pixel / 2) # 200 + (80/2) = 240
# 2. Normalize all values (divide by image dimensions)
x_center = x_center_pixel / image_width # 175 / 640 = 0.273438
y_center = y_center_pixel / image_height # 240 / 480 = 0.500000
width = width_pixel / image_width # 50 / 640 = 0.078125
height = height_pixel / image_height # 80 / 480 = 0.166667
# Annotation line format: <class_id> <x_center> <y_center> <width> <height>
annotation = f"0 {x_center} {y_center} {width} {height}"
# Output: "0 0.273438 0.500000 0.078125 0.166667"
Converting Annotations to YOLOv9 Format with Labelformat¶
Our Labelformat framework simplifies the process of converting various annotation formats to the YOLOv9 detection format. Below is a step-by-step guide to perform this conversion.
Installation¶
First, ensure that Labelformat is installed. You can install it via pip:
pip install labelformat
Conversion Example: COCO to YOLOv9¶
Assume you have annotations in the COCO format and wish to convert them to YOLOv9. Here’s how you can achieve this using Labelformat.
Step 1: Prepare Your Dataset
Ensure your dataset follows the standard COCO structure:
- You have a
.json
file with the COCO annotations. (e.g.annotations/instances_train.json
) - You have a directory with the images. (e.g.
images/
)
Full example:
dataset/
├── annotations/
│ └── instances_train.json
├── images/
│ ├── image1.jpg
│ ├── image2.jpg
│ └── ...
Step 2: Run the Conversion Command
Use the Labelformat CLI to convert COCO annotations to YOLOv9:
labelformat convert \
--task object-detection \
--input-format coco \
--input-file dataset/annotations/instances_train.json \
--output-format yolov9 \
--output-folder dataset/yolov9_labels \
--output-split train
Step 3: Verify the Converted Annotations
After conversion, your dataset structure will be:
dataset/
├── yolov9_labels/
│ ├── data.yaml
│ ├── images/
│ │ ├── image1.jpg
│ │ ├── image2.jpg
│ │ └── ...
│ └── labels/
│ ├── image1.txt
│ ├── image2.txt
│ └── ...
Contents of data.yaml
:
path: . # Dataset root directory
train: images # Directory for training images
nc: 3 # Number of classes
names: # Class name mapping
0: cat
1: dog
2: person
Contents of image1.txt
:
0 0.234375 0.416667 0.078125 0.166667
1 0.500000 0.500000 0.100000 0.200000
Error Handling in Labelformat¶
The format implementation includes several safeguards:
- Missing Label Files:
- Warning is logged if a label file doesn't exist for an image
-
Image is skipped from processing
-
File Access Issues:
- Errors are logged if label files cannot be read due to permissions or other OS issues
-
Affected images are skipped from processing
-
Label Format Validation:
- Each line must contain exactly 5 space-separated values
- Invalid lines are logged as warnings and skipped
- All values must be convertible to appropriate types:
- Category ID must be a valid integer
- Coordinates and dimensions must be valid floats
- Category IDs must exist in the category mapping
Example of a properly formatted label file:
0 0.716797 0.395833 0.216406 0.147222 # Each line must have exactly 5 space-separated values
1 0.687500 0.379167 0.255208 0.175000 # All values must be valid numbers within [0,1] range