YOLOv5 Object Detection Format¶
Overview¶
YOLOv5 is a well-known object detection model in the You Only Look Once (YOLO) series, renowned for its real-time object detection capabilities. Building upon the foundations of YOLO series, YOLOv5 introduces significant advancements in model architecture and training methodologies, enhancing both accuracy and efficiency. Despite these improvements, YOLOv5 retains the same object detection format as its predecessors, utilizing normalized coordinates in text files. This consistency ensures seamless integration and compatibility with existing workflows while benefiting from YOLOv5's enhanced performance.
Info: Unlike other YOLO versions, YOLOv5 was not introduced through an academic paper. The most commonly referenced implementation is the Ultralytics YOLOv5 repository on GitHub, which has become the de facto standard implementation. For implementation details and specifications, see: GitHub Repository: ultralytics/yolov5
Specification of YOLOv5 Detection Format¶
The YOLOv5 detection format remains consistent with previous versions, ensuring ease of adoption and compatibility. Below are the detailed specifications:
-
One Text File per Image:
For every image in your dataset, there exists a corresponding.txt
file containing annotation data. -
Object Representation:
Each line in the text file represents a single object detected within the image, following the format:<class_id> <x_center> <y_center> <width> <height>
<class_id>
(Integer): An integer representing the object's class.<x_center>
and<y_center>
(Float): The normalized coordinates of the object's center relative to the image's width and height.<width>
and<height>
(Float): The normalized width and height of the bounding box encompassing the object.
-
Normalization of Values:
All coordinate and size values are normalized to a range between0.0
and1.0
. -
Normalization Formula:
To convert pixel values to normalized coordinates:
normalized_x = x_pixel / image_width normalized_y = y_pixel / image_height normalized_width = box_width_pixel / image_width normalized_height = box_height_pixel / image_height
-
Class ID Indexing:
Class IDs start from0
, with each ID corresponding to a specific object category defined in thedata.yaml
configuration file. Class IDs must be contiguous integers (e.g., 0,1,2 and not 0,2,3) to ensure proper model training and inference. -
Configuration via
data.yaml
:
Thedata.yaml
file contains essential configuration settings, including paths to training and validation datasets, number of classes (nc
), and a mapping of class names to their respective IDs (names
).
Directory Structure Requirements¶
The dataset must maintain a parallel directory structure for images and their corresponding label files. There are two common organizational patterns:
Pattern 1: Images and Labels as Root Directories¶
dataset/
├── data.yaml
├── images/
│ ├── train/
│ │ ├── image1.jpg
│ │ └── image2.jpg
│ └── val/
│ ├── image3.jpg
│ └── image4.jpg
└── labels/
├── train/
│ ├── image1.txt
│ └── image2.txt
└── val/
├── image3.txt
└── image4.txt
Pattern 2: Train/Val as Root Directories¶
dataset/
├── data.yaml
├── train/
│ ├── images/
│ │ ├── image1.jpg
│ │ └── image2.jpg
│ └── labels/
│ ├── image1.txt
│ └── image2.txt
└── val/
├── images/
│ ├── image3.jpg
│ └── image4.jpg
└── labels/
├── image3.txt
└── image4.txt
The corresponding data.yaml
configuration should match your chosen structure:
# For Pattern 1:
path: . # Optional - defaults to current directory if omitted
train: images/train # Path to training images
val: images/val # Path to validation images
# For Pattern 2:
path: . # Optional - defaults to current directory if omitted
train: train/images # Path to training images
val: val/images # Path to validation images
Important: Label files must have the same name as their corresponding image files (excluding the file extension) and must maintain the parallel directory structure, only replacing images
with labels
in the path.
Benefits of YOLOv5 Format¶
- Simplicity: Easy to read and write, facilitating quick dataset preparation.
- Efficiency: Compact representation reduces storage requirements.
- Compatibility: Maintains consistency across YOLO versions, ensuring seamless integration with various tools and frameworks.
Example of YOLOv5 Format¶
Example of data.yaml
:
path: . # Dataset root directory (defaults to current directory if omitted). Can also be `../dataset`.
train: images/train # Directory for training images
val: images/val # Directory for validation images
test: images/test # Directory for test images (optional)
names:
0: cat
1: dog
2: person
Example Annotation
For an image named image1.jpg
, the corresponding image1.txt
might contain:
0 0.716797 0.395833 0.216406 0.147222
1 0.687500 0.379167 0.255208 0.175000
Explanation:
- The first line represents an object of class 0
(e.g., cat
) with its bounding box centered at (0.716797, 0.395833)
relative to the image dimensions, and a width and height of 0.216406
and 0.147222
respectively.
- The second line represents an object of class 1
(e.g., dog
) with its own bounding box specifications.
Normalizing Bounding Box Coordinates for YOLOv5¶
To convert pixel values to normalized values required by YOLOv5:
# Given pixel values and image dimensions
x_top_left = 150 # x coordinate of top-left corner of bounding box
y_top_left = 200 # y coordinate of top-left corner of bounding box
width_pixel = 50 # width of bounding box
height_pixel = 80 # height of bounding box
image_width = 640
image_height = 480
# 1. Convert top-left coordinates to center coordinates (in pixels)
x_center_pixel = x_top_left + (width_pixel / 2) # 150 + (50/2) = 175
y_center_pixel = y_top_left + (height_pixel / 2) # 200 + (80/2) = 240
# 2. Normalize all values (divide by image dimensions)
x_center = x_center_pixel / image_width # 175 / 640 = 0.273438
y_center = y_center_pixel / image_height # 240 / 480 = 0.500000
width = width_pixel / image_width # 50 / 640 = 0.078125
height = height_pixel / image_height # 80 / 480 = 0.166667
# Annotation line format: <class_id> <x_center> <y_center> <width> <height>
annotation = f"0 {x_center} {y_center} {width} {height}"
# Output: "0 0.273438 0.500000 0.078125 0.166667"
Converting Annotations to YOLOv5 Format with Labelformat¶
Our Labelformat framework simplifies the process of converting various annotation formats to the YOLOv5 detection format. Below is a step-by-step guide to perform this conversion.
Installation¶
First, ensure that Labelformat is installed. You can install it via pip:
pip install labelformat
Conversion Example: COCO to YOLOv5¶
Assume you have annotations in the COCO format and wish to convert them to YOLOv5. Here’s how you can achieve this using Labelformat.
Step 1: Prepare Your Dataset
Ensure your dataset follows the standard COCO structure:
- You have a
.json
file with the COCO annotations. (e.g.annotations/instances_train.json
) - You have a directory with the images. (e.g.
images/
)
Full example:
dataset/
├── annotations/
│ └── instances_train.json
├── images/
│ ├── image1.jpg
│ ├── image2.jpg
│ └── ...
Step 2: Run the Conversion Command
Use the Labelformat CLI to convert COCO annotations to YOLOv5:
labelformat convert \
--task object-detection \
--input-format coco \
--input-file dataset/annotations/instances_train.json \
--output-format yolov5 \
--output-folder dataset/yolov5_labels \
--output-split train
Step 3: Verify the Converted Annotations
After conversion, your dataset structure will be:
dataset/
├── yolov5_labels/
│ ├── data.yaml
│ ├── images/
│ │ ├── image1.jpg
│ │ ├── image2.jpg
│ │ └── ...
│ └── labels/
│ ├── image1.txt
│ ├── image2.txt
│ └── ...
Contents of data.yaml
:
path: . # Dataset root directory
train: images # Directory for training images
nc: 3 # Number of classes
names: # Class name mapping
0: cat
1: dog
2: person
Contents of image1.txt
:
0 0.234375 0.416667 0.078125 0.166667
1 0.500000 0.500000 0.100000 0.200000
Error Handling in Labelformat¶
The format implementation includes several safeguards:
- Missing Label Files:
- Warning is logged if a label file doesn't exist for an image
-
Image is skipped from processing
-
File Access Issues:
- Errors are logged if label files cannot be read due to permissions or other OS issues
-
Affected images are skipped from processing
-
Label Format Validation:
- Each line must contain exactly 5 space-separated values
- Invalid lines are logged as warnings and skipped
- All values must be convertible to appropriate types:
- Category ID must be a valid integer
- Coordinates and dimensions must be valid floats
- Category IDs must exist in the category mapping
Example of a properly formatted label file:
0 0.716797 0.395833 0.216406 0.147222 # Each line must have exactly 5 space-separated values
1 0.687500 0.379167 0.255208 0.175000 # All values must be valid numbers within [0,1] range