mmyolo.datasets¶

datasets¶

class mmyolo.datasets.BatchShapePolicy(batch_size: int = 32, img_size: int = 640, size_divisor: int = 32, extra_pad_ratio: float = 0.5)[source]¶

BatchShapePolicy is only used in the testing phase, which can reduce the number of pad pixels during batch inference.

Parameters

batch_size (int) – Single GPU batch size during batch inference. Defaults to 32.
img_size (int) – Expected output image size. Defaults to 640.
size_divisor (int) – The minimum size that is divisible by size_divisor. Defaults to 32.
extra_pad_ratio (float) – Extra pad ratio. Defaults to 0.5.

class mmyolo.datasets.YOLOv5CocoDataset(*args, batch_shapes_cfg: Optional[dict] = None, **kwargs)[source]¶

Dataset for YOLOv5 COCO Dataset.

We only add BatchShapePolicy function compared with CocoDataset. See mmyolo/datasets/utils.py#BatchShapePolicy for details

class mmyolo.datasets.YOLOv5CrowdHumanDataset(*args, batch_shapes_cfg: Optional[dict] = None, **kwargs)[source]¶

Dataset for YOLOv5 CrowdHuman Dataset.

We only add BatchShapePolicy function compared with CrowdHumanDataset. See mmyolo/datasets/utils.py#BatchShapePolicy for details

class mmyolo.datasets.YOLOv5DOTADataset(*args, **kwargs)[source]¶

Dataset for YOLOv5 DOTA Dataset.

We only add BatchShapePolicy function compared with DOTADataset. See mmyolo/datasets/utils.py#BatchShapePolicy for details

class mmyolo.datasets.YOLOv5VOCDataset(*args, batch_shapes_cfg: Optional[dict] = None, **kwargs)[source]¶

Dataset for YOLOv5 VOC Dataset.

We only add BatchShapePolicy function compared with VOCDataset. See mmyolo/datasets/utils.py#BatchShapePolicy for details

mmyolo.datasets.yolov5_collate(data_batch: Sequence, use_ms_training: bool = False) → dict[source]¶

Rewrite collate_fn to get faster training speed.

Parameters

data_batch (Sequence) – Batch of data.
use_ms_training (bool) – Whether to use multi-scale training.

transforms¶

class mmyolo.datasets.transforms.FilterAnnotations(by_keypoints: bool = False, **kwargs)[source]¶

Filter invalid annotations.

In addition to the conditions checked by FilterDetAnnotations, this filter adds a new condition requiring instances to have at least one visible keypoints.

class mmyolo.datasets.transforms.LetterResize(scale: Union[int, Tuple[int, int]], pad_val: dict = {'img': 0, 'mask': 0, 'seg': 255}, use_mini_pad: bool = False, stretch_only: bool = False, allow_scale_up: bool = True, half_pad_param: bool = False, **kwargs)[source]¶

Resize and pad image while meeting stride-multiple constraints.

Required Keys:

img (np.uint8)
batch_shape (np.int64) (optional)

Modified Keys:

img (np.uint8)
img_shape (tuple)
gt_bboxes (optional)

Added Keys: - pad_param (np.float32)

Parameters

scale (Union[int, Tuple[int, int]]) – Images scales for resizing.
pad_val (dict) – Padding value. Defaults to dict(img=0, seg=255).
use_mini_pad (bool) – Whether using minimum rectangle padding. Defaults to True
stretch_only (bool) – Whether stretch to the specified size directly. Defaults to False
allow_scale_up (bool) – Allow scale up when ratio > 1. Defaults to True
half_pad_param (bool) – If set to True, left and right pad_param will be given by dividing padding_h by 2. If set to False, pad_param is in int format. We recommend setting this to False for object detection tasks, and True for instance segmentation tasks. Default to False.

transform(results: dict) → dict[source]¶

Transform function to resize images, bounding boxes, semantic segmentation map and keypoints.

Parameters: results (dict) – Result dict from loading pipeline.
Returns: Resized results, ‘img’, ‘gt_bboxes’, ‘gt_seg_map’, ‘gt_keypoints’, ‘scale’, ‘scale_factor’, ‘img_shape’, and ‘keep_ratio’ keys are updated in result dict.
Return type: dict

class mmyolo.datasets.transforms.LoadAnnotations(mask2bbox: bool = False, poly2mask: bool = False, merge_polygons: bool = True, **kwargs)[source]¶

Because the yolo series does not need to consider ignore bboxes for the time being, in order to speed up the pipeline, it can be excluded in advance.

Parameters

mask2bbox (bool) – Whether to use mask annotation to get bbox. Defaults to False.
poly2mask (bool) – Whether to transform the polygons to bitmaps. Defaults to False.
merge_polygons (bool) – Whether to merge polygons into one polygon. If merged, the storage structure is simpler and training is more effcient, especially if the mask inside a bbox is divided into multiple polygons. Defaults to True.

merge_multi_segment(gt_masks: List[numpy.ndarray]) → List[numpy.ndarray][source]¶

Merge multi segments to one list.

Find the coordinates with min distance between each segment, then connect these coordinates with one thin line to merge all segments into one. :param gt_masks: original segmentations in coco’s json file.

like [segmentation1, segmentation2,…], each segmentation is a list of coordinates.

Returns: merged gt_masks
Return type: gt_masks(List(np.array))

min_index(arr1: numpy.ndarray, arr2: numpy.ndarray) → Tuple[int, int][source]¶

Find a pair of indexes with the shortest distance.

Parameters

arr1 – (N, 2).
arr2 – (M, 2).

Returns

a pair of indexes.

Return type

tuple

transform(results: dict) → dict[source]¶

Function to load multiple types annotations.

Parameters: results (dict) – Result dict from :obj:mmengine.BaseDataset.
Returns: The dict contains loaded bounding box, label and semantic segmentation.
Return type: dict

class mmyolo.datasets.transforms.Mosaic(img_scale: Tuple[int, int] = (640, 640), center_ratio_range: Tuple[float, float] = (0.5, 1.5), bbox_clip_border: bool = True, pad_val: float = 114.0, pre_transform: Optional[Sequence[dict]] = None, prob: float = 1.0, use_cached: bool = False, max_cached_images: int = 40, random_pop: bool = True, max_refetch: int = 15)[source]¶

Mosaic augmentation.

Given 4 images, mosaic transform combines them into one output image. The output image is composed of the parts from each sub- image.

                   mosaic transform
                      center_x
           +------------------------------+
           |       pad        |           |
           |      +-----------+    pad    |
           |      |           |           |
           |      |  image1   +-----------+
           |      |           |           |
           |      |           |   image2  |
center_y   |----+-+-----------+-----------+
           |    |   cropped   |           |
           |pad |   image3    |   image4  |
           |    |             |           |
           +----|-------------+-----------+
                |             |
                +-------------+

The mosaic transform steps are as follows:

    1. Choose the mosaic center as the intersections of 4 images
    2. Get the left top image according to the index, and randomly
       sample another 3 images from the custom dataset.
    3. Sub image will be cropped if image is larger than mosaic patch

Required Keys:

img
gt_bboxes (BaseBoxes[torch.float32]) (optional)
gt_bboxes_labels (np.int64) (optional)
gt_ignore_flags (bool) (optional)
mix_results (List[dict])

Modified Keys:

img
img_shape
gt_bboxes (optional)
gt_bboxes_labels (optional)
gt_ignore_flags (optional)

Parameters

img_scale (Sequence[int]) – Image size after mosaic pipeline of single image. The shape order should be (width, height). Defaults to (640, 640).
center_ratio_range (Sequence[float]) – Center ratio range of mosaic output. Defaults to (0.5, 1.5).
bbox_clip_border (bool, optional) – Whether to clip the objects outside the border of the image. In some dataset like MOT17, the gt bboxes are allowed to cross the border of images. Therefore, we don’t need to clip the gt bboxes in these cases. Defaults to True.
pad_val (int) – Pad value. Defaults to 114.
pre_transform (Sequence[dict]) – Sequence of transform object or config dict to be composed.
prob (float) – Probability of applying this transformation. Defaults to 1.0.
use_cached (bool) – Whether to use cache. Defaults to False.
max_cached_images (int) – The maximum length of the cache. The larger the cache, the stronger the randomness of this transform. As a rule of thumb, providing 10 caches for each image suffices for randomness. Defaults to 40.
random_pop (bool) – Whether to randomly pop a result from the cache when the cache is full. If set to False, use FIFO popping method. Defaults to True.
max_refetch (int) – The maximum number of retry iterations for getting valid results from the pipeline. If the number of iterations is greater than max_refetch, but results is still None, then the iteration is terminated and raise the error. Defaults to 15.

get_indexes(dataset: Union[mmengine.dataset.base_dataset.BaseDataset, list]) → list[source]¶

Call function to collect indexes.

Parameters: dataset (Dataset or list) – The dataset or cached list.
Returns: indexes.
Return type: list

mix_img_transform(results: dict) → dict[source]¶

Mixed image data transformation.

Parameters: results (dict) – Result dict.
Returns: Updated result dict.
Return type: results (dict)

class mmyolo.datasets.transforms.Mosaic9(img_scale: Tuple[int, int] = (640, 640), bbox_clip_border: bool = True, pad_val: Union[float, int] = 114.0, pre_transform: Optional[Sequence[dict]] = None, prob: float = 1.0, use_cached: bool = False, max_cached_images: int = 50, random_pop: bool = True, max_refetch: int = 15)[source]¶

Mosaic9 augmentation.

Given 9 images, mosaic transform combines them into one output image. The output image is composed of the parts from each sub- image.

           +-------------------------------+------------+
           | pad           |      pad      |            |
           |    +----------+               |            |
           |    |          +---------------+  top_right |
           |    |          |      top      |   image2   |
           |    | top_left |     image1    |            |
           |    |  image8  o--------+------+--------+---+
           |    |          |        |               |   |
           +----+----------+        |     right     |pad|
           |               | center |     image3    |   |
           |     left      | image0 +---------------+---|
           |    image7     |        |               |   |
       +---+-----------+---+--------+               |   |
       |   |  cropped  |            |  bottom_right |pad|
       |   |bottom_left|            |    image4     |   |
       |   |  image6   |   bottom   |               |   |
       +---|-----------+   image5   +---------------+---|
           |    pad    |            |        pad        |
           +-----------+------------+-------------------+

The mosaic transform steps are as follows:

    1. Get the center image according to the index, and randomly
       sample another 8 images from the custom dataset.
    2. Randomly offset the image after Mosaic

Required Keys:

img
gt_bboxes (BaseBoxes[torch.float32]) (optional)
gt_bboxes_labels (np.int64) (optional)
gt_ignore_flags (bool) (optional)
mix_results (List[dict])

Modified Keys:

img
img_shape
gt_bboxes (optional)
gt_bboxes_labels (optional)
gt_ignore_flags (optional)

Parameters

img_scale (Sequence[int]) – Image size after mosaic pipeline of single image. The shape order should be (width, height). Defaults to (640, 640).
bbox_clip_border (bool, optional) – Whether to clip the objects outside the border of the image. In some dataset like MOT17, the gt bboxes are allowed to cross the border of images. Therefore, we don’t need to clip the gt bboxes in these cases. Defaults to True.
pad_val (int) – Pad value. Defaults to 114.
pre_transform (Sequence[dict]) – Sequence of transform object or config dict to be composed.
prob (float) – Probability of applying this transformation. Defaults to 1.0.
use_cached (bool) – Whether to use cache. Defaults to False.
max_cached_images (int) – The maximum length of the cache. The larger the cache, the stronger the randomness of this transform. As a rule of thumb, providing 5 caches for each image suffices for randomness. Defaults to 50.
random_pop (bool) – Whether to randomly pop a result from the cache when the cache is full. If set to False, use FIFO popping method. Defaults to True.
max_refetch (int) – The maximum number of retry iterations for getting valid results from the pipeline. If the number of iterations is greater than max_refetch, but results is still None, then the iteration is terminated and raise the error. Defaults to 15.

get_indexes(dataset: Union[mmengine.dataset.base_dataset.BaseDataset, list]) → list[source]¶

Call function to collect indexes.

Parameters: dataset (Dataset or list) – The dataset or cached list.
Returns: indexes.
Return type: list

mix_img_transform(results: dict) → dict[source]¶

Mixed image data transformation.

Parameters: results (dict) – Result dict.
Returns: Updated result dict.
Return type: results (dict)

class mmyolo.datasets.transforms.PPYOLOERandomCrop(aspect_ratio: List[float] = [0.5, 2.0], thresholds: List[float] = [0.0, 0.1, 0.3, 0.5, 0.7, 0.9], scaling: List[float] = [0.3, 1.0], num_attempts: int = 50, allow_no_crop: bool = True, cover_all_box: bool = False)[source]¶

Random crop the img and bboxes. Different thresholds are used in PPYOLOE to judge whether the clipped image meets the requirements. This implementation is different from the implementation of RandomCrop in mmdet.

Required Keys:

img
gt_bboxes (BaseBoxes[torch.float32]) (optional)
gt_bboxes_labels (np.int64) (optional)
gt_ignore_flags (bool) (optional)

Modified Keys:

img
img_shape
gt_bboxes (optional)
gt_bboxes_labels (optional)
gt_ignore_flags (optional)

Added Keys: - pad_param (np.float32)

Parameters

aspect_ratio (List[float]) – Aspect ratio of cropped region. Default to [.5, 2].
thresholds (List[float]) – Iou thresholds for deciding a valid bbox crop in [min, max] format. Defaults to [.0, .1, .3, .5, .7, .9].
scaling (List[float]) – Ratio between a cropped region and the original image in [min, max] format. Default to [.3, 1.].
num_attempts (int) – Number of tries for each threshold before giving up. Default to 50.
allow_no_crop (bool) – Allow return without actually cropping them. Default to True.
cover_all_box (bool) – Ensure all bboxes are covered in the final crop. Default to False.

class mmyolo.datasets.transforms.PPYOLOERandomDistort(hue_cfg: dict = {'max': 18, 'min': - 18, 'prob': 0.5}, saturation_cfg: dict = {'max': 1.5, 'min': 0.5, 'prob': 0.5}, contrast_cfg: dict = {'max': 1.5, 'min': 0.5, 'prob': 0.5}, brightness_cfg: dict = {'max': 1.5, 'min': 0.5, 'prob': 0.5}, num_distort_func: int = 4)[source]¶

Random hue, saturation, contrast and brightness distortion.

Required Keys:

img

Modified Keys:

img (np.float32)

Parameters

hue_cfg (dict) – Hue settings. Defaults to dict(min=-18, max=18, prob=0.5).
saturation_cfg (dict) – Saturation settings. Defaults to dict( min=0.5, max=1.5, prob=0.5).
contrast_cfg (dict) – Contrast settings. Defaults to dict( min=0.5, max=1.5, prob=0.5).
brightness_cfg (dict) – Brightness settings. Defaults to dict( min=0.5, max=1.5, prob=0.5).
num_distort_func (int) – The number of distort function. Defaults to 4.

transform(results: dict) → dict[source]¶

The hue, saturation, contrast and brightness distortion function.

Parameters: results (dict) – The result dict.
Returns: The result dict.
Return type: dict

transform_brightness(results)[source]¶: Transform brightness randomly.

transform_contrast(results)[source]¶: Transform contrast randomly.

transform_hue(results)[source]¶: Transform hue randomly.

transform_saturation(results)[source]¶: Transform saturation randomly.

class mmyolo.datasets.transforms.PackDetInputs(meta_keys=('img_id', 'img_path', 'ori_shape', 'img_shape', 'scale_factor', 'flip', 'flip_direction'))[source]¶

Pack the inputs data for the detection / semantic segmentation / panoptic segmentation.

Compared to mmdet, we just add the gt_panoptic_seg field and logic.

transform(results: dict) → dict[source]¶

Method to pack the input data. :param results: Result dict from the data pipeline. :type results: dict

Returns

‘inputs’ (obj:torch.Tensor): The forward data of models.
’data_sample’ (obj:DetDataSample): The annotation info of the
sample.

Return type

dict

class mmyolo.datasets.transforms.Polygon2Mask(downsample_ratio: int = 4, mask_overlap: bool = True, coco_style: bool = False)[source]¶

Polygons to bitmaps in YOLOv5.

Parameters

downsample_ratio (int) – Downsample ratio of mask.
mask_overlap (bool) – Whether to use maskoverlap in mask process. When set to True, the implementation here is the same as the official, with higher training speed. If set to True, all gt masks will compress into one overlap mask, the value of mask indicates the index of gt masks. If set to False, one mask is a binary mask. Default to True.
coco_style (bool) – Whether to use coco_style to convert the polygons to bitmaps. Note that this option is only used to test if there is an improvement in training speed and we recommend setting it to False.

polygon2mask(img_shape: Tuple[int, int], polygons: numpy.ndarray, color: int = 1) → numpy.ndarray[source]¶

Parameters

img_shape (tuple) – The image size.
polygons (np.ndarray) – [N, M], N is the number of polygons, M is the number of points(Be divided by 2).
color (int) – color in fillPoly.

Returns

the overlap mask.

Return type

np.ndarray

polygons2masks(img_shape: Tuple[int, int], polygons: mmdet.structures.mask.structures.PolygonMasks, color: int = 1) → numpy.ndarray[source]¶

Return a list of bitmap masks.

Parameters

img_shape (tuple) – The image size.
polygons (PolygonMasks) – The mask annotations.
color (int) – color in fillPoly.

Returns

the list of masks in bitmaps.

Return type

List[np.ndarray]

polygons2masks_overlap(img_shape: Tuple[int, int], polygons: mmdet.structures.mask.structures.PolygonMasks) → Tuple[numpy.ndarray, numpy.ndarray][source]¶

Return a overlap mask and the sorted idx of area.

Parameters

img_shape (tuple) – The image size.
polygons (PolygonMasks) – The mask annotations.
color (int) – color in fillPoly.

Returns

the overlap mask and the sorted idx of area.

Return type

Tuple[np.ndarray, np.ndarray]

transform(results: dict) → dict[source]¶

The transform function. All subclass of BaseTransform should override this method.

This function takes the result dict as the input, and can add new items to the dict or modify existing items in the dict. And the result dict will be returned in the end, which allows to concate multiple transforms into a pipeline.

Parameters: results (dict) – The result dict.
Returns: The result dict.
Return type: dict

class mmyolo.datasets.transforms.RandomAffine(**kwargs)[source]¶

class mmyolo.datasets.transforms.RandomFlip(prob: Optional[Union[float, Iterable[float]]] = None, direction: Union[str, Sequence[Optional[str]]] = 'horizontal', swap_seg_labels: Optional[Sequence] = None)[source]¶

class mmyolo.datasets.transforms.RegularizeRotatedBox(angle_version='le90')[source]¶

Regularize rotated boxes.

Due to the angle periodicity, one rotated box can be represented in many different (x, y, w, h, t). To make each rotated box unique, regularize_boxes will take the remainder of the angle divided by 180 degrees.

For convenience, three angle_version can be used here:

‘oc’: OpenCV Definition. Has the same box representation as
cv2.minAreaRect the angle ranges in [-90, 0).
‘le90’: Long Edge Definition (90). the angle ranges in [-90, 90).
The width is always longer than the height.
‘le135’: Long Edge Definition (135). the angle ranges in [-45, 135).
The width is always longer than the height.

Required Keys:

gt_bboxes (RotatedBoxes[torch.float32])

Modified Keys:

gt_bboxes

Parameters: angle_version (str) – Angle version. Can only be ‘oc’, ‘le90’, or ‘le135’. Defaults to ‘le90.

transform(results: dict) → dict[source]¶

The transform function. All subclass of BaseTransform should override this method.

This function takes the result dict as the input, and can add new items to the dict or modify existing items in the dict. And the result dict will be returned in the end, which allows to concate multiple transforms into a pipeline.

Parameters: results (dict) – The result dict.
Returns: The result dict.
Return type: dict

class mmyolo.datasets.transforms.RemoveDataElement(keys: Union[str, Sequence[str]])[source]¶

Remove unnecessary data element in results.

Parameters: keys (Union[str, Sequence[str]]) – Keys need to be removed.

transform(results: dict) → dict[source]¶

The transform function. All subclass of BaseTransform should override this method.

This function takes the result dict as the input, and can add new items to the dict or modify existing items in the dict. And the result dict will be returned in the end, which allows to concate multiple transforms into a pipeline.

Parameters: results (dict) – The result dict.
Returns: The result dict.
Return type: dict

class mmyolo.datasets.transforms.Resize(scale: Optional[Union[int, Tuple[int, int]]] = None, scale_factor: Optional[Union[float, Tuple[float, float]]] = None, keep_ratio: bool = False, clip_object_border: bool = True, backend: str = 'cv2', interpolation='bilinear')[source]¶

class mmyolo.datasets.transforms.YOLOXMixUp(img_scale: Tuple[int, int] = (640, 640), ratio_range: Tuple[float, float] = (0.5, 1.5), flip_ratio: float = 0.5, pad_val: float = 114.0, bbox_clip_border: bool = True, pre_transform: Optional[Sequence[dict]] = None, prob: float = 1.0, use_cached: bool = False, max_cached_images: int = 20, random_pop: bool = True, max_refetch: int = 15)[source]¶

MixUp data augmentation for YOLOX.

         mixup transform
+---------------+--------------+
| mixup image   |              |
|      +--------|--------+     |
|      |        |        |     |
+---------------+        |     |
|      |                 |     |
|      |      image      |     |
|      |                 |     |
|      |                 |     |
|      +-----------------+     |
|             pad              |
+------------------------------+

The mixup transform steps are as follows:

Another random image is picked by dataset and embedded in the top left patch(after padding and resizing)

The target of mixup transform is the weighted average of mixup image and origin image.

Required Keys:

img
gt_bboxes (BaseBoxes[torch.float32]) (optional)
gt_bboxes_labels (np.int64) (optional)
gt_ignore_flags (bool) (optional)
mix_results (List[dict])

Modified Keys:

img
img_shape
gt_bboxes (optional)
gt_bboxes_labels (optional)
gt_ignore_flags (optional)

Parameters

img_scale (Sequence[int]) – Image output size after mixup pipeline. The shape order should be (width, height). Defaults to (640, 640).
ratio_range (Sequence[float]) – Scale ratio of mixup image. Defaults to (0.5, 1.5).
flip_ratio (float) – Horizontal flip ratio of mixup image. Defaults to 0.5.
pad_val (int) – Pad value. Defaults to 114.
bbox_clip_border (bool, optional) – Whether to clip the objects outside the border of the image. In some dataset like MOT17, the gt bboxes are allowed to cross the border of images. Therefore, we don’t need to clip the gt bboxes in these cases. Defaults to True.
pre_transform (Sequence[dict]) – Sequence of transform object or config dict to be composed.
prob (float) – Probability of applying this transformation. Defaults to 1.0.
use_cached (bool) – Whether to use cache. Defaults to False.
max_cached_images (int) – The maximum length of the cache. The larger the cache, the stronger the randomness of this transform. As a rule of thumb, providing 10 caches for each image suffices for randomness. Defaults to 20.
random_pop (bool) – Whether to randomly pop a result from the cache when the cache is full. If set to False, use FIFO popping method. Defaults to True.
max_refetch (int) – The maximum number of iterations. If the number of iterations is greater than max_refetch, but gt_bbox is still empty, then the iteration is terminated. Defaults to 15.

get_indexes(dataset: Union[mmengine.dataset.base_dataset.BaseDataset, list]) → int[source]¶

Call function to collect indexes.

Parameters: dataset (Dataset or list) – The dataset or cached list.
Returns: indexes.
Return type: int

mix_img_transform(results: dict) → dict[source]¶

YOLOX MixUp transform function.

Parameters: results (dict) – Result dict.
Returns: Updated result dict.
Return type: results (dict)

class mmyolo.datasets.transforms.YOLOv5CopyPaste(ioa_thresh: float = 0.3, prob: float = 0.5)[source]¶

Copy-Paste used in YOLOv5 and YOLOv8.

This transform randomly copy some objects in the image to the mirror position of the image.It is different from the CopyPaste in mmdet.

Required Keys:

img (np.uint8)
gt_bboxes (BaseBoxes[torch.float32])
gt_bboxes_labels (np.int64) (optional)
gt_ignore_flags (bool) (optional)
gt_masks (PolygonMasks) (optional)

Modified Keys:

img
gt_bboxes
gt_bboxes_labels (np.int64) (optional)
gt_ignore_flags (optional)
gt_masks (optional)

Parameters

ioa_thresh (float) – Ioa thresholds for deciding valid bbox.
prob (float) – Probability of choosing objects. Defaults to 0.5.

static bbox_ioa(gt_bboxes_flip: mmdet.structures.bbox.horizontal_boxes.HorizontalBoxes, gt_bboxes: mmdet.structures.bbox.horizontal_boxes.HorizontalBoxes, eps: float = 1e-07) → numpy.ndarray[source]¶

Calculate ioa between gt_bboxes_flip and gt_bboxes.

Parameters

gt_bboxes_flip (HorizontalBoxes) – Flipped ground truth bounding boxes.
gt_bboxes (HorizontalBoxes) – Ground truth bounding boxes.
eps (float) – Default to 1e-10.

Returns

Ioa.

Return type

(Tensor)

class mmyolo.datasets.transforms.YOLOv5HSVRandomAug(hue_delta: Union[int, float] = 0.015, saturation_delta: Union[int, float] = 0.7, value_delta: Union[int, float] = 0.4)[source]¶

Apply HSV augmentation to image sequentially.

Required Keys:

img

Modified Keys:

img

Parameters

hue_delta ([int, float]) – delta of hue. Defaults to 0.015.
saturation_delta ([int, float]) – delta of saturation. Defaults to 0.7.
value_delta ([int, float]) – delta of value. Defaults to 0.4.

transform(results: dict) → dict[source]¶

The HSV augmentation transform function.

Parameters: results (dict) – The result dict.
Returns: The result dict.
Return type: dict

class mmyolo.datasets.transforms.YOLOv5KeepRatioResize(scale: Union[int, Tuple[int, int]], keep_ratio: bool = True, **kwargs)[source]¶

Resize images & bbox(if existed).

This transform resizes the input image according to scale. Bboxes (if existed) are then resized with the same scale factor.

Required Keys:

img (np.uint8)
gt_bboxes (BaseBoxes[torch.float32]) (optional)

Modified Keys:

img (np.uint8)
img_shape (tuple)
gt_bboxes (optional)
scale (float)

Added Keys:

scale_factor (np.float32)

Parameters: scale (Union[int, Tuple[int, int]]) – Images scales for resizing.

class mmyolo.datasets.transforms.YOLOv5MixUp(alpha: float = 32.0, beta: float = 32.0, pre_transform: Optional[Sequence[dict]] = None, prob: float = 1.0, use_cached: bool = False, max_cached_images: int = 20, random_pop: bool = True, max_refetch: int = 15)[source]¶

MixUp data augmentation for YOLOv5.

The mixup transform steps are as follows:

Another random image is picked by dataset.

Randomly obtain the fusion ratio from the beta distribution,
then fuse the target

of the original image and mixup image through this ratio.

Required Keys:

img
gt_bboxes (BaseBoxes[torch.float32]) (optional)
gt_bboxes_labels (np.int64) (optional)
gt_ignore_flags (bool) (optional)
mix_results (List[dict])

Modified Keys:

img
img_shape
gt_bboxes (optional)
gt_bboxes_labels (optional)
gt_ignore_flags (optional)

Parameters

alpha (float) – parameter of beta distribution to get mixup ratio. Defaults to 32.
beta (float) – parameter of beta distribution to get mixup ratio. Defaults to 32.
pre_transform (Sequence[dict]) – Sequence of transform object or config dict to be composed.
prob (float) – Probability of applying this transformation. Defaults to 1.0.
use_cached (bool) – Whether to use cache. Defaults to False.
max_cached_images (int) – The maximum length of the cache. The larger the cache, the stronger the randomness of this transform. As a rule of thumb, providing 10 caches for each image suffices for randomness. Defaults to 20.
random_pop (bool) – Whether to randomly pop a result from the cache when the cache is full. If set to False, use FIFO popping method. Defaults to True.
max_refetch (int) – The maximum number of iterations. If the number of iterations is greater than max_refetch, but gt_bbox is still empty, then the iteration is terminated. Defaults to 15.

get_indexes(dataset: Union[mmengine.dataset.base_dataset.BaseDataset, list]) → int[source]¶

Call function to collect indexes.

Parameters: dataset (Dataset or list) – The dataset or cached list.
Returns: indexes.
Return type: int

mix_img_transform(results: dict) → dict[source]¶

YOLOv5 MixUp transform function.

Parameters: results (dict) – Result dict
Returns: Updated result dict.
Return type: results (dict)

class mmyolo.datasets.transforms.YOLOv5RandomAffine(max_rotate_degree: float = 10.0, max_translate_ratio: float = 0.1, scaling_ratio_range: Tuple[float, float] = (0.5, 1.5), max_shear_degree: float = 2.0, border: Tuple[int, int] = (0, 0), border_val: Tuple[int, int, int] = (114, 114, 114), bbox_clip_border: bool = True, min_bbox_size: int = 2, min_area_ratio: float = 0.1, use_mask_refine: bool = False, max_aspect_ratio: float = 20.0, resample_num: int = 1000)[source]¶

Random affine transform data augmentation in YOLOv5 and YOLOv8. It is different from the implementation in YOLOX.

This operation randomly generates affine transform matrix which including rotation, translation, shear and scaling transforms. If you set use_mask_refine == True, the code will use the masks annotation to refine the bbox. Our implementation is slightly different from the official. In COCO dataset, a gt may have multiple mask tags. The official YOLOv5 annotation file already combines the masks that an object has, but our code takes into account the fact that an object has multiple masks.

Required Keys:

img
gt_bboxes (BaseBoxes[torch.float32]) (optional)
gt_bboxes_labels (np.int64) (optional)
gt_ignore_flags (bool) (optional)
gt_masks (PolygonMasks) (optional)

Modified Keys:

img
img_shape
gt_bboxes (optional)
gt_bboxes_labels (optional)
gt_ignore_flags (optional)
gt_masks (PolygonMasks) (optional)

Parameters

max_rotate_degree (float) – Maximum degrees of rotation transform. Defaults to 10.
max_translate_ratio (float) – Maximum ratio of translation. Defaults to 0.1.
scaling_ratio_range (tuple[float]) – Min and max ratio of scaling transform. Defaults to (0.5, 1.5).
max_shear_degree (float) – Maximum degrees of shear transform. Defaults to 2.
border (tuple[int]) – Distance from width and height sides of input image to adjust output shape. Only used in mosaic dataset. Defaults to (0, 0).
border_val (tuple[int]) – Border padding values of 3 channels. Defaults to (114, 114, 114).
bbox_clip_border (bool, optional) – Whether to clip the objects outside the border of the image. In some dataset like MOT17, the gt bboxes are allowed to cross the border of images. Therefore, we don’t need to clip the gt bboxes in these cases. Defaults to True.
min_bbox_size (float) – Width and height threshold to filter bboxes. If the height or width of a box is smaller than this value, it will be removed. Defaults to 2.
min_area_ratio (float) – Threshold of area ratio between original bboxes and wrapped bboxes. If smaller than this value, the box will be removed. Defaults to 0.1.
use_mask_refine (bool) – Whether to refine bbox by mask. Deprecated.
max_aspect_ratio (float) – Aspect ratio of width and height threshold to filter bboxes. If max(h/w, w/h) larger than this value, the box will be removed. Defaults to 20.
resample_num (int) – Number of poly to resample to.

clip_polygons(gt_masks: mmdet.structures.mask.structures.PolygonMasks, height: int, width: int) → mmdet.structures.mask.structures.PolygonMasks[source]¶

Function to clip points of polygons with height and width.

Parameters

gt_masks (PolygonMasks) – Annotations of instance segmentation.
height (int) – height of clip border.
width (int) – width of clip border.

Returns

Clip annotations of instance segmentation.

Return type

clipped_masks (PolygonMasks)

filter_gt_bboxes(origin_bboxes: mmdet.structures.bbox.horizontal_boxes.HorizontalBoxes, wrapped_bboxes: mmdet.structures.bbox.horizontal_boxes.HorizontalBoxes) → torch.Tensor[source]¶

Filter gt bboxes.

Parameters

origin_bboxes (HorizontalBoxes) – Origin bboxes.
wrapped_bboxes (HorizontalBoxes) – Wrapped bboxes

Returns

The result dict.

Return type

dict

resample_masks(gt_masks: mmdet.structures.mask.structures.PolygonMasks) → mmdet.structures.mask.structures.PolygonMasks[source]¶

Function to resample each mask annotation with shape (2 * n, ) to shape (resample_num * 2, ).

Parameters: gt_masks (PolygonMasks) – Annotations of semantic segmentation.

segment2box(gt_masks: mmdet.structures.mask.structures.PolygonMasks, height: int, width: int) → mmdet.structures.bbox.horizontal_boxes.HorizontalBoxes[source]¶

Convert 1 segment label to 1 box label, applying inside-image constraint i.e. (xy1, xy2, …) to (xyxy) :param gt_masks: the segment label :type gt_masks: torch.Tensor :param width: the width of the image. Defaults to 640 :type width: int :param height: The height of the image. Defaults to 640 :type height: int

Returns: the clip bboxes from gt_masks.
Return type: HorizontalBoxes

warp_mask(gt_masks: mmdet.structures.mask.structures.PolygonMasks, warp_matrix: numpy.ndarray, img_w: int, img_h: int) → mmdet.structures.mask.structures.PolygonMasks[source]¶

Warp masks by warp_matrix and retain masks inside image after warping.

Parameters

gt_masks (PolygonMasks) – Annotations of semantic segmentation.
warp_matrix (np.ndarray) – Affine transformation matrix. Shape: (3, 3).
img_w (int) – Width of output image.
img_h (int) – Height of output image.

Returns

Masks after warping.

Return type

PolygonMasks

static warp_poly(poly: numpy.ndarray, warp_matrix: numpy.ndarray, img_w: int, img_h: int) → numpy.ndarray[source]¶

Function to warp one mask and filter points outside image.

Parameters

poly (np.ndarray) – Segmentation annotation with shape (n, ) and with format (x1, y1, x2, y2, …).
warp_matrix (np.ndarray) – Affine transformation matrix. Shape: (3, 3).
img_w (int) – Width of output image.
img_h (int) – Height of output image.

mmyolo.engine¶

hooks¶

optimizers¶

mmyolo.models¶

backbones¶

class mmyolo.models.backbones.BaseBackbone(arch_setting: list, deepen_factor: float = 1.0, widen_factor: float = 1.0, input_channels: int = 3, out_indices: Sequence[int] = (2, 3, 4), frozen_stages: int = - 1, plugins: Optional[Union[dict, List[dict]]] = None, norm_cfg: Optional[Union[mmengine.config.config.ConfigDict, dict]] = None, act_cfg: Optional[Union[mmengine.config.config.ConfigDict, dict]] = None, norm_eval: bool = False, init_cfg: Optional[Union[mmengine.config.config.ConfigDict, dict, List[Union[dict, mmengine.config.config.ConfigDict]]]] = None)[source]¶

BaseBackbone backbone used in YOLO series.

Backbone model structure diagram
+-----------+
|   input   |
+-----------+
      v
+-----------+
|   stem    |
|   layer   |
+-----------+
      v
+-----------+
|   stage   |
|  layer 1  |
+-----------+
      v
+-----------+
|   stage   |
|  layer 2  |
+-----------+
      v
    ......
      v
+-----------+
|   stage   |
|  layer n  |
+-----------+
In P5 model, n=4
In P6 model, n=5

Parameters

arch_setting (list) – Architecture of BaseBackbone.
plugins (list[dict]) –
List of plugins for stages, each dict contains:
- cfg (dict, required): Cfg dict to build plugin.
- stages (tuple[bool], optional): Stages to apply plugin, length should be same as ‘num_stages’.
deepen_factor (float) – Depth multiplier, multiply number of blocks in CSP layer by this amount. Defaults to 1.0.
widen_factor (float) – Width multiplier, multiply number of channels in each layer by this amount. Defaults to 1.0.
input_channels – Number of input image channels. Defaults to 3.
out_indices (Sequence[int]) – Output from which stages. Defaults to (2, 3, 4).
frozen_stages (int) – Stages to be frozen (stop grad and set eval mode). -1 means not freezing any parameters. Defaults to -1.
norm_cfg (dict) – Dictionary to construct and config norm layer. Defaults to None.
act_cfg (dict) – Config dict for activation layer. Defaults to None.
norm_eval (bool) – Whether to set norm layers to eval mode, namely, freeze running stats (mean and var). Note: Effect on Batch Norm and its variants only. Defaults to False.
init_cfg (dict or list[dict], optional) – Initialization config dict. Defaults to None.

abstract build_stage_layer(stage_idx: int, setting: list)[source]¶

Build a stage layer.

Parameters

stage_idx (int) – The index of a stage layer.
setting (list) – The architecture setting of a stage layer.

abstract build_stem_layer()[source]¶: Build a stem layer.

forward(x: torch.Tensor) → tuple[source]¶: Forward batch_inputs from the data_preprocessor.

make_stage_plugins(plugins, stage_idx, setting)[source]¶

Make plugins for backbone stage_idx th stage.

Currently we support to insert context_block, empirical_attention_block, nonlocal_block, dropout_block into the backbone.

An example of plugins format could be:

Examples

>>> plugins=[
...     dict(cfg=dict(type='xxx', arg1='xxx'),
...          stages=(False, True, True, True)),
...     dict(cfg=dict(type='yyy'),
...          stages=(True, True, True, True)),
... ]
>>> model = YOLOv5CSPDarknet()
>>> stage_plugins = model.make_stage_plugins(plugins, 0, setting)
>>> assert len(stage_plugins) == 1

Suppose stage_idx=0, the structure of blocks in the stage would be:

conv1 -> conv2 -> conv3 -> yyy

Suppose stage_idx=1, the structure of blocks in the stage would be:

conv1 -> conv2 -> conv3 -> xxx -> yyy

Parameters

plugins (list[dict]) – List of plugins cfg to build. The postfix is required if multiple same type plugins are inserted.
stage_idx (int) – Index of stage to build If stages is missing, the plugin would be applied to all stages.
setting (list) – The architecture setting of a stage layer.

Returns

Plugins for current stage

Return type

list[nn.Module]

train(mode: bool = True)[source]¶: Convert the model into training mode while keep normalization layer frozen.

class mmyolo.models.backbones.CSPNeXt(arch: str = 'P5', deepen_factor: float = 1.0, widen_factor: float = 1.0, input_channels: int = 3, out_indices: Sequence[int] = (2, 3, 4), frozen_stages: int = - 1, plugins: Optional[Union[dict, List[dict]]] = None, use_depthwise: bool = False, expand_ratio: float = 0.5, arch_ovewrite: Optional[dict] = None, channel_attention: bool = True, conv_cfg: Optional[Union[mmengine.config.config.ConfigDict, dict]] = None, norm_cfg: Union[mmengine.config.config.ConfigDict, dict] = {'type': 'BN'}, act_cfg: Union[mmengine.config.config.ConfigDict, dict] = {'inplace': True, 'type': 'SiLU'}, norm_eval: bool = False, init_cfg: Optional[Union[mmengine.config.config.ConfigDict, dict, List[Union[dict, mmengine.config.config.ConfigDict]]]] = {'a': 2.23606797749979, 'distribution': 'uniform', 'layer': 'Conv2d', 'mode': 'fan_in', 'nonlinearity': 'leaky_relu', 'type': 'Kaiming'})[source]¶

CSPNeXt backbone used in RTMDet.

Parameters

arch (str) – Architecture of CSPNeXt, from {P5, P6}. Defaults to P5.
deepen_factor (float) – Depth multiplier, multiply number of blocks in CSP layer by this amount. Defaults to 1.0.
widen_factor (float) – Width multiplier, multiply number of channels in each layer by this amount. Defaults to 1.0.
out_indices (Sequence[int]) – Output from which stages. Defaults to (2, 3, 4).
frozen_stages (int) – Stages to be frozen (stop grad and set eval mode). -1 means not freezing any parameters. Defaults to -1.
plugins (list[dict]) –
List of plugins for stages, each dict contains: - cfg (dict, required): Cfg dict to build plugin.Defaults to - stages (tuple[bool], optional): Stages to apply plugin, length

should be same as ‘num_stages’.
use_depthwise (bool) – Whether to use depthwise separable convolution. Defaults to False.
expand_ratio (float) – Ratio to adjust the number of channels of the hidden layer. Defaults to 0.5.
arch_ovewrite (list) – Overwrite default arch settings. Defaults to None.
channel_attention (bool) – Whether to add channel attention in each stage. Defaults to True.
conv_cfg (ConfigDict or dict, optional) – Config dict for convolution layer. Defaults to None.
norm_cfg (ConfigDict or dict) – Dictionary to construct and config norm layer. Defaults to dict(type=’BN’, requires_grad=True).
act_cfg (ConfigDict or dict) – Config dict for activation layer. Defaults to dict(type=’SiLU’, inplace=True).
norm_eval (bool) – Whether to set norm layers to eval mode, namely, freeze running stats (mean and var). Note: Effect on Batch Norm and its variants only.

:param init_cfg (ConfigDict or dict or list[dict] or: list[ConfigDict]): Initialization config dict.

build_stage_layer(stage_idx: int, setting: list) → list[source]¶

Build a stage layer.

Parameters

stage_idx (int) – The index of a stage layer.
setting (list) – The architecture setting of a stage layer.

build_stem_layer() → torch.nn.modules.module.Module[source]¶: Build a stem layer.

class mmyolo.models.backbones.PPYOLOECSPResNet(arch: str = 'P5', deepen_factor: float = 1.0, widen_factor: float = 1.0, input_channels: int = 3, out_indices: Tuple[int] = (2, 3, 4), frozen_stages: int = - 1, plugins: Optional[Union[dict, List[dict]]] = None, arch_ovewrite: Optional[dict] = None, block_cfg: Union[mmengine.config.config.ConfigDict, dict] = {'shortcut': True, 'type': 'PPYOLOEBasicBlock', 'use_alpha': True}, norm_cfg: Union[mmengine.config.config.ConfigDict, dict] = {'eps': 1e-05, 'momentum': 0.1, 'type': 'BN'}, act_cfg: Union[mmengine.config.config.ConfigDict, dict] = {'inplace': True, 'type': 'SiLU'}, attention_cfg: Union[mmengine.config.config.ConfigDict, dict] = {'act_cfg': {'type': 'HSigmoid'}, 'type': 'EffectiveSELayer'}, norm_eval: bool = False, init_cfg: Optional[Union[mmengine.config.config.ConfigDict, dict, List[Union[dict, mmengine.config.config.ConfigDict]]]] = None, use_large_stem: bool = False)[source]¶

CSP-ResNet backbone used in PPYOLOE.

Parameters

arch (str) – Architecture of CSPNeXt, from {P5, P6}. Defaults to P5.
deepen_factor (float) – Depth multiplier, multiply number of blocks in CSP layer by this amount. Defaults to 1.0.
widen_factor (float) – Width multiplier, multiply number of channels in each layer by this amount. Defaults to 1.0.
out_indices (Sequence[int]) – Output from which stages. Defaults to (2, 3, 4).
frozen_stages (int) – Stages to be frozen (stop grad and set eval mode). -1 means not freezing any parameters. Defaults to -1.
plugins (list[dict]) –
List of plugins for stages, each dict contains: - cfg (dict, required): Cfg dict to build plugin. - stages (tuple[bool], optional): Stages to apply plugin, length

should be same as ‘num_stages’.
arch_ovewrite (list) – Overwrite default arch settings. Defaults to None.
block_cfg (dict) – Config dict for block. Defaults to dict(type=’PPYOLOEBasicBlock’, shortcut=True, use_alpha=True)
norm_cfg (ConfigDict or dict) – Dictionary to construct and config norm layer. Defaults to dict(type=’BN’, momentum=0.1, eps=1e-5).
act_cfg (ConfigDict or dict) – Config dict for activation layer. Defaults to dict(type=’SiLU’, inplace=True).
attention_cfg (dict) – Config dict for EffectiveSELayer. Defaults to dict(type=’EffectiveSELayer’, act_cfg=dict(type=’HSigmoid’)).
norm_eval (bool) – Whether to set norm layers to eval mode, namely, freeze running stats (mean and var). Note: Effect on Batch Norm and its variants only.

:param init_cfg (ConfigDict or dict or list[dict] or: list[ConfigDict]): Initialization config dict. :param use_large_stem: Whether to use large stem layer.

Defaults to False.

build_stage_layer(stage_idx: int, setting: list) → list[source]¶

Build a stage layer.

Parameters

stage_idx (int) – The index of a stage layer.
setting (list) – The architecture setting of a stage layer.

build_stem_layer() → torch.nn.modules.module.Module[source]¶: Build a stem layer.

class mmyolo.models.backbones.YOLOXCSPDarknet(arch: str = 'P5', plugins: Optional[Union[dict, List[dict]]] = None, deepen_factor: float = 1.0, widen_factor: float = 1.0, input_channels: int = 3, out_indices: Tuple[int] = (2, 3, 4), frozen_stages: int = - 1, use_depthwise: bool = False, spp_kernal_sizes: Tuple[int] = (5, 9, 13), norm_cfg: Union[mmengine.config.config.ConfigDict, dict] = {'eps': 0.001, 'momentum': 0.03, 'type': 'BN'}, act_cfg: Union[mmengine.config.config.ConfigDict, dict] = {'inplace': True, 'type': 'SiLU'}, norm_eval: bool = False, init_cfg: Optional[Union[mmengine.config.config.ConfigDict, dict, List[Union[dict, mmengine.config.config.ConfigDict]]]] = None)[source]¶

CSP-Darknet backbone used in YOLOX.

Parameters

arch (str) – Architecture of CSP-Darknet, from {P5, P6}. Defaults to P5.
plugins (list[dict]) –
List of plugins for stages, each dict contains:
- cfg (dict, required): Cfg dict to build plugin.
- stages (tuple[bool], optional): Stages to apply plugin, length should be same as ‘num_stages’.
deepen_factor (float) – Depth multiplier, multiply number of blocks in CSP layer by this amount. Defaults to 1.0.
widen_factor (float) – Width multiplier, multiply number of channels in each layer by this amount. Defaults to 1.0.
input_channels (int) – Number of input image channels. Defaults to 3.
out_indices (Tuple[int]) – Output from which stages. Defaults to (2, 3, 4).
frozen_stages (int) – Stages to be frozen (stop grad and set eval mode). -1 means not freezing any parameters. Defaults to -1.
use_depthwise (bool) – Whether to use depthwise separable convolution. Defaults to False.
spp_kernal_sizes – (tuple[int]): Sequential of kernel sizes of SPP layers. Defaults to (5, 9, 13).
norm_cfg (dict) – Dictionary to construct and config norm layer. Defaults to dict(type=’BN’, momentum=0.03, eps=0.001).
act_cfg (dict) – Config dict for activation layer. Defaults to dict(type=’SiLU’, inplace=True).
norm_eval (bool) – Whether to set norm layers to eval mode, namely, freeze running stats (mean and var). Note: Effect on Batch Norm and its variants only.
init_cfg (Union[dict,list[dict]], optional) – Initialization config dict. Defaults to None.

Example

>>> from mmyolo.models import YOLOXCSPDarknet
>>> import torch
>>> model = YOLOXCSPDarknet()
>>> model.eval()
>>> inputs = torch.rand(1, 3, 416, 416)
>>> level_outputs = model(inputs)
>>> for level_out in level_outputs:
...     print(tuple(level_out.shape))
...
(1, 256, 52, 52)
(1, 512, 26, 26)
(1, 1024, 13, 13)

build_stage_layer(stage_idx: int, setting: list) → list[source]¶

Build a stage layer.

Parameters

stage_idx (int) – The index of a stage layer.
setting (list) – The architecture setting of a stage layer.

build_stem_layer() → torch.nn.modules.module.Module[source]¶: Build a stem layer.

class mmyolo.models.backbones.YOLOv5CSPDarknet(arch: str = 'P5', plugins: Optional[Union[dict, List[dict]]] = None, deepen_factor: float = 1.0, widen_factor: float = 1.0, input_channels: int = 3, out_indices: Tuple[int] = (2, 3, 4), frozen_stages: int = - 1, norm_cfg: Union[mmengine.config.config.ConfigDict, dict] = {'eps': 0.001, 'momentum': 0.03, 'type': 'BN'}, act_cfg: Union[mmengine.config.config.ConfigDict, dict] = {'inplace': True, 'type': 'SiLU'}, norm_eval: bool = False, init_cfg: Optional[Union[mmengine.config.config.ConfigDict, dict, List[Union[dict, mmengine.config.config.ConfigDict]]]] = None)[source]¶

CSP-Darknet backbone used in YOLOv5. :param arch: Architecture of CSP-Darknet, from {P5, P6}.

Defaults to P5.

Parameters

plugins (list[dict]) –
List of plugins for stages, each dict contains: - cfg (dict, required): Cfg dict to build plugin. - stages (tuple[bool], optional): Stages to apply plugin, length

should be same as ‘num_stages’.
deepen_factor (float) – Depth multiplier, multiply number of blocks in CSP layer by this amount. Defaults to 1.0.
widen_factor (float) – Width multiplier, multiply number of channels in each layer by this amount. Defaults to 1.0.
input_channels (int) – Number of input image channels. Defaults to: 3.
out_indices (Tuple[int]) – Output from which stages. Defaults to (2, 3, 4).
frozen_stages (int) – Stages to be frozen (stop grad and set eval mode). -1 means not freezing any parameters. Defaults to -1.
norm_cfg (dict) – Dictionary to construct and config norm layer. Defaults to dict(type=’BN’, requires_grad=True).
act_cfg (dict) – Config dict for activation layer. Defaults to dict(type=’SiLU’, inplace=True).
norm_eval (bool) – Whether to set norm layers to eval mode, namely, freeze running stats (mean and var). Note: Effect on Batch Norm and its variants only. Defaults to False.
init_cfg (Union[dict,list[dict]], optional) – Initialization config dict. Defaults to None.

Example

>>> from mmyolo.models import YOLOv5CSPDarknet
>>> import torch
>>> model = YOLOv5CSPDarknet()
>>> model.eval()
>>> inputs = torch.rand(1, 3, 416, 416)
>>> level_outputs = model(inputs)
>>> for level_out in level_outputs:
...     print(tuple(level_out.shape))
...
(1, 256, 52, 52)
(1, 512, 26, 26)
(1, 1024, 13, 13)

build_stage_layer(stage_idx: int, setting: list) → list[source]¶

Build a stage layer.

Parameters

stage_idx (int) – The index of a stage layer.
setting (list) – The architecture setting of a stage layer.

build_stem_layer() → torch.nn.modules.module.Module[source]¶: Build a stem layer.

init_weights()[source]¶: Initialize the parameters.

class mmyolo.models.backbones.YOLOv6CSPBep(arch: str = 'P5', plugins: Optional[Union[dict, List[dict]]] = None, deepen_factor: float = 1.0, widen_factor: float = 1.0, input_channels: int = 3, hidden_ratio: float = 0.5, out_indices: Tuple[int] = (2, 3, 4), frozen_stages: int = - 1, use_cspsppf: bool = False, norm_cfg: Union[mmengine.config.config.ConfigDict, dict] = {'eps': 0.001, 'momentum': 0.03, 'type': 'BN'}, act_cfg: Union[mmengine.config.config.ConfigDict, dict] = {'inplace': True, 'type': 'SiLU'}, norm_eval: bool = False, block_cfg: Union[mmengine.config.config.ConfigDict, dict] = {'type': 'ConvWrapper'}, init_cfg: Optional[Union[mmengine.config.config.ConfigDict, dict, List[Union[dict, mmengine.config.config.ConfigDict]]]] = None)[source]¶

CSPBep backbone used in YOLOv6. :param arch: Architecture of BaseDarknet, from {P5, P6}.

Defaults to P5.

Parameters

plugins (list[dict]) –
List of plugins for stages, each dict contains: - cfg (dict, required): Cfg dict to build plugin. - stages (tuple[bool], optional): Stages to apply plugin, length

should be same as ‘num_stages’.
deepen_factor (float) – Depth multiplier, multiply number of blocks in CSP layer by this amount. Defaults to 1.0.
widen_factor (float) – Width multiplier, multiply number of channels in each layer by this amount. Defaults to 1.0.
input_channels (int) – Number of input image channels. Defaults to 3.
out_indices (Tuple[int]) – Output from which stages. Defaults to (2, 3, 4).
frozen_stages (int) – Stages to be frozen (stop grad and set eval mode). -1 means not freezing any parameters. Defaults to -1.
norm_cfg (dict) – Dictionary to construct and config norm layer. Defaults to dict(type=’BN’, requires_grad=True).
act_cfg (dict) – Config dict for activation layer. Defaults to dict(type=’LeakyReLU’, negative_slope=0.1).
norm_eval (bool) – Whether to set norm layers to eval mode, namely, freeze running stats (mean and var). Note: Effect on Batch Norm and its variants only. Defaults to False.
block_cfg (dict) – Config dict for the block used to build each layer. Defaults to dict(type=’RepVGGBlock’).
block_act_cfg (dict) – Config dict for activation layer used in each stage. Defaults to dict(type=’SiLU’, inplace=True).
init_cfg (Union[dict, list[dict]], optional) – Initialization config dict. Defaults to None.

Example

>>> from mmyolo.models import YOLOv6CSPBep
>>> import torch
>>> model = YOLOv6CSPBep()
>>> model.eval()
>>> inputs = torch.rand(1, 3, 416, 416)
>>> level_outputs = model(inputs)
>>> for level_out in level_outputs:
...     print(tuple(level_out.shape))
...
(1, 256, 52, 52)
(1, 512, 26, 26)
(1, 1024, 13, 13)

build_stage_layer(stage_idx: int, setting: list) → list[source]¶

Build a stage layer.

Parameters

stage_idx (int) – The index of a stage layer.
setting (list) – The architecture setting of a stage layer.

class mmyolo.models.backbones.YOLOv6EfficientRep(arch: str = 'P5', plugins: Optional[Union[dict, List[dict]]] = None, deepen_factor: float = 1.0, widen_factor: float = 1.0, input_channels: int = 3, out_indices: Tuple[int] = (2, 3, 4), frozen_stages: int = - 1, use_cspsppf: bool = False, norm_cfg: Union[mmengine.config.config.ConfigDict, dict] = {'eps': 0.001, 'momentum': 0.03, 'type': 'BN'}, act_cfg: Union[mmengine.config.config.ConfigDict, dict] = {'inplace': True, 'type': 'ReLU'}, norm_eval: bool = False, block_cfg: Union[mmengine.config.config.ConfigDict, dict] = {'type': 'RepVGGBlock'}, init_cfg: Optional[Union[mmengine.config.config.ConfigDict, dict, List[Union[dict, mmengine.config.config.ConfigDict]]]] = None)[source]¶

EfficientRep backbone used in YOLOv6. :param arch: Architecture of BaseDarknet, from {P5, P6}.

Defaults to P5.

Parameters

plugins (list[dict]) –
List of plugins for stages, each dict contains: - cfg (dict, required): Cfg dict to build plugin. - stages (tuple[bool], optional): Stages to apply plugin, length

should be same as ‘num_stages’.
deepen_factor (float) – Depth multiplier, multiply number of blocks in CSP layer by this amount. Defaults to 1.0.
widen_factor (float) – Width multiplier, multiply number of channels in each layer by this amount. Defaults to 1.0.
input_channels (int) – Number of input image channels. Defaults to 3.
out_indices (Tuple[int]) – Output from which stages. Defaults to (2, 3, 4).
frozen_stages (int) – Stages to be frozen (stop grad and set eval mode). -1 means not freezing any parameters. Defaults to -1.
norm_cfg (dict) – Dictionary to construct and config norm layer. Defaults to dict(type=’BN’, requires_grad=True).
act_cfg (dict) – Config dict for activation layer. Defaults to dict(type=’LeakyReLU’, negative_slope=0.1).
norm_eval (bool) – Whether to set norm layers to eval mode, namely, freeze running stats (mean and var). Note: Effect on Batch Norm and its variants only. Defaults to False.
block_cfg (dict) – Config dict for the block used to build each layer. Defaults to dict(type=’RepVGGBlock’).
init_cfg (Union[dict, list[dict]], optional) – Initialization config dict. Defaults to None.

Example

>>> from mmyolo.models import YOLOv6EfficientRep
>>> import torch
>>> model = YOLOv6EfficientRep()
>>> model.eval()
>>> inputs = torch.rand(1, 3, 416, 416)
>>> level_outputs = model(inputs)
>>> for level_out in level_outputs:
...     print(tuple(level_out.shape))
...
(1, 256, 52, 52)
(1, 512, 26, 26)
(1, 1024, 13, 13)

build_stage_layer(stage_idx: int, setting: list) → list[source]¶

Build a stage layer.

Parameters

stage_idx (int) – The index of a stage layer.
setting (list) – The architecture setting of a stage layer.

build_stem_layer() → torch.nn.modules.module.Module[source]¶: Build a stem layer.

init_weights()[source]¶: Initialize the weights.

class mmyolo.models.backbones.YOLOv7Backbone(arch: str = 'L', deepen_factor: float = 1.0, widen_factor: float = 1.0, input_channels: int = 3, out_indices: Tuple[int] = (2, 3, 4), frozen_stages: int = - 1, plugins: Optional[Union[dict, List[dict]]] = None, norm_cfg: Union[mmengine.config.config.ConfigDict, dict] = {'eps': 0.001, 'momentum': 0.03, 'type': 'BN'}, act_cfg: Union[mmengine.config.config.ConfigDict, dict] = {'inplace': True, 'type': 'SiLU'}, norm_eval: bool = False, init_cfg: Optional[Union[mmengine.config.config.ConfigDict, dict, List[Union[dict, mmengine.config.config.ConfigDict]]]] = None)[source]¶

Backbone used in YOLOv7.

Parameters

arch (str) – Architecture of YOLOv7Defaults to L.
deepen_factor (float) – Depth multiplier, multiply number of blocks in CSP layer by this amount. Defaults to 1.0.
widen_factor (float) – Width multiplier, multiply number of channels in each layer by this amount. Defaults to 1.0.
out_indices (Sequence[int]) – Output from which stages. Defaults to (2, 3, 4).
frozen_stages (int) – Stages to be frozen (stop grad and set eval mode). -1 means not freezing any parameters. Defaults to -1.
plugins (list[dict]) –
List of plugins for stages, each dict contains:
- cfg (dict, required): Cfg dict to build plugin.
- stages (tuple[bool], optional): Stages to apply plugin, length should be same as ‘num_stages’.
norm_cfg (ConfigDict or dict) – Dictionary to construct and config norm layer. Defaults to dict(type=’BN’, requires_grad=True).
act_cfg (ConfigDict or dict) – Config dict for activation layer. Defaults to dict(type=’SiLU’, inplace=True).
norm_eval (bool) – Whether to set norm layers to eval mode, namely, freeze running stats (mean and var). Note: Effect on Batch Norm and its variants only.

:param init_cfg (ConfigDict or dict or list[dict] or: list[ConfigDict]): Initialization config dict.

build_stage_layer(stage_idx: int, setting: list) → list[source]¶

Build a stage layer.

Parameters

stage_idx (int) – The index of a stage layer.
setting (list) – The architecture setting of a stage layer.

build_stem_layer() → torch.nn.modules.module.Module[source]¶: Build a stem layer.

class mmyolo.models.backbones.YOLOv8CSPDarknet(arch: str = 'P5', last_stage_out_channels: int = 1024, plugins: Optional[Union[dict, List[dict]]] = None, deepen_factor: float = 1.0, widen_factor: float = 1.0, input_channels: int = 3, out_indices: Tuple[int] = (2, 3, 4), frozen_stages: int = - 1, norm_cfg: Union[mmengine.config.config.ConfigDict, dict] = {'eps': 0.001, 'momentum': 0.03, 'type': 'BN'}, act_cfg: Union[mmengine.config.config.ConfigDict, dict] = {'inplace': True, 'type': 'SiLU'}, norm_eval: bool = False, init_cfg: Optional[Union[mmengine.config.config.ConfigDict, dict, List[Union[dict, mmengine.config.config.ConfigDict]]]] = None)[source]¶

CSP-Darknet backbone used in YOLOv8.

Parameters

arch (str) – Architecture of CSP-Darknet, from {P5}. Defaults to P5.
last_stage_out_channels (int) – Final layer output channel. Defaults to 1024.
plugins (list[dict]) –
List of plugins for stages, each dict contains: - cfg (dict, required): Cfg dict to build plugin. - stages (tuple[bool], optional): Stages to apply plugin, length

should be same as ‘num_stages’.
deepen_factor (float) – Depth multiplier, multiply number of blocks in CSP layer by this amount. Defaults to 1.0.
widen_factor (float) – Width multiplier, multiply number of channels in each layer by this amount. Defaults to 1.0.
input_channels (int) – Number of input image channels. Defaults to: 3.
out_indices (Tuple[int]) – Output from which stages. Defaults to (2, 3, 4).
frozen_stages (int) – Stages to be frozen (stop grad and set eval mode). -1 means not freezing any parameters. Defaults to -1.
norm_cfg (dict) – Dictionary to construct and config norm layer. Defaults to dict(type=’BN’, requires_grad=True).
act_cfg (dict) – Config dict for activation layer. Defaults to dict(type=’SiLU’, inplace=True).
norm_eval (bool) – Whether to set norm layers to eval mode, namely, freeze running stats (mean and var). Note: Effect on Batch Norm and its variants only. Defaults to False.
init_cfg (Union[dict,list[dict]], optional) – Initialization config dict. Defaults to None.

Example

>>> from mmyolo.models import YOLOv8CSPDarknet
>>> import torch
>>> model = YOLOv8CSPDarknet()
>>> model.eval()
>>> inputs = torch.rand(1, 3, 416, 416)
>>> level_outputs = model(inputs)
>>> for level_out in level_outputs:
...     print(tuple(level_out.shape))
...
(1, 256, 52, 52)
(1, 512, 26, 26)
(1, 1024, 13, 13)

build_stage_layer(stage_idx: int, setting: list) → list[source]¶

Build a stage layer.

Parameters

stage_idx (int) – The index of a stage layer.
setting (list) – The architecture setting of a stage layer.

build_stem_layer() → torch.nn.modules.module.Module[source]¶: Build a stem layer.

init_weights()[source]¶: Initialize the parameters.

data_preprocessor¶

dense_heads¶

class mmyolo.models.dense_heads.PPYOLOEHead(head_module: Union[mmengine.config.config.ConfigDict, dict], prior_generator: Union[mmengine.config.config.ConfigDict, dict] = {'offset': 0.5, 'strides': [8, 16, 32], 'type': 'mmdet.MlvlPointGenerator'}, bbox_coder: Union[mmengine.config.config.ConfigDict, dict] = {'type': 'DistancePointBBoxCoder'}, loss_cls: Union[mmengine.config.config.ConfigDict, dict] = {'alpha': 0.75, 'gamma': 2.0, 'iou_weighted': True, 'loss_weight': 1.0, 'reduction': 'sum', 'type': 'mmdet.VarifocalLoss', 'use_sigmoid': True}, loss_bbox: Union[mmengine.config.config.ConfigDict, dict] = {'bbox_format': 'xyxy', 'iou_mode': 'giou', 'loss_weight': 2.5, 'reduction': 'mean', 'return_iou': False, 'type': 'IoULoss'}, loss_dfl: Union[mmengine.config.config.ConfigDict, dict] = {'loss_weight': 0.125, 'reduction': 'mean', 'type': 'mmdet.DistributionFocalLoss'}, train_cfg: Optional[Union[mmengine.config.config.ConfigDict, dict]] = None, test_cfg: Optional[Union[mmengine.config.config.ConfigDict, dict]] = None, init_cfg: Optional[Union[mmengine.config.config.ConfigDict, dict, List[Union[dict, mmengine.config.config.ConfigDict]]]] = None)[source]¶

PPYOLOEHead head used in PPYOLOE. The YOLOv6 head and the PPYOLOE head are only slightly different. Distribution focal loss is extra used in PPYOLOE, but not in YOLOv6.

Parameters

head_module (ConfigType) – Base module used for YOLOv5Head
prior_generator (dict) – Points generator feature maps in 2D points-based detectors.
bbox_coder (ConfigDict or dict) – Config of bbox coder.
loss_cls (ConfigDict or dict) – Config of classification loss.
loss_bbox (ConfigDict or dict) – Config of localization loss.
loss_dfl (ConfigDict or dict) – Config of distribution focal loss.
train_cfg (ConfigDict or dict, optional) – Training config of anchor head. Defaults to None.
test_cfg (ConfigDict or dict, optional) – Testing config of anchor head. Defaults to None.

:param init_cfg (ConfigDict or list[ConfigDict] or dict or: list[dict], optional): Initialization config dict.: Defaults to None.

loss_by_feat(cls_scores: Sequence[torch.Tensor], bbox_preds: Sequence[torch.Tensor], bbox_dist_preds: Sequence[torch.Tensor], batch_gt_instances: Sequence[mmengine.structures.instance_data.InstanceData], batch_img_metas: Sequence[dict], batch_gt_instances_ignore: Optional[List[mmengine.structures.instance_data.InstanceData]] = None) → dict[source]¶

Calculate the loss based on the features extracted by the detection head.

Parameters

cls_scores (Sequence[Tensor]) – Box scores for each scale level, each is a 4D-tensor, the channel number is num_priors * num_classes.
bbox_preds (Sequence[Tensor]) – Box energies / deltas for each scale level, each is a 4D-tensor, the channel number is num_priors * 4.
bbox_dist_preds (Sequence[Tensor]) – Box distribution logits for each scale level with shape (bs, reg_max + 1, H*W, 4).
batch_gt_instances (list[InstanceData]) – Batch of gt_instance. It usually includes bboxes and labels attributes.
batch_img_metas (list[dict]) – Meta information of each image, e.g., image size, scaling factor, etc.
batch_gt_instances_ignore (list[InstanceData], optional) – Batch of gt_instances_ignore. It includes bboxes attribute data that is ignored during training and testing. Defaults to None.

Returns

A dictionary of losses.

Return type

dict[str, Tensor]

class mmyolo.models.dense_heads.PPYOLOEHeadModule(num_classes: int, in_channels: Union[int, Sequence], widen_factor: float = 1.0, num_base_priors: int = 1, featmap_strides: Sequence[int] = (8, 16, 32), reg_max: int = 16, norm_cfg: Union[mmengine.config.config.ConfigDict, dict] = {'eps': 1e-05, 'momentum': 0.1, 'type': 'BN'}, act_cfg: Union[mmengine.config.config.ConfigDict, dict] = {'inplace': True, 'type': 'SiLU'}, init_cfg: Optional[Union[mmengine.config.config.ConfigDict, dict, List[Union[dict, mmengine.config.config.ConfigDict]]]] = None)[source]¶

PPYOLOEHead head module used in `PPYOLOE.

<https://arxiv.org/abs/2203.16250>`_.

Parameters

num_classes (int) – Number of categories excluding the background category.
in_channels (int) – Number of channels in the input feature map.
widen_factor (float) – Width multiplier, multiply number of channels in each layer by this amount. Defaults to 1.0.
num_base_priors (int) – The number of priors (points) at a point on the feature grid.
featmap_strides (Sequence[int]) – Downsample factor of each feature map. Defaults to (8, 16, 32).
reg_max (int) – Max value of integral set :math: {0, ..., reg_max} in QFL setting. Defaults to 16.
norm_cfg (dict) – Config dict for normalization layer. Defaults to dict(type=’BN’, momentum=0.03, eps=0.001).
act_cfg (dict) – Config dict for activation layer. Defaults to dict(type=’SiLU’, inplace=True).
init_cfg (dict or list[dict], optional) – Initialization config dict. Defaults to None.

forward(x: Tuple[torch.Tensor]) → torch.Tensor[source]¶

Forward features from the upstream network.

Parameters: x (Tuple[Tensor]) – Features from the upstream network, each is a 4D-tensor.
Returns: A tuple of multi-level classification scores, bbox predictions.
Return type: Tuple[List]

forward_single(x: torch.Tensor, cls_stem: torch.nn.modules.container.ModuleList, cls_pred: torch.nn.modules.container.ModuleList, reg_stem: torch.nn.modules.container.ModuleList, reg_pred: torch.nn.modules.container.ModuleList) → torch.Tensor[source]¶: Forward feature of a single scale level.

init_weights(prior_prob=0.01)[source]¶: Initialize the weight and bias of PPYOLOE head.

class mmyolo.models.dense_heads.RTMDetHead(head_module: Union[mmengine.config.config.ConfigDict, dict], prior_generator: Union[mmengine.config.config.ConfigDict, dict] = {'offset': 0, 'strides': [8, 16, 32], 'type': 'mmdet.MlvlPointGenerator'}, bbox_coder: Union[mmengine.config.config.ConfigDict, dict] = {'type': 'DistancePointBBoxCoder'}, loss_cls: Union[mmengine.config.config.ConfigDict, dict] = {'beta': 2.0, 'loss_weight': 1.0, 'type': 'mmdet.QualityFocalLoss', 'use_sigmoid': True}, loss_bbox: Union[mmengine.config.config.ConfigDict, dict] = {'loss_weight': 2.0, 'type': 'mmdet.GIoULoss'}, train_cfg: Optional[Union[mmengine.config.config.ConfigDict, dict]] = None, test_cfg: Optional[Union[mmengine.config.config.ConfigDict, dict]] = None, init_cfg: Optional[Union[mmengine.config.config.ConfigDict, dict, List[Union[dict, mmengine.config.config.ConfigDict]]]] = None)[source]¶

RTMDet head.

Parameters

head_module (ConfigType) – Base module used for RTMDetHead
prior_generator – Points generator feature maps in 2D points-based detectors.
bbox_coder (ConfigDict or dict) – Config of bbox coder.
loss_cls (ConfigDict or dict) – Config of classification loss.
loss_bbox (ConfigDict or dict) – Config of localization loss.
train_cfg (ConfigDict or dict, optional) – Training config of anchor head. Defaults to None.
test_cfg (ConfigDict or dict, optional) – Testing config of anchor head. Defaults to None.

:param init_cfg (ConfigDict or list[ConfigDict] or dict or: list[dict], optional): Initialization config dict.: Defaults to None.

forward(x: Tuple[torch.Tensor]) → Tuple[List][source]¶

Forward features from the upstream network.

Parameters: x (Tuple[Tensor]) – Features from the upstream network, each is a 4D-tensor.
Returns: A tuple of multi-level classification scores, bbox predictions, and objectnesses.
Return type: Tuple[List]

loss_by_feat(cls_scores: List[torch.Tensor], bbox_preds: List[torch.Tensor], batch_gt_instances: List[mmengine.structures.instance_data.InstanceData], batch_img_metas: List[dict], batch_gt_instances_ignore: Optional[List[mmengine.structures.instance_data.InstanceData]] = None) → dict[source]¶

Compute losses of the head.

Parameters

cls_scores (list[Tensor]) – Box scores for each scale level Has shape (N, num_anchors * num_classes, H, W)
bbox_preds (list[Tensor]) – Decoded box for each scale level with shape (N, num_anchors * 4, H, W) in [tl_x, tl_y, br_x, br_y] format.
batch_gt_instances (list[InstanceData]) – Batch of gt_instance. It usually includes bboxes and labels attributes.
batch_img_metas (list[dict]) – Meta information of each image, e.g., image size, scaling factor, etc.
batch_gt_instances_ignore (list[InstanceData], Optional) – Batch of gt_instances_ignore. It includes bboxes attribute data that is ignored during training and testing. Defaults to None.

Returns

A dictionary of loss components.

Return type

dict[str, Tensor]

special_init()[source]¶

Since YOLO series algorithms will inherit from YOLOv5Head, but different algorithms have special initialization process.

The special_init function is designed to deal with this situation.

class mmyolo.models.dense_heads.RTMDetInsSepBNHead(head_module: Union[mmengine.config.config.ConfigDict, dict], prior_generator: Union[mmengine.config.config.ConfigDict, dict] = {'offset': 0, 'strides': [8, 16, 32], 'type': 'mmdet.MlvlPointGenerator'}, bbox_coder: Union[mmengine.config.config.ConfigDict, dict] = {'type': 'DistancePointBBoxCoder'}, loss_cls: Union[mmengine.config.config.ConfigDict, dict] = {'beta': 2.0, 'loss_weight': 1.0, 'type': 'mmdet.QualityFocalLoss', 'use_sigmoid': True}, loss_bbox: Union[mmengine.config.config.ConfigDict, dict] = {'loss_weight': 2.0, 'type': 'mmdet.GIoULoss'}, loss_mask={'eps': 5e-06, 'loss_weight': 2.0, 'reduction': 'mean', 'type': 'mmdet.DiceLoss'}, train_cfg: Optional[Union[mmengine.config.config.ConfigDict, dict]] = None, test_cfg: Optional[Union[mmengine.config.config.ConfigDict, dict]] = None, init_cfg: Optional[Union[mmengine.config.config.ConfigDict, dict, List[Union[dict, mmengine.config.config.ConfigDict]]]] = None)[source]¶

RTMDet Instance Segmentation head.

Parameters

head_module (ConfigType) – Base module used for RTMDetInsSepBNHead
prior_generator – Points generator feature maps in 2D points-based detectors.
bbox_coder (ConfigDict or dict) – Config of bbox coder.
loss_cls (ConfigDict or dict) – Config of classification loss.
loss_bbox (ConfigDict or dict) – Config of localization loss.
loss_mask (ConfigDict or dict) – Config of mask loss.
train_cfg (ConfigDict or dict, optional) – Training config of anchor head. Defaults to None.
test_cfg (ConfigDict or dict, optional) – Testing config of anchor head. Defaults to None.

:param init_cfg (ConfigDict or list[ConfigDict] or dict or: list[dict], optional): Initialization config dict.: Defaults to None.

loss_by_feat(cls_scores: List[torch.Tensor], bbox_preds: List[torch.Tensor], batch_gt_instances: List[mmengine.structures.instance_data.InstanceData], batch_img_metas: List[dict], batch_gt_instances_ignore: Optional[List[mmengine.structures.instance_data.InstanceData]] = None) → dict[source]¶

Compute losses of the head.

Parameters

cls_scores (list[Tensor]) – Box scores for each scale level Has shape (N, num_anchors * num_classes, H, W)
bbox_preds (list[Tensor]) – Decoded box for each scale level with shape (N, num_anchors * 4, H, W) in [tl_x, tl_y, br_x, br_y] format.
batch_gt_instances (list[InstanceData]) – Batch of gt_instance. It usually includes bboxes and labels attributes.
batch_img_metas (list[dict]) – Meta information of each image, e.g., image size, scaling factor, etc.
batch_gt_instances_ignore (list[InstanceData], Optional) – Batch of gt_instances_ignore. It includes bboxes attribute data that is ignored during training and testing. Defaults to None.

Returns

A dictionary of loss components.

Return type

dict[str, Tensor]

parse_dynamic_params(flatten_kernels: torch.Tensor) → tuple[source]¶: split kernel head prediction to conv weight and bias.

predict_by_feat(cls_scores: List[torch.Tensor], bbox_preds: List[torch.Tensor], kernel_preds: List[torch.Tensor], mask_feats: torch.Tensor, score_factors: Optional[List[torch.Tensor]] = None, batch_img_metas: Optional[List[dict]] = None, cfg: Optional[mmengine.config.config.ConfigDict] = None, rescale: bool = True, with_nms: bool = True) → List[mmengine.structures.instance_data.InstanceData][source]¶

Transform a batch of output features extracted from the head into bbox results.

Note: When score_factors is not None, the cls_scores are usually multiplied by it then obtain the real score used in NMS.

Parameters

cls_scores (list[Tensor]) – Classification scores for all scale levels, each is a 4D-tensor, has shape (batch_size, num_priors * num_classes, H, W).
bbox_preds (list[Tensor]) – Box energies / deltas for all scale levels, each is a 4D-tensor, has shape (batch_size, num_priors * 4, H, W).
kernel_preds (list[Tensor]) – Kernel predictions of dynamic convs for all scale levels, each is a 4D-tensor, has shape (batch_size, num_params, H, W).
mask_feats (Tensor) – Mask prototype features extracted from the mask head, has shape (batch_size, num_prototypes, H, W).
score_factors (list[Tensor], optional) – Score factor for all scale level, each is a 4D-tensor, has shape (batch_size, num_priors * 1, H, W). Defaults to None.
batch_img_metas (list[dict], Optional) – Batch image meta info. Defaults to None.
cfg (ConfigDict, optional) – Test / postprocessing configuration, if None, test_cfg would be used. Defaults to None.
rescale (bool) – If True, return boxes in original image space. Defaults to False.
with_nms (bool) – If True, do nms before return boxes. Defaults to True.

Returns

Object detection and instance segmentation results of each image after the post process. Each item usually contains following keys.

scores (Tensor): Classification scores, has a shape (num_instance, )

labels (Tensor): Labels of bboxes, has a shape (num_instances, ).

bboxes (Tensor): Has a shape (num_instances, 4), the last dimension 4 arrange as (x1, y1, x2, y2).

masks (Tensor): Has a shape (num_instances, h, w).

Return type

list[InstanceData]

class mmyolo.models.dense_heads.RTMDetInsSepBNHeadModule(num_classes: int, *args, num_prototypes: int = 8, dyconv_channels: int = 8, num_dyconvs: int = 3, use_sigmoid_cls: bool = True, **kwargs)[source]¶

Detection and Instance Segmentation Head of RTMDet.

Parameters

num_classes (int) – Number of categories excluding the background category.
num_prototypes (int) – Number of mask prototype features extracted from the mask head. Defaults to 8.
dyconv_channels (int) – Channel of the dynamic conv layers. Defaults to 8.
num_dyconvs (int) – Number of the dynamic convolution layers. Defaults to 3.
use_sigmoid_cls (bool) – Use sigmoid for class prediction. Defaults to True.

forward(feats: Tuple[torch.Tensor, ...]) → tuple[source]¶

Forward features from the upstream network.

Parameters

feats (tuple[Tensor]) – Features from the upstream network, each is a 4D-tensor.

Returns

Usually a tuple of classification scores and bbox prediction - cls_scores (list[Tensor]): Classification scores for all scale

levels, each is a 4D-tensor, the channels number is num_base_priors * num_classes.

bbox_preds (list[Tensor]): Box energies / deltas for all scale levels, each is a 4D-tensor, the channels number is num_base_priors * 4.
kernel_preds (list[Tensor]): Dynamic conv kernels for all scale levels, each is a 4D-tensor, the channels number is num_gen_params.
mask_feat (Tensor): Mask prototype features.
Has shape (batch_size, num_prototypes, H, W).

Return type

tuple

init_weights() → None[source]¶: Initialize weights of the head.

class mmyolo.models.dense_heads.RTMDetRotatedHead(head_module: Union[mmengine.config.config.ConfigDict, dict], prior_generator: Union[mmengine.config.config.ConfigDict, dict] = {'offset': 0, 'strides': [8, 16, 32], 'type': 'mmdet.MlvlPointGenerator'}, bbox_coder: Union[mmengine.config.config.ConfigDict, dict] = {'type': 'DistanceAnglePointCoder'}, loss_cls: Union[mmengine.config.config.ConfigDict, dict] = {'beta': 2.0, 'loss_weight': 1.0, 'type': 'mmdet.QualityFocalLoss', 'use_sigmoid': True}, loss_bbox: Union[mmengine.config.config.ConfigDict, dict] = {'loss_weight': 2.0, 'mode': 'linear', 'type': 'mmrotate.RotatedIoULoss'}, angle_version: str = 'le90', use_hbbox_loss: bool = False, angle_coder: Union[mmengine.config.config.ConfigDict, dict] = {'type': 'mmrotate.PseudoAngleCoder'}, loss_angle: Optional[Union[mmengine.config.config.ConfigDict, dict]] = None, train_cfg: Optional[Union[mmengine.config.config.ConfigDict, dict]] = None, test_cfg: Optional[Union[mmengine.config.config.ConfigDict, dict]] = None, init_cfg: Optional[Union[mmengine.config.config.ConfigDict, dict, List[Union[dict, mmengine.config.config.ConfigDict]]]] = None)[source]¶

RTMDet-R head.

Compared with RTMDetHead, RTMDetRotatedHead add some args to support rotated object detection.

angle_version used to limit angle_range during training.
angle_coder used to encode and decode angle, which is similar to bbox_coder.

use_hbbox_loss and loss_angle allow custom regression loss calculation for rotated box.

There are three combination options for regression:

use_hbbox_loss=False and loss_angle is None.

bbox_pred────(tblr)───┐
                      ▼
angle_pred          decode──►rbox_pred──(xywha)─►loss_bbox
    │                 ▲
    └────►decode──(a)─┘

use_hbbox_loss=False and loss_angle is specified. A angle loss is added on angle_pred.

bbox_pred────(tblr)───┐
                      ▼
angle_pred          decode──►rbox_pred──(xywha)─►loss_bbox
    │                 ▲
    ├────►decode──(a)─┘
    │
    └───────────────────────────────────────────►loss_angle

use_hbbox_loss=True and loss_angle is specified. In this case the loss_angle must be set.

bbox_pred──(tblr)──►decode──►hbox_pred──(xyxy)──►loss_bbox

angle_pred──────────────────────────────────────►loss_angle

There’s a decoded_with_angle flag in test_cfg, which is similar to training process.

When decoded_with_angle=True:

bbox_pred────(tblr)───┐
                      ▼
angle_pred          decode──(xywha)──►rbox_pred
    │                 ▲
    └────►decode──(a)─┘

When decoded_with_angle=False:

bbox_pred──(tblr)─►decode
                      │ (xyxy)
                      ▼
                   format───(xywh)──►concat──(xywha)──►rbox_pred
                                       ▲
angle_pred────────►decode────(a)───────┘

Parameters

head_module (ConfigType) – Base module used for RTMDetRotatedHead.
prior_generator – Points generator feature maps in 2D points-based detectors.
bbox_coder (ConfigDict or dict) – Config of bbox coder.
loss_cls (ConfigDict or dict) – Config of classification loss.
loss_bbox (ConfigDict or dict) – Config of localization loss.
angle_version (str) – Angle representations. Defaults to ‘le90’.
use_hbbox_loss (bool) – If true, use horizontal bbox loss and loss_angle should not be None. Default to False.
angle_coder (ConfigDict or dict) – Config of angle coder.
loss_angle (ConfigDict or dict, optional) – Config of angle loss.
train_cfg (ConfigDict or dict, optional) – Training config of anchor head. Defaults to None.
test_cfg (ConfigDict or dict, optional) – Testing config of anchor head. Defaults to None.

:param init_cfg (ConfigDict or list[ConfigDict] or dict or: list[dict], optional): Initialization config dict.: Defaults to None.

loss_by_feat(cls_scores: List[torch.Tensor], bbox_preds: List[torch.Tensor], angle_preds: List[torch.Tensor], batch_gt_instances: List[mmengine.structures.instance_data.InstanceData], batch_img_metas: List[dict], batch_gt_instances_ignore: Optional[List[mmengine.structures.instance_data.InstanceData]] = None) → dict[source]¶

Compute losses of the head.

Parameters

cls_scores (list[Tensor]) – Box scores for each scale level Has shape (N, num_anchors * num_classes, H, W)
bbox_preds (list[Tensor]) – Decoded box for each scale level with shape (N, num_anchors * 4, H, W) in [tl_x, tl_y, br_x, br_y] format.
angle_preds (list[Tensor]) – Angle prediction for each scale level with shape (N, num_anchors * angle_out_dim, H, W).
batch_gt_instances (list[InstanceData]) – Batch of gt_instance. It usually includes bboxes and labels attributes.
batch_img_metas (list[dict]) – Meta information of each image, e.g., image size, scaling factor, etc.
batch_gt_instances_ignore (list[InstanceData], Optional) – Batch of gt_instances_ignore. It includes bboxes attribute data that is ignored during training and testing. Defaults to None.

Returns

A dictionary of loss components.

Return type

dict[str, Tensor]

predict_by_feat(cls_scores: List[torch.Tensor], bbox_preds: List[torch.Tensor], angle_preds: List[torch.Tensor], objectnesses: Optional[List[torch.Tensor]] = None, batch_img_metas: Optional[List[dict]] = None, cfg: Optional[mmengine.config.config.ConfigDict] = None, rescale: bool = True, with_nms: bool = True) → List[mmengine.structures.instance_data.InstanceData][source]¶

Transform a batch of output features extracted by the head into bbox results.

Parameters

cls_scores (list[Tensor]) – Classification scores for all scale levels, each is a 4D-tensor, has shape (batch_size, num_priors * num_classes, H, W).
bbox_preds (list[Tensor]) – Box energies / deltas for all scale levels, each is a 4D-tensor, has shape (batch_size, num_priors * 4, H, W).
angle_preds (list[Tensor]) – Box angle for each scale level with shape (N, num_points * angle_dim, H, W)
objectnesses (list[Tensor], Optional) – Score factor for all scale level, each is a 4D-tensor, has shape (batch_size, 1, H, W).
batch_img_metas (list[dict], Optional) – Batch image meta info. Defaults to None.
cfg (ConfigDict, optional) – Test / postprocessing configuration, if None, test_cfg would be used. Defaults to None.
rescale (bool) – If True, return boxes in original image space. Defaults to False.
with_nms (bool) – If True, do nms before return boxes. Defaults to True.

Returns

Object detection results of each image after the post process. Each item usually contains following keys. - scores (Tensor): Classification scores, has a shape

(num_instance, )

labels (Tensor): Labels of bboxes, has a shape (num_instances, ).
bboxes (Tensor): Has a shape (num_instances, 5), the last dimension 4 arrange as (x, y, w, h, angle).

Return type

list[InstanceData]

class mmyolo.models.dense_heads.RTMDetRotatedSepBNHeadModule(num_classes: int, in_channels: int, widen_factor: float = 1.0, num_base_priors: int = 1, feat_channels: int = 256, stacked_convs: int = 2, featmap_strides: Sequence[int] = [8, 16, 32], share_conv: bool = True, pred_kernel_size: int = 1, angle_out_dim: int = 1, conv_cfg: Optional[Union[mmengine.config.config.ConfigDict, dict]] = None, norm_cfg: Union[mmengine.config.config.ConfigDict, dict] = {'type': 'BN'}, act_cfg: Union[mmengine.config.config.ConfigDict, dict] = {'inplace': True, 'type': 'SiLU'}, init_cfg: Optional[Union[mmengine.config.config.ConfigDict, dict, List[Union[dict, mmengine.config.config.ConfigDict]]]] = None)[source]¶

Detection Head Module of RTMDet-R.

Compared with RTMDet Detection Head Module, RTMDet-R adds a conv for angle prediction. An angle_out_dim arg is added, which is generated by the angle_coder module and controls the angle pred dim.

Parameters

num_classes (int) – Number of categories excluding the background category.
in_channels (int) – Number of channels in the input feature map.
widen_factor (float) – Width multiplier, multiply number of channels in each layer by this amount. Defaults to 1.0.
num_base_priors (int) – The number of priors (points) at a point on the feature grid. Defaults to 1.
feat_channels (int) – Number of hidden channels. Used in child classes. Defaults to 256
stacked_convs (int) – Number of stacking convs of the head. Defaults to 2.
featmap_strides (Sequence[int]) – Downsample factor of each feature map. Defaults to (8, 16, 32).
share_conv (bool) – Whether to share conv layers between stages. Defaults to True.
pred_kernel_size (int) – Kernel size of nn.Conv2d. Defaults to 1.
angle_out_dim (int) – Encoded length of angle, will passed by head. Defaults to 1.
conv_cfg (ConfigDict or dict, optional) – Config dict for convolution layer. Defaults to None.
norm_cfg (ConfigDict or dict) – Config dict for normalization layer. Defaults to dict(type='BN').
act_cfg (ConfigDict or dict) – Config dict for activation layer. Default: dict(type=’SiLU’, inplace=True).

:param init_cfg (ConfigDict or list[ConfigDict] or dict or: list[dict], optional): Initialization config dict.: Defaults to None.

forward(feats: Tuple[torch.Tensor, ...]) → tuple[source]¶

Forward features from the upstream network.

Parameters

feats (tuple[Tensor]) – Features from the upstream network, each is a 4D-tensor.

Returns

Usually a tuple of classification scores and bbox prediction - cls_scores (list[Tensor]): Classification scores for all scale

levels, each is a 4D-tensor, the channels number is num_base_priors * num_classes.

bbox_preds (list[Tensor]): Box energies / deltas for all scale levels, each is a 4D-tensor, the channels number is num_base_priors * 4.
angle_preds (list[Tensor]): Angle prediction for all scale levels, each is a 4D-tensor, the channels number is num_base_priors * angle_out_dim.

Return type

tuple

init_weights() → None[source]¶: Initialize weights of the head.

class mmyolo.models.dense_heads.RTMDetSepBNHeadModule(num_classes: int, in_channels: int, widen_factor: float = 1.0, num_base_priors: int = 1, feat_channels: int = 256, stacked_convs: int = 2, featmap_strides: Sequence[int] = [8, 16, 32], share_conv: bool = True, pred_kernel_size: int = 1, conv_cfg: Optional[Union[mmengine.config.config.ConfigDict, dict]] = None, norm_cfg: Union[mmengine.config.config.ConfigDict, dict] = {'type': 'BN'}, act_cfg: Union[mmengine.config.config.ConfigDict, dict] = {'inplace': True, 'type': 'SiLU'}, init_cfg: Optional[Union[mmengine.config.config.ConfigDict, dict, List[Union[dict, mmengine.config.config.ConfigDict]]]] = None)[source]¶

Detection Head of RTMDet.

Parameters

num_classes (int) – Number of categories excluding the background category.
in_channels (int) – Number of channels in the input feature map.
widen_factor (float) – Width multiplier, multiply number of channels in each layer by this amount. Defaults to 1.0.
num_base_priors (int) – The number of priors (points) at a point on the feature grid. Defaults to 1.
feat_channels (int) – Number of hidden channels. Used in child classes. Defaults to 256
stacked_convs (int) – Number of stacking convs of the head. Defaults to 2.
featmap_strides (Sequence[int]) – Downsample factor of each feature map. Defaults to (8, 16, 32).
share_conv (bool) – Whether to share conv layers between stages. Defaults to True.
pred_kernel_size (int) – Kernel size of nn.Conv2d. Defaults to 1.
conv_cfg (ConfigDict or dict, optional) – Config dict for convolution layer. Defaults to None.
norm_cfg (ConfigDict or dict) – Config dict for normalization layer. Defaults to dict(type='BN').
act_cfg (ConfigDict or dict) – Config dict for activation layer. Default: dict(type=’SiLU’, inplace=True).

:param init_cfg (ConfigDict or list[ConfigDict] or dict or: list[dict], optional): Initialization config dict.: Defaults to None.

forward(feats: Tuple[torch.Tensor, ...]) → tuple[source]¶

Forward features from the upstream network.

Parameters

feats (tuple[Tensor]) – Features from the upstream network, each is a 4D-tensor.

Returns

Usually a tuple of classification scores and bbox prediction - cls_scores (list[Tensor]): Classification scores for all scale

levels, each is a 4D-tensor, the channels number is num_base_priors * num_classes.

bbox_preds (list[Tensor]): Box energies / deltas for all scale levels, each is a 4D-tensor, the channels number is num_base_priors * 4.

Return type

tuple

init_weights() → None[source]¶: Initialize weights of the head.

class mmyolo.models.dense_heads.YOLOXHead(head_module: Union[mmengine.config.config.ConfigDict, dict], prior_generator: Union[mmengine.config.config.ConfigDict, dict] = {'offset': 0, 'strides': [8, 16, 32], 'type': 'mmdet.MlvlPointGenerator'}, bbox_coder: Union[mmengine.config.config.ConfigDict, dict] = {'type': 'YOLOXBBoxCoder'}, loss_cls: Union[mmengine.config.config.ConfigDict, dict] = {'loss_weight': 1.0, 'reduction': 'sum', 'type': 'mmdet.CrossEntropyLoss', 'use_sigmoid': True}, loss_bbox: Union[mmengine.config.config.ConfigDict, dict] = {'eps': 1e-16, 'loss_weight': 5.0, 'mode': 'square', 'reduction': 'sum', 'type': 'mmdet.IoULoss'}, loss_obj: Union[mmengine.config.config.ConfigDict, dict] = {'loss_weight': 1.0, 'reduction': 'sum', 'type': 'mmdet.CrossEntropyLoss', 'use_sigmoid': True}, loss_bbox_aux: Union[mmengine.config.config.ConfigDict, dict] = {'loss_weight': 1.0, 'reduction': 'sum', 'type': 'mmdet.L1Loss'}, train_cfg: Optional[Union[mmengine.config.config.ConfigDict, dict]] = None, test_cfg: Optional[Union[mmengine.config.config.ConfigDict, dict]] = None, init_cfg: Optional[Union[mmengine.config.config.ConfigDict, dict, List[Union[dict, mmengine.config.config.ConfigDict]]]] = None)[source]¶

YOLOXHead head used in YOLOX.

Parameters

head_module (ConfigType) – Base module used for YOLOXHead
prior_generator – Points generator feature maps in 2D points-based detectors.
loss_cls (ConfigDict or dict) – Config of classification loss.
loss_bbox (ConfigDict or dict) – Config of localization loss.
loss_obj (ConfigDict or dict) – Config of objectness loss.
loss_bbox_aux (ConfigDict or dict) – Config of bbox aux loss.
train_cfg (ConfigDict or dict, optional) – Training config of anchor head. Defaults to None.
test_cfg (ConfigDict or dict, optional) – Testing config of anchor head. Defaults to None.

:param init_cfg (ConfigDict or list[ConfigDict] or dict or: list[dict], optional): Initialization config dict.: Defaults to None.

forward(x: Tuple[torch.Tensor]) → Tuple[List][source]¶

Forward features from the upstream network.

Parameters: x (Tuple[Tensor]) – Features from the upstream network, each is a 4D-tensor.
Returns: A tuple of multi-level classification scores, bbox predictions, and objectnesses.
Return type: Tuple[List]

static gt_instances_preprocess(batch_gt_instances: torch.Tensor, batch_size: int) → List[mmengine.structures.instance_data.InstanceData][source]¶

Split batch_gt_instances with batch size.

Parameters

batch_gt_instances (Tensor) – Ground truth a 2D-Tensor for whole batch, shape [all_gt_bboxes, 6]
batch_size (int) – Batch size.

Returns

batch gt instances data, shape [batch_size, InstanceData]

Return type

List

loss_by_feat(cls_scores: Sequence[torch.Tensor], bbox_preds: Sequence[torch.Tensor], objectnesses: Sequence[torch.Tensor], batch_gt_instances: torch.Tensor, batch_img_metas: Sequence[dict], batch_gt_instances_ignore: Optional[List[mmengine.structures.instance_data.InstanceData]] = None) → dict[source]¶

Calculate the loss based on the features extracted by the detection head.

Parameters

cls_scores (Sequence[Tensor]) – Box scores for each scale level, each is a 4D-tensor, the channel number is num_priors * num_classes.
bbox_preds (Sequence[Tensor]) – Box energies / deltas for each scale level, each is a 4D-tensor, the channel number is num_priors * 4.
objectnesses (Sequence[Tensor]) – Score factor for all scale level, each is a 4D-tensor, has shape (batch_size, 1, H, W).
batch_gt_instances (list[InstanceData]) – Batch of gt_instance. It usually includes bboxes and labels attributes.
batch_img_metas (list[dict]) – Meta information of each image, e.g., image size, scaling factor, etc.
batch_gt_instances_ignore (list[InstanceData], optional) – Batch of gt_instances_ignore. It includes bboxes attribute data that is ignored during training and testing. Defaults to None.

Returns

A dictionary of losses.

Return type

dict[str, Tensor]

special_init()[source]¶

Since YOLO series algorithms will inherit from YOLOv5Head, but different algorithms have special initialization process.

The special_init function is designed to deal with this situation.

class mmyolo.models.dense_heads.YOLOXHeadModule(num_classes: int, in_channels: Union[int, Sequence], widen_factor: float = 1.0, num_base_priors: int = 1, feat_channels: int = 256, stacked_convs: int = 2, featmap_strides: Sequence[int] = [8, 16, 32], use_depthwise: bool = False, dcn_on_last_conv: bool = False, conv_bias: Union[bool, str] = 'auto', conv_cfg: Optional[Union[mmengine.config.config.ConfigDict, dict]] = None, norm_cfg: Union[mmengine.config.config.ConfigDict, dict] = {'eps': 0.001, 'momentum': 0.03, 'type': 'BN'}, act_cfg: Union[mmengine.config.config.ConfigDict, dict] = {'inplace': True, 'type': 'SiLU'}, init_cfg: Optional[Union[mmengine.config.config.ConfigDict, dict, List[Union[dict, mmengine.config.config.ConfigDict]]]] = None)[source]¶

YOLOXHead head module used in `YOLOX.

https://arxiv.org/abs/2107.08430

Parameters

num_classes (int) – Number of categories excluding the background category.
in_channels (Union[int, Sequence]) – Number of channels in the input feature map.
widen_factor (float) – Width multiplier, multiply number of channels in each layer by this amount. Defaults to 1.0.
num_base_priors (int) – The number of priors (points) at a point on the feature grid
stacked_convs (int) – Number of stacking convs of the head. Defaults to 2.
featmap_strides (Sequence[int]) – Downsample factor of each feature map. Defaults to [8, 16, 32].
use_depthwise (bool) – Whether to depthwise separable convolution in blocks. Defaults to False.
dcn_on_last_conv (bool) – If true, use dcn in the last layer of towers. Defaults to False.
conv_bias (bool or str) – If specified as auto, it will be decided by the norm_cfg. Bias of conv will be set as True if norm_cfg is None, otherwise False. Defaults to “auto”.
conv_cfg (ConfigDict or dict, optional) – Config dict for convolution layer. Defaults to None.
norm_cfg (ConfigDict or dict) – Config dict for normalization layer. Defaults to dict(type=’BN’, momentum=0.03, eps=0.001).
act_cfg (ConfigDict or dict) – Config dict for activation layer. Defaults to None.

:param init_cfg (ConfigDict or list[ConfigDict] or dict or: list[dict], optional): Initialization config dict.: Defaults to None.

forward(x: Tuple[torch.Tensor]) → Tuple[List][source]¶

Forward features from the upstream network.

Parameters: x (Tuple[Tensor]) – Features from the upstream network, each is a 4D-tensor.
Returns: A tuple of multi-level classification scores, bbox predictions, and objectnesses.
Return type: Tuple[List]

forward_single(x: torch.Tensor, cls_convs: torch.nn.modules.module.Module, reg_convs: torch.nn.modules.module.Module, conv_cls: torch.nn.modules.module.Module, conv_reg: torch.nn.modules.module.Module, conv_obj: torch.nn.modules.module.Module) → Tuple[torch.Tensor, torch.Tensor, torch.Tensor][source]¶: Forward feature of a single scale level.

init_weights()[source]¶: Initialize weights of the head.

class mmyolo.models.dense_heads.YOLOXPoseHead(loss_pose: Optional[Union[mmengine.config.config.ConfigDict, dict]] = None, *args, **kwargs)[source]¶

YOLOXPoseHead head used in `YOLO-Pose.

<https://arxiv.org/abs/2204.06806>`_. :param loss_pose: Config of keypoint OKS loss. :type loss_pose: ConfigDict, optional

decode_pose(grids: torch.Tensor, offsets: torch.Tensor, strides: Union[torch.Tensor, int]) → torch.Tensor[source]¶

Decode regression offsets to keypoints.

Parameters

grids (torch.Tensor) – The coordinates of the feature map grids.
offsets (torch.Tensor) – The predicted offset of each keypoint relative to its corresponding grid.
strides (torch.Tensor | int) – The stride of the feature map for each instance.

Returns

The decoded keypoints coordinates.

Return type

torch.Tensor

static gt_instances_preprocess(batch_gt_instances: List[mmengine.structures.instance_data.InstanceData], *args, **kwargs) → List[mmengine.structures.instance_data.InstanceData][source]¶

Split batch_gt_instances with batch size.

Parameters

batch_gt_instances (Tensor) – Ground truth a 2D-Tensor for whole batch, shape [all_gt_bboxes, 6]
batch_size (int) – Batch size.

Returns

batch gt instances data, shape [batch_size, InstanceData]

Return type

List

static gt_kps_instances_preprocess(batch_gt_instances: torch.Tensor, batch_gt_keypoints, batch_gt_keypoints_visible, batch_size: int) → List[mmengine.structures.instance_data.InstanceData][source]¶

Split batch_gt_instances with batch size.

Parameters

batch_gt_instances (Tensor) – Ground truth a 2D-Tensor for whole batch, shape [all_gt_bboxes, 6]
batch_size (int) – Batch size.

Returns

batch gt instances data, shape [batch_size, InstanceData]

Return type

List

loss(x: Tuple[torch.Tensor], batch_data_samples: Union[list, dict]) → dict[source]¶

Perform forward propagation and loss calculation of the detection head on the features of the upstream network.

Parameters

x (tuple[Tensor]) – Features from the upstream network, each is a 4D-tensor.
batch_data_samples (List[DetDataSample], dict) – The Data Samples. It usually includes information such as gt_instance, gt_panoptic_seg and gt_sem_seg.

Returns

A dictionary of loss components.

Return type

dict

loss_by_feat(cls_scores: Sequence[torch.Tensor], bbox_preds: Sequence[torch.Tensor], objectnesses: Sequence[torch.Tensor], kpt_preds: Sequence[torch.Tensor], vis_preds: Sequence[torch.Tensor], batch_gt_instances: torch.Tensor, batch_gt_keypoints: torch.Tensor, batch_gt_keypoints_visible: torch.Tensor, batch_img_metas: Sequence[dict], batch_gt_instances_ignore: Optional[List[mmengine.structures.instance_data.InstanceData]] = None) → dict[source]¶

Calculate the loss based on the features extracted by the detection head.

In addition to the base class method, keypoint losses are also calculated in this method.

predict_by_feat(cls_scores: List[torch.Tensor], bbox_preds: List[torch.Tensor], objectnesses: Optional[List[torch.Tensor]] = None, kpt_preds: Optional[List[torch.Tensor]] = None, vis_preds: Optional[List[torch.Tensor]] = None, batch_img_metas: Optional[List[dict]] = None, cfg: Optional[mmengine.config.config.ConfigDict] = None, rescale: bool = True, with_nms: bool = True) → List[mmengine.structures.instance_data.InstanceData][source]¶

Transform a batch of output features extracted by the head into bbox and keypoint results.

In addition to the base class method, keypoint predictions are also calculated in this method.

class mmyolo.models.dense_heads.YOLOXPoseHeadModule(num_keypoints: int, *args, **kwargs)[source]¶

YOLOXPoseHeadModule serves as a head module for YOLOX-Pose.

In comparison to YOLOXHeadModule, this module introduces branches for keypoint prediction.

forward(x: Tuple[torch.Tensor]) → Tuple[List][source]¶: Forward features from the upstream network.

init_weights()[source]¶: Initialize weights of the head.

class mmyolo.models.dense_heads.YOLOv5Head(head_module: Union[mmengine.config.config.ConfigDict, dict], prior_generator: Union[mmengine.config.config.ConfigDict, dict] = {'base_sizes': [[(10, 13), (16, 30), (33, 23)], [(30, 61), (62, 45), (59, 119)], [(116, 90), (156, 198), (373, 326)]], 'strides': [8, 16, 32], 'type': 'mmdet.YOLOAnchorGenerator'}, bbox_coder: Union[mmengine.config.config.ConfigDict, dict] = {'type': 'YOLOv5BBoxCoder'}, loss_cls: Union[mmengine.config.config.ConfigDict, dict] = {'loss_weight': 0.5, 'reduction': 'mean', 'type': 'mmdet.CrossEntropyLoss', 'use_sigmoid': True}, loss_bbox: Union[mmengine.config.config.ConfigDict, dict] = {'bbox_format': 'xywh', 'eps': 1e-07, 'iou_mode': 'ciou', 'loss_weight': 0.05, 'reduction': 'mean', 'return_iou': True, 'type': 'IoULoss'}, loss_obj: Union[mmengine.config.config.ConfigDict, dict] = {'loss_weight': 1.0, 'reduction': 'mean', 'type': 'mmdet.CrossEntropyLoss', 'use_sigmoid': True}, prior_match_thr: float = 4.0, near_neighbor_thr: float = 0.5, ignore_iof_thr: float = - 1.0, obj_level_weights: List[float] = [4.0, 1.0, 0.4], train_cfg: Optional[Union[mmengine.config.config.ConfigDict, dict]] = None, test_cfg: Optional[Union[mmengine.config.config.ConfigDict, dict]] = None, init_cfg: Optional[Union[mmengine.config.config.ConfigDict, dict, List[Union[dict, mmengine.config.config.ConfigDict]]]] = None)[source]¶

YOLOv5Head head used in YOLOv5.

Parameters

head_module (ConfigType) – Base module used for YOLOv5Head
prior_generator (dict) – Points generator feature maps in 2D points-based detectors.
bbox_coder (ConfigDict or dict) – Config of bbox coder.
loss_cls (ConfigDict or dict) – Config of classification loss.
loss_bbox (ConfigDict or dict) – Config of localization loss.
loss_obj (ConfigDict or dict) – Config of objectness loss.
prior_match_thr (float) – Defaults to 4.0.
ignore_iof_thr (float) – Defaults to -1.0.
obj_level_weights (List[float]) – Defaults to [4.0, 1.0, 0.4].
train_cfg (ConfigDict or dict, optional) – Training config of anchor head. Defaults to None.
test_cfg (ConfigDict or dict, optional) – Testing config of anchor head. Defaults to None.

:param init_cfg (ConfigDict or list[ConfigDict] or dict or: list[dict], optional): Initialization config dict.: Defaults to None.

forward(x: Tuple[torch.Tensor]) → Tuple[List][source]¶

Forward features from the upstream network.

Parameters: x (Tuple[Tensor]) – Features from the upstream network, each is a 4D-tensor.
Returns: A tuple of multi-level classification scores, bbox predictions, and objectnesses.
Return type: Tuple[List]

loss(x: Tuple[torch.Tensor], batch_data_samples: Union[list, dict]) → dict[source]¶

Perform forward propagation and loss calculation of the detection head on the features of the upstream network.

Parameters

x (tuple[Tensor]) – Features from the upstream network, each is a 4D-tensor.
batch_data_samples (List[DetDataSample], dict) – The Data Samples. It usually includes information such as gt_instance, gt_panoptic_seg and gt_sem_seg.

Returns

A dictionary of loss components.

Return type

dict

loss_by_feat(cls_scores: Sequence[torch.Tensor], bbox_preds: Sequence[torch.Tensor], objectnesses: Sequence[torch.Tensor], batch_gt_instances: Sequence[mmengine.structures.instance_data.InstanceData], batch_img_metas: Sequence[dict], batch_gt_instances_ignore: Optional[List[mmengine.structures.instance_data.InstanceData]] = None) → dict[source]¶

Calculate the loss based on the features extracted by the detection head.

Parameters

cls_scores (Sequence[Tensor]) – Box scores for each scale level, each is a 4D-tensor, the channel number is num_priors * num_classes.
bbox_preds (Sequence[Tensor]) – Box energies / deltas for each scale level, each is a 4D-tensor, the channel number is num_priors * 4.
objectnesses (Sequence[Tensor]) – Score factor for all scale level, each is a 4D-tensor, has shape (batch_size, 1, H, W).
batch_gt_instances (Sequence[InstanceData]) – Batch of gt_instance. It usually includes bboxes and labels attributes.
batch_img_metas (Sequence[dict]) – Meta information of each image, e.g., image size, scaling factor, etc.
batch_gt_instances_ignore (list[InstanceData], optional) – Batch of gt_instances_ignore. It includes bboxes attribute data that is ignored during training and testing. Defaults to None.

Returns

A dictionary of losses.

Return type

dict[str, Tensor]

predict_by_feat(cls_scores: List[torch.Tensor], bbox_preds: List[torch.Tensor], objectnesses: Optional[List[torch.Tensor]] = None, batch_img_metas: Optional[List[dict]] = None, cfg: Optional[mmengine.config.config.ConfigDict] = None, rescale: bool = True, with_nms: bool = True) → List[mmengine.structures.instance_data.InstanceData][source]¶

Transform a batch of output features extracted by the head into bbox results. :param cls_scores: Classification scores for all

scale levels, each is a 4D-tensor, has shape (batch_size, num_priors * num_classes, H, W).

Parameters

bbox_preds (list[Tensor]) – Box energies / deltas for all scale levels, each is a 4D-tensor, has shape (batch_size, num_priors * 4, H, W).
objectnesses (list[Tensor], Optional) – Score factor for all scale level, each is a 4D-tensor, has shape (batch_size, 1, H, W).
batch_img_metas (list[dict], Optional) – Batch image meta info. Defaults to None.
cfg (ConfigDict, optional) – Test / postprocessing configuration, if None, test_cfg would be used. Defaults to None.
rescale (bool) – If True, return boxes in original image space. Defaults to False.
with_nms (bool) – If True, do nms before return boxes. Defaults to True.

Returns

Object detection results of each image after the post process. Each item usually contains following keys.

scores (Tensor): Classification scores, has a shape (num_instance, )
labels (Tensor): Labels of bboxes, has a shape (num_instances, ).
bboxes (Tensor): Has a shape (num_instances, 4), the last dimension 4 arrange as (x1, y1, x2, y2).

Return type

list[InstanceData]

special_init()[source]¶

Since YOLO series algorithms will inherit from YOLOv5Head, but different algorithms have special initialization process.

The special_init function is designed to deal with this situation.

class mmyolo.models.dense_heads.YOLOv5HeadModule(num_classes: int, in_channels: Union[int, Sequence], widen_factor: float = 1.0, num_base_priors: int = 3, featmap_strides: Sequence[int] = (8, 16, 32), init_cfg: Optional[Union[mmengine.config.config.ConfigDict, dict, List[Union[dict, mmengine.config.config.ConfigDict]]]] = None)[source]¶

YOLOv5Head head module used in YOLOv5.

Parameters

num_classes (int) – Number of categories excluding the background category.
in_channels (Union[int, Sequence]) – Number of channels in the input feature map.
widen_factor (float) – Width multiplier, multiply number of channels in each layer by this amount. Defaults to 1.0.
num_base_priors (int) – The number of priors (points) at a point on the feature grid.
featmap_strides (Sequence[int]) – Downsample factor of each feature map. Defaults to (8, 16, 32).

:param init_cfg (ConfigDict or list[ConfigDict] or dict or: list[dict], optional): Initialization config dict.: Defaults to None.

forward(x: Tuple[torch.Tensor]) → Tuple[List][source]¶

Forward features from the upstream network.

Parameters: x (Tuple[Tensor]) – Features from the upstream network, each is a 4D-tensor.
Returns: A tuple of multi-level classification scores, bbox predictions, and objectnesses.
Return type: Tuple[List]

forward_single(x: torch.Tensor, convs: torch.nn.modules.module.Module) → Tuple[torch.Tensor, torch.Tensor, torch.Tensor][source]¶: Forward feature of a single scale level.

init_weights()[source]¶: Initialize the bias of YOLOv5 head.

class mmyolo.models.dense_heads.YOLOv5InsHead(*args, mask_overlap: bool = True, loss_mask: Union[mmengine.config.config.ConfigDict, dict] = {'reduction': 'none', 'type': 'mmdet.CrossEntropyLoss', 'use_sigmoid': True}, loss_mask_weight=0.05, **kwargs)[source]¶

YOLOv5 Instance Segmentation and Detection head.

Parameters

mask_overlap (bool) – Defaults to True.
loss_mask (ConfigDict or dict) – Config of mask loss.
loss_mask_weight (float) – The weight of mask loss.

crop_mask(masks: torch.Tensor, boxes: torch.Tensor) → torch.Tensor[source]¶

Crop mask by the bounding box.

Parameters

masks (Tensor) – Predicted mask results. Has shape (1, num_instance, H, W).
boxes (Tensor) – Tensor of the bbox. Has shape (num_instance, 4).

Returns

The masks are being cropped to the bounding box.

Return type

(torch.Tensor)

loss(x: Tuple[torch.Tensor], batch_data_samples: Union[list, dict]) → dict[source]¶

Perform forward propagation and loss calculation of the detection head on the features of the upstream network.

Parameters

x (tuple[Tensor]) – Features from the upstream network, each is a 4D-tensor.
batch_data_samples (List[DetDataSample], dict) – The Data Samples. It usually includes information such as gt_instance, gt_panoptic_seg and gt_sem_seg.

Returns

A dictionary of loss components.

Return type

dict

loss_by_feat(cls_scores: Sequence[torch.Tensor], bbox_preds: Sequence[torch.Tensor], objectnesses: Sequence[torch.Tensor], coeff_preds: Sequence[torch.Tensor], proto_preds: torch.Tensor, batch_gt_instances: Sequence[mmengine.structures.instance_data.InstanceData], batch_gt_masks: Sequence[torch.Tensor], batch_img_metas: Sequence[dict], batch_gt_instances_ignore: Optional[List[mmengine.structures.instance_data.InstanceData]] = None) → dict[source]¶

Calculate the loss based on the features extracted by the detection head.

Parameters

cls_scores (Sequence[Tensor]) – Box scores for each scale level, each is a 4D-tensor, the channel number is num_priors * num_classes.
bbox_preds (Sequence[Tensor]) – Box energies / deltas for each scale level, each is a 4D-tensor, the channel number is num_priors * 4.
objectnesses (Sequence[Tensor]) – Score factor for all scale level, each is a 4D-tensor, has shape (batch_size, 1, H, W).
coeff_preds (Sequence[Tensor]) – Mask coefficient for each scale level, each is a 4D-tensor, the channel number is num_priors * mask_channels.
proto_preds (Tensor) – Mask prototype features extracted from the mask head, has shape (batch_size, mask_channels, H, W).
batch_gt_instances (Sequence[InstanceData]) – Batch of gt_instance. It usually includes bboxes and labels attributes.
batch_gt_masks (Sequence[Tensor]) – Batch of gt_mask.
batch_img_metas (Sequence[dict]) – Meta information of each image, e.g., image size, scaling factor, etc.
batch_gt_instances_ignore (list[InstanceData], optional) – Batch of gt_instances_ignore. It includes bboxes attribute data that is ignored during training and testing. Defaults to None.

Returns

A dictionary of losses.

Return type

dict[str, Tensor]

predict_by_feat(cls_scores: List[torch.Tensor], bbox_preds: List[torch.Tensor], objectnesses: Optional[List[torch.Tensor]] = None, coeff_preds: Optional[List[torch.Tensor]] = None, proto_preds: Optional[torch.Tensor] = None, batch_img_metas: Optional[List[dict]] = None, cfg: Optional[mmengine.config.config.ConfigDict] = None, rescale: bool = True, with_nms: bool = True) → List[mmengine.structures.instance_data.InstanceData][source]¶

Transform a batch of output features extracted from the head into bbox results. Note: When score_factors is not None, the cls_scores are usually multiplied by it then obtain the real score used in NMS. :param cls_scores: Classification scores for all

scale levels, each is a 4D-tensor, has shape (batch_size, num_priors * num_classes, H, W).

Parameters

bbox_preds (list[Tensor]) – Box energies / deltas for all scale levels, each is a 4D-tensor, has shape (batch_size, num_priors * 4, H, W).
objectnesses (list[Tensor], Optional) – Score factor for all scale level, each is a 4D-tensor, has shape (batch_size, 1, H, W).
coeff_preds (list[Tensor]) – Mask coefficients predictions for all scale levels, each is a 4D-tensor, has shape (batch_size, mask_channels, H, W).
proto_preds (Tensor) – Mask prototype features extracted from the mask head, has shape (batch_size, mask_channels, H, W).
batch_img_metas (list[dict], Optional) – Batch image meta info. Defaults to None.
cfg (ConfigDict, optional) – Test / postprocessing configuration, if None, test_cfg would be used. Defaults to None.
rescale (bool) – If True, return boxes in original image space. Defaults to False.
with_nms (bool) – If True, do nms before return boxes. Defaults to True.

Returns

Object detection and instance segmentation results of each image after the post process. Each item usually contains following keys.

scores (Tensor): Classification scores, has a shape (num_instance, )

labels (Tensor): Labels of bboxes, has a shape (num_instances, ).

bboxes (Tensor): Has a shape (num_instances, 4), the last dimension 4 arrange as (x1, y1, x2, y2).

masks (Tensor): Has a shape (num_instances, h, w).

Return type

list[InstanceData]

process_mask(mask_proto: torch.Tensor, mask_coeff_pred: torch.Tensor, bboxes: torch.Tensor, shape: Tuple[int, int], upsample: bool = False) → torch.Tensor[source]¶

Generate mask logits results.

Parameters

mask_proto (Tensor) – Mask prototype features. Has shape (num_instance, mask_channels).
mask_coeff_pred (Tensor) – Mask coefficients prediction for single image. Has shape (mask_channels, H, W)
bboxes (Tensor) – Tensor of the bbox. Has shape (num_instance, 4).
shape (Tuple) – Batch input shape of image.
upsample (bool) – Whether upsample masks results to batch input shape. Default to False.

Returns

Instance segmentation masks for each instance.: Has shape (num_instance, H, W).

Return type

Tensor

class mmyolo.models.dense_heads.YOLOv5InsHeadModule(*args, num_classes: int, mask_channels: int = 32, proto_channels: int = 256, widen_factor: float = 1.0, norm_cfg: Union[mmengine.config.config.ConfigDict, dict] = {'eps': 0.001, 'momentum': 0.03, 'type': 'BN'}, act_cfg: Union[mmengine.config.config.ConfigDict, dict] = {'inplace': True, 'type': 'SiLU'}, **kwargs)[source]¶

Detection and Instance Segmentation Head of YOLOv5.

Parameters

num_classes (int) – Number of categories excluding the background category.
mask_channels (int) – Number of channels in the mask feature map. This is the channel count of the mask.
proto_channels (int) – Number of channels in the proto feature map.
widen_factor (float) – Width multiplier, multiply number of channels in each layer by this amount. Defaults to 1.0.
norm_cfg (ConfigDict or dict) – Config dict for normalization layer. Defaults to dict(type='BN', momentum=0.03, eps=0.001).
act_cfg (ConfigDict or dict) – Config dict for activation layer. Default: dict(type=’SiLU’, inplace=True).

forward(x: Tuple[torch.Tensor]) → Tuple[List][source]¶

Forward features from the upstream network.

Parameters: x (Tuple[Tensor]) – Features from the upstream network, each is a 4D-tensor.
Returns: A tuple of multi-level classification scores, bbox predictions, objectnesses, and mask predictions.
Return type: Tuple[List]

forward_single(x: torch.Tensor, convs_pred: torch.nn.modules.module.Module) → Tuple[torch.Tensor, torch.Tensor, torch.Tensor, torch.Tensor][source]¶: Forward feature of a single scale level.

class mmyolo.models.dense_heads.YOLOv6Head(head_module: Union[mmengine.config.config.ConfigDict, dict], prior_generator: Union[mmengine.config.config.ConfigDict, dict] = {'offset': 0.5, 'strides': [8, 16, 32], 'type': 'mmdet.MlvlPointGenerator'}, bbox_coder: Union[mmengine.config.config.ConfigDict, dict] = {'type': 'DistancePointBBoxCoder'}, loss_cls: Union[mmengine.config.config.ConfigDict, dict] = {'alpha': 0.75, 'gamma': 2.0, 'iou_weighted': True, 'loss_weight': 1.0, 'reduction': 'sum', 'type': 'mmdet.VarifocalLoss', 'use_sigmoid': True}, loss_bbox: Union[mmengine.config.config.ConfigDict, dict] = {'bbox_format': 'xyxy', 'iou_mode': 'giou', 'loss_weight': 2.5, 'reduction': 'mean', 'return_iou': False, 'type': 'IoULoss'}, train_cfg: Optional[Union[mmengine.config.config.ConfigDict, dict]] = None, test_cfg: Optional[Union[mmengine.config.config.ConfigDict, dict]] = None, init_cfg: Optional[Union[mmengine.config.config.ConfigDict, dict, List[Union[dict, mmengine.config.config.ConfigDict]]]] = None)[source]¶

YOLOv6Head head used in YOLOv6.

Parameters

head_module (ConfigType) – Base module used for YOLOv6Head
prior_generator (dict) – Points generator feature maps in 2D points-based detectors.
loss_cls (ConfigDict or dict) – Config of classification loss.
loss_bbox (ConfigDict or dict) – Config of localization loss.
train_cfg (ConfigDict or dict, optional) – Training config of anchor head. Defaults to None.
test_cfg (ConfigDict or dict, optional) – Testing config of anchor head. Defaults to None.

:param init_cfg (ConfigDict or list[ConfigDict] or dict or: list[dict], optional): Initialization config dict.: Defaults to None.

loss_by_feat(cls_scores: Sequence[torch.Tensor], bbox_preds: Sequence[torch.Tensor], bbox_dist_preds: Sequence[torch.Tensor], batch_gt_instances: Sequence[mmengine.structures.instance_data.InstanceData], batch_img_metas: Sequence[dict], batch_gt_instances_ignore: Optional[List[mmengine.structures.instance_data.InstanceData]] = None) → dict[source]¶

Calculate the loss based on the features extracted by the detection head.

Parameters

cls_scores (Sequence[Tensor]) – Box scores for each scale level, each is a 4D-tensor, the channel number is num_priors * num_classes.
bbox_preds (Sequence[Tensor]) – Box energies / deltas for each scale level, each is a 4D-tensor, the channel number is num_priors * 4.
batch_gt_instances (list[InstanceData]) – Batch of gt_instance. It usually includes bboxes and labels attributes.
batch_img_metas (list[dict]) – Meta information of each image, e.g., image size, scaling factor, etc.
batch_gt_instances_ignore (list[InstanceData], optional) – Batch of gt_instances_ignore. It includes bboxes attribute data that is ignored during training and testing. Defaults to None.

Returns

A dictionary of losses.

Return type

dict[str, Tensor]

special_init()[source]¶

Since YOLO series algorithms will inherit from YOLOv5Head, but different algorithms have special initialization process.

The special_init function is designed to deal with this situation.

class mmyolo.models.dense_heads.YOLOv6HeadModule(num_classes: int, in_channels: Union[int, Sequence], widen_factor: float = 1.0, num_base_priors: int = 1, reg_max=0, featmap_strides: Sequence[int] = (8, 16, 32), norm_cfg: Union[mmengine.config.config.ConfigDict, dict] = {'eps': 0.001, 'momentum': 0.03, 'type': 'BN'}, act_cfg: Union[mmengine.config.config.ConfigDict, dict] = {'inplace': True, 'type': 'SiLU'}, init_cfg: Optional[Union[mmengine.config.config.ConfigDict, dict, List[Union[dict, mmengine.config.config.ConfigDict]]]] = None)[source]¶

YOLOv6Head head module used in `YOLOv6.

<https://arxiv.org/pdf/2209.02976>`_.

Parameters

num_classes (int) – Number of categories excluding the background category.
in_channels (Union[int, Sequence]) – Number of channels in the input feature map.
widen_factor (float) – Width multiplier, multiply number of channels in each layer by this amount. Defaults to 1.0.
num_base_priors – (int): The number of priors (points) at a point on the feature grid.
featmap_strides (Sequence[int]) –

Downsample factor of each feature map.
Defaults to [8, 16, 32].

None, otherwise False. Defaults to “auto”.
norm_cfg (ConfigDict or dict) – Config dict for normalization layer. Defaults to dict(type=’BN’, momentum=0.03, eps=0.001).
act_cfg (ConfigDict or dict) – Config dict for activation layer. Defaults to None.

:param init_cfg (ConfigDict or list[ConfigDict] or dict or: list[dict], optional): Initialization config dict.: Defaults to None.

forward(x: Tuple[torch.Tensor]) → Tuple[List][source]¶

Forward features from the upstream network.

Parameters: x (Tuple[Tensor]) – Features from the upstream network, each is a 4D-tensor.
Returns: A tuple of multi-level classification scores, bbox predictions.
Return type: Tuple[List]

forward_single(x: torch.Tensor, stem: torch.nn.modules.module.Module, cls_conv: torch.nn.modules.module.Module, cls_pred: torch.nn.modules.module.Module, reg_conv: torch.nn.modules.module.Module, reg_pred: torch.nn.modules.module.Module) → Tuple[torch.Tensor, torch.Tensor][source]¶: Forward feature of a single scale level.

init_weights()[source]¶: Initialize the weights.

class mmyolo.models.dense_heads.YOLOv7Head(*args, simota_candidate_topk: int = 20, simota_iou_weight: float = 3.0, simota_cls_weight: float = 1.0, aux_loss_weights: float = 0.25, **kwargs)[source]¶

YOLOv7Head head used in YOLOv7.

Parameters

simota_candidate_topk (int) – The candidate top-k which used to get top-k ious to calculate dynamic-k in BatchYOLOv7Assigner. Defaults to 10.
simota_iou_weight (float) – The scale factor for regression iou cost in BatchYOLOv7Assigner. Defaults to 3.0.
simota_cls_weight (float) – The scale factor for classification cost in BatchYOLOv7Assigner. Defaults to 1.0.

loss_by_feat(cls_scores: Sequence[Union[torch.Tensor, List]], bbox_preds: Sequence[Union[torch.Tensor, List]], objectnesses: Sequence[Union[torch.Tensor, List]], batch_gt_instances: Sequence[mmengine.structures.instance_data.InstanceData], batch_img_metas: Sequence[dict], batch_gt_instances_ignore: Optional[List[mmengine.structures.instance_data.InstanceData]] = None) → dict[source]¶

Calculate the loss based on the features extracted by the detection head.

Parameters

cls_scores (Sequence[Tensor]) – Box scores for each scale level, each is a 4D-tensor, the channel number is num_priors * num_classes.
bbox_preds (Sequence[Tensor]) – Box energies / deltas for each scale level, each is a 4D-tensor, the channel number is num_priors * 4.
objectnesses (Sequence[Tensor]) – Score factor for all scale level, each is a 4D-tensor, has shape (batch_size, 1, H, W).
batch_gt_instances (list[InstanceData]) – Batch of gt_instance. It usually includes bboxes and labels attributes.
batch_img_metas (list[dict]) – Meta information of each image, e.g., image size, scaling factor, etc.
batch_gt_instances_ignore (list[InstanceData], optional) – Batch of gt_instances_ignore. It includes bboxes attribute data that is ignored during training and testing. Defaults to None.

Returns

A dictionary of losses.

Return type

dict[str, Tensor]

class mmyolo.models.dense_heads.YOLOv7HeadModule(num_classes: int, in_channels: Union[int, Sequence], widen_factor: float = 1.0, num_base_priors: int = 3, featmap_strides: Sequence[int] = (8, 16, 32), init_cfg: Optional[Union[mmengine.config.config.ConfigDict, dict, List[Union[dict, mmengine.config.config.ConfigDict]]]] = None)[source]¶

YOLOv7Head head module used in YOLOv7.

init_weights()[source]¶: Initialize the bias of YOLOv7 head.

class mmyolo.models.dense_heads.YOLOv7p6HeadModule(*args, main_out_channels: Sequence[int] = [256, 512, 768, 1024], aux_out_channels: Sequence[int] = [320, 640, 960, 1280], use_aux: bool = True, norm_cfg: Union[mmengine.config.config.ConfigDict, dict] = {'eps': 0.001, 'momentum': 0.03, 'type': 'BN'}, act_cfg: Union[mmengine.config.config.ConfigDict, dict] = {'inplace': True, 'type': 'SiLU'}, **kwargs)[source]¶

YOLOv7Head head module used in YOLOv7.

forward(x: Tuple[torch.Tensor]) → Tuple[List][source]¶

Forward features from the upstream network.

Parameters: x (Tuple[Tensor]) – Features from the upstream network, each is a 4D-tensor.
Returns: A tuple of multi-level classification scores, bbox predictions, and objectnesses.
Return type: Tuple[List]

forward_single(x: torch.Tensor, convs: torch.nn.modules.module.Module, aux_convs: Optional[torch.nn.modules.module.Module]) → Tuple[Union[torch.Tensor, List], Union[torch.Tensor, List], Union[torch.Tensor, List]][source]¶: Forward feature of a single scale level.

init_weights()[source]¶: Initialize the bias of YOLOv5 head.

class mmyolo.models.dense_heads.YOLOv8Head(head_module: Union[mmengine.config.config.ConfigDict, dict], prior_generator: Union[mmengine.config.config.ConfigDict, dict] = {'offset': 0.5, 'strides': [8, 16, 32], 'type': 'mmdet.MlvlPointGenerator'}, bbox_coder: Union[mmengine.config.config.ConfigDict, dict] = {'type': 'DistancePointBBoxCoder'}, loss_cls: Union[mmengine.config.config.ConfigDict, dict] = {'loss_weight': 0.5, 'reduction': 'none', 'type': 'mmdet.CrossEntropyLoss', 'use_sigmoid': True}, loss_bbox: Union[mmengine.config.config.ConfigDict, dict] = {'bbox_format': 'xyxy', 'iou_mode': 'ciou', 'loss_weight': 7.5, 'reduction': 'sum', 'return_iou': False, 'type': 'IoULoss'}, loss_dfl={'loss_weight': 0.375, 'reduction': 'mean', 'type': 'mmdet.DistributionFocalLoss'}, train_cfg: Optional[Union[mmengine.config.config.ConfigDict, dict]] = None, test_cfg: Optional[Union[mmengine.config.config.ConfigDict, dict]] = None, init_cfg: Optional[Union[mmengine.config.config.ConfigDict, dict, List[Union[dict, mmengine.config.config.ConfigDict]]]] = None)[source]¶

YOLOv8Head head used in YOLOv8.

Parameters

head_module (ConfigDict or dict) – Base module used for YOLOv8Head
prior_generator (dict) – Points generator feature maps in 2D points-based detectors.
bbox_coder (ConfigDict or dict) – Config of bbox coder.
loss_cls (ConfigDict or dict) – Config of classification loss.
loss_bbox (ConfigDict or dict) – Config of localization loss.
loss_dfl (ConfigDict or dict) – Config of Distribution Focal Loss.
train_cfg (ConfigDict or dict, optional) – Training config of anchor head. Defaults to None.
test_cfg (ConfigDict or dict, optional) – Testing config of anchor head. Defaults to None.

:param init_cfg (ConfigDict or list[ConfigDict] or dict or: list[dict], optional): Initialization config dict.: Defaults to None.

loss_by_feat(cls_scores: Sequence[torch.Tensor], bbox_preds: Sequence[torch.Tensor], bbox_dist_preds: Sequence[torch.Tensor], batch_gt_instances: Sequence[mmengine.structures.instance_data.InstanceData], batch_img_metas: Sequence[dict], batch_gt_instances_ignore: Optional[List[mmengine.structures.instance_data.InstanceData]] = None) → dict[source]¶

Calculate the loss based on the features extracted by the detection head.

Parameters

cls_scores (Sequence[Tensor]) – Box scores for each scale level, each is a 4D-tensor, the channel number is num_priors * num_classes.
bbox_preds (Sequence[Tensor]) – Box energies / deltas for each scale level, each is a 4D-tensor, the channel number is num_priors * 4.
bbox_dist_preds (Sequence[Tensor]) – Box distribution logits for each scale level with shape (bs, reg_max + 1, H*W, 4).
batch_gt_instances (list[InstanceData]) – Batch of gt_instance. It usually includes bboxes and labels attributes.
batch_img_metas (list[dict]) – Meta information of each image, e.g., image size, scaling factor, etc.
batch_gt_instances_ignore (list[InstanceData], optional) – Batch of gt_instances_ignore. It includes bboxes attribute data that is ignored during training and testing. Defaults to None.

Returns

A dictionary of losses.

Return type

dict[str, Tensor]

special_init()[source]¶

Since YOLO series algorithms will inherit from YOLOv5Head, but different algorithms have special initialization process.

The special_init function is designed to deal with this situation.

class mmyolo.models.dense_heads.YOLOv8HeadModule(num_classes: int, in_channels: Union[int, Sequence], widen_factor: float = 1.0, num_base_priors: int = 1, featmap_strides: Sequence[int] = (8, 16, 32), reg_max: int = 16, norm_cfg: Union[mmengine.config.config.ConfigDict, dict] = {'eps': 0.001, 'momentum': 0.03, 'type': 'BN'}, act_cfg: Union[mmengine.config.config.ConfigDict, dict] = {'inplace': True, 'type': 'SiLU'}, init_cfg: Optional[Union[mmengine.config.config.ConfigDict, dict, List[Union[dict, mmengine.config.config.ConfigDict]]]] = None)[source]¶

YOLOv8HeadModule head module used in YOLOv8.

Parameters

num_classes (int) – Number of categories excluding the background category.
in_channels (Union[int, Sequence]) – Number of channels in the input feature map.
widen_factor (float) – Width multiplier, multiply number of channels in each layer by this amount. Defaults to 1.0.
num_base_priors (int) – The number of priors (points) at a point on the feature grid.
featmap_strides (Sequence[int]) – Downsample factor of each feature map. Defaults to [8, 16, 32].
reg_max (int) – Max value of integral set :math: {0, ..., reg_max-1} in QFL setting. Defaults to 16.
norm_cfg (ConfigDict or dict) – Config dict for normalization layer. Defaults to dict(type=’BN’, momentum=0.03, eps=0.001).
act_cfg (ConfigDict or dict) – Config dict for activation layer. Defaults to None.

:param init_cfg (ConfigDict or list[ConfigDict] or dict or: list[dict], optional): Initialization config dict.: Defaults to None.

forward(x: Tuple[torch.Tensor]) → Tuple[List][source]¶

Forward features from the upstream network.

Parameters: x (Tuple[Tensor]) – Features from the upstream network, each is a 4D-tensor.
Returns: A tuple of multi-level classification scores, bbox predictions
Return type: Tuple[List]

forward_single(x: torch.Tensor, cls_pred: torch.nn.modules.container.ModuleList, reg_pred: torch.nn.modules.container.ModuleList) → Tuple[source]¶: Forward feature of a single scale level.

init_weights(prior_prob=0.01)[source]¶: Initialize the weight and bias of PPYOLOE head.

detectors¶

class mmyolo.models.detectors.YOLODetector(backbone: Union[mmengine.config.config.ConfigDict, dict], neck: Union[mmengine.config.config.ConfigDict, dict], bbox_head: Union[mmengine.config.config.ConfigDict, dict], train_cfg: Optional[Union[mmengine.config.config.ConfigDict, dict]] = None, test_cfg: Optional[Union[mmengine.config.config.ConfigDict, dict]] = None, data_preprocessor: Optional[Union[mmengine.config.config.ConfigDict, dict]] = None, init_cfg: Optional[Union[mmengine.config.config.ConfigDict, dict, List[Union[dict, mmengine.config.config.ConfigDict]]]] = None, use_syncbn: bool = True)[source]¶

Implementation of YOLO Series

Parameters

backbone (ConfigDict or dict) – The backbone config.
neck (ConfigDict or dict) – The neck config.
bbox_head (ConfigDict or dict) – The bbox head config.
train_cfg (ConfigDict or dict, optional) – The training config of YOLO. Defaults to None.
test_cfg (ConfigDict or dict, optional) – The testing config of YOLO. Defaults to None.
data_preprocessor (ConfigDict or dict, optional) – Config of DetDataPreprocessor to process the input data. Defaults to None.

:param init_cfg (ConfigDict or list[ConfigDict] or dict or: list[dict], optional): Initialization config dict.: Defaults to None.

Parameters: use_syncbn (bool) – whether to use SyncBatchNorm. Defaults to True.

layers¶

class mmyolo.models.layers.BepC3StageBlock(in_channels: int, out_channels: int, num_blocks: int = 1, hidden_ratio: float = 0.5, concat_all_layer: bool = True, block_cfg: Union[mmengine.config.config.ConfigDict, dict] = {'type': 'RepVGGBlock'}, norm_cfg: Union[mmengine.config.config.ConfigDict, dict] = {'eps': 0.001, 'momentum': 0.03, 'type': 'BN'}, act_cfg: Union[mmengine.config.config.ConfigDict, dict] = {'inplace': True, 'type': 'ReLU'})[source]¶

Beer-mug RepC3 Block.

Parameters

in_channels (int) – Number of channels in the input image
out_channels (int) – Number of channels produced by the convolution
num_blocks (int) – Number of blocks. Defaults to 1
hidden_ratio (float) – Hidden channel expansion. Default: 0.5
concat_all_layer (bool) – Concat all layer when forward calculate. Default: True
block_cfg (dict) – Config dict for the block used to build each layer. Defaults to dict(type=’RepVGGBlock’).
norm_cfg (ConfigType) – Config dict for normalization layer. Defaults to dict(type=’BN’, momentum=0.03, eps=0.001).
act_cfg (ConfigType) – Config dict for activation layer. Defaults to dict(type=’ReLU’, inplace=True).

forward(x)[source]¶

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

class mmyolo.models.layers.BiFusion(in_channels0: int, in_channels1: int, out_channels: int, norm_cfg: Union[mmengine.config.config.ConfigDict, dict] = {'eps': 0.001, 'momentum': 0.03, 'type': 'BN'}, act_cfg: Union[mmengine.config.config.ConfigDict, dict] = {'inplace': True, 'type': 'ReLU'})[source]¶

BiFusion Block in YOLOv6.

BiFusion fuses current-, high- and low-level features. Compared with concatenation in PAN, it fuses an extra low-level feature.

Parameters

in_channels0 (int) – The channels of current-level feature.
in_channels1 (int) – The input channels of lower-level feature.
out_channels (int) – The out channels of the BiFusion module.
norm_cfg (dict) – Config dict for normalization layer. Defaults to dict(type=’BN’, momentum=0.03, eps=0.001).
act_cfg (dict) – Config dict for activation layer. Defaults to dict(type=’SiLU’, inplace=True).
init_cfg (dict or list[dict], optional) – Initialization config dict. Defaults to None.

forward(x: List[torch.Tensor]) → torch.Tensor[source]¶

Forward process :param x: The tensor list of length 3.

x[0]: The high-level feature. x[1]: The current-level feature. x[2]: The low-level feature.

class mmyolo.models.layers.CSPLayerWithTwoConv(in_channels: int, out_channels: int, expand_ratio: float = 0.5, num_blocks: int = 1, add_identity: bool = True, conv_cfg: Optional[Union[mmengine.config.config.ConfigDict, dict]] = None, norm_cfg: Union[mmengine.config.config.ConfigDict, dict] = {'eps': 0.001, 'momentum': 0.03, 'type': 'BN'}, act_cfg: Union[mmengine.config.config.ConfigDict, dict] = {'inplace': True, 'type': 'SiLU'}, init_cfg: Optional[Union[mmengine.config.config.ConfigDict, dict, List[Union[dict, mmengine.config.config.ConfigDict]]]] = None)[source]¶

Cross Stage Partial Layer with 2 convolutions.

Parameters

in_channels (int) – The input channels of the CSP layer.
out_channels (int) – The output channels of the CSP layer.
expand_ratio (float) – Ratio to adjust the number of channels of the hidden layer. Defaults to 0.5.
num_blocks (int) – Number of blocks. Defaults to 1
add_identity (bool) – Whether to add identity in blocks. Defaults to True.
conv_cfg (dict, optional) – Config dict for convolution layer. Defaults to None, which means using conv2d.
norm_cfg (dict) – Config dict for normalization layer. Defaults to dict(type=’BN’).
act_cfg (dict) – Config dict for activation layer. Defaults to dict(type=’SiLU’, inplace=True).

:param init_cfg (ConfigDict or dict or list[dict] or: list[ConfigDict], optional): Initialization config dict.: Defaults to None.

forward(x: torch.Tensor) → torch.Tensor[source]¶: Forward process.

class mmyolo.models.layers.DarknetBottleneck(in_channels: int, out_channels: int, expansion: float = 0.5, kernel_size: Sequence[int] = (1, 3), padding: Sequence[int] = (0, 1), add_identity: bool = True, use_depthwise: bool = False, conv_cfg: Optional[Union[mmengine.config.config.ConfigDict, dict]] = None, norm_cfg: Union[mmengine.config.config.ConfigDict, dict] = {'eps': 0.001, 'momentum': 0.03, 'type': 'BN'}, act_cfg: Union[mmengine.config.config.ConfigDict, dict] = {'inplace': True, 'type': 'SiLU'}, init_cfg: Optional[Union[mmengine.config.config.ConfigDict, dict, List[Union[dict, mmengine.config.config.ConfigDict]]]] = None)[source]¶

The basic bottleneck block used in Darknet.

Each ResBlock consists of two ConvModules and the input is added to the final output. Each ConvModule is composed of Conv, BN, and LeakyReLU. The first convLayer has filter size of k1Xk1 and the second one has the filter size of k2Xk2.

Note: This DarknetBottleneck is little different from MMDet’s, we can change the kernel size and padding for each conv.

Parameters

in_channels (int) – The input channels of this Module.
out_channels (int) – The output channels of this Module.
expansion (float) – The kernel size for hidden channel. Defaults to 0.5.
kernel_size (Sequence[int]) – The kernel size of the convolution. Defaults to (1, 3).
padding (Sequence[int]) – The padding size of the convolution. Defaults to (0, 1).
add_identity (bool) – Whether to add identity to the out. Defaults to True
use_depthwise (bool) – Whether to use depthwise separable convolution. Defaults to False
conv_cfg (dict) – Config dict for convolution layer. Default: None, which means using conv2d.
norm_cfg (dict) – Config dict for normalization layer. Defaults to dict(type=’BN’).
act_cfg (dict) – Config dict for activation layer. Defaults to dict(type=’Swish’).

class mmyolo.models.layers.EELANBlock(num_elan_block: int, **kwargs)[source]¶

Expand efficient layer aggregation networks for YOLOv7.

Parameters: num_elan_block (int) – The number of ELANBlock.

forward(x: torch.Tensor) → torch.Tensor[source]¶

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

class mmyolo.models.layers.ELANBlock(in_channels: int, out_channels: int, middle_ratio: float, block_ratio: float, num_blocks: int = 2, num_convs_in_block: int = 1, conv_cfg: Optional[Union[mmengine.config.config.ConfigDict, dict]] = None, norm_cfg: Union[mmengine.config.config.ConfigDict, dict] = {'eps': 0.001, 'momentum': 0.03, 'type': 'BN'}, act_cfg: Union[mmengine.config.config.ConfigDict, dict] = {'inplace': True, 'type': 'SiLU'}, init_cfg: Optional[Union[mmengine.config.config.ConfigDict, dict, List[Union[dict, mmengine.config.config.ConfigDict]]]] = None)[source]¶

Efficient layer aggregation networks for YOLOv7.

Parameters

in_channels (int) – The input channels of this Module.
out_channels (int) – The out channels of this Module.
middle_ratio (float) – The scaling ratio of the middle layer based on the in_channels.
block_ratio (float) – The scaling ratio of the block layer based on the in_channels.
num_blocks (int) – The number of blocks in the main branch. Defaults to 2.
num_convs_in_block (int) – The number of convs pre block. Defaults to 1.
conv_cfg (dict) – Config dict for convolution layer. Defaults to None. which means using conv2d. Defaults to None.
norm_cfg (dict) – Config dict for normalization layer. Defaults to dict(type=’BN’, momentum=0.03, eps=0.001).
act_cfg (dict) – Config dict for activation layer. Defaults to dict(type=’SiLU’, inplace=True).
init_cfg (dict or list[dict], optional) – Initialization config dict. Defaults to None.

forward(x: torch.Tensor) → torch.Tensor[source]¶: Forward process :param x: The input tensor. :type x: Tensor

class mmyolo.models.layers.EffectiveSELayer(channels: int, act_cfg: Union[mmengine.config.config.ConfigDict, dict] = {'type': 'HSigmoid'})[source]¶

Effective Squeeze-Excitation.

From CenterMask : Real-Time Anchor-Free Instance Segmentation arxiv (https://arxiv.org/abs/1911.06667) This code referenced to https://github.com/youngwanLEE/CenterMask/blob/72147e8aae673fcaf4103ee90a6a6b73863e7fa1/maskrcnn_benchmark/modeling/backbone/vovnet.py#L108-L121 # noqa

Parameters

channels (int) – The input and output channels of this Module.
act_cfg (dict) – Config dict for activation layer. Defaults to dict(type=’HSigmoid’).

forward(x: torch.Tensor) → torch.Tensor[source]¶: Forward process :param x: The input tensor. :type x: Tensor

class mmyolo.models.layers.ExpMomentumEMA(model: torch.nn.modules.module.Module, momentum: float = 0.0002, gamma: int = 2000, interval=1, device: Optional[torch.device] = None, update_buffers: bool = False)[source]¶

Exponential moving average (EMA) with exponential momentum strategy, which is used in YOLO.

Parameters

model (nn.Module) – The model to be averaged.
momentum (float) –

The momentum used for updating ema parameter.
Ema’s parameters are updated with the formula:

averaged_param = (1-momentum) * averaged_param + momentum * source_param. Defaults to 0.0002.
gamma (int) – Use a larger momentum early in training and gradually annealing to a smaller value to update the ema model smoothly. The momentum is calculated as (1 - momentum) * exp(-(1 + steps) / gamma) + momentum. Defaults to 2000.
interval (int) – Interval between two updates. Defaults to 1.
device (torch.device, optional) – If provided, the averaged model will be stored on the device. Defaults to None.
update_buffers (bool) – if True, it will compute running averages for both the parameters and the buffers of the model. Defaults to False.

avg_func(averaged_param: torch.Tensor, source_param: torch.Tensor, steps: int)[source]¶

Compute the moving average of the parameters using the exponential momentum strategy.

Parameters

averaged_param (Tensor) – The averaged parameters.
source_param (Tensor) – The source parameters.
steps (int) – The number of times the parameters have been updated.

update_parameters(model: torch.nn.modules.module.Module)[source]¶

Update the parameters after each training step.

Parameters: model (nn.Module) – The model of the parameter needs to be updated.

class mmyolo.models.layers.ImplicitA(in_channels: int, mean: float = 0.0, std: float = 0.02)[source]¶

Implicit add layer in YOLOv7.

Parameters

in_channels (int) – The input channels of this Module.
mean (float) – Mean value of implicit module. Defaults to 0.
std (float) – Std value of implicit module. Defaults to 0.02

forward(x)[source]¶: Forward process :param x: The input tensor. :type x: Tensor

class mmyolo.models.layers.ImplicitM(in_channels: int, mean: float = 1.0, std: float = 0.02)[source]¶

Implicit multiplier layer in YOLOv7.

Parameters

in_channels (int) – The input channels of this Module.
mean (float) – Mean value of implicit module. Defaults to 1.
std (float) – Std value of implicit module. Defaults to 0.02.

forward(x)[source]¶: Forward process :param x: The input tensor. :type x: Tensor

class mmyolo.models.layers.MaxPoolAndStrideConvBlock(in_channels: int, out_channels: int, maxpool_kernel_sizes: int = 2, use_in_channels_of_middle: bool = False, conv_cfg: Optional[Union[mmengine.config.config.ConfigDict, dict]] = None, norm_cfg: Union[mmengine.config.config.ConfigDict, dict] = {'eps': 0.001, 'momentum': 0.03, 'type': 'BN'}, act_cfg: Union[mmengine.config.config.ConfigDict, dict] = {'inplace': True, 'type': 'SiLU'}, init_cfg: Optional[Union[mmengine.config.config.ConfigDict, dict, List[Union[dict, mmengine.config.config.ConfigDict]]]] = None)[source]¶

Max pooling and stride conv layer for YOLOv7.

Parameters

in_channels (int) – The input channels of this Module.
out_channels (int) – The out channels of this Module.
maxpool_kernel_sizes (int) – kernel sizes of pooling layers. Defaults to 2.
use_in_channels_of_middle (bool) – Whether to calculate middle channels based on in_channels. Defaults to False.
conv_cfg (dict) – Config dict for convolution layer. Defaults to None. which means using conv2d. Defaults to None.
norm_cfg (dict) – Config dict for normalization layer. Defaults to dict(type=’BN’, momentum=0.03, eps=0.001).
act_cfg (dict) – Config dict for activation layer. Defaults to dict(type=’SiLU’, inplace=True).
init_cfg (dict or list[dict], optional) – Initialization config dict. Defaults to None.

forward(x: torch.Tensor) → torch.Tensor[source]¶: Forward process :param x: The input tensor. :type x: Tensor

class mmyolo.models.layers.PPYOLOEBasicBlock(in_channels: int, out_channels: int, norm_cfg: Union[mmengine.config.config.ConfigDict, dict] = {'eps': 1e-05, 'momentum': 0.1, 'type': 'BN'}, act_cfg: Union[mmengine.config.config.ConfigDict, dict] = {'inplace': True, 'type': 'SiLU'}, shortcut: bool = True, use_alpha: bool = False)[source]¶

PPYOLOE Backbone BasicBlock.

Parameters

in_channels (int) – The input channels of this Module.
out_channels (int) – The output channels of this Module.
norm_cfg (dict) – Config dict for normalization layer. Defaults to dict(type=’BN’, momentum=0.1, eps=1e-5).
act_cfg (dict) – Config dict for activation layer. Defaults to dict(type=’SiLU’, inplace=True).
shortcut (bool) – Whether to add inputs and outputs together
the end of this layer. Defaults to True. (at) –
use_alpha (bool) – Whether to use alpha parameter at 1x1 conv.

forward(x: torch.Tensor) → torch.Tensor[source]¶

Forward process. :param inputs: The input tensor. :type inputs: Tensor

Returns: The output tensor.
Return type: Tensor

class mmyolo.models.layers.RepStageBlock(in_channels: int, out_channels: int, num_blocks: int = 1, bottle_block: torch.nn.modules.module.Module = <class 'mmyolo.models.layers.yolo_bricks.RepVGGBlock'>, block_cfg: Union[mmengine.config.config.ConfigDict, dict] = {'type': 'RepVGGBlock'})[source]¶

RepStageBlock is a stage block with rep-style basic block.

Parameters

in_channels (int) – The input channels of this Module.
out_channels (int) – The output channels of this Module.
num_blocks (int, tuple[int]) – Number of blocks. Defaults to 1.
bottle_block (nn.Module) – Basic unit of RepStage. Defaults to RepVGGBlock.
block_cfg (ConfigType) – Config of RepStage. Defaults to ‘RepVGGBlock’.

forward(x: torch.Tensor) → torch.Tensor[source]¶

Forward process.

Parameters: x (Tensor) – The input tensor.
Returns: The output tensor.
Return type: Tensor

class mmyolo.models.layers.RepVGGBlock(in_channels: int, out_channels: int, kernel_size: Union[int, Tuple[int]] = 3, stride: Union[int, Tuple[int]] = 1, padding: Union[int, Tuple[int]] = 1, dilation: Union[int, Tuple[int]] = 1, groups: Optional[int] = 1, padding_mode: Optional[str] = 'zeros', norm_cfg: Union[mmengine.config.config.ConfigDict, dict] = {'eps': 0.001, 'momentum': 0.03, 'type': 'BN'}, act_cfg: Union[mmengine.config.config.ConfigDict, dict] = {'inplace': True, 'type': 'ReLU'}, use_se: bool = False, use_alpha: bool = False, use_bn_first=True, deploy: bool = False)[source]¶

RepVGGBlock is a basic rep-style block, including training and deploy status This code is based on https://github.com/DingXiaoH/RepVGG/blob/main/repvgg.py.

Parameters

in_channels (int) – Number of channels in the input image
out_channels (int) – Number of channels produced by the convolution
kernel_size (int or tuple) – Size of the convolving kernel
stride (int or tuple) – Stride of the convolution. Default: 1
padding (int, tuple) – Padding added to all four sides of the input. Default: 1
dilation (int or tuple) – Spacing between kernel elements. Default: 1
groups (int, optional) – Number of blocked connections from input channels to output channels. Default: 1
padding_mode (string, optional) – Default: ‘zeros’
use_se (bool) – Whether to use se. Default: False
use_alpha (bool) – Whether to use alpha parameter at 1x1 conv. In PPYOLOE+ model backbone, use_alpha will be set to True. Default: False.
use_bn_first (bool) – Whether to use bn layer before conv. In YOLOv6 and YOLOv7, this will be set to True. In PPYOLOE, this will be set to False. Default: True.
deploy (bool) – Whether in deploy mode. Default: False

forward(inputs: torch.Tensor) → torch.Tensor[source]¶

Forward process. :param inputs: The input tensor. :type inputs: Tensor

Returns: The output tensor.
Return type: Tensor

get_equivalent_kernel_bias()[source]¶

Derives the equivalent kernel and bias in a differentiable way.

Returns: Equivalent kernel and bias
Return type: tuple

switch_to_deploy()[source]¶: Switch to deploy mode.

class mmyolo.models.layers.SPPFBottleneck(in_channels: int, out_channels: int, kernel_sizes: Union[int, Sequence[int]] = 5, use_conv_first: bool = True, mid_channels_scale: float = 0.5, conv_cfg: Optional[Union[mmengine.config.config.ConfigDict, dict]] = None, norm_cfg: Union[mmengine.config.config.ConfigDict, dict] = {'eps': 0.001, 'momentum': 0.03, 'type': 'BN'}, act_cfg: Union[mmengine.config.config.ConfigDict, dict] = {'inplace': True, 'type': 'SiLU'}, init_cfg: Optional[Union[mmengine.config.config.ConfigDict, dict, List[Union[dict, mmengine.config.config.ConfigDict]]]] = None)[source]¶

Spatial pyramid pooling - Fast (SPPF) layer for YOLOv5, YOLOX and PPYOLOE by Glenn Jocher

Parameters

in_channels (int) – The input channels of this Module.
out_channels (int) – The output channels of this Module.
kernel_sizes (int, tuple[int]) – Sequential or number of kernel sizes of pooling layers. Defaults to 5.
use_conv_first (bool) – Whether to use conv before pooling layer. In YOLOv5 and YOLOX, the para set to True. In PPYOLOE, the para set to False. Defaults to True.
mid_channels_scale (float) – Channel multiplier, multiply in_channels by this amount to get mid_channels. This parameter is valid only when use_conv_fist=True.Defaults to 0.5.
conv_cfg (dict) – Config dict for convolution layer. Defaults to None. which means using conv2d. Defaults to None.
norm_cfg (dict) – Config dict for normalization layer. Defaults to dict(type=’BN’, momentum=0.03, eps=0.001).
act_cfg (dict) – Config dict for activation layer. Defaults to dict(type=’SiLU’, inplace=True).
init_cfg (dict or list[dict], optional) – Initialization config dict. Defaults to None.

forward(x: torch.Tensor) → torch.Tensor[source]¶: Forward process :param x: The input tensor. :type x: Tensor

class mmyolo.models.layers.SPPFCSPBlock(in_channels: int, out_channels: int, expand_ratio: float = 0.5, kernel_sizes: Union[int, Sequence[int]] = 5, is_tiny_version: bool = False, conv_cfg: Optional[Union[mmengine.config.config.ConfigDict, dict]] = None, norm_cfg: Union[mmengine.config.config.ConfigDict, dict] = {'eps': 0.001, 'momentum': 0.03, 'type': 'BN'}, act_cfg: Union[mmengine.config.config.ConfigDict, dict] = {'inplace': True, 'type': 'SiLU'}, init_cfg: Optional[Union[mmengine.config.config.ConfigDict, dict, List[Union[dict, mmengine.config.config.ConfigDict]]]] = None)[source]¶

Spatial pyramid pooling - Fast (SPPF) layer with CSP for YOLOv7

Parameters

in_channels (int) – The input channels of this Module.
out_channels (int) – The output channels of this Module.
expand_ratio (float) – Expand ratio of SPPCSPBlock. Defaults to 0.5.
kernel_sizes (int, tuple[int]) – Sequential or number of kernel sizes of pooling layers. Defaults to 5.
is_tiny_version (bool) – Is tiny version of SPPFCSPBlock. If True, it means it is a yolov7 tiny model. Defaults to False.
conv_cfg (dict) – Config dict for convolution layer. Defaults to None. which means using conv2d. Defaults to None.
norm_cfg (dict) – Config dict for normalization layer. Defaults to dict(type=’BN’, momentum=0.03, eps=0.001).
act_cfg (dict) – Config dict for activation layer. Defaults to dict(type=’SiLU’, inplace=True).
init_cfg (dict or list[dict], optional) – Initialization config dict. Defaults to None.

forward(x) → torch.Tensor[source]¶: Forward process :param x: The input tensor. :type x: Tensor

class mmyolo.models.layers.TinyDownSampleBlock(in_channels: int, out_channels: int, middle_ratio: float = 1.0, kernel_sizes: Union[int, Sequence[int]] = 3, conv_cfg: Optional[Union[mmengine.config.config.ConfigDict, dict]] = None, norm_cfg: Union[mmengine.config.config.ConfigDict, dict] = {'eps': 0.001, 'momentum': 0.03, 'type': 'BN'}, act_cfg: Union[mmengine.config.config.ConfigDict, dict] = {'negative_slope': 0.1, 'type': 'LeakyReLU'}, init_cfg: Optional[Union[mmengine.config.config.ConfigDict, dict, List[Union[dict, mmengine.config.config.ConfigDict]]]] = None)[source]¶

Down sample layer for YOLOv7-tiny.

Parameters

in_channels (int) – The input channels of this Module.
out_channels (int) – The out channels of this Module.
middle_ratio (float) – The scaling ratio of the middle layer based on the in_channels. Defaults to 1.0.
kernel_sizes (int, tuple[int]) – Sequential or number of kernel sizes of pooling layers. Defaults to 3.
conv_cfg (dict) – Config dict for convolution layer. Defaults to None. which means using conv2d. Defaults to None.
norm_cfg (dict) – Config dict for normalization layer. Defaults to dict(type=’BN’, momentum=0.03, eps=0.001).
act_cfg (dict) – Config dict for activation layer. Defaults to dict(type=’LeakyReLU’, negative_slope=0.1).
init_cfg (dict or list[dict], optional) – Initialization config dict. Defaults to None.

forward(x) → torch.Tensor[source]¶

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

losses¶

class mmyolo.models.losses.IoULoss(iou_mode: str = 'ciou', bbox_format: str = 'xywh', eps: float = 1e-07, reduction: str = 'mean', loss_weight: float = 1.0, return_iou: bool = True)[source]¶

IoULoss.

Computing the IoU loss between a set of predicted bboxes and target bboxes. :param iou_mode: Options are “ciou”.

Defaults to “ciou”.

Parameters

bbox_format (str) – Options are “xywh” and “xyxy”. Defaults to “xywh”.
eps (float) – Eps to avoid log(0).
reduction (str) – Options are “none”, “mean” and “sum”.
loss_weight (float) – Weight of loss.
return_iou (bool) – If True, return loss and iou.

forward(pred: torch.Tensor, target: torch.Tensor, weight: Optional[torch.Tensor] = None, avg_factor: Optional[float] = None, reduction_override: Optional[Union[str, bool]] = None) → Tuple[torch.Tensor, torch.Tensor][source]¶

Forward function.

Parameters

pred (Tensor) – Predicted bboxes of format (x1, y1, x2, y2) or (x, y, w, h),shape (n, 4).
target (Tensor) – Corresponding gt bboxes, shape (n, 4).
weight (Tensor, optional) – Element-wise weights.
avg_factor (float, optional) – Average factor when computing the mean of losses.
reduction_override (str, bool, optional) – Same as built-in losses of PyTorch. Defaults to None.

Returns

Return type

loss or tuple(loss, iou)

class mmyolo.models.losses.OksLoss(metainfo: Optional[str] = None, loss_weight: float = 1.0)[source]¶

A PyTorch implementation of the Object Keypoint Similarity (OKS) loss as described in the paper “YOLO-Pose: Enhancing YOLO for Multi Person Pose Estimation Using Object Keypoint Similarity Loss” by Debapriya et al.

(2022). The OKS loss is used for keypoint-based object recognition and consists of a measure of the similarity between predicted and ground truth keypoint locations, adjusted by the size of the object in the image. The loss function takes as input the predicted keypoint locations, the ground truth keypoint locations, a mask indicating which keypoints are valid, and bounding boxes for the objects. :param metainfo: Path to a JSON file containing information

about the dataset’s annotations.

Parameters: loss_weight (float) – Weight for the loss.

compute_oks(output: torch.Tensor, target: torch.Tensor, target_weights: torch.Tensor, bboxes: Optional[torch.Tensor] = None) → torch.Tensor[source]¶

Calculates the OKS loss.

Parameters

output (Tensor) – Predicted keypoints in shape N x k x 2, where N is batch size, k is the number of keypoints, and 2 are the xy coordinates.
target (Tensor) – Ground truth keypoints in the same shape as output.
target_weights (Tensor) – Mask of valid keypoints in shape N x k, with 1 for valid and 0 for invalid.
bboxes (Optional[Tensor]) – Bounding boxes in shape N x 4, where 4 are the xyxy coordinates.

Returns

The calculated OKS loss.

Return type

Tensor

forward(output: torch.Tensor, target: torch.Tensor, target_weights: torch.Tensor, bboxes: Optional[torch.Tensor] = None) → torch.Tensor[source]¶

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

mmyolo.models.losses.bbox_overlaps(pred: torch.Tensor, target: torch.Tensor, iou_mode: str = 'ciou', bbox_format: str = 'xywh', siou_theta: float = 4.0, eps: float = 1e-07) → torch.Tensor[source]¶

Calculate overlap between two set of bboxes. Implementation of paper `Enhancing Geometric Factors into Model Learning and Inference for Object Detection and Instance Segmentation.

In the CIoU implementation of YOLOv5 and MMDetection, there is a slight difference in the way the alpha parameter is computed.

mmdet version:: alpha = (ious > 0.5).float() * v / (1 - ious + v)
YOLOv5 version:: alpha = v / (v - ious + (1 + eps)

Parameters

pred (Tensor) – Predicted bboxes of format (x1, y1, x2, y2) or (x, y, w, h),shape (n, 4).
target (Tensor) – Corresponding gt bboxes, shape (n, 4).
iou_mode (str) – Options are (‘iou’, ‘ciou’, ‘giou’, ‘siou’). Defaults to “ciou”.
bbox_format (str) – Options are “xywh” and “xyxy”. Defaults to “xywh”.
siou_theta (float) – siou_theta for SIoU when calculate shape cost. Defaults to 4.0.
eps (float) – Eps to avoid log(0).

Returns

shape (n, ).

Return type

Tensor

necks¶

class mmyolo.models.necks.BaseYOLONeck(in_channels: List[int], out_channels: Union[int, List[int]], deepen_factor: float = 1.0, widen_factor: float = 1.0, upsample_feats_cat_first: bool = True, freeze_all: bool = False, norm_cfg: Optional[Union[mmengine.config.config.ConfigDict, dict]] = None, act_cfg: Optional[Union[mmengine.config.config.ConfigDict, dict]] = None, init_cfg: Optional[Union[mmengine.config.config.ConfigDict, dict, List[Union[dict, mmengine.config.config.ConfigDict]]]] = None, **kwargs)[source]¶

Base neck used in YOLO series.

P5 neck model structure diagram
                   +--------+                     +-------+
                   |top_down|----------+--------->|  out  |---> output0
                   | layer1 |          |          | layer0|
                   +--------+          |          +-------+
stride=8                ^              |
idx=0  +------+    +--------+          |
-----> |reduce|--->|   cat  |          |
       |layer0|    +--------+          |
       +------+         ^              v
                   +--------+    +-----------+
                   |upsample|    |downsample |
                   | layer1 |    |  layer0   |
                   +--------+    +-----------+
                        ^              |
                   +--------+          v
                   |top_down|    +-----------+
                   | layer2 |--->|    cat    |
                   +--------+    +-----------+
stride=16               ^              v
idx=1  +------+    +--------+    +-----------+    +-------+
-----> |reduce|--->|   cat  |    | bottom_up |--->|  out  |---> output1
       |layer1|    +--------+    |   layer0  |    | layer1|
       +------+         ^        +-----------+    +-------+
                        |              v
                   +--------+    +-----------+
                   |upsample|    |downsample |
                   | layer2 |    |  layer1   |
stride=32          +--------+    +-----------+
idx=2  +------+         ^              v
-----> |reduce|         |        +-----------+
       |layer2|---------+------->|    cat    |
       +------+                  +-----------+
                                       v
                                 +-----------+    +-------+
                                 | bottom_up |--->|  out  |---> output2
                                 |  layer1   |    | layer2|
                                 +-----------+    +-------+

P6 neck model structure diagram
                   +--------+                     +-------+
                   |top_down|----------+--------->|  out  |---> output0
                   | layer1 |          |          | layer0|
                   +--------+          |          +-------+
stride=8                ^              |
idx=0  +------+    +--------+          |
-----> |reduce|--->|   cat  |          |
       |layer0|    +--------+          |
       +------+         ^              v
                   +--------+    +-----------+
                   |upsample|    |downsample |
                   | layer1 |    |  layer0   |
                   +--------+    +-----------+
                        ^              |
                   +--------+          v
                   |top_down|    +-----------+
                   | layer2 |--->|    cat    |
                   +--------+    +-----------+
stride=16               ^              v
idx=1  +------+    +--------+    +-----------+    +-------+
-----> |reduce|--->|   cat  |    | bottom_up |--->|  out  |---> output1
       |layer1|    +--------+    |   layer0  |    | layer1|
       +------+         ^        +-----------+    +-------+
                        |              v
                   +--------+    +-----------+
                   |upsample|    |downsample |
                   | layer2 |    |  layer1   |
                   +--------+    +-----------+
                        ^              |
                   +--------+          v
                   |top_down|    +-----------+
                   | layer3 |--->|    cat    |
                   +--------+    +-----------+
stride=32               ^              v
idx=2  +------+    +--------+    +-----------+    +-------+
-----> |reduce|--->|   cat  |    | bottom_up |--->|  out  |---> output2
       |layer2|    +--------+    |   layer1  |    | layer2|
       +------+         ^        +-----------+    +-------+
                        |              v
                   +--------+    +-----------+
                   |upsample|    |downsample |
                   | layer3 |    |  layer2   |
                   +--------+    +-----------+
stride=64               ^              v
idx=3  +------+         |        +-----------+
-----> |reduce|---------+------->|    cat    |
       |layer3|                  +-----------+
       +------+                        v
                                 +-----------+    +-------+
                                 | bottom_up |--->|  out  |---> output3
                                 |  layer2   |    | layer3|
                                 +-----------+    +-------+

Parameters

in_channels (List[int]) – Number of input channels per scale.
out_channels (int) – Number of output channels (used at each scale)
deepen_factor (float) – Depth multiplier, multiply number of blocks in CSP layer by this amount. Defaults to 1.0.
widen_factor (float) – Width multiplier, multiply number of channels in each layer by this amount. Defaults to 1.0.
upsample_feats_cat_first (bool) – Whether the output features are concat first after upsampling in the topdown module. Defaults to True. Currently only YOLOv7 is false.
freeze_all (bool) – Whether to freeze the model. Defaults to False
norm_cfg (dict) – Config dict for normalization layer. Defaults to None.
act_cfg (dict) – Config dict for activation layer. Defaults to None.
init_cfg (dict or list[dict], optional) – Initialization config dict. Defaults to None.

abstract build_bottom_up_layer(idx: int)[source]¶: build bottom up layer.

abstract build_downsample_layer(idx: int)[source]¶: build downsample layer.

abstract build_out_layer(idx: int)[source]¶: build out layer.

abstract build_reduce_layer(idx: int)[source]¶: build reduce layer.

abstract build_top_down_layer(idx: int)[source]¶: build top down layer.

abstract build_upsample_layer(idx: int)[source]¶: build upsample layer.

forward(inputs: List[torch.Tensor]) → tuple[source]¶: Forward function.

train(mode=True)[source]¶: Convert the model into training mode while keep the normalization layer freezed.

class mmyolo.models.necks.CSPNeXtPAFPN(in_channels: Sequence[int], out_channels: int, deepen_factor: float = 1.0, widen_factor: float = 1.0, num_csp_blocks: int = 3, freeze_all: bool = False, use_depthwise: bool = False, expand_ratio: float = 0.5, upsample_cfg: Union[mmengine.config.config.ConfigDict, dict] = {'mode': 'nearest', 'scale_factor': 2}, conv_cfg: Optional[bool] = None, norm_cfg: Union[mmengine.config.config.ConfigDict, dict] = {'type': 'BN'}, act_cfg: Union[mmengine.config.config.ConfigDict, dict] = {'inplace': True, 'type': 'SiLU'}, init_cfg: Optional[Union[mmengine.config.config.ConfigDict, dict, List[Union[dict, mmengine.config.config.ConfigDict]]]] = {'a': 2.23606797749979, 'distribution': 'uniform', 'layer': 'Conv2d', 'mode': 'fan_in', 'nonlinearity': 'leaky_relu', 'type': 'Kaiming'})[source]¶

Path Aggregation Network with CSPNeXt blocks.

Parameters

in_channels (Sequence[int]) – Number of input channels per scale.
out_channels (int) – Number of output channels (used at each scale)
deepen_factor (float) – Depth multiplier, multiply number of blocks in CSP layer by this amount. Defaults to 1.0.
widen_factor (float) – Width multiplier, multiply number of channels in each layer by this amount. Defaults to 1.0.
num_csp_blocks (int) – Number of bottlenecks in CSPLayer. Defaults to 3.
use_depthwise (bool) – Whether to use depthwise separable convolution in blocks. Defaults to False.
expand_ratio (float) – Ratio to adjust the number of channels of the hidden layer. Defaults to 0.5.
upsample_cfg (dict) – Config dict for interpolate layer. Default: dict(scale_factor=2, mode=’nearest’)
conv_cfg (dict, optional) – Config dict for convolution layer. Default: None, which means using conv2d.
norm_cfg (dict) – Config dict for normalization layer. Default: dict(type=’BN’)
act_cfg (dict) – Config dict for activation layer. Default: dict(type=’SiLU’, inplace=True)
init_cfg (dict or list[dict], optional) – Initialization config dict. Default: None.

build_bottom_up_layer(idx: int) → torch.nn.modules.module.Module[source]¶

build bottom up layer.

Parameters: idx (int) – layer idx.
Returns: The bottom up layer.
Return type: nn.Module

build_downsample_layer(idx: int) → torch.nn.modules.module.Module[source]¶

build downsample layer.

Parameters: idx (int) – layer idx.
Returns: The downsample layer.
Return type: nn.Module

build_out_layer(idx: int) → torch.nn.modules.module.Module[source]¶

build out layer.

Parameters: idx (int) – layer idx.
Returns: The out layer.
Return type: nn.Module

build_reduce_layer(idx: int) → torch.nn.modules.module.Module[source]¶

build reduce layer.

Parameters: idx (int) – layer idx.
Returns: The reduce layer.
Return type: nn.Module

build_top_down_layer(idx: int) → torch.nn.modules.module.Module[source]¶

build top down layer.

Parameters: idx (int) – layer idx.
Returns: The top down layer.
Return type: nn.Module

build_upsample_layer(*args, **kwargs) → torch.nn.modules.module.Module[source]¶: build upsample layer.

class mmyolo.models.necks.PPYOLOECSPPAFPN(in_channels: List[int] = [256, 512, 1024], out_channels: List[int] = [256, 512, 1024], deepen_factor: float = 1.0, widen_factor: float = 1.0, freeze_all: bool = False, num_csplayer: int = 1, num_blocks_per_layer: int = 3, block_cfg: Union[mmengine.config.config.ConfigDict, dict] = {'shortcut': False, 'type': 'PPYOLOEBasicBlock', 'use_alpha': False}, norm_cfg: Union[mmengine.config.config.ConfigDict, dict] = {'eps': 1e-05, 'momentum': 0.1, 'type': 'BN'}, act_cfg: Union[mmengine.config.config.ConfigDict, dict] = {'inplace': True, 'type': 'SiLU'}, drop_block_cfg: Optional[Union[mmengine.config.config.ConfigDict, dict]] = None, init_cfg: Optional[Union[mmengine.config.config.ConfigDict, dict, List[Union[dict, mmengine.config.config.ConfigDict]]]] = None, use_spp: bool = False)[source]¶

CSPPAN in PPYOLOE.

Parameters

in_channels (List[int]) – Number of input channels per scale.
out_channels (List[int]) – Number of output channels (used at each scale).
deepen_factor (float) – Depth multiplier, multiply number of blocks in CSP layer by this amount. Defaults to 1.0.
widen_factor (float) – Width multiplier, multiply number of channels in each layer by this amount. Defaults to 1.0.
freeze_all (bool) – Whether to freeze the model.
num_csplayer (int) – Number of CSPResLayer in per layer. Defaults to 1.
num_blocks_per_layer (int) – Number of blocks per CSPResLayer. Defaults to 3.
block_cfg (dict) – Config dict for block. Defaults to dict(type=’PPYOLOEBasicBlock’, shortcut=True, use_alpha=False)
norm_cfg (dict) – Config dict for normalization layer. Defaults to dict(type=’BN’, momentum=0.1, eps=1e-5).
act_cfg (dict) – Config dict for activation layer. Defaults to dict(type=’SiLU’, inplace=True).
drop_block_cfg (dict, optional) – Drop block config. Defaults to None. If you want to use Drop block after CSPResLayer, you can set this para as dict(type=’mmdet.DropBlock’, drop_prob=0.1, block_size=3, warm_iters=0).
init_cfg (dict or list[dict], optional) – Initialization config dict. Defaults to None.
use_spp (bool) – Whether to use SPP in reduce layer. Defaults to False.

build_bottom_up_layer(idx: int) → torch.nn.modules.module.Module[source]¶

build bottom up layer.

Parameters: idx (int) – layer idx.
Returns: The bottom up layer.
Return type: nn.Module

build_downsample_layer(idx: int) → torch.nn.modules.module.Module[source]¶

build downsample layer.

Parameters: idx (int) – layer idx.
Returns: The downsample layer.
Return type: nn.Module

build_out_layer(*args, **kwargs) → torch.nn.modules.module.Module[source]¶: build out layer.

build_reduce_layer(idx: int)[source]¶

build reduce layer.

Parameters: idx (int) – layer idx.
Returns: The reduce layer.
Return type: nn.Module

build_top_down_layer(idx: int) → torch.nn.modules.module.Module[source]¶

build top down layer.

Parameters: idx (int) – layer idx.
Returns: The top down layer.
Return type: nn.Module

build_upsample_layer(idx: int) → torch.nn.modules.module.Module[source]¶: build upsample layer.

class mmyolo.models.necks.YOLOXPAFPN(in_channels: List[int], out_channels: int, deepen_factor: float = 1.0, widen_factor: float = 1.0, num_csp_blocks: int = 3, use_depthwise: bool = False, freeze_all: bool = False, norm_cfg: Union[mmengine.config.config.ConfigDict, dict] = {'eps': 0.001, 'momentum': 0.03, 'type': 'BN'}, act_cfg: Union[mmengine.config.config.ConfigDict, dict] = {'inplace': True, 'type': 'SiLU'}, init_cfg: Optional[Union[mmengine.config.config.ConfigDict, dict, List[Union[dict, mmengine.config.config.ConfigDict]]]] = None)[source]¶

Path Aggregation Network used in YOLOX.

Parameters

in_channels (List[int]) – Number of input channels per scale.
out_channels (int) – Number of output channels (used at each scale).
deepen_factor (float) – Depth multiplier, multiply number of blocks in CSP layer by this amount. Defaults to 1.0.
widen_factor (float) – Width multiplier, multiply number of channels in each layer by this amount. Defaults to 1.0.
num_csp_blocks (int) – Number of bottlenecks in CSPLayer. Defaults to 1.
use_depthwise (bool) – Whether to use depthwise separable convolution. Defaults to False.
freeze_all (bool) – Whether to freeze the model. Defaults to False.
norm_cfg (dict) – Config dict for normalization layer. Defaults to dict(type=’BN’, momentum=0.03, eps=0.001).
act_cfg (dict) – Config dict for activation layer. Defaults to dict(type=’SiLU’, inplace=True).
init_cfg (dict or list[dict], optional) – Initialization config dict. Defaults to None.

build_bottom_up_layer(idx: int) → torch.nn.modules.module.Module[source]¶

build bottom up layer.

Parameters: idx (int) – layer idx.
Returns: The bottom up layer.
Return type: nn.Module

build_downsample_layer(idx: int) → torch.nn.modules.module.Module[source]¶

build downsample layer.

Parameters: idx (int) – layer idx.
Returns: The downsample layer.
Return type: nn.Module

build_out_layer(idx: int) → torch.nn.modules.module.Module[source]¶

build out layer.

Parameters: idx (int) – layer idx.
Returns: The out layer.
Return type: nn.Module

build_reduce_layer(idx: int) → torch.nn.modules.module.Module[source]¶

build reduce layer.

Parameters: idx (int) – layer idx.
Returns: The reduce layer.
Return type: nn.Module

build_top_down_layer(idx: int) → torch.nn.modules.module.Module[source]¶

build top down layer.

Parameters: idx (int) – layer idx.
Returns: The top down layer.
Return type: nn.Module

build_upsample_layer(*args, **kwargs) → torch.nn.modules.module.Module[source]¶: build upsample layer.

class mmyolo.models.necks.YOLOv5PAFPN(in_channels: List[int], out_channels: Union[List[int], int], deepen_factor: float = 1.0, widen_factor: float = 1.0, num_csp_blocks: int = 1, freeze_all: bool = False, norm_cfg: Union[mmengine.config.config.ConfigDict, dict] = {'eps': 0.001, 'momentum': 0.03, 'type': 'BN'}, act_cfg: Union[mmengine.config.config.ConfigDict, dict] = {'inplace': True, 'type': 'SiLU'}, init_cfg: Optional[Union[mmengine.config.config.ConfigDict, dict, List[Union[dict, mmengine.config.config.ConfigDict]]]] = None)[source]¶

Path Aggregation Network used in YOLOv5.

Parameters

in_channels (List[int]) – Number of input channels per scale.
out_channels (int) – Number of output channels (used at each scale)
deepen_factor (float) – Depth multiplier, multiply number of blocks in CSP layer by this amount. Defaults to 1.0.
widen_factor (float) – Width multiplier, multiply number of channels in each layer by this amount. Defaults to 1.0.
num_csp_blocks (int) – Number of bottlenecks in CSPLayer. Defaults to 1.
freeze_all (bool) – Whether to freeze the model
norm_cfg (dict) – Config dict for normalization layer. Defaults to dict(type=’BN’, momentum=0.03, eps=0.001).
act_cfg (dict) – Config dict for activation layer. Defaults to dict(type=’SiLU’, inplace=True).
init_cfg (dict or list[dict], optional) – Initialization config dict. Defaults to None.

build_bottom_up_layer(idx: int) → torch.nn.modules.module.Module[source]¶

build bottom up layer.

Parameters: idx (int) – layer idx.
Returns: The bottom up layer.
Return type: nn.Module

build_downsample_layer(idx: int) → torch.nn.modules.module.Module[source]¶

build downsample layer.

Parameters: idx (int) – layer idx.
Returns: The downsample layer.
Return type: nn.Module

build_out_layer(*args, **kwargs) → torch.nn.modules.module.Module[source]¶: build out layer.

build_reduce_layer(idx: int) → torch.nn.modules.module.Module[source]¶

build reduce layer.

Parameters: idx (int) – layer idx.
Returns: The reduce layer.
Return type: nn.Module

build_top_down_layer(idx: int)[source]¶

build top down layer.

Parameters: idx (int) – layer idx.
Returns: The top down layer.
Return type: nn.Module

build_upsample_layer(*args, **kwargs) → torch.nn.modules.module.Module[source]¶: build upsample layer.

init_weights()[source]¶: Initialize the weights.

class mmyolo.models.necks.YOLOv6CSPRepBiPAFPN(in_channels: List[int], out_channels: int, deepen_factor: float = 1.0, widen_factor: float = 1.0, hidden_ratio: float = 0.5, num_csp_blocks: int = 12, freeze_all: bool = False, norm_cfg: Union[mmengine.config.config.ConfigDict, dict] = {'eps': 0.001, 'momentum': 0.03, 'type': 'BN'}, act_cfg: Union[mmengine.config.config.ConfigDict, dict] = {'inplace': True, 'type': 'ReLU'}, block_act_cfg: Union[mmengine.config.config.ConfigDict, dict] = {'inplace': True, 'type': 'SiLU'}, block_cfg: Union[mmengine.config.config.ConfigDict, dict] = {'type': 'RepVGGBlock'}, init_cfg: Optional[Union[mmengine.config.config.ConfigDict, dict, List[Union[dict, mmengine.config.config.ConfigDict]]]] = None)[source]¶

Path Aggregation Network used in YOLOv6 3.0.

Parameters

in_channels (List[int]) – Number of input channels per scale.
out_channels (int) – Number of output channels (used at each scale)
deepen_factor (float) – Depth multiplier, multiply number of blocks in CSP layer by this amount. Defaults to 1.0.
widen_factor (float) – Width multiplier, multiply number of channels in each layer by this amount. Defaults to 1.0.
num_csp_blocks (int) – Number of bottlenecks in CSPLayer. Defaults to 1.
freeze_all (bool) – Whether to freeze the model.
norm_cfg (dict) – Config dict for normalization layer. Defaults to dict(type=’BN’, momentum=0.03, eps=0.001).
act_cfg (dict) – Config dict for activation layer. Defaults to dict(type=’ReLU’, inplace=True).
block_cfg (dict) – Config dict for the block used to build each layer. Defaults to dict(type=’RepVGGBlock’).
block_act_cfg (dict) – Config dict for activation layer used in each stage. Defaults to dict(type=’SiLU’, inplace=True).
init_cfg (dict or list[dict], optional) – Initialization config dict. Defaults to None.

build_bottom_up_layer(idx: int) → torch.nn.modules.module.Module[source]¶

build bottom up layer.

Parameters: idx (int) – layer idx.
Returns: The bottom up layer.
Return type: nn.Module

build_top_down_layer(idx: int) → torch.nn.modules.module.Module[source]¶

build top down layer.

Parameters: idx (int) – layer idx.
Returns: The top down layer.
Return type: nn.Module

class mmyolo.models.necks.YOLOv6CSPRepPAFPN(in_channels: List[int], out_channels: int, deepen_factor: float = 1.0, widen_factor: float = 1.0, hidden_ratio: float = 0.5, num_csp_blocks: int = 12, freeze_all: bool = False, norm_cfg: Union[mmengine.config.config.ConfigDict, dict] = {'eps': 0.001, 'momentum': 0.03, 'type': 'BN'}, act_cfg: Union[mmengine.config.config.ConfigDict, dict] = {'inplace': True, 'type': 'ReLU'}, block_act_cfg: Union[mmengine.config.config.ConfigDict, dict] = {'inplace': True, 'type': 'SiLU'}, block_cfg: Union[mmengine.config.config.ConfigDict, dict] = {'type': 'RepVGGBlock'}, init_cfg: Optional[Union[mmengine.config.config.ConfigDict, dict, List[Union[dict, mmengine.config.config.ConfigDict]]]] = None)[source]¶

Path Aggregation Network used in YOLOv6.

Parameters

in_channels (List[int]) – Number of input channels per scale.
out_channels (int) – Number of output channels (used at each scale)
deepen_factor (float) – Depth multiplier, multiply number of blocks in CSP layer by this amount. Defaults to 1.0.
widen_factor (float) – Width multiplier, multiply number of channels in each layer by this amount. Defaults to 1.0.
num_csp_blocks (int) – Number of bottlenecks in CSPLayer. Defaults to 1.
freeze_all (bool) – Whether to freeze the model.
norm_cfg (dict) – Config dict for normalization layer. Defaults to dict(type=’BN’, momentum=0.03, eps=0.001).
act_cfg (dict) – Config dict for activation layer. Defaults to dict(type=’ReLU’, inplace=True).
block_cfg (dict) – Config dict for the block used to build each layer. Defaults to dict(type=’RepVGGBlock’).
block_act_cfg (dict) – Config dict for activation layer used in each stage. Defaults to dict(type=’SiLU’, inplace=True).
init_cfg (dict or list[dict], optional) – Initialization config dict. Defaults to None.

build_bottom_up_layer(idx: int) → torch.nn.modules.module.Module[source]¶

build bottom up layer.

Parameters: idx (int) – layer idx.
Returns: The bottom up layer.
Return type: nn.Module

build_top_down_layer(idx: int) → torch.nn.modules.module.Module[source]¶

build top down layer.

Parameters: idx (int) – layer idx.
Returns: The top down layer.
Return type: nn.Module

class mmyolo.models.necks.YOLOv6RepBiPAFPN(in_channels: List[int], out_channels: int, deepen_factor: float = 1.0, widen_factor: float = 1.0, num_csp_blocks: int = 12, freeze_all: bool = False, norm_cfg: Union[mmengine.config.config.ConfigDict, dict] = {'eps': 0.001, 'momentum': 0.03, 'type': 'BN'}, act_cfg: Union[mmengine.config.config.ConfigDict, dict] = {'inplace': True, 'type': 'ReLU'}, block_cfg: Union[mmengine.config.config.ConfigDict, dict] = {'type': 'RepVGGBlock'}, init_cfg: Optional[Union[mmengine.config.config.ConfigDict, dict, List[Union[dict, mmengine.config.config.ConfigDict]]]] = None)[source]¶

Path Aggregation Network used in YOLOv6 3.0.

Parameters

in_channels (List[int]) – Number of input channels per scale.
out_channels (int) – Number of output channels (used at each scale)
deepen_factor (float) – Depth multiplier, multiply number of blocks in CSP layer by this amount. Defaults to 1.0.
widen_factor (float) – Width multiplier, multiply number of channels in each layer by this amount. Defaults to 1.0.
num_csp_blocks (int) – Number of bottlenecks in CSPLayer. Defaults to 1.
freeze_all (bool) – Whether to freeze the model.
norm_cfg (dict) – Config dict for normalization layer. Defaults to dict(type=’BN’, momentum=0.03, eps=0.001).
act_cfg (dict) – Config dict for activation layer. Defaults to dict(type=’ReLU’, inplace=True).
block_cfg (dict) – Config dict for the block used to build each layer. Defaults to dict(type=’RepVGGBlock’).
init_cfg (dict or list[dict], optional) – Initialization config dict. Defaults to None.

build_top_down_layer(idx: int) → torch.nn.modules.module.Module[source]¶

build top down layer.

Parameters: idx (int) – layer idx.
Returns: The top down layer.
Return type: nn.Module

build_upsample_layer(idx: int) → torch.nn.modules.module.Module[source]¶

build upsample layer.

Parameters: idx (int) – layer idx.
Returns: The upsample layer.
Return type: nn.Module

forward(inputs: List[torch.Tensor]) → tuple[source]¶: Forward function.

class mmyolo.models.necks.YOLOv6RepPAFPN(in_channels: List[int], out_channels: int, deepen_factor: float = 1.0, widen_factor: float = 1.0, num_csp_blocks: int = 12, freeze_all: bool = False, norm_cfg: Union[mmengine.config.config.ConfigDict, dict] = {'eps': 0.001, 'momentum': 0.03, 'type': 'BN'}, act_cfg: Union[mmengine.config.config.ConfigDict, dict] = {'inplace': True, 'type': 'ReLU'}, block_cfg: Union[mmengine.config.config.ConfigDict, dict] = {'type': 'RepVGGBlock'}, init_cfg: Optional[Union[mmengine.config.config.ConfigDict, dict, List[Union[dict, mmengine.config.config.ConfigDict]]]] = None)[source]¶

Path Aggregation Network used in YOLOv6.

Parameters

in_channels (List[int]) – Number of input channels per scale.
out_channels (int) – Number of output channels (used at each scale)
deepen_factor (float) – Depth multiplier, multiply number of blocks in CSP layer by this amount. Defaults to 1.0.
widen_factor (float) – Width multiplier, multiply number of channels in each layer by this amount. Defaults to 1.0.
num_csp_blocks (int) – Number of bottlenecks in CSPLayer. Defaults to 1.
freeze_all (bool) – Whether to freeze the model.
norm_cfg (dict) – Config dict for normalization layer. Defaults to dict(type=’BN’, momentum=0.03, eps=0.001).
act_cfg (dict) – Config dict for activation layer. Defaults to dict(type=’ReLU’, inplace=True).
block_cfg (dict) – Config dict for the block used to build each layer. Defaults to dict(type=’RepVGGBlock’).
init_cfg (dict or list[dict], optional) – Initialization config dict. Defaults to None.

build_bottom_up_layer(idx: int) → torch.nn.modules.module.Module[source]¶

build bottom up layer.

Parameters: idx (int) – layer idx.
Returns: The bottom up layer.
Return type: nn.Module

build_downsample_layer(idx: int) → torch.nn.modules.module.Module[source]¶

build downsample layer.

Parameters: idx (int) – layer idx.
Returns: The downsample layer.
Return type: nn.Module

build_out_layer(*args, **kwargs) → torch.nn.modules.module.Module[source]¶: build out layer.

build_reduce_layer(idx: int) → torch.nn.modules.module.Module[source]¶

build reduce layer.

Parameters: idx (int) – layer idx.
Returns: The reduce layer.
Return type: nn.Module

build_top_down_layer(idx: int) → torch.nn.modules.module.Module[source]¶

build top down layer.

Parameters: idx (int) – layer idx.
Returns: The top down layer.
Return type: nn.Module

build_upsample_layer(idx: int) → torch.nn.modules.module.Module[source]¶

build upsample layer.

Parameters: idx (int) – layer idx.
Returns: The upsample layer.
Return type: nn.Module

init_weights()[source]¶: Initialize the weights.

class mmyolo.models.necks.YOLOv7PAFPN(in_channels: List[int], out_channels: List[int], block_cfg: dict = {'block_ratio': 0.25, 'middle_ratio': 0.5, 'num_blocks': 4, 'num_convs_in_block': 1, 'type': 'ELANBlock'}, deepen_factor: float = 1.0, widen_factor: float = 1.0, spp_expand_ratio: float = 0.5, is_tiny_version: bool = False, use_maxpool_in_downsample: bool = True, use_in_channels_in_downsample: bool = False, use_repconv_outs: bool = True, upsample_feats_cat_first: bool = False, freeze_all: bool = False, norm_cfg: Union[mmengine.config.config.ConfigDict, dict] = {'eps': 0.001, 'momentum': 0.03, 'type': 'BN'}, act_cfg: Union[mmengine.config.config.ConfigDict, dict] = {'inplace': True, 'type': 'SiLU'}, init_cfg: Optional[Union[mmengine.config.config.ConfigDict, dict, List[Union[dict, mmengine.config.config.ConfigDict]]]] = None)[source]¶

Path Aggregation Network used in YOLOv7.

Parameters

in_channels (List[int]) – Number of input channels per scale.
out_channels (int) – Number of output channels (used at each scale).
block_cfg (dict) – Config dict for block.
deepen_factor (float) – Depth multiplier, multiply number of blocks in CSP layer by this amount. Defaults to 1.0.
widen_factor (float) – Width multiplier, multiply number of channels in each layer by this amount. Defaults to 1.0.
spp_expand_ratio (float) – Expand ratio of SPPCSPBlock. Defaults to 0.5.
is_tiny_version (bool) – Is tiny version of neck. If True, it means it is a yolov7 tiny model. Defaults to False.
use_maxpool_in_downsample (bool) – Whether maxpooling is used in downsample layers. Defaults to True.
use_in_channels_in_downsample (bool) – MaxPoolAndStrideConvBlock module input parameters. Defaults to False.
use_repconv_outs (bool) – Whether to use repconv in the output layer. Defaults to True.
upsample_feats_cat_first (bool) – Whether the output features are concat first after upsampling in the topdown module. Defaults to True. Currently only YOLOv7 is false.
freeze_all (bool) – Whether to freeze the model. Defaults to False.
norm_cfg (dict) – Config dict for normalization layer. Defaults to dict(type=’BN’, momentum=0.03, eps=0.001).
act_cfg (dict) – Config dict for activation layer. Defaults to dict(type=’SiLU’, inplace=True).
init_cfg (dict or list[dict], optional) – Initialization config dict. Defaults to None.

build_bottom_up_layer(idx: int) → torch.nn.modules.module.Module[source]¶

build bottom up layer.

Parameters: idx (int) – layer idx.
Returns: The bottom up layer.
Return type: nn.Module

build_downsample_layer(idx: int) → torch.nn.modules.module.Module[source]¶

build downsample layer.

Parameters: idx (int) – layer idx.
Returns: The downsample layer.
Return type: nn.Module

build_out_layer(idx: int) → torch.nn.modules.module.Module[source]¶

build out layer.

Parameters: idx (int) – layer idx.
Returns: The out layer.
Return type: nn.Module

build_reduce_layer(idx: int) → torch.nn.modules.module.Module[source]¶

build reduce layer.

Parameters: idx (int) – layer idx.
Returns: The reduce layer.
Return type: nn.Module

build_top_down_layer(idx: int) → torch.nn.modules.module.Module[source]¶

build top down layer.

Parameters: idx (int) – layer idx.
Returns: The top down layer.
Return type: nn.Module

build_upsample_layer(idx: int) → torch.nn.modules.module.Module[source]¶: build upsample layer.

class mmyolo.models.necks.YOLOv8PAFPN(in_channels: List[int], out_channels: Union[List[int], int], deepen_factor: float = 1.0, widen_factor: float = 1.0, num_csp_blocks: int = 3, freeze_all: bool = False, norm_cfg: Union[mmengine.config.config.ConfigDict, dict] = {'eps': 0.001, 'momentum': 0.03, 'type': 'BN'}, act_cfg: Union[mmengine.config.config.ConfigDict, dict] = {'inplace': True, 'type': 'SiLU'}, init_cfg: Optional[Union[mmengine.config.config.ConfigDict, dict, List[Union[dict, mmengine.config.config.ConfigDict]]]] = None)[source]¶

Path Aggregation Network used in YOLOv8.

Parameters

in_channels (List[int]) – Number of input channels per scale.
out_channels (int) – Number of output channels (used at each scale)
deepen_factor (float) – Depth multiplier, multiply number of blocks in CSP layer by this amount. Defaults to 1.0.
widen_factor (float) – Width multiplier, multiply number of channels in each layer by this amount. Defaults to 1.0.
num_csp_blocks (int) – Number of bottlenecks in CSPLayer. Defaults to 1.
freeze_all (bool) – Whether to freeze the model
norm_cfg (dict) – Config dict for normalization layer. Defaults to dict(type=’BN’, momentum=0.03, eps=0.001).
act_cfg (dict) – Config dict for activation layer. Defaults to dict(type=’SiLU’, inplace=True).
init_cfg (dict or list[dict], optional) – Initialization config dict. Defaults to None.

build_bottom_up_layer(idx: int) → torch.nn.modules.module.Module[source]¶

build bottom up layer.

Parameters: idx (int) – layer idx.
Returns: The bottom up layer.
Return type: nn.Module

build_reduce_layer(idx: int) → torch.nn.modules.module.Module[source]¶

build reduce layer.

Parameters: idx (int) – layer idx.
Returns: The reduce layer.
Return type: nn.Module

build_top_down_layer(idx: int) → torch.nn.modules.module.Module[source]¶

build top down layer.

Parameters: idx (int) – layer idx.
Returns: The top down layer.
Return type: nn.Module

task_modules¶

class mmyolo.models.task_modules.BatchATSSAssigner(num_classes: int, iou_calculator: Union[mmengine.config.config.ConfigDict, dict] = {'type': 'mmdet.BboxOverlaps2D'}, topk: int = 9)[source]¶

Assign a batch of corresponding gt bboxes or background to each prior.

This code is based on https://github.com/meituan/YOLOv6/blob/main/yolov6/assigners/atss_assigner.py

Each proposal will be assigned with 0 or a positive integer indicating the ground truth index.

0: negative sample, no assigned gt
positive integer: positive sample, index (1-based) of assigned gt

Parameters

num_classes (int) – number of class
iou_calculator (ConfigDict or dict) – Config dict for iou calculator. Defaults to dict(type='BboxOverlaps2D')
topk (int) – number of priors selected in each level

forward(pred_bboxes: torch.Tensor, priors: torch.Tensor, num_level_priors: List, gt_labels: torch.Tensor, gt_bboxes: torch.Tensor, pad_bbox_flag: torch.Tensor) → dict[source]¶

Assign gt to priors.

The assignment is done in following steps

compute iou between all prior (prior of all pyramid levels) and gt
compute center distance between all prior and gt
on each pyramid level, for each gt, select k prior whose center are closest to the gt center, so we total select k*l prior as candidates for each gt
get corresponding iou for the these candidates, and compute the mean and std, set mean + std as the iou threshold
select these candidates whose iou are greater than or equal to the threshold as positive
limit the positive sample’s center in gt

Parameters

pred_bboxes (Tensor) – Predicted bounding boxes, shape(batch_size, num_priors, 4)
priors (Tensor) – Model priors with stride, shape(num_priors, 4)
num_level_priors (List) – Number of bboxes in each level, len(3)
gt_labels (Tensor) – Ground truth label, shape(batch_size, num_gt, 1)
gt_bboxes (Tensor) – Ground truth bbox, shape(batch_size, num_gt, 4)
pad_bbox_flag (Tensor) – Ground truth bbox mask, 1 means bbox, 0 means no bbox, shape(batch_size, num_gt, 1)

Returns

Assigned result

’assigned_labels’ (Tensor): shape(batch_size, num_gt) ‘assigned_bboxes’ (Tensor): shape(batch_size, num_gt, 4) ‘assigned_scores’ (Tensor):

shape(batch_size, num_gt, number_classes)

’fg_mask_pre_prior’ (Tensor): shape(bs, num_gt)

Return type

assigned_result (dict)

get_targets(gt_labels: torch.Tensor, gt_bboxes: torch.Tensor, assigned_gt_inds: torch.Tensor, fg_mask_pre_prior: torch.Tensor, num_priors: int, batch_size: int, num_gt: int) → Tuple[torch.Tensor, torch.Tensor, torch.Tensor][source]¶

Get target info.

Parameters

gt_labels (Tensor) – Ground true labels, shape(batch_size, num_gt, 1)
gt_bboxes (Tensor) – Ground true bboxes, shape(batch_size, num_gt, 4)
assigned_gt_inds (Tensor) – Assigned ground truth indexes, shape(batch_size, num_priors)
fg_mask_pre_prior (Tensor) – Force ground truth matching mask, shape(batch_size, num_priors)
num_priors (int) – Number of priors.
batch_size (int) – Batch size.
num_gt (int) – Number of ground truth.

Returns

Assigned labels,: shape(batch_size, num_priors)
assigned_bboxes (Tensor): Assigned bboxes,: shape(batch_size, num_priors)
assigned_scores (Tensor): Assigned scores,: shape(batch_size, num_priors)

Return type

assigned_labels (Tensor)

select_topk_candidates(distances: torch.Tensor, num_level_priors: List[int], pad_bbox_flag: torch.Tensor) → Tuple[torch.Tensor, torch.Tensor][source]¶

Selecting candidates based on the center distance.

Parameters

distances (Tensor) – Distance between all bbox and gt, shape(batch_size, num_gt, num_priors)
num_level_priors (List[int]) – Number of bboxes in each level, len(3)
pad_bbox_flag (Tensor) – Ground truth bbox mask, shape(batch_size, num_gt, 1)

Returns

Flag show that each level have: topk candidates or not, shape(batch_size, num_gt, num_priors)
candidate_idxs (Tensor): Candidates index,: shape(batch_size, num_gt, num_gt)

Return type

is_in_candidate_list (Tensor)

static threshold_calculator(is_in_candidate: List, candidate_idxs: torch.Tensor, overlaps: torch.Tensor, num_priors: int, batch_size: int, num_gt: int) → Tuple[torch.Tensor, torch.Tensor][source]¶

Get corresponding iou for the these candidates, and compute the mean and std, set mean + std as the iou threshold.

Parameters

is_in_candidate (Tensor) – Flag show that each level have topk candidates or not, shape(batch_size, num_gt, num_priors).
candidate_idxs (Tensor) – Candidates index, shape(batch_size, num_gt, num_gt)
overlaps (Tensor) – Overlaps area, shape(batch_size, num_gt, num_priors).
num_priors (int) – Number of priors.
batch_size (int) – Batch size.
num_gt (int) – Number of ground truth.

Returns

Overlap threshold of: per ground truth, shape(batch_size, num_gt, 1).
candidate_overlaps (Tensor): Candidate overlaps,: shape(batch_size, num_gt, num_priors).

Return type

overlaps_thr_per_gt (Tensor)

class mmyolo.models.task_modules.BatchTaskAlignedAssigner(num_classes: int, topk: int = 13, alpha: float = 1.0, beta: float = 6.0, eps: float = 1e-07, use_ciou: bool = False)[source]¶

This code referenced to https://github.com/meituan/YOLOv6/blob/main/yolov6/ assigners/tal_assigner.py. Batch Task aligned assigner base on the paper: TOOD: Task-aligned One-stage Object Detection.. Assign a corresponding gt bboxes or background to a batch of predicted bboxes. Each bbox will be assigned with 0 or a positive integer indicating the ground truth index. - 0: negative sample, no assigned gt - positive integer: positive sample, index (1-based) of assigned gt :param num_classes: number of class :type num_classes: int :param topk: number of bbox selected in each level :type topk: int :param alpha: Hyper-parameters related to alignment_metrics.

Defaults to 1.0

Parameters

beta (float) – Hyper-parameters related to alignment_metrics. Defaults to 6.
eps (float) – Eps to avoid log(0). Default set to 1e-9
use_ciou (bool) – Whether to use ciou while calculating iou. Defaults to False.

forward(pred_bboxes: torch.Tensor, pred_scores: torch.Tensor, priors: torch.Tensor, gt_labels: torch.Tensor, gt_bboxes: torch.Tensor, pad_bbox_flag: torch.Tensor) → dict[source]¶

Assign gt to bboxes.

The assignment is done in following steps 1. compute alignment metric between all bbox (bbox of all pyramid

levels) and gt

select top-k bbox as candidates for each gt
limit the positive sample’s center in gt (because the anchor-free detector only can predict positive distance)

Parameters

pred_bboxes (Tensor) – Predict bboxes, shape(batch_size, num_priors, 4)
pred_scores (Tensor) – Scores of predict bboxes, shape(batch_size, num_priors, num_classes)
priors (Tensor) – Model priors, shape (num_priors, 4)
gt_labels (Tensor) – Ground true labels, shape(batch_size, num_gt, 1)
gt_bboxes (Tensor) – Ground true bboxes, shape(batch_size, num_gt, 4)
pad_bbox_flag (Tensor) – Ground truth bbox mask, 1 means bbox, 0 means no bbox, shape(batch_size, num_gt, 1)

Returns

assigned_labels (Tensor): Assigned labels,: shape(batch_size, num_priors)
assigned_bboxes (Tensor): Assigned boxes,: shape(batch_size, num_priors, 4)
assigned_scores (Tensor): Assigned scores,: shape(batch_size, num_priors, num_classes)
fg_mask_pre_prior (Tensor): Force ground truth matching mask,: shape(batch_size, num_priors)

Return type

assigned_result (dict) Assigned result

get_box_metrics(pred_bboxes: torch.Tensor, pred_scores: torch.Tensor, gt_labels: torch.Tensor, gt_bboxes: torch.Tensor, batch_size: int, num_gt: int) → Tuple[torch.Tensor, torch.Tensor][source]¶

Compute alignment metric between all bbox and gt.

Parameters

pred_bboxes (Tensor) – Predict bboxes, shape(batch_size, num_priors, 4)
pred_scores (Tensor) – Scores of predict bbox, shape(batch_size, num_priors, num_classes)
gt_labels (Tensor) – Ground true labels, shape(batch_size, num_gt, 1)
gt_bboxes (Tensor) – Ground true bboxes, shape(batch_size, num_gt, 4)
batch_size (int) – Batch size.
num_gt (int) – Number of ground truth.

Returns

Align metric,: shape(batch_size, num_gt, num_priors)

overlaps (Tensor): Overlaps, shape(batch_size, num_gt, num_priors)

Return type

alignment_metrics (Tensor)

get_pos_mask(pred_bboxes: torch.Tensor, pred_scores: torch.Tensor, priors: torch.Tensor, gt_labels: torch.Tensor, gt_bboxes: torch.Tensor, pad_bbox_flag: torch.Tensor, batch_size: int, num_gt: int) → Tuple[torch.Tensor, torch.Tensor, torch.Tensor][source]¶

Get possible mask.

Parameters

pred_bboxes (Tensor) – Predict bboxes, shape(batch_size, num_priors, 4)
pred_scores (Tensor) – Scores of predict bbox, shape(batch_size, num_priors, num_classes)
priors (Tensor) – Model priors, shape (num_priors, 2)
gt_labels (Tensor) – Ground true labels, shape(batch_size, num_gt, 1)
gt_bboxes (Tensor) – Ground true bboxes, shape(batch_size, num_gt, 4)
pad_bbox_flag (Tensor) – Ground truth bbox mask, 1 means bbox, 0 means no bbox, shape(batch_size, num_gt, 1)
batch_size (int) – Batch size.
num_gt (int) – Number of ground truth.

Returns

Possible mask,: shape(batch_size, num_gt, num_priors)
alignment_metrics (Tensor): Alignment metrics,: shape(batch_size, num_gt, num_priors)
overlaps (Tensor): Overlaps of gt_bboxes and pred_bboxes,: shape(batch_size, num_gt, num_priors)

Return type

pos_mask (Tensor)

get_targets(gt_labels: torch.Tensor, gt_bboxes: torch.Tensor, assigned_gt_idxs: torch.Tensor, fg_mask_pre_prior: torch.Tensor, batch_size: int, num_gt: int) → Tuple[torch.Tensor, torch.Tensor, torch.Tensor][source]¶

Get assigner info.

Parameters

gt_labels (Tensor) – Ground true labels, shape(batch_size, num_gt, 1)
gt_bboxes (Tensor) – Ground true bboxes, shape(batch_size, num_gt, 4)
assigned_gt_idxs (Tensor) – Assigned ground truth indexes, shape(batch_size, num_priors)
fg_mask_pre_prior (Tensor) – Force ground truth matching mask, shape(batch_size, num_priors)
batch_size (int) – Batch size.
num_gt (int) – Number of ground truth.

Returns

Assigned labels,: shape(batch_size, num_priors)
assigned_bboxes (Tensor): Assigned bboxes,: shape(batch_size, num_priors)
assigned_scores (Tensor): Assigned scores,: shape(batch_size, num_priors)

Return type

assigned_labels (Tensor)

select_topk_candidates(alignment_gt_metrics: torch.Tensor, using_largest_topk: bool = True, topk_mask: Optional[torch.Tensor] = None) → torch.Tensor[source]¶

Compute alignment metric between all bbox and gt.

Parameters

alignment_gt_metrics (Tensor) – Alignment metric of gt candidates, shape(batch_size, num_gt, num_priors)
using_largest_topk (bool) – Controls whether to using largest or smallest elements.
topk_mask (Tensor) – Topk mask, shape(batch_size, num_gt, self.topk)

Returns

Topk candidates mask,: shape(batch_size, num_gt, num_priors)

Return type

Tensor

class mmyolo.models.task_modules.YOLOXBBoxCoder(use_box_type: bool = False, **kwargs)[source]¶

YOLOX BBox coder.

This decoder decodes pred bboxes (delta_x, delta_x, w, h) to bboxes (tl_x, tl_y, br_x, br_y).

decode(priors: torch.Tensor, pred_bboxes: torch.Tensor, stride: Union[torch.Tensor, int]) → torch.Tensor[source]¶

Decode regression results (delta_x, delta_x, w, h) to bboxes (tl_x, tl_y, br_x, br_y).

Parameters

priors (torch.Tensor) – Basic boxes or points, e.g. anchors.
pred_bboxes (torch.Tensor) – Encoded boxes with shape
stride (torch.Tensor | int) – Strides of bboxes.

Returns

Decoded boxes.

Return type

torch.Tensor

encode(**kwargs)[source]¶: Encode deltas between bboxes and ground truth boxes.

class mmyolo.models.task_modules.YOLOv5BBoxCoder(use_box_type: bool = False, **kwargs)[source]¶

YOLOv5 BBox coder.

This decoder decodes pred bboxes (delta_x, delta_x, w, h) to bboxes (tl_x, tl_y, br_x, br_y).

decode(priors: torch.Tensor, pred_bboxes: torch.Tensor, stride: Union[torch.Tensor, int]) → torch.Tensor[source]¶

Decode regression results (delta_x, delta_x, w, h) to bboxes (tl_x, tl_y, br_x, br_y).

Parameters

priors (torch.Tensor) – Basic boxes or points, e.g. anchors.
pred_bboxes (torch.Tensor) – Encoded boxes with shape
stride (torch.Tensor | int) – Strides of bboxes.

Returns

Decoded boxes.

Return type

torch.Tensor

encode(**kwargs)[source]¶: Encode deltas between bboxes and ground truth boxes.

utils¶

class mmyolo.models.utils.OutputSaveFunctionWrapper(func: Callable, spec: Optional[Dict])[source]¶

A class that wraps a function and saves its outputs.

This class can be used to decorate a function to save its outputs. It wraps the function with a __call__ method that calls the original function and saves the results in a log attribute. :param func: A function to wrap. :type func: Callable :param spec: A dictionary of global variables to use as the

namespace for the wrapper. If None, the global namespace of the original function is used.

class mmyolo.models.utils.OutputSaveObjectWrapper(obj: Any)[source]¶

A wrapper class that saves the output of function calls on an object.

clear()[source]¶: Clears the log of function call outputs.

mmyolo.models.utils.gt_instances_preprocess(batch_gt_instances: Union[torch.Tensor, Sequence], batch_size: int) → torch.Tensor[source]¶

Split batch_gt_instances with batch size.

From [all_gt_bboxes, box_dim+2] to [batch_size, number_gt, box_dim+1]. For horizontal box, box_dim=4, for rotated box, box_dim=5

If some shape of single batch smaller than gt bbox len, then using zeros to fill.

Parameters

batch_gt_instances (Sequence[Tensor]) – Ground truth instances for whole batch, shape [all_gt_bboxes, box_dim+2]
batch_size (int) – Batch size.

Returns

batch gt instances data, shape: [batch_size, number_gt, box_dim+1]

Return type

Tensor

mmyolo.models.utils.make_divisible(x: float, widen_factor: float = 1.0, divisor: int = 8) → int[source]¶: Make sure that x*widen_factor is divisible by divisor.

mmyolo.models.utils.make_round(x: float, deepen_factor: float = 1.0) → int[source]¶: Make sure that x*deepen_factor becomes an integer not less than 1.

mmyolo.datasets¶

datasets¶

transforms¶

mmyolo.engine¶

hooks¶

optimizers¶

mmyolo.models¶

backbones¶

data_preprocessor¶

dense_heads¶

detectors¶

layers¶

losses¶

necks¶

task_modules¶

utils¶

mmyolo.utils¶