


class mmyolo.datasets.BatchShapePolicy(batch_size: int = 32, img_size: int = 640, size_divisor: int = 32, extra_pad_ratio: float = 0.5)[源代码]

BatchShapePolicy is only used in the testing phase, which can reduce the number of pad pixels during batch inference.

  • batch_size (int) – Single GPU batch size during batch inference. Defaults to 32.

  • img_size (int) – Expected output image size. Defaults to 640.

  • size_divisor (int) – The minimum size that is divisible by size_divisor. Defaults to 32.

  • extra_pad_ratio (float) – Extra pad ratio. Defaults to 0.5.

class mmyolo.datasets.YOLOv5CocoDataset(*args, batch_shapes_cfg: Optional[dict] = None, **kwargs)[源代码]

Dataset for YOLOv5 COCO Dataset.

We only add BatchShapePolicy function compared with CocoDataset. See mmyolo/datasets/ for details

class mmyolo.datasets.YOLOv5CrowdHumanDataset(*args, batch_shapes_cfg: Optional[dict] = None, **kwargs)[源代码]

Dataset for YOLOv5 CrowdHuman Dataset.

We only add BatchShapePolicy function compared with CrowdHumanDataset. See mmyolo/datasets/ for details

class mmyolo.datasets.YOLOv5DOTADataset(*args, **kwargs)[源代码]

Dataset for YOLOv5 DOTA Dataset.

We only add BatchShapePolicy function compared with DOTADataset. See mmyolo/datasets/ for details

class mmyolo.datasets.YOLOv5VOCDataset(*args, batch_shapes_cfg: Optional[dict] = None, **kwargs)[源代码]

Dataset for YOLOv5 VOC Dataset.

We only add BatchShapePolicy function compared with VOCDataset. See mmyolo/datasets/ for details

mmyolo.datasets.yolov5_collate(data_batch: Sequence, use_ms_training: bool = False)dict[源代码]

Rewrite collate_fn to get faster training speed.

  • data_batch (Sequence) – Batch of data.

  • use_ms_training (bool) – Whether to use multi-scale training.


class mmyolo.datasets.transforms.FilterAnnotations(by_keypoints: bool = False, **kwargs)[源代码]

Filter invalid annotations.

In addition to the conditions checked by FilterDetAnnotations, this filter adds a new condition requiring instances to have at least one visible keypoints.

class mmyolo.datasets.transforms.LetterResize(scale: Union[int, Tuple[int, int]], pad_val: dict = {'img': 0, 'mask': 0, 'seg': 255}, use_mini_pad: bool = False, stretch_only: bool = False, allow_scale_up: bool = True, half_pad_param: bool = False, **kwargs)[源代码]

Resize and pad image while meeting stride-multiple constraints.

Required Keys:

  • img (np.uint8)

  • batch_shape (np.int64) (optional)

Modified Keys:

  • img (np.uint8)

  • img_shape (tuple)

  • gt_bboxes (optional)

Added Keys: - pad_param (np.float32)

  • scale (Union[int, Tuple[int, int]]) – Images scales for resizing.

  • pad_val (dict) – Padding value. Defaults to dict(img=0, seg=255).

  • use_mini_pad (bool) – Whether using minimum rectangle padding. Defaults to True

  • stretch_only (bool) – Whether stretch to the specified size directly. Defaults to False

  • allow_scale_up (bool) – Allow scale up when ratio > 1. Defaults to True

  • half_pad_param (bool) – If set to True, left and right pad_param will be given by dividing padding_h by 2. If set to False, pad_param is in int format. We recommend setting this to False for object detection tasks, and True for instance segmentation tasks. Default to False.

transform(results: dict)dict[源代码]

Transform function to resize images, bounding boxes, semantic segmentation map and keypoints.


results (dict) – Result dict from loading pipeline.


Resized results, ‘img’, ‘gt_bboxes’, ‘gt_seg_map’, ‘gt_keypoints’, ‘scale’, ‘scale_factor’, ‘img_shape’, and ‘keep_ratio’ keys are updated in result dict.



class mmyolo.datasets.transforms.LoadAnnotations(mask2bbox: bool = False, poly2mask: bool = False, merge_polygons: bool = True, **kwargs)[源代码]

Because the yolo series does not need to consider ignore bboxes for the time being, in order to speed up the pipeline, it can be excluded in advance.

  • mask2bbox (bool) – Whether to use mask annotation to get bbox. Defaults to False.

  • poly2mask (bool) – Whether to transform the polygons to bitmaps. Defaults to False.

  • merge_polygons (bool) – Whether to merge polygons into one polygon. If merged, the storage structure is simpler and training is more effcient, especially if the mask inside a bbox is divided into multiple polygons. Defaults to True.

merge_multi_segment(gt_masks: List[numpy.ndarray])List[numpy.ndarray][源代码]

Merge multi segments to one list.

Find the coordinates with min distance between each segment, then connect these coordinates with one thin line to merge all segments into one. :param gt_masks: original segmentations in coco’s json file.

like [segmentation1, segmentation2,…], each segmentation is a list of coordinates.


merged gt_masks



min_index(arr1: numpy.ndarray, arr2: numpy.ndarray)Tuple[int, int][源代码]

Find a pair of indexes with the shortest distance.

  • arr1 – (N, 2).

  • arr2 – (M, 2).


a pair of indexes.



transform(results: dict)dict[源代码]

Function to load multiple types annotations.


results (dict) – Result dict from :obj:mmengine.BaseDataset.


The dict contains loaded bounding box, label and semantic segmentation.



class mmyolo.datasets.transforms.Mosaic(img_scale: Tuple[int, int] = (640, 640), center_ratio_range: Tuple[float, float] = (0.5, 1.5), bbox_clip_border: bool = True, pad_val: float = 114.0, pre_transform: Optional[Sequence[dict]] = None, prob: float = 1.0, use_cached: bool = False, max_cached_images: int = 40, random_pop: bool = True, max_refetch: int = 15)[源代码]

Mosaic augmentation.

Given 4 images, mosaic transform combines them into one output image. The output image is composed of the parts from each sub- image.

                   mosaic transform
           |       pad        |           |
           |      +-----------+    pad    |
           |      |           |           |
           |      |  image1   +-----------+
           |      |           |           |
           |      |           |   image2  |
center_y   |----+-+-----------+-----------+
           |    |   cropped   |           |
           |pad |   image3    |   image4  |
           |    |             |           |
                |             |

The mosaic transform steps are as follows:

    1. Choose the mosaic center as the intersections of 4 images
    2. Get the left top image according to the index, and randomly
       sample another 3 images from the custom dataset.
    3. Sub image will be cropped if image is larger than mosaic patch

Required Keys:

  • img

  • gt_bboxes (BaseBoxes[torch.float32]) (optional)

  • gt_bboxes_labels (np.int64) (optional)

  • gt_ignore_flags (bool) (optional)

  • mix_results (List[dict])

Modified Keys:

  • img

  • img_shape

  • gt_bboxes (optional)

  • gt_bboxes_labels (optional)

  • gt_ignore_flags (optional)

  • img_scale (Sequence[int]) – Image size after mosaic pipeline of single image. The shape order should be (width, height). Defaults to (640, 640).

  • center_ratio_range (Sequence[float]) – Center ratio range of mosaic output. Defaults to (0.5, 1.5).

  • bbox_clip_border (bool, optional) – Whether to clip the objects outside the border of the image. In some dataset like MOT17, the gt bboxes are allowed to cross the border of images. Therefore, we don’t need to clip the gt bboxes in these cases. Defaults to True.

  • pad_val (int) – Pad value. Defaults to 114.

  • pre_transform (Sequence[dict]) – Sequence of transform object or config dict to be composed.

  • prob (float) – Probability of applying this transformation. Defaults to 1.0.

  • use_cached (bool) – Whether to use cache. Defaults to False.

  • max_cached_images (int) – The maximum length of the cache. The larger the cache, the stronger the randomness of this transform. As a rule of thumb, providing 10 caches for each image suffices for randomness. Defaults to 40.

  • random_pop (bool) – Whether to randomly pop a result from the cache when the cache is full. If set to False, use FIFO popping method. Defaults to True.

  • max_refetch (int) – The maximum number of retry iterations for getting valid results from the pipeline. If the number of iterations is greater than max_refetch, but results is still None, then the iteration is terminated and raise the error. Defaults to 15.

get_indexes(dataset: Union[mmengine.dataset.base_dataset.BaseDataset, list])list[源代码]

Call function to collect indexes.


dataset (Dataset or list) – The dataset or cached list.





mix_img_transform(results: dict)dict[源代码]

Mixed image data transformation.


results (dict) – Result dict.


Updated result dict.


results (dict)

class mmyolo.datasets.transforms.Mosaic9(img_scale: Tuple[int, int] = (640, 640), bbox_clip_border: bool = True, pad_val: Union[float, int] = 114.0, pre_transform: Optional[Sequence[dict]] = None, prob: float = 1.0, use_cached: bool = False, max_cached_images: int = 50, random_pop: bool = True, max_refetch: int = 15)[源代码]

Mosaic9 augmentation.

Given 9 images, mosaic transform combines them into one output image. The output image is composed of the parts from each sub- image.

           | pad           |      pad      |            |
           |    +----------+               |            |
           |    |          +---------------+  top_right |
           |    |          |      top      |   image2   |
           |    | top_left |     image1    |            |
           |    |  image8  o--------+------+--------+---+
           |    |          |        |               |   |
           +----+----------+        |     right     |pad|
           |               | center |     image3    |   |
           |     left      | image0 +---------------+---|
           |    image7     |        |               |   |
       +---+-----------+---+--------+               |   |
       |   |  cropped  |            |  bottom_right |pad|
       |   |bottom_left|            |    image4     |   |
       |   |  image6   |   bottom   |               |   |
       +---|-----------+   image5   +---------------+---|
           |    pad    |            |        pad        |

The mosaic transform steps are as follows:

    1. Get the center image according to the index, and randomly
       sample another 8 images from the custom dataset.
    2. Randomly offset the image after Mosaic

Required Keys:

  • img

  • gt_bboxes (BaseBoxes[torch.float32]) (optional)

  • gt_bboxes_labels (np.int64) (optional)

  • gt_ignore_flags (bool) (optional)

  • mix_results (List[dict])

Modified Keys:

  • img

  • img_shape

  • gt_bboxes (optional)

  • gt_bboxes_labels (optional)

  • gt_ignore_flags (optional)

  • img_scale (Sequence[int]) – Image size after mosaic pipeline of single image. The shape order should be (width, height). Defaults to (640, 640).

  • bbox_clip_border (bool, optional) – Whether to clip the objects outside the border of the image. In some dataset like MOT17, the gt bboxes are allowed to cross the border of images. Therefore, we don’t need to clip the gt bboxes in these cases. Defaults to True.

  • pad_val (int) – Pad value. Defaults to 114.

  • pre_transform (Sequence[dict]) – Sequence of transform object or config dict to be composed.

  • prob (float) – Probability of applying this transformation. Defaults to 1.0.

  • use_cached (bool) – Whether to use cache. Defaults to False.

  • max_cached_images (int) – The maximum length of the cache. The larger the cache, the stronger the randomness of this transform. As a rule of thumb, providing 5 caches for each image suffices for randomness. Defaults to 50.

  • random_pop (bool) – Whether to randomly pop a result from the cache when the cache is full. If set to False, use FIFO popping method. Defaults to True.

  • max_refetch (int) – The maximum number of retry iterations for getting valid results from the pipeline. If the number of iterations is greater than max_refetch, but results is still None, then the iteration is terminated and raise the error. Defaults to 15.

get_indexes(dataset: Union[mmengine.dataset.base_dataset.BaseDataset, list])list[源代码]

Call function to collect indexes.


dataset (Dataset or list) – The dataset or cached list.





mix_img_transform(results: dict)dict[源代码]

Mixed image data transformation.


results (dict) – Result dict.


Updated result dict.


results (dict)

class mmyolo.datasets.transforms.PPYOLOERandomCrop(aspect_ratio: List[float] = [0.5, 2.0], thresholds: List[float] = [0.0, 0.1, 0.3, 0.5, 0.7, 0.9], scaling: List[float] = [0.3, 1.0], num_attempts: int = 50, allow_no_crop: bool = True, cover_all_box: bool = False)[源代码]

Random crop the img and bboxes. Different thresholds are used in PPYOLOE to judge whether the clipped image meets the requirements. This implementation is different from the implementation of RandomCrop in mmdet.

Required Keys:

  • img

  • gt_bboxes (BaseBoxes[torch.float32]) (optional)

  • gt_bboxes_labels (np.int64) (optional)

  • gt_ignore_flags (bool) (optional)

Modified Keys:

  • img

  • img_shape

  • gt_bboxes (optional)

  • gt_bboxes_labels (optional)

  • gt_ignore_flags (optional)

Added Keys: - pad_param (np.float32)

  • aspect_ratio (List[float]) – Aspect ratio of cropped region. Default to [.5, 2].

  • thresholds (List[float]) – Iou thresholds for deciding a valid bbox crop in [min, max] format. Defaults to [.0, .1, .3, .5, .7, .9].

  • scaling (List[float]) – Ratio between a cropped region and the original image in [min, max] format. Default to [.3, 1.].

  • num_attempts (int) – Number of tries for each threshold before giving up. Default to 50.

  • allow_no_crop (bool) – Allow return without actually cropping them. Default to True.

  • cover_all_box (bool) – Ensure all bboxes are covered in the final crop. Default to False.

class mmyolo.datasets.transforms.PPYOLOERandomDistort(hue_cfg: dict = {'max': 18, 'min': - 18, 'prob': 0.5}, saturation_cfg: dict = {'max': 1.5, 'min': 0.5, 'prob': 0.5}, contrast_cfg: dict = {'max': 1.5, 'min': 0.5, 'prob': 0.5}, brightness_cfg: dict = {'max': 1.5, 'min': 0.5, 'prob': 0.5}, num_distort_func: int = 4)[源代码]

Random hue, saturation, contrast and brightness distortion.

Required Keys:

  • img

Modified Keys:

  • img (np.float32)

  • hue_cfg (dict) – Hue settings. Defaults to dict(min=-18, max=18, prob=0.5).

  • saturation_cfg (dict) – Saturation settings. Defaults to dict( min=0.5, max=1.5, prob=0.5).

  • contrast_cfg (dict) – Contrast settings. Defaults to dict( min=0.5, max=1.5, prob=0.5).

  • brightness_cfg (dict) – Brightness settings. Defaults to dict( min=0.5, max=1.5, prob=0.5).

  • num_distort_func (int) – The number of distort function. Defaults to 4.

transform(results: dict)dict[源代码]

The hue, saturation, contrast and brightness distortion function.


results (dict) – The result dict.


The result dict.




Transform brightness randomly.


Transform contrast randomly.


Transform hue randomly.


Transform saturation randomly.

class mmyolo.datasets.transforms.PackDetInputs(meta_keys=('img_id', 'img_path', 'ori_shape', 'img_shape', 'scale_factor', 'flip', 'flip_direction'))[源代码]

Pack the inputs data for the detection / semantic segmentation / panoptic segmentation.

Compared to mmdet, we just add the gt_panoptic_seg field and logic.

transform(results: dict)dict[源代码]

Method to pack the input data. :param results: Result dict from the data pipeline. :type results: dict


  • ‘inputs’ (obj:torch.Tensor): The forward data of models.

  • ’data_sample’ (obj:DetDataSample): The annotation info of the




class mmyolo.datasets.transforms.Polygon2Mask(downsample_ratio: int = 4, mask_overlap: bool = True, coco_style: bool = False)[源代码]

Polygons to bitmaps in YOLOv5.

  • downsample_ratio (int) – Downsample ratio of mask.

  • mask_overlap (bool) – Whether to use maskoverlap in mask process. When set to True, the implementation here is the same as the official, with higher training speed. If set to True, all gt masks will compress into one overlap mask, the value of mask indicates the index of gt masks. If set to False, one mask is a binary mask. Default to True.

  • coco_style (bool) – Whether to use coco_style to convert the polygons to bitmaps. Note that this option is only used to test if there is an improvement in training speed and we recommend setting it to False.

polygon2mask(img_shape: Tuple[int, int], polygons: numpy.ndarray, color: int = 1)numpy.ndarray[源代码]
  • img_shape (tuple) – The image size.

  • polygons (np.ndarray) – [N, M], N is the number of polygons, M is the number of points(Be divided by 2).

  • color (int) – color in fillPoly.


the overlap mask.



polygons2masks(img_shape: Tuple[int, int], polygons: mmdet.structures.mask.structures.PolygonMasks, color: int = 1)numpy.ndarray[源代码]

Return a list of bitmap masks.

  • img_shape (tuple) – The image size.

  • polygons (PolygonMasks) – The mask annotations.

  • color (int) – color in fillPoly.


the list of masks in bitmaps.



polygons2masks_overlap(img_shape: Tuple[int, int], polygons: mmdet.structures.mask.structures.PolygonMasks)Tuple[numpy.ndarray, numpy.ndarray][源代码]

Return a overlap mask and the sorted idx of area.

  • img_shape (tuple) – The image size.

  • polygons (PolygonMasks) – The mask annotations.

  • color (int) – color in fillPoly.


the overlap mask and the sorted idx of area.


Tuple[np.ndarray, np.ndarray]

transform(results: dict)dict[源代码]

The transform function. All subclass of BaseTransform should override this method.

This function takes the result dict as the input, and can add new items to the dict or modify existing items in the dict. And the result dict will be returned in the end, which allows to concate multiple transforms into a pipeline.


results (dict) – The result dict.


The result dict.



class mmyolo.datasets.transforms.RandomAffine(**kwargs)[源代码]
class mmyolo.datasets.transforms.RandomFlip(prob: Optional[Union[float, Iterable[float]]] = None, direction: Union[str, Sequence[Optional[str]]] = 'horizontal', swap_seg_labels: Optional[Sequence] = None)[源代码]
class mmyolo.datasets.transforms.RegularizeRotatedBox(angle_version='le90')[源代码]

Regularize rotated boxes.

Due to the angle periodicity, one rotated box can be represented in many different (x, y, w, h, t). To make each rotated box unique, regularize_boxes will take the remainder of the angle divided by 180 degrees.

For convenience, three angle_version can be used here:

  • ‘oc’: OpenCV Definition. Has the same box representation as

    cv2.minAreaRect the angle ranges in [-90, 0).

  • ‘le90’: Long Edge Definition (90). the angle ranges in [-90, 90).

    The width is always longer than the height.

  • ‘le135’: Long Edge Definition (135). the angle ranges in [-45, 135).

    The width is always longer than the height.

Required Keys:

  • gt_bboxes (RotatedBoxes[torch.float32])

Modified Keys:

  • gt_bboxes


angle_version (str) – Angle version. Can only be ‘oc’, ‘le90’, or ‘le135’. Defaults to ‘le90.

transform(results: dict)dict[源代码]

The transform function. All subclass of BaseTransform should override this method.

This function takes the result dict as the input, and can add new items to the dict or modify existing items in the dict. And the result dict will be returned in the end, which allows to concate multiple transforms into a pipeline.


results (dict) – The result dict.


The result dict.



class mmyolo.datasets.transforms.RemoveDataElement(keys: Union[str, Sequence[str]])[源代码]

Remove unnecessary data element in results.


keys (Union[str, Sequence[str]]) – Keys need to be removed.

transform(results: dict)dict[源代码]

The transform function. All subclass of BaseTransform should override this method.

This function takes the result dict as the input, and can add new items to the dict or modify existing items in the dict. And the result dict will be returned in the end, which allows to concate multiple transforms into a pipeline.


results (dict) – The result dict.


The result dict.



class mmyolo.datasets.transforms.Resize(scale: Optional[Union[int, Tuple[int, int]]] = None, scale_factor: Optional[Union[float, Tuple[float, float]]] = None, keep_ratio: bool = False, clip_object_border: bool = True, backend: str = 'cv2', interpolation='bilinear')[源代码]
class mmyolo.datasets.transforms.YOLOXMixUp(img_scale: Tuple[int, int] = (640, 640), ratio_range: Tuple[float, float] = (0.5, 1.5), flip_ratio: float = 0.5, pad_val: float = 114.0, bbox_clip_border: bool = True, pre_transform: Optional[Sequence[dict]] = None, prob: float = 1.0, use_cached: bool = False, max_cached_images: int = 20, random_pop: bool = True, max_refetch: int = 15)[源代码]

MixUp data augmentation for YOLOX.

         mixup transform
| mixup image   |              |
|      +--------|--------+     |
|      |        |        |     |
+---------------+        |     |
|      |                 |     |
|      |      image      |     |
|      |                 |     |
|      |                 |     |
|      +-----------------+     |
|             pad              |

The mixup transform steps are as follows:

  1. Another random image is picked by dataset and embedded in the top left patch(after padding and resizing)

  2. The target of mixup transform is the weighted average of mixup image and origin image.

Required Keys:

  • img

  • gt_bboxes (BaseBoxes[torch.float32]) (optional)

  • gt_bboxes_labels (np.int64) (optional)

  • gt_ignore_flags (bool) (optional)

  • mix_results (List[dict])

Modified Keys:

  • img

  • img_shape

  • gt_bboxes (optional)

  • gt_bboxes_labels (optional)

  • gt_ignore_flags (optional)

  • img_scale (Sequence[int]) – Image output size after mixup pipeline. The shape order should be (width, height). Defaults to (640, 640).

  • ratio_range (Sequence[float]) – Scale ratio of mixup image. Defaults to (0.5, 1.5).

  • flip_ratio (float) – Horizontal flip ratio of mixup image. Defaults to 0.5.

  • pad_val (int) – Pad value. Defaults to 114.

  • bbox_clip_border (bool, optional) – Whether to clip the objects outside the border of the image. In some dataset like MOT17, the gt bboxes are allowed to cross the border of images. Therefore, we don’t need to clip the gt bboxes in these cases. Defaults to True.

  • pre_transform (Sequence[dict]) – Sequence of transform object or config dict to be composed.

  • prob (float) – Probability of applying this transformation. Defaults to 1.0.

  • use_cached (bool) – Whether to use cache. Defaults to False.

  • max_cached_images (int) – The maximum length of the cache. The larger the cache, the stronger the randomness of this transform. As a rule of thumb, providing 10 caches for each image suffices for randomness. Defaults to 20.

  • random_pop (bool) – Whether to randomly pop a result from the cache when the cache is full. If set to False, use FIFO popping method. Defaults to True.

  • max_refetch (int) – The maximum number of iterations. If the number of iterations is greater than max_refetch, but gt_bbox is still empty, then the iteration is terminated. Defaults to 15.

get_indexes(dataset: Union[mmengine.dataset.base_dataset.BaseDataset, list])int[源代码]

Call function to collect indexes.


dataset (Dataset or list) – The dataset or cached list.





mix_img_transform(results: dict)dict[源代码]

YOLOX MixUp transform function.


results (dict) – Result dict.


Updated result dict.


results (dict)

class mmyolo.datasets.transforms.YOLOv5CopyPaste(ioa_thresh: float = 0.3, prob: float = 0.5)[源代码]

Copy-Paste used in YOLOv5 and YOLOv8.

This transform randomly copy some objects in the image to the mirror position of the image.It is different from the CopyPaste in mmdet.

Required Keys:

  • img (np.uint8)

  • gt_bboxes (BaseBoxes[torch.float32])

  • gt_bboxes_labels (np.int64) (optional)

  • gt_ignore_flags (bool) (optional)

  • gt_masks (PolygonMasks) (optional)

Modified Keys:

  • img

  • gt_bboxes

  • gt_bboxes_labels (np.int64) (optional)

  • gt_ignore_flags (optional)

  • gt_masks (optional)

  • ioa_thresh (float) – Ioa thresholds for deciding valid bbox.

  • prob (float) – Probability of choosing objects. Defaults to 0.5.

static bbox_ioa(gt_bboxes_flip: mmdet.structures.bbox.horizontal_boxes.HorizontalBoxes, gt_bboxes: mmdet.structures.bbox.horizontal_boxes.HorizontalBoxes, eps: float = 1e-07)numpy.ndarray[源代码]

Calculate ioa between gt_bboxes_flip and gt_bboxes.

  • gt_bboxes_flip (HorizontalBoxes) – Flipped ground truth bounding boxes.

  • gt_bboxes (HorizontalBoxes) – Ground truth bounding boxes.

  • eps (float) – Default to 1e-10.





class mmyolo.datasets.transforms.YOLOv5HSVRandomAug(hue_delta: Union[int, float] = 0.015, saturation_delta: Union[int, float] = 0.7, value_delta: Union[int, float] = 0.4)[源代码]

Apply HSV augmentation to image sequentially.

Required Keys:

  • img

Modified Keys:

  • img

  • hue_delta ([int, float]) – delta of hue. Defaults to 0.015.

  • saturation_delta ([int, float]) – delta of saturation. Defaults to 0.7.

  • value_delta ([int, float]) – delta of value. Defaults to 0.4.

transform(results: dict)dict[源代码]

The HSV augmentation transform function.


results (dict) – The result dict.


The result dict.



class mmyolo.datasets.transforms.YOLOv5KeepRatioResize(scale: Union[int, Tuple[int, int]], keep_ratio: bool = True, **kwargs)[源代码]

Resize images & bbox(if existed).

This transform resizes the input image according to scale. Bboxes (if existed) are then resized with the same scale factor.

Required Keys:

  • img (np.uint8)

  • gt_bboxes (BaseBoxes[torch.float32]) (optional)

Modified Keys:

  • img (np.uint8)

  • img_shape (tuple)

  • gt_bboxes (optional)

  • scale (float)

Added Keys:

  • scale_factor (np.float32)


scale (Union[int, Tuple[int, int]]) – Images scales for resizing.

class mmyolo.datasets.transforms.YOLOv5MixUp(alpha: float = 32.0, beta: float = 32.0, pre_transform: Optional[Sequence[dict]] = None, prob: float = 1.0, use_cached: bool = False, max_cached_images: int = 20, random_pop: bool = True, max_refetch: int = 15)[源代码]

MixUp data augmentation for YOLOv5.

The mixup transform steps are as follows:

  1. Another random image is picked by dataset.

  2. Randomly obtain the fusion ratio from the beta distribution,

    then fuse the target

of the original image and mixup image through this ratio.

Required Keys:

  • img

  • gt_bboxes (BaseBoxes[torch.float32]) (optional)

  • gt_bboxes_labels (np.int64) (optional)

  • gt_ignore_flags (bool) (optional)

  • mix_results (List[dict])

Modified Keys:

  • img

  • img_shape

  • gt_bboxes (optional)

  • gt_bboxes_labels (optional)

  • gt_ignore_flags (optional)

  • alpha (float) – parameter of beta distribution to get mixup ratio. Defaults to 32.

  • beta (float) – parameter of beta distribution to get mixup ratio. Defaults to 32.

  • pre_transform (Sequence[dict]) – Sequence of transform object or config dict to be composed.

  • prob (float) – Probability of applying this transformation. Defaults to 1.0.

  • use_cached (bool) – Whether to use cache. Defaults to False.

  • max_cached_images (int) – The maximum length of the cache. The larger the cache, the stronger the randomness of this transform. As a rule of thumb, providing 10 caches for each image suffices for randomness. Defaults to 20.

  • random_pop (bool) – Whether to randomly pop a result from the cache when the cache is full. If set to False, use FIFO popping method. Defaults to True.

  • max_refetch (int) – The maximum number of iterations. If the number of iterations is greater than max_refetch, but gt_bbox is still empty, then the iteration is terminated. Defaults to 15.

get_indexes(dataset: Union[mmengine.dataset.base_dataset.BaseDataset, list])int[源代码]

Call function to collect indexes.


dataset (Dataset or list) – The dataset or cached list.





mix_img_transform(results: dict)dict[源代码]

YOLOv5 MixUp transform function.


results (dict) – Result dict


Updated result dict.


results (dict)

class mmyolo.datasets.transforms.YOLOv5RandomAffine(max_rotate_degree: float = 10.0, max_translate_ratio: float = 0.1, scaling_ratio_range: Tuple[float, float] = (0.5, 1.5), max_shear_degree: float = 2.0, border: Tuple[int, int] = (0, 0), border_val: Tuple[int, int, int] = (114, 114, 114), bbox_clip_border: bool = True, min_bbox_size: int = 2, min_area_ratio: float = 0.1, use_mask_refine: bool = False, max_aspect_ratio: float = 20.0, resample_num: int = 1000)[源代码]

Random affine transform data augmentation in YOLOv5 and YOLOv8. It is different from the implementation in YOLOX.

This operation randomly generates affine transform matrix which including rotation, translation, shear and scaling transforms. If you set use_mask_refine == True, the code will use the masks annotation to refine the bbox. Our implementation is slightly different from the official. In COCO dataset, a gt may have multiple mask tags. The official YOLOv5 annotation file already combines the masks that an object has, but our code takes into account the fact that an object has multiple masks.

Required Keys:

  • img

  • gt_bboxes (BaseBoxes[torch.float32]) (optional)

  • gt_bboxes_labels (np.int64) (optional)

  • gt_ignore_flags (bool) (optional)

  • gt_masks (PolygonMasks) (optional)

Modified Keys:

  • img

  • img_shape

  • gt_bboxes (optional)

  • gt_bboxes_labels (optional)

  • gt_ignore_flags (optional)

  • gt_masks (PolygonMasks) (optional)

  • max_rotate_degree (float) – Maximum degrees of rotation transform. Defaults to 10.

  • max_translate_ratio (float) – Maximum ratio of translation. Defaults to 0.1.

  • scaling_ratio_range (tuple[float]) – Min and max ratio of scaling transform. Defaults to (0.5, 1.5).

  • max_shear_degree (float) – Maximum degrees of shear transform. Defaults to 2.

  • border (tuple[int]) – Distance from width and height sides of input image to adjust output shape. Only used in mosaic dataset. Defaults to (0, 0).

  • border_val (tuple[int]) – Border padding values of 3 channels. Defaults to (114, 114, 114).

  • bbox_clip_border (bool, optional) – Whether to clip the objects outside the border of the image. In some dataset like MOT17, the gt bboxes are allowed to cross the border of images. Therefore, we don’t need to clip the gt bboxes in these cases. Defaults to True.

  • min_bbox_size (float) – Width and height threshold to filter bboxes. If the height or width of a box is smaller than this value, it will be removed. Defaults to 2.

  • min_area_ratio (float) – Threshold of area ratio between original bboxes and wrapped bboxes. If smaller than this value, the box will be removed. Defaults to 0.1.

  • use_mask_refine (bool) – Whether to refine bbox by mask. Deprecated.

  • max_aspect_ratio (float) – Aspect ratio of width and height threshold to filter bboxes. If max(h/w, w/h) larger than this value, the box will be removed. Defaults to 20.

  • resample_num (int) – Number of poly to resample to.

clip_polygons(gt_masks: mmdet.structures.mask.structures.PolygonMasks, height: int, width: int)mmdet.structures.mask.structures.PolygonMasks[源代码]

Function to clip points of polygons with height and width.

  • gt_masks (PolygonMasks) – Annotations of instance segmentation.

  • height (int) – height of clip border.

  • width (int) – width of clip border.


Clip annotations of instance segmentation.


clipped_masks (PolygonMasks)

filter_gt_bboxes(origin_bboxes: mmdet.structures.bbox.horizontal_boxes.HorizontalBoxes, wrapped_bboxes: mmdet.structures.bbox.horizontal_boxes.HorizontalBoxes)torch.Tensor[源代码]

Filter gt bboxes.

  • origin_bboxes (HorizontalBoxes) – Origin bboxes.

  • wrapped_bboxes (HorizontalBoxes) – Wrapped bboxes


The result dict.



resample_masks(gt_masks: mmdet.structures.mask.structures.PolygonMasks)mmdet.structures.mask.structures.PolygonMasks[源代码]

Function to resample each mask annotation with shape (2 * n, ) to shape (resample_num * 2, ).


gt_masks (PolygonMasks) – Annotations of semantic segmentation.

segment2box(gt_masks: mmdet.structures.mask.structures.PolygonMasks, height: int, width: int)mmdet.structures.bbox.horizontal_boxes.HorizontalBoxes[源代码]

Convert 1 segment label to 1 box label, applying inside-image constraint i.e. (xy1, xy2, …) to (xyxy) :param gt_masks: the segment label :type gt_masks: torch.Tensor :param width: the width of the image. Defaults to 640 :type width: int :param height: The height of the image. Defaults to 640 :type height: int


the clip bboxes from gt_masks.



warp_mask(gt_masks: mmdet.structures.mask.structures.PolygonMasks, warp_matrix: numpy.ndarray, img_w: int, img_h: int)mmdet.structures.mask.structures.PolygonMasks[源代码]

Warp masks by warp_matrix and retain masks inside image after warping.

  • gt_masks (PolygonMasks) – Annotations of semantic segmentation.

  • warp_matrix (np.ndarray) – Affine transformation matrix. Shape: (3, 3).

  • img_w (int) – Width of output image.

  • img_h (int) – Height of output image.


Masks after warping.



static warp_poly(poly: numpy.ndarray, warp_matrix: numpy.ndarray, img_w: int, img_h: int)numpy.ndarray[源代码]

Function to warp one mask and filter points outside image.

  • poly (np.ndarray) – Segmentation annotation with shape (n, ) and with format (x1, y1, x2, y2, …).

  • warp_matrix (np.ndarray) – Affine transformation matrix. Shape: (3, 3).

  • img_w (int) – Width of output image.

  • img_h (int) – Height of output image.






class mmyolo.models.backbones.BaseBackbone(arch_setting: list, deepen_factor: float = 1.0, widen_factor: float = 1.0, input_channels: int = 3, out_indices: Sequence[int] = (2, 3, 4), frozen_stages: int = - 1, plugins: Optional[Union[dict, List[dict]]] = None, norm_cfg: Optional[Union[mmengine.config.config.ConfigDict, dict]] = None, act_cfg: Optional[Union[mmengine.config.config.ConfigDict, dict]] = None, norm_eval: bool = False, init_cfg: Optional[Union[mmengine.config.config.ConfigDict, dict, List[Union[dict, mmengine.config.config.ConfigDict]]]] = None)[源代码]

BaseBackbone backbone used in YOLO series.

Backbone model structure diagram
|   input   |
|   stem    |
|   layer   |
|   stage   |
|  layer 1  |
|   stage   |
|  layer 2  |
|   stage   |
|  layer n  |
In P5 model, n=4
In P6 model, n=5
  • arch_setting (list) – Architecture of BaseBackbone.

  • plugins (list[dict]) –

    List of plugins for stages, each dict contains:

    • cfg (dict, required): Cfg dict to build plugin.

    • stages (tuple[bool], optional): Stages to apply plugin, length should be same as ‘num_stages’.

  • deepen_factor (float) – Depth multiplier, multiply number of blocks in CSP layer by this amount. Defaults to 1.0.

  • widen_factor (float) – Width multiplier, multiply number of channels in each layer by this amount. Defaults to 1.0.

  • input_channels – Number of input image channels. Defaults to 3.

  • out_indices (Sequence[int]) – Output from which stages. Defaults to (2, 3, 4).

  • frozen_stages (int) – Stages to be frozen (stop grad and set eval mode). -1 means not freezing any parameters. Defaults to -1.

  • norm_cfg (dict) – Dictionary to construct and config norm layer. Defaults to None.

  • act_cfg (dict) – Config dict for activation layer. Defaults to None.

  • norm_eval (bool) – Whether to set norm layers to eval mode, namely, freeze running stats (mean and var). Note: Effect on Batch Norm and its variants only. Defaults to False.

  • init_cfg (dict or list[dict], optional) – Initialization config dict. Defaults to None.

abstract build_stage_layer(stage_idx: int, setting: list)[源代码]

Build a stage layer.

  • stage_idx (int) – The index of a stage layer.

  • setting (list) – The architecture setting of a stage layer.

abstract build_stem_layer()[源代码]

Build a stem layer.

forward(x: torch.Tensor)tuple[源代码]

Forward batch_inputs from the data_preprocessor.

make_stage_plugins(plugins, stage_idx, setting)[源代码]

Make plugins for backbone stage_idx th stage.

Currently we support to insert context_block, empirical_attention_block, nonlocal_block, dropout_block into the backbone.

An example of plugins format could be:


>>> plugins=[
...     dict(cfg=dict(type='xxx', arg1='xxx'),
...          stages=(False, True, True, True)),
...     dict(cfg=dict(type='yyy'),
...          stages=(True, True, True, True)),
... ]
>>> model = YOLOv5CSPDarknet()
>>> stage_plugins = model.make_stage_plugins(plugins, 0, setting)
>>> assert len(stage_plugins) == 1

Suppose stage_idx=0, the structure of blocks in the stage would be:

conv1 -> conv2 -> conv3 -> yyy

Suppose stage_idx=1, the structure of blocks in the stage would be:

conv1 -> conv2 -> conv3 -> xxx -> yyy
  • plugins (list[dict]) – List of plugins cfg to build. The postfix is required if multiple same type plugins are inserted.

  • stage_idx (int) – Index of stage to build If stages is missing, the plugin would be applied to all stages.

  • setting (list) – The architecture setting of a stage layer.


Plugins for current stage



train(mode: bool = True)[源代码]

Convert the model into training mode while keep normalization layer frozen.

class mmyolo.models.backbones.CSPNeXt(arch: str = 'P5', deepen_factor: float = 1.0, widen_factor: float = 1.0, input_channels: int = 3, out_indices: Sequence[int] = (2, 3, 4), frozen_stages: int = - 1, plugins: Optional[Union[dict, List[dict]]] = None, use_depthwise: bool = False, expand_ratio: float = 0.5, arch_ovewrite: Optional[dict] = None, channel_attention: bool = True, conv_cfg: Optional[Union[mmengine.config.config.ConfigDict, dict]] = None, norm_cfg: Union[mmengine.config.config.ConfigDict, dict] = {'type': 'BN'}, act_cfg: Union[mmengine.config.config.ConfigDict, dict] = {'inplace': True, 'type': 'SiLU'}, norm_eval: bool = False, init_cfg: Optional[Union[mmengine.config.config.ConfigDict, dict, List[Union[dict, mmengine.config.config.ConfigDict]]]] = {'a': 2.23606797749979, 'distribution': 'uniform', 'layer': 'Conv2d', 'mode': 'fan_in', 'nonlinearity': 'leaky_relu', 'type': 'Kaiming'})[源代码]

CSPNeXt backbone used in RTMDet.

  • arch (str) – Architecture of CSPNeXt, from {P5, P6}. Defaults to P5.

  • deepen_factor (float) – Depth multiplier, multiply number of blocks in CSP layer by this amount. Defaults to 1.0.

  • widen_factor (float) – Width multiplier, multiply number of channels in each layer by this amount. Defaults to 1.0.

  • out_indices (Sequence[int]) – Output from which stages. Defaults to (2, 3, 4).

  • frozen_stages (int) – Stages to be frozen (stop grad and set eval mode). -1 means not freezing any parameters. Defaults to -1.

  • plugins (list[dict]) –

    List of plugins for stages, each dict contains: - cfg (dict, required): Cfg dict to build plugin.Defaults to - stages (tuple[bool], optional): Stages to apply plugin, length

    should be same as ‘num_stages’.

  • use_depthwise (bool) – Whether to use depthwise separable convolution. Defaults to False.

  • expand_ratio (float) – Ratio to adjust the number of channels of the hidden layer. Defaults to 0.5.

  • arch_ovewrite (list) – Overwrite default arch settings. Defaults to None.

  • channel_attention (bool) – Whether to add channel attention in each stage. Defaults to True.

  • conv_cfg (ConfigDict or dict, optional) – Config dict for convolution layer. Defaults to None.

  • norm_cfg (ConfigDict or dict) – Dictionary to construct and config norm layer. Defaults to dict(type=’BN’, requires_grad=True).

  • act_cfg (ConfigDict or dict) – Config dict for activation layer. Defaults to dict(type=’SiLU’, inplace=True).

  • norm_eval (bool) – Whether to set norm layers to eval mode, namely, freeze running stats (mean and var). Note: Effect on Batch Norm and its variants only.

:param init_cfg (ConfigDict or dict or list[dict] or: list[ConfigDict]): Initialization config dict.

build_stage_layer(stage_idx: int, setting: list)list[源代码]

Build a stage layer.

  • stage_idx (int) – The index of a stage layer.

  • setting (list) – The architecture setting of a stage layer.


Build a stem layer.

class mmyolo.models.backbones.PPYOLOECSPResNet(arch: str = 'P5', deepen_factor: float = 1.0, widen_factor: float = 1.0, input_channels: int = 3, out_indices: Tuple[int] = (2, 3, 4), frozen_stages: int = - 1, plugins: Optional[Union[dict, List[dict]]] = None, arch_ovewrite: Optional[dict] = None, block_cfg: Union[mmengine.config.config.ConfigDict, dict] = {'shortcut': True, 'type': 'PPYOLOEBasicBlock', 'use_alpha': True}, norm_cfg: Union[mmengine.config.config.ConfigDict, dict] = {'eps': 1e-05, 'momentum': 0.1, 'type': 'BN'}, act_cfg: Union[mmengine.config.config.ConfigDict, dict] = {'inplace': True, 'type': 'SiLU'}, attention_cfg: Union[mmengine.config.config.ConfigDict, dict] = {'act_cfg': {'type': 'HSigmoid'}, 'type': 'EffectiveSELayer'}, norm_eval: bool = False, init_cfg: Optional[Union[mmengine.config.config.ConfigDict, dict, List[Union[dict, mmengine.config.config.ConfigDict]]]] = None, use_large_stem: bool = False)[源代码]

CSP-ResNet backbone used in PPYOLOE.

  • arch (str) – Architecture of CSPNeXt, from {P5, P6}. Defaults to P5.

  • deepen_factor (float) – Depth multiplier, multiply number of blocks in CSP layer by this amount. Defaults to 1.0.

  • widen_factor (float) – Width multiplier, multiply number of channels in each layer by this amount. Defaults to 1.0.

  • out_indices (Sequence[int]) – Output from which stages. Defaults to (2, 3, 4).

  • frozen_stages (int) – Stages to be frozen (stop grad and set eval mode). -1 means not freezing any parameters. Defaults to -1.

  • plugins (list[dict]) –

    List of plugins for stages, each dict contains: - cfg (dict, required): Cfg dict to build plugin. - stages (tuple[bool], optional): Stages to apply plugin, length

    should be same as ‘num_stages’.

  • arch_ovewrite (list) – Overwrite default arch settings. Defaults to None.

  • block_cfg (dict) – Config dict for block. Defaults to dict(type=’PPYOLOEBasicBlock’, shortcut=True, use_alpha=True)

  • norm_cfg (ConfigDict or dict) – Dictionary to construct and config norm layer. Defaults to dict(type=’BN’, momentum=0.1, eps=1e-5).

  • act_cfg (ConfigDict or dict) – Config dict for activation layer. Defaults to dict(type=’SiLU’, inplace=True).

  • attention_cfg (dict) – Config dict for EffectiveSELayer. Defaults to dict(type=’EffectiveSELayer’, act_cfg=dict(type=’HSigmoid’)).

  • norm_eval (bool) – Whether to set norm layers to eval mode, namely, freeze running stats (mean and var). Note: Effect on Batch Norm and its variants only.

:param init_cfg (ConfigDict or dict or list[dict] or: list[ConfigDict]): Initialization config dict. :param use_large_stem: Whether to use large stem layer.

Defaults to False.

build_stage_layer(stage_idx: int, setting: list)list[源代码]

Build a stage layer.

  • stage_idx (int) – The index of a stage layer.

  • setting (list) – The architecture setting of a stage layer.


Build a stem layer.

class mmyolo.models.backbones.YOLOXCSPDarknet(arch: str = 'P5', plugins: Optional[Union[dict, List[dict]]] = None, deepen_factor: float = 1.0, widen_factor: float = 1.0, input_channels: int = 3, out_indices: Tuple[int] = (2, 3, 4), frozen_stages: int = - 1, use_depthwise: bool = False, spp_kernal_sizes: Tuple[int] = (5, 9, 13), norm_cfg: Union[mmengine.config.config.ConfigDict, dict] = {'eps': 0.001, 'momentum': 0.03, 'type': 'BN'}, act_cfg: Union[mmengine.config.config.ConfigDict, dict] = {'inplace': True, 'type': 'SiLU'}, norm_eval: bool = False, init_cfg: Optional[Union[mmengine.config.config.ConfigDict, dict, List[Union[dict, mmengine.config.config.ConfigDict]]]] = None)[源代码]

CSP-Darknet backbone used in YOLOX.

  • arch (str) – Architecture of CSP-Darknet, from {P5, P6}. Defaults to P5.

  • plugins (list[dict]) –

    List of plugins for stages, each dict contains:

    • cfg (dict, required): Cfg dict to build plugin.

    • stages (tuple[bool], optional): Stages to apply plugin, length should be same as ‘num_stages’.

  • deepen_factor (float) – Depth multiplier, multiply number of blocks in CSP layer by this amount. Defaults to 1.0.

  • widen_factor (float) – Width multiplier, multiply number of channels in each layer by this amount. Defaults to 1.0.

  • input_channels (int) – Number of input image channels. Defaults to 3.

  • out_indices (Tuple[int]) – Output from which stages. Defaults to (2, 3, 4).

  • frozen_stages (int) – Stages to be frozen (stop grad and set eval mode). -1 means not freezing any parameters. Defaults to -1.

  • use_depthwise (bool) – Whether to use depthwise separable convolution. Defaults to False.

  • spp_kernal_sizes – (tuple[int]): Sequential of kernel sizes of SPP layers. Defaults to (5, 9, 13).

  • norm_cfg (dict) – Dictionary to construct and config norm layer. Defaults to dict(type=’BN’, momentum=0.03, eps=0.001).

  • act_cfg (dict) – Config dict for activation layer. Defaults to dict(type=’SiLU’, inplace=True).

  • norm_eval (bool) – Whether to set norm layers to eval mode, namely, freeze running stats (mean and var). Note: Effect on Batch Norm and its variants only.

  • init_cfg (Union[dict,list[dict]], optional) – Initialization config dict. Defaults to None.


>>> from mmyolo.models import YOLOXCSPDarknet
>>> import torch
>>> model = YOLOXCSPDarknet()
>>> model.eval()
>>> inputs = torch.rand(1, 3, 416, 416)
>>> level_outputs = model(inputs)
>>> for level_out in level_outputs:
...     print(tuple(level_out.shape))
(1, 256, 52, 52)
(1, 512, 26, 26)
(1, 1024, 13, 13)
build_stage_layer(stage_idx: int, setting: list)list[源代码]

Build a stage layer.

  • stage_idx (int) – The index of a stage layer.

  • setting (list) – The architecture setting of a stage layer.


Build a stem layer.

class mmyolo.models.backbones.YOLOv5CSPDarknet(arch: str = 'P5', plugins: Optional[Union[dict, List[dict]]] = None, deepen_factor: float = 1.0, widen_factor: float = 1.0, input_channels: int = 3, out_indices: Tuple[int] = (2, 3, 4), frozen_stages: int = - 1, norm_cfg: Union[mmengine.config.config.ConfigDict, dict] = {'eps': 0.001, 'momentum': 0.03, 'type': 'BN'}, act_cfg: Union[mmengine.config.config.ConfigDict, dict] = {'inplace': True, 'type': 'SiLU'}, norm_eval: bool = False, init_cfg: Optional[Union[mmengine.config.config.ConfigDict, dict, List[Union[dict, mmengine.config.config.ConfigDict]]]] = None)[源代码]

CSP-Darknet backbone used in YOLOv5. :param arch: Architecture of CSP-Darknet, from {P5, P6}.

Defaults to P5.

  • plugins (list[dict]) –

    List of plugins for stages, each dict contains: - cfg (dict, required): Cfg dict to build plugin. - stages (tuple[bool], optional): Stages to apply plugin, length

    should be same as ‘num_stages’.

  • deepen_factor (float) – Depth multiplier, multiply number of blocks in CSP layer by this amount. Defaults to 1.0.

  • widen_factor (float) – Width multiplier, multiply number of channels in each layer by this amount. Defaults to 1.0.

  • input_channels (int) – Number of input image channels. Defaults to: 3.

  • out_indices (Tuple[int]) – Output from which stages. Defaults to (2, 3, 4).

  • frozen_stages (int) – Stages to be frozen (stop grad and set eval mode). -1 means not freezing any parameters. Defaults to -1.

  • norm_cfg (dict) – Dictionary to construct and config norm layer. Defaults to dict(type=’BN’, requires_grad=True).

  • act_cfg (dict) – Config dict for activation layer. Defaults to dict(type=’SiLU’, inplace=True).

  • norm_eval (bool) – Whether to set norm layers to eval mode, namely, freeze running stats (mean and var). Note: Effect on Batch Norm and its variants only. Defaults to False.

  • init_cfg (Union[dict,list[dict]], optional) – Initialization config dict. Defaults to None.


>>> from mmyolo.models import YOLOv5CSPDarknet
>>> import torch
>>> model = YOLOv5CSPDarknet()
>>> model.eval()
>>> inputs = torch.rand(1, 3, 416, 416)
>>> level_outputs = model(inputs)
>>> for level_out in level_outputs:
...     print(tuple(level_out.shape))
(1, 256, 52, 52)
(1, 512, 26, 26)
(1, 1024, 13, 13)
build_stage_layer(stage_idx: int, setting: list)list[源代码]

Build a stage layer.

  • stage_idx (int) – The index of a stage layer.

  • setting (list) – The architecture setting of a stage layer.


Build a stem layer.


Initialize the parameters.

class mmyolo.models.backbones.YOLOv6CSPBep(arch: str = 'P5', plugins: Optional[Union[dict, List[dict]]] = None, deepen_factor: float = 1.0, widen_factor: float = 1.0, input_channels: int = 3, hidden_ratio: float = 0.5, out_indices: Tuple[int] = (2, 3, 4), frozen_stages: int = - 1, use_cspsppf: bool = False, norm_cfg: Union[mmengine.config.config.ConfigDict, dict] = {'eps': 0.001, 'momentum': 0.03, 'type': 'BN'}, act_cfg: Union[mmengine.config.config.ConfigDict, dict] = {'inplace': True, 'type': 'SiLU'}, norm_eval: bool = False, block_cfg: Union[mmengine.config.config.ConfigDict, dict] = {'type': 'ConvWrapper'}, init_cfg: Optional[Union[mmengine.config.config.ConfigDict, dict, List[Union[dict, mmengine.config.config.ConfigDict]]]] = None)[源代码]

CSPBep backbone used in YOLOv6. :param arch: Architecture of BaseDarknet, from {P5, P6}.

Defaults to P5.

  • plugins (list[dict]) –

    List of plugins for stages, each dict contains: - cfg (dict, required): Cfg dict to build plugin. - stages (tuple[bool], optional): Stages to apply plugin, length

    should be same as ‘num_stages’.

  • deepen_factor (float) – Depth multiplier, multiply number of blocks in CSP layer by this amount. Defaults to 1.0.

  • widen_factor (float) – Width multiplier, multiply number of channels in each layer by this amount. Defaults to 1.0.

  • input_channels (int) – Number of input image channels. Defaults to 3.

  • out_indices (Tuple[int]) – Output from which stages. Defaults to (2, 3, 4).

  • frozen_stages (int) – Stages to be frozen (stop grad and set eval mode). -1 means not freezing any parameters. Defaults to -1.

  • norm_cfg (dict) – Dictionary to construct and config norm layer. Defaults to dict(type=’BN’, requires_grad=True).

  • act_cfg (dict) – Config dict for activation layer. Defaults to dict(type=’LeakyReLU’, negative_slope=0.1).

  • norm_eval (bool) – Whether to set norm layers to eval mode, namely, freeze running stats (mean and var). Note: Effect on Batch Norm and its variants only. Defaults to False.

  • block_cfg (dict) – Config dict for the block used to build each layer. Defaults to dict(type=’RepVGGBlock’).

  • block_act_cfg (dict) – Config dict for activation layer used in each stage. Defaults to dict(type=’SiLU’, inplace=True).

  • init_cfg (Union[dict, list[dict]], optional) – Initialization config dict. Defaults to None.


>>> from mmyolo.models import YOLOv6CSPBep
>>> import torch
>>> model = YOLOv6CSPBep()
>>> model.eval()
>>> inputs = torch.rand(1, 3, 416, 416)
>>> level_outputs = model(inputs)
>>> for level_out in level_outputs:
...     print(tuple(level_out.shape))
(1, 256, 52, 52)
(1, 512, 26, 26)
(1, 1024, 13, 13)
build_stage_layer(stage_idx: int, setting: list)list[源代码]

Build a stage layer.

  • stage_idx (int) – The index of a stage layer.

  • setting (list) – The architecture setting of a stage layer.

class mmyolo.models.backbones.YOLOv6EfficientRep(arch: str = 'P5', plugins: Optional[Union[dict, List[dict]]] = None, deepen_factor: float = 1.0, widen_factor: float = 1.0, input_channels: int = 3, out_indices: Tuple[int] = (2, 3, 4), frozen_stages: int = - 1, use_cspsppf: bool = False, norm_cfg: Union[mmengine.config.config.ConfigDict, dict] = {'eps': 0.001, 'momentum': 0.03, 'type': 'BN'}, act_cfg: Union[mmengine.config.config.ConfigDict, dict] = {'inplace': True, 'type': 'ReLU'}, norm_eval: bool = False, block_cfg: Union[mmengine.config.config.ConfigDict, dict] = {'type': 'RepVGGBlock'}, init_cfg: Optional[Union[mmengine.config.config.ConfigDict, dict, List[Union[dict, mmengine.config.config.ConfigDict]]]] = None)[源代码]

EfficientRep backbone used in YOLOv6. :param arch: Architecture of BaseDarknet, from {P5, P6}.

Defaults to P5.

  • plugins (list[dict]) –

    List of plugins for stages, each dict contains: - cfg (dict, required): Cfg dict to build plugin. - stages (tuple[bool], optional): Stages to apply plugin, length

    should be same as ‘num_stages’.

  • deepen_factor (float) – Depth multiplier, multiply number of blocks in CSP layer by this amount. Defaults to 1.0.

  • widen_factor (float) – Width multiplier, multiply number of channels in each layer by this amount. Defaults to 1.0.

  • input_channels (int) – Number of input image channels. Defaults to 3.

  • out_indices (Tuple[int]) – Output from which stages. Defaults to (2, 3, 4).

  • frozen_stages (int) – Stages to be frozen (stop grad and set eval mode). -1 means not freezing any parameters. Defaults to -1.

  • norm_cfg (dict) – Dictionary to construct and config norm layer. Defaults to dict(type=’BN’, requires_grad=True).

  • act_cfg (dict) – Config dict for activation layer. Defaults to dict(type=’LeakyReLU’, negative_slope=0.1).

  • norm_eval (bool) – Whether to set norm layers to eval mode, namely, freeze running stats (mean and var). Note: Effect on Batch Norm and its variants only. Defaults to False.

  • block_cfg (dict) – Config dict for the block used to build each layer. Defaults to dict(type=’RepVGGBlock’).

  • init_cfg (Union[dict, list[dict]], optional) – Initialization config dict. Defaults to None.


>>> from mmyolo.models import YOLOv6EfficientRep
>>> import torch
>>> model = YOLOv6EfficientRep()
>>> model.eval()
>>> inputs = torch.rand(1, 3, 416, 416)
>>> level_outputs = model(inputs)
>>> for level_out in level_outputs:
...     print(tuple(level_out.shape))
(1, 256, 52, 52)
(1, 512, 26, 26)
(1, 1024, 13, 13)
build_stage_layer(stage_idx: int, setting: list)list[源代码]

Build a stage layer.

  • stage_idx (int) – The index of a stage layer.

  • setting (list) – The architecture setting of a stage layer.


Build a stem layer.


Initialize the weights.

class mmyolo.models.backbones.YOLOv7Backbone(arch: str = 'L', deepen_factor: float = 1.0, widen_factor: float = 1.0, input_channels: int = 3, out_indices: Tuple[int] = (2, 3, 4), frozen_stages: int = - 1, plugins: Optional[Union[dict, List[dict]]] = None, norm_cfg: Union[mmengine.config.config.ConfigDict, dict] = {'eps': 0.001, 'momentum': 0.03, 'type': 'BN'}, act_cfg: Union[mmengine.config.config.ConfigDict, dict] = {'inplace': True, 'type': 'SiLU'}, norm_eval: bool = False, init_cfg: Optional[Union[mmengine.config.config.ConfigDict, dict, List[Union[dict, mmengine.config.config.ConfigDict]]]] = None)[源代码]

Backbone used in YOLOv7.

  • arch (str) – Architecture of YOLOv7Defaults to L.

  • deepen_factor (float) – Depth multiplier, multiply number of blocks in CSP layer by this amount. Defaults to 1.0.

  • widen_factor (float) – Width multiplier, multiply number of channels in each layer by this amount. Defaults to 1.0.

  • out_indices (Sequence[int]) – Output from which stages. Defaults to (2, 3, 4).

  • frozen_stages (int) – Stages to be frozen (stop grad and set eval mode). -1 means not freezing any parameters. Defaults to -1.

  • plugins (list[dict]) –

    List of plugins for stages, each dict contains:

    • cfg (dict, required): Cfg dict to build plugin.

    • stages (tuple[bool], optional): Stages to apply plugin, length should be same as ‘num_stages’.

  • norm_cfg (ConfigDict or dict) – Dictionary to construct and config norm layer. Defaults to dict(type=’BN’, requires_grad=True).

  • act_cfg (ConfigDict or dict) – Config dict for activation layer. Defaults to dict(type=’SiLU’, inplace=True).

  • norm_eval (bool) – Whether to set norm layers to eval mode, namely, freeze running stats (mean and var). Note: Effect on Batch Norm and its variants only.

:param init_cfg (ConfigDict or dict or list[dict] or: list[ConfigDict]): Initialization config dict.

build_stage_layer(stage_idx: int, setting: list)list[源代码]

Build a stage layer.

  • stage_idx (int) – The index of a stage layer.

  • setting (list) – The architecture setting of a stage layer.


Build a stem layer.

class mmyolo.models.backbones.YOLOv8CSPDarknet(arch: str = 'P5', last_stage_out_channels: int = 1024, plugins: Optional[Union[dict, List[dict]]] = None, deepen_factor: float = 1.0, widen_factor: float = 1.0, input_channels: int = 3, out_indices: Tuple[int] = (2, 3, 4), frozen_stages: int = - 1, norm_cfg: Union[mmengine.config.config.ConfigDict, dict] = {'eps': 0.001, 'momentum': 0.03, 'type': 'BN'}, act_cfg: Union[mmengine.config.config.ConfigDict, dict] = {'inplace': True, 'type': 'SiLU'}, norm_eval: bool = False, init_cfg: Optional[Union[mmengine.config.config.ConfigDict, dict, List[Union[dict, mmengine.config.config.ConfigDict]]]] = None)[源代码]

CSP-Darknet backbone used in YOLOv8.

  • arch (str) – Architecture of CSP-Darknet, from {P5}. Defaults to P5.

  • last_stage_out_channels (int) – Final layer output channel. Defaults to 1024.

  • plugins (list[dict]) –

    List of plugins for stages, each dict contains: - cfg (dict, required): Cfg dict to build plugin. - stages (tuple[bool], optional): Stages to apply plugin, length

    should be same as ‘num_stages’.

  • deepen_factor (float) – Depth multiplier, multiply number of blocks in CSP layer by this amount. Defaults to 1.0.

  • widen_factor (float) – Width multiplier, multiply number of channels in each layer by this amount. Defaults to 1.0.

  • input_channels (int) – Number of input image channels. Defaults to: 3.

  • out_indices (Tuple[int]) – Output from which stages. Defaults to (2, 3, 4).

  • frozen_stages (int) – Stages to be frozen (stop grad and set eval mode). -1 means not freezing any parameters. Defaults to -1.

  • norm_cfg (dict) – Dictionary to construct and config norm layer. Defaults to dict(type=’BN’, requires_grad=True).

  • act_cfg (dict) – Config dict for activation layer. Defaults to dict(type=’SiLU’, inplace=True).

  • norm_eval (bool) – Whether to set norm layers to eval mode, namely, freeze running stats (mean and var). Note: Effect on Batch Norm and its variants only. Defaults to False.

  • init_cfg (Union[dict,list[dict]], optional) – Initialization config dict. Defaults to None.


>>> from mmyolo.models import YOLOv8CSPDarknet
>>> import torch
>>> model = YOLOv8CSPDarknet()
>>> model.eval()
>>> inputs = torch.rand(1, 3, 416, 416)
>>> level_outputs = model(inputs)
>>> for level_out in level_outputs:
...     print(tuple(level_out.shape))
(1, 256, 52, 52)
(1, 512, 26, 26)
(1, 1024, 13, 13)
build_stage_layer(stage_idx: int, setting: list)list[源代码]

Build a stage layer.

  • stage_idx (int) – The index of a stage layer.

  • setting (list) – The architecture setting of a stage layer.


Build a stem layer.


Initialize the parameters.



class mmyolo.models.dense_heads.PPYOLOEHead(head_module: Union[mmengine.config.config.ConfigDict, dict], prior_generator: Union[mmengine.config.config.ConfigDict, dict] = {'offset': 0.5, 'strides': [8, 16, 32], 'type': 'mmdet.MlvlPointGenerator'}, bbox_coder: Union[mmengine.config.config.ConfigDict, dict] = {'type': 'DistancePointBBoxCoder'}, loss_cls: Union[mmengine.config.config.ConfigDict, dict] = {'alpha': 0.75, 'gamma': 2.0, 'iou_weighted': True, 'loss_weight': 1.0, 'reduction': 'sum', 'type': 'mmdet.VarifocalLoss', 'use_sigmoid': True}, loss_bbox: Union[mmengine.config.config.ConfigDict, dict] = {'bbox_format': 'xyxy', 'iou_mode': 'giou', 'loss_weight': 2.5, 'reduction': 'mean', 'return_iou': False, 'type': 'IoULoss'}, loss_dfl: Union[mmengine.config.config.ConfigDict, dict] = {'loss_weight': 0.125, 'reduction': 'mean', 'type': 'mmdet.DistributionFocalLoss'}, train_cfg: Optional[Union[mmengine.config.config.ConfigDict, dict]] = None, test_cfg: Optional[Union[mmengine.config.config.ConfigDict, dict]] = None, init_cfg: Optional[Union[mmengine.config.config.ConfigDict, dict, List[Union[dict, mmengine.config.config.ConfigDict]]]] = None)[源代码]

PPYOLOEHead head used in PPYOLOE. The YOLOv6 head and the PPYOLOE head are only slightly different. Distribution focal loss is extra used in PPYOLOE, but not in YOLOv6.

  • head_module (ConfigType) – Base module used for YOLOv5Head

  • prior_generator (dict) – Points generator feature maps in 2D points-based detectors.

  • bbox_coder (ConfigDict or dict) – Config of bbox coder.

  • loss_cls (ConfigDict or dict) – Config of classification loss.

  • loss_bbox (ConfigDict or dict) – Config of localization loss.

  • loss_dfl (ConfigDict or dict) – Config of distribution focal loss.

  • train_cfg (ConfigDict or dict, optional) – Training config of anchor head. Defaults to None.

  • test_cfg (ConfigDict or dict, optional) – Testing config of anchor head. Defaults to None.

:param init_cfg (ConfigDict or list[ConfigDict] or dict or: list[dict], optional): Initialization config dict.

Defaults to None.

loss_by_feat(cls_scores: Sequence[torch.Tensor], bbox_preds: Sequence[torch.Tensor], bbox_dist_preds: Sequence[torch.Tensor], batch_gt_instances: Sequence[mmengine.structures.instance_data.InstanceData], batch_img_metas: Sequence[dict], batch_gt_instances_ignore: Optional[List[mmengine.structures.instance_data.InstanceData]] = None)dict[源代码]

Calculate the loss based on the features extracted by the detection head.

  • cls_scores (Sequence[Tensor]) – Box scores for each scale level, each is a 4D-tensor, the channel number is num_priors * num_classes.

  • bbox_preds (Sequence[Tensor]) – Box energies / deltas for each scale level, each is a 4D-tensor, the channel number is num_priors * 4.

  • bbox_dist_preds (Sequence[Tensor]) – Box distribution logits for each scale level with shape (bs, reg_max + 1, H*W, 4).

  • batch_gt_instances (list[InstanceData]) – Batch of gt_instance. It usually includes bboxes and labels attributes.

  • batch_img_metas (list[dict]) – Meta information of each image, e.g., image size, scaling factor, etc.

  • batch_gt_instances_ignore (list[InstanceData], optional) – Batch of gt_instances_ignore. It includes bboxes attribute data that is ignored during training and testing. Defaults to None.


A dictionary of losses.


dict[str, Tensor]

class mmyolo.models.dense_heads.PPYOLOEHeadModule(num_classes: int, in_channels: Union[int, Sequence], widen_factor: float = 1.0, num_base_priors: int = 1, featmap_strides: Sequence[int] = (8, 16, 32), reg_max: int = 16, norm_cfg: Union[mmengine.config.config.ConfigDict, dict] = {'eps': 1e-05, 'momentum': 0.1, 'type': 'BN'}, act_cfg: Union[mmengine.config.config.ConfigDict, dict] = {'inplace': True, 'type': 'SiLU'}, init_cfg: Optional[Union[mmengine.config.config.ConfigDict, dict, List[Union[dict, mmengine.config.config.ConfigDict]]]] = None)[源代码]

PPYOLOEHead head module used in `PPYOLOE.


  • num_classes (int) – Number of categories excluding the background category.

  • in_channels (int) – Number of channels in the input feature map.

  • widen_factor (float) – Width multiplier, multiply number of channels in each layer by this amount. Defaults to 1.0.

  • num_base_priors (int) – The number of priors (points) at a point on the feature grid.

  • featmap_strides (Sequence[int]) – Downsample factor of each feature map. Defaults to (8, 16, 32).

  • reg_max (int) – Max value of integral set :math: {0, ..., reg_max} in QFL setting. Defaults to 16.

  • norm_cfg (dict) – Config dict for normalization layer. Defaults to dict(type=’BN’, momentum=0.03, eps=0.001).

  • act_cfg (dict) – Config dict for activation layer. Defaults to dict(type=’SiLU’, inplace=True).

  • init_cfg (dict or list[dict], optional) – Initialization config dict. Defaults to None.

forward(x: Tuple[torch.Tensor])torch.Tensor[源代码]

Forward features from the upstream network.


x (Tuple[Tensor]) – Features from the upstream network, each is a 4D-tensor.


A tuple of multi-level classification scores, bbox predictions.



forward_single(x: torch.Tensor, cls_stem: torch.nn.modules.container.ModuleList, cls_pred: torch.nn.modules.container.ModuleList, reg_stem: torch.nn.modules.container.ModuleList, reg_pred: torch.nn.modules.container.ModuleList)torch.Tensor[源代码]

Forward feature of a single scale level.


Initialize the weight and bias of PPYOLOE head.

class mmyolo.models.dense_heads.RTMDetHead(head_module: Union[mmengine.config.config.ConfigDict, dict], prior_generator: Union[mmengine.config.config.ConfigDict, dict] = {'offset': 0, 'strides': [8, 16, 32], 'type': 'mmdet.MlvlPointGenerator'}, bbox_coder: Union[mmengine.config.config.ConfigDict, dict] = {'type': 'DistancePointBBoxCoder'}, loss_cls: Union[mmengine.config.config.ConfigDict, dict] = {'beta': 2.0, 'loss_weight': 1.0, 'type': 'mmdet.QualityFocalLoss', 'use_sigmoid': True}, loss_bbox: Union[mmengine.config.config.ConfigDict, dict] = {'loss_weight': 2.0, 'type': 'mmdet.GIoULoss'}, train_cfg: Optional[Union[mmengine.config.config.ConfigDict, dict]] = None, test_cfg: Optional[Union[mmengine.config.config.ConfigDict, dict]] = None, init_cfg: Optional[Union[mmengine.config.config.ConfigDict, dict, List[Union[dict, mmengine.config.config.ConfigDict]]]] = None)[源代码]

RTMDet head.

  • head_module (ConfigType) – Base module used for RTMDetHead

  • prior_generator – Points generator feature maps in 2D points-based detectors.

  • bbox_coder (ConfigDict or dict) – Config of bbox coder.

  • loss_cls (ConfigDict or dict) – Config of classification loss.

  • loss_bbox (ConfigDict or dict) – Config of localization loss.

  • train_cfg (ConfigDict or dict, optional) – Training config of anchor head. Defaults to None.

  • test_cfg (ConfigDict or dict, optional) – Testing config of anchor head. Defaults to None.

:param init_cfg (ConfigDict or list[ConfigDict] or dict or: list[dict], optional): Initialization config dict.

Defaults to None.

forward(x: Tuple[torch.Tensor])Tuple[List][源代码]

Forward features from the upstream network.


x (Tuple[Tensor]) – Features from the upstream network, each is a 4D-tensor.


A tuple of multi-level classification scores, bbox predictions, and objectnesses.



loss_by_feat(cls_scores: List[torch.Tensor], bbox_preds: List[torch.Tensor], batch_gt_instances: List[mmengine.structures.instance_data.InstanceData], batch_img_metas: List[dict], batch_gt_instances_ignore: Optional[List[mmengine.structures.instance_data.InstanceData]] = None)dict[源代码]

Compute losses of the head.

  • cls_scores (list[Tensor]) – Box scores for each scale level Has shape (N, num_anchors * num_classes, H, W)

  • bbox_preds (list[Tensor]) – Decoded box for each scale level with shape (N, num_anchors * 4, H, W) in [tl_x, tl_y, br_x, br_y] format.

  • batch_gt_instances (list[InstanceData]) – Batch of gt_instance. It usually includes bboxes and labels attributes.

  • batch_img_metas (list[dict]) – Meta information of each image, e.g., image size, scaling factor, etc.

  • batch_gt_instances_ignore (list[InstanceData], Optional) – Batch of gt_instances_ignore. It includes bboxes attribute data that is ignored during training and testing. Defaults to None.


A dictionary of loss components.


dict[str, Tensor]


Since YOLO series algorithms will inherit from YOLOv5Head, but different algorithms have special initialization process.

The special_init function is designed to deal with this situation.

class mmyolo.models.dense_heads.RTMDetInsSepBNHead(head_module: Union[mmengine.config.config.ConfigDict, dict], prior_generator: Union[mmengine.config.config.ConfigDict, dict] = {'offset': 0, 'strides': [8, 16, 32], 'type': 'mmdet.MlvlPointGenerator'}, bbox_coder: Union[mmengine.config.config.ConfigDict, dict] = {'type': 'DistancePointBBoxCoder'}, loss_cls: Union[mmengine.config.config.ConfigDict, dict] = {'beta': 2.0, 'loss_weight': 1.0, 'type': 'mmdet.QualityFocalLoss', 'use_sigmoid': True}, loss_bbox: Union[mmengine.config.config.ConfigDict, dict] = {'loss_weight': 2.0, 'type': 'mmdet.GIoULoss'}, loss_mask={'eps': 5e-06, 'loss_weight': 2.0, 'reduction': 'mean', 'type': 'mmdet.DiceLoss'}, train_cfg: Optional[Union[mmengine.config.config.ConfigDict, dict]] = None, test_cfg: Optional[Union[mmengine.config.config.ConfigDict, dict]] = None, init_cfg: Optional[Union[mmengine.config.config.ConfigDict, dict, List[Union[dict, mmengine.config.config.ConfigDict]]]] = None)[源代码]

RTMDet Instance Segmentation head.

  • head_module (ConfigType) – Base module used for RTMDetInsSepBNHead

  • prior_generator – Points generator feature maps in 2D points-based detectors.

  • bbox_coder (ConfigDict or dict) – Config of bbox coder.

  • loss_cls (ConfigDict or dict) – Config of classification loss.

  • loss_bbox (ConfigDict or dict) – Config of localization loss.

  • loss_mask (ConfigDict or dict) – Config of mask loss.

  • train_cfg (ConfigDict or dict, optional) – Training config of anchor head. Defaults to None.

  • test_cfg (ConfigDict or dict, optional) – Testing config of anchor head. Defaults to None.

:param init_cfg (ConfigDict or list[ConfigDict] or dict or: list[dict], optional): Initialization config dict.

Defaults to None.

loss_by_feat(cls_scores: List[torch.Tensor], bbox_preds: List[torch.Tensor], batch_gt_instances: List[mmengine.structures.instance_data.InstanceData], batch_img_metas: List[dict], batch_gt_instances_ignore: Optional[List[mmengine.structures.instance_data.InstanceData]] = None)dict[源代码]

Compute losses of the head.

  • cls_scores (list[Tensor]) – Box scores for each scale level Has shape (N, num_anchors * num_classes, H, W)

  • bbox_preds (list[Tensor]) – Decoded box for each scale level with shape (N, num_anchors * 4, H, W) in [tl_x, tl_y, br_x, br_y] format.

  • batch_gt_instances (list[InstanceData]) – Batch of gt_instance. It usually includes bboxes and labels attributes.

  • batch_img_metas (list[dict]) – Meta information of each image, e.g., image size, scaling factor, etc.

  • batch_gt_instances_ignore (list[InstanceData], Optional) – Batch of gt_instances_ignore. It includes bboxes attribute data that is ignored during training and testing. Defaults to None.


A dictionary of loss components.


dict[str, Tensor]

parse_dynamic_params(flatten_kernels: torch.Tensor)tuple[源代码]

split kernel head prediction to conv weight and bias.

predict_by_feat(cls_scores: List[torch.Tensor], bbox_preds: List[torch.Tensor], kernel_preds: List[torch.Tensor], mask_feats: torch.Tensor, score_factors: Optional[List[torch.Tensor]] = None, batch_img_metas: Optional[List[dict]] = None, cfg: Optional[mmengine.config.config.ConfigDict] = None, rescale: bool = True, with_nms: bool = True)List[mmengine.structures.instance_data.InstanceData][源代码]

Transform a batch of output features extracted from the head into bbox results.

Note: When score_factors is not None, the cls_scores are usually multiplied by it then obtain the real score used in NMS.

  • cls_scores (list[Tensor]) – Classification scores for all scale levels, each is a 4D-tensor, has shape (batch_size, num_priors * num_classes, H, W).

  • bbox_preds (list[Tensor]) – Box energies / deltas for all scale levels, each is a 4D-tensor, has shape (batch_size, num_priors * 4, H, W).

  • kernel_preds (list[Tensor]) – Kernel predictions of dynamic convs for all scale levels, each is a 4D-tensor, has shape (batch_size, num_params, H, W).

  • mask_feats (Tensor) – Mask prototype features extracted from the mask head, has shape (batch_size, num_prototypes, H, W).

  • score_factors (list[Tensor], optional) – Score factor for all scale level, each is a 4D-tensor, has shape (batch_size, num_priors * 1, H, W). Defaults to None.

  • batch_img_metas (list[dict], Optional) – Batch image meta info. Defaults to None.

  • cfg (ConfigDict, optional) – Test / postprocessing configuration, if None, test_cfg would be used. Defaults to None.

  • rescale (bool) – If True, return boxes in original image space. Defaults to False.

  • with_nms (bool) – If True, do nms before return boxes. Defaults to True.


Object detection and instance segmentation results of each image after the post process. Each item usually contains following keys.

  • scores (Tensor): Classification scores, has a shape (num_instance, )

  • labels (Tensor): Labels of bboxes, has a shape (num_instances, ).

  • bboxes (Tensor): Has a shape (num_instances, 4), the last dimension 4 arrange as (x1, y1, x2, y2).

  • masks (Tensor): Has a shape (num_instances, h, w).



class mmyolo.models.dense_heads.RTMDetInsSepBNHeadModule(num_classes: int, *args, num_prototypes: int = 8, dyconv_channels: int = 8, num_dyconvs: int = 3, use_sigmoid_cls: bool = True, **kwargs)[源代码]

Detection and Instance Segmentation Head of RTMDet.

  • num_classes (int) – Number of categories excluding the background category.

  • num_prototypes (int) – Number of mask prototype features extracted from the mask head. Defaults to 8.

  • dyconv_channels (int) – Channel of the dynamic conv layers. Defaults to 8.

  • num_dyconvs (int) – Number of the dynamic convolution layers. Defaults to 3.

  • use_sigmoid_cls (bool) – Use sigmoid for class prediction. Defaults to True.

forward(feats: Tuple[torch.Tensor, ...])tuple[源代码]

Forward features from the upstream network.


feats (tuple[Tensor]) – Features from the upstream network, each is a 4D-tensor.


Usually a tuple of classification scores and bbox prediction - cls_scores (list[Tensor]): Classification scores for all scale

levels, each is a 4D-tensor, the channels number is num_base_priors * num_classes.

  • bbox_preds (list[Tensor]): Box energies / deltas for all scale levels, each is a 4D-tensor, the channels number is num_base_priors * 4.

  • kernel_preds (list[Tensor]): Dynamic conv kernels for all scale levels, each is a 4D-tensor, the channels number is num_gen_params.

  • mask_feat (Tensor): Mask prototype features.

    Has shape (batch_size, num_prototypes, H, W).




Initialize weights of the head.

class mmyolo.models.dense_heads.RTMDetRotatedHead(head_module: Union[mmengine.config.config.ConfigDict, dict], prior_generator: Union[mmengine.config.config.ConfigDict, dict] = {'offset': 0, 'strides': [8, 16, 32], 'type': 'mmdet.MlvlPointGenerator'}, bbox_coder: Union[mmengine.config.config.ConfigDict, dict] = {'type': 'DistanceAnglePointCoder'}, loss_cls: Union[mmengine.config.config.ConfigDict, dict] = {'beta': 2.0, 'loss_weight': 1.0, 'type': 'mmdet.QualityFocalLoss', 'use_sigmoid': True}, loss_bbox: Union[mmengine.config.config.ConfigDict, dict] = {'loss_weight': 2.0, 'mode': 'linear', 'type': 'mmrotate.RotatedIoULoss'}, angle_version: str = 'le90', use_hbbox_loss: bool = False, angle_coder: Union[mmengine.config.config.ConfigDict, dict] = {'type': 'mmrotate.PseudoAngleCoder'}, loss_angle: Optional[Union[mmengine.config.config.ConfigDict, dict]] = None, train_cfg: Optional[Union[mmengine.config.config.ConfigDict, dict]] = None, test_cfg: Optional[Union[mmengine.config.config.ConfigDict, dict]] = None, init_cfg: Optional[Union[mmengine.config.config.ConfigDict, dict, List[Union[dict, mmengine.config.config.ConfigDict]]]] = None)[源代码]

RTMDet-R head.

Compared with RTMDetHead, RTMDetRotatedHead add some args to support rotated object detection.

  • angle_version used to limit angle_range during training.

  • angle_coder used to encode and decode angle, which is similar to bbox_coder.

  • use_hbbox_loss and loss_angle allow custom regression loss calculation for rotated box.

    There are three combination options for regression:

    1. use_hbbox_loss=False and loss_angle is None.

    angle_pred          decode──►rbox_pred──(xywha)─►loss_bbox
        │                 ▲
    1. use_hbbox_loss=False and loss_angle is specified. A angle loss is added on angle_pred.

    angle_pred          decode──►rbox_pred──(xywha)─►loss_bbox
        │                 ▲
    1. use_hbbox_loss=True and loss_angle is specified. In this case the loss_angle must be set.

  • There’s a decoded_with_angle flag in test_cfg, which is similar to training process.

    When decoded_with_angle=True:

    angle_pred          decode──(xywha)──►rbox_pred
        │                 ▲

    When decoded_with_angle=False:

                          │ (xyxy)
  • head_module (ConfigType) – Base module used for RTMDetRotatedHead.

  • prior_generator – Points generator feature maps in 2D points-based detectors.

  • bbox_coder (ConfigDict or dict) – Config of bbox coder.

  • loss_cls (ConfigDict or dict) – Config of classification loss.

  • loss_bbox (ConfigDict or dict) – Config of localization loss.

  • angle_version (str) – Angle representations. Defaults to ‘le90’.

  • use_hbbox_loss (bool) – If true, use horizontal bbox loss and loss_angle should not be None. Default to False.

  • angle_coder (ConfigDict or dict) – Config of angle coder.

  • loss_angle (ConfigDict or dict, optional) – Config of angle loss.

  • train_cfg (ConfigDict or dict, optional) – Training config of anchor head. Defaults to None.

  • test_cfg (ConfigDict or dict, optional) – Testing config of anchor head. Defaults to None.

:param init_cfg (ConfigDict or list[ConfigDict] or dict or: list[dict], optional): Initialization config dict.

Defaults to None.

loss_by_feat(cls_scores: List[torch.Tensor], bbox_preds: List[torch.Tensor], angle_preds: List[torch.Tensor], batch_gt_instances: List[mmengine.structures.instance_data.InstanceData], batch_img_metas: List[dict], batch_gt_instances_ignore: Optional[List[mmengine.structures.instance_data.InstanceData]] = None)dict[源代码]

Compute losses of the head.

  • cls_scores (list[Tensor]) – Box scores for each scale level Has shape (N, num_anchors * num_classes, H, W)

  • bbox_preds (list[Tensor]) – Decoded box for each scale level with shape (N, num_anchors * 4, H, W) in [tl_x, tl_y, br_x, br_y] format.

  • angle_preds (list[Tensor]) – Angle prediction for each scale level with shape (N, num_anchors * angle_out_dim, H, W).

  • batch_gt_instances (list[InstanceData]) – Batch of gt_instance. It usually includes bboxes and labels attributes.

  • batch_img_metas (list[dict]) – Meta information of each image, e.g., image size, scaling factor, etc.

  • batch_gt_instances_ignore (list[InstanceData], Optional) – Batch of gt_instances_ignore. It includes bboxes attribute data that is ignored during training and testing. Defaults to None.


A dictionary of loss components.


dict[str, Tensor]

predict_by_feat(cls_scores: List[torch.Tensor], bbox_preds: List[torch.Tensor], angle_preds: List[torch.Tensor], objectnesses: Optional[List[torch.Tensor]] = None, batch_img_metas: Optional[List[dict]] = None, cfg: Optional[mmengine.config.config.ConfigDict] = None, rescale: bool = True, with_nms: bool = True)List[mmengine.structures.instance_data.InstanceData][源代码]

Transform a batch of output features extracted by the head into bbox results.

  • cls_scores (list[Tensor]) – Classification scores for all scale levels, each is a 4D-tensor, has shape (batch_size, num_priors * num_classes, H, W).

  • bbox_preds (list[Tensor]) – Box energies / deltas for all scale levels, each is a 4D-tensor, has shape (batch_size, num_priors * 4, H, W).

  • angle_preds (list[Tensor]) – Box angle for each scale level with shape (N, num_points * angle_dim, H, W)

  • objectnesses (list[Tensor], Optional) – Score factor for all scale level, each is a 4D-tensor, has shape (batch_size, 1, H, W).

  • batch_img_metas (list[dict], Optional) – Batch image meta info. Defaults to None.

  • cfg (ConfigDict, optional) – Test / postprocessing configuration, if None, test_cfg would be used. Defaults to None.

  • rescale (bool) – If True, return boxes in original image space. Defaults to False.

  • with_nms (bool) – If True, do nms before return boxes. Defaults to True.


Object detection results of each image after the post process. Each item usually contains following keys. - scores (Tensor): Classification scores, has a shape

(num_instance, )

  • labels (Tensor): Labels of bboxes, has a shape (num_instances, ).

  • bboxes (Tensor): Has a shape (num_instances, 5), the last dimension 4 arrange as (x, y, w, h, angle).



class mmyolo.models.dense_heads.RTMDetRotatedSepBNHeadModule(num_classes: int, in_channels: int, widen_factor: float = 1.0, num_base_priors: int = 1, feat_channels: int = 256, stacked_convs: int = 2, featmap_strides: Sequence[int] = [8, 16, 32], share_conv: bool = True, pred_kernel_size: int = 1, angle_out_dim: int = 1, conv_cfg: Optional[Union[mmengine.config.config.ConfigDict, dict]] = None, norm_cfg: Union[mmengine.config.config.ConfigDict, dict] = {'type': 'BN'}, act_cfg: Union[mmengine.config.config.ConfigDict, dict] = {'inplace': True, 'type': 'SiLU'}, init_cfg: Optional[Union[mmengine.config.config.ConfigDict, dict, List[Union[dict, mmengine.config.config.ConfigDict]]]] = None)[源代码]

Detection Head Module of RTMDet-R.

Compared with RTMDet Detection Head Module, RTMDet-R adds a conv for angle prediction. An angle_out_dim arg is added, which is generated by the angle_coder module and controls the angle pred dim.

  • num_classes (int) – Number of categories excluding the background category.

  • in_channels (int) – Number of channels in the input feature map.

  • widen_factor (float) – Width multiplier, multiply number of channels in each layer by this amount. Defaults to 1.0.

  • num_base_priors (int) – The number of priors (points) at a point on the feature grid. Defaults to 1.

  • feat_channels (int) – Number of hidden channels. Used in child classes. Defaults to 256

  • stacked_convs (int) – Number of stacking convs of the head. Defaults to 2.

  • featmap_strides (Sequence[int]) – Downsample factor of each feature map. Defaults to (8, 16, 32).

  • share_conv (bool) – Whether to share conv layers between stages. Defaults to True.

  • pred_kernel_size (int) – Kernel size of nn.Conv2d. Defaults to 1.

  • angle_out_dim (int) – Encoded length of angle, will passed by head. Defaults to 1.

  • conv_cfg (ConfigDict or dict, optional) – Config dict for convolution layer. Defaults to None.

  • norm_cfg (ConfigDict or dict) – Config dict for normalization layer. Defaults to dict(type='BN').

  • act_cfg (ConfigDict or dict) – Config dict for activation layer. Default: dict(type=’SiLU’, inplace=True).

:param init_cfg (ConfigDict or list[ConfigDict] or dict or: list[dict], optional): Initialization config dict.

Defaults to None.

forward(feats: Tuple[torch.Tensor, ...])tuple[源代码]

Forward features from the upstream network.


feats (tuple[Tensor]) – Features from the upstream network, each is a 4D-tensor.


Usually a tuple of classification scores and bbox prediction - cls_scores (list[Tensor]): Classification scores for all scale

levels, each is a 4D-tensor, the channels number is num_base_priors * num_classes.

  • bbox_preds (list[Tensor]): Box energies / deltas for all scale levels, each is a 4D-tensor, the channels number is num_base_priors * 4.

  • angle_preds (list[Tensor]): Angle prediction for all scale levels, each is a 4D-tensor, the channels number is num_base_priors * angle_out_dim.




Initialize weights of the head.

class mmyolo.models.dense_heads.RTMDetSepBNHeadModule(num_classes: int, in_channels: int, widen_factor: float = 1.0, num_base_priors: int = 1, feat_channels: int = 256, stacked_convs: int = 2, featmap_strides: Sequence[int] = [8, 16, 32], share_conv: bool = True, pred_kernel_size: int = 1, conv_cfg: Optional[Union[mmengine.config.config.ConfigDict, dict]] = None, norm_cfg: Union[mmengine.config.config.ConfigDict, dict] = {'type': 'BN'}, act_cfg: Union[mmengine.config.config.ConfigDict, dict] = {'inplace': True, 'type': 'SiLU'}, init_cfg: Optional[Union[mmengine.config.config.ConfigDict, dict, List[Union[dict, mmengine.config.config.ConfigDict]]]] = None)[源代码]

Detection Head of RTMDet.

  • num_classes (int) – Number of categories excluding the background category.

  • in_channels (int) – Number of channels in the input feature map.

  • widen_factor (float) – Width multiplier, multiply number of channels in each layer by this amount. Defaults to 1.0.

  • num_base_priors (int) – The number of priors (points) at a point on the feature grid. Defaults to 1.

  • feat_channels (int) – Number of hidden channels. Used in child classes. Defaults to 256

  • stacked_convs (int) – Number of stacking convs of the head. Defaults to 2.

  • featmap_strides (Sequence[int]) – Downsample factor of each feature map. Defaults to (8, 16, 32).

  • share_conv (bool) – Whether to share conv layers between stages. Defaults to True.

  • pred_kernel_size (int) – Kernel size of nn.Conv2d. Defaults to 1.

  • conv_cfg (ConfigDict or dict, optional) – Config dict for convolution layer. Defaults to None.

  • norm_cfg (ConfigDict or dict) – Config dict for normalization layer. Defaults to dict(type='BN').

  • act_cfg (ConfigDict or dict) – Config dict for activation layer. Default: dict(type=’SiLU’, inplace=True).

:param init_cfg (ConfigDict or list[ConfigDict] or dict or: list[dict], optional): Initialization config dict.

Defaults to None.

forward(feats: Tuple[torch.Tensor, ...])tuple[源代码]

Forward features from the upstream network.


feats (tuple[Tensor]) – Features from the upstream network, each is a 4D-tensor.


Usually a tuple of classification scores and bbox prediction - cls_scores (list[Tensor]): Classification scores for all scale

levels, each is a 4D-tensor, the channels number is num_base_priors * num_classes.

  • bbox_preds (list[Tensor]): Box energies / deltas for all scale levels, each is a 4D-tensor, the channels number is num_base_priors * 4.




Initialize weights of the head.

class mmyolo.models.dense_heads.YOLOXHead(head_module: Union[mmengine.config.config.ConfigDict, dict], prior_generator: Union[mmengine.config.config.ConfigDict, dict] = {'offset': 0, 'strides': [8, 16, 32], 'type': 'mmdet.MlvlPointGenerator'}, bbox_coder: Union[mmengine.config.config.ConfigDict, dict] = {'type': 'YOLOXBBoxCoder'}, loss_cls: Union[mmengine.config.config.ConfigDict, dict] = {'loss_weight': 1.0, 'reduction': 'sum', 'type': 'mmdet.CrossEntropyLoss', 'use_sigmoid': True}, loss_bbox: Union[mmengine.config.config.ConfigDict, dict] = {'eps': 1e-16, 'loss_weight': 5.0, 'mode': 'square', 'reduction': 'sum', 'type': 'mmdet.IoULoss'}, loss_obj: Union[mmengine.config.config.ConfigDict, dict] = {'loss_weight': 1.0, 'reduction': 'sum', 'type': 'mmdet.CrossEntropyLoss', 'use_sigmoid': True}, loss_bbox_aux: Union[mmengine.config.config.ConfigDict, dict] = {'loss_weight': 1.0, 'reduction': 'sum', 'type': 'mmdet.L1Loss'}, train_cfg: Optional[Union[mmengine.config.config.ConfigDict, dict]] = None, test_cfg: Optional[Union[mmengine.config.config.ConfigDict, dict]] = None, init_cfg: Optional[Union[mmengine.config.config.ConfigDict, dict, List[Union[dict, mmengine.config.config.ConfigDict]]]] = None)[源代码]

YOLOXHead head used in YOLOX.

  • head_module (ConfigType) – Base module used for YOLOXHead

  • prior_generator – Points generator feature maps in 2D points-based detectors.

  • loss_cls (ConfigDict or dict) – Config of classification loss.

  • loss_bbox (ConfigDict or dict) – Config of localization loss.

  • loss_obj (ConfigDict or dict) – Config of objectness loss.

  • loss_bbox_aux (ConfigDict or dict) – Config of bbox aux loss.

  • train_cfg (ConfigDict or dict, optional) – Training config of anchor head. Defaults to None.

  • test_cfg (ConfigDict or dict, optional) – Testing config of anchor head. Defaults to None.

:param init_cfg (ConfigDict or list[ConfigDict] or dict or: list[dict], optional): Initialization config dict.

Defaults to None.

forward(x: Tuple[torch.Tensor])Tuple[List][源代码]

Forward features from the upstream network.


x (Tuple[Tensor]) – Features from the upstream network, each is a 4D-tensor.


A tuple of multi-level classification scores, bbox predictions, and objectnesses.



static gt_instances_preprocess(batch_gt_instances: torch.Tensor, batch_size: int)List[mmengine.structures.instance_data.InstanceData][源代码]

Split batch_gt_instances with batch size.

  • batch_gt_instances (Tensor) – Ground truth a 2D-Tensor for whole batch, shape [all_gt_bboxes, 6]

  • batch_size (int) – Batch size.


batch gt instances data, shape [batch_size, InstanceData]



loss_by_feat(cls_scores: Sequence[torch.Tensor], bbox_preds: Sequence[torch.Tensor], objectnesses: Sequence[torch.Tensor], batch_gt_instances: torch.Tensor, batch_img_metas: Sequence[dict], batch_gt_instances_ignore: Optional[List[mmengine.structures.instance_data.InstanceData]] = None)dict[源代码]

Calculate the loss based on the features extracted by the detection head.

  • cls_scores (Sequence[Tensor]) – Box scores for each scale level, each is a 4D-tensor, the channel number is num_priors * num_classes.

  • bbox_preds (Sequence[Tensor]) – Box energies / deltas for each scale level, each is a 4D-tensor, the channel number is num_priors * 4.

  • objectnesses (Sequence[Tensor]) – Score factor for all scale level, each is a 4D-tensor, has shape (batch_size, 1, H, W).

  • batch_gt_instances (list[InstanceData]) – Batch of gt_instance. It usually includes bboxes and labels attributes.

  • batch_img_metas (list[dict]) – Meta information of each image, e.g., image size, scaling factor, etc.

  • batch_gt_instances_ignore (list[InstanceData], optional) – Batch of gt_instances_ignore. It includes bboxes attribute data that is ignored during training and testing. Defaults to None.


A dictionary of losses.


dict[str, Tensor]


Since YOLO series algorithms will inherit from YOLOv5Head, but different algorithms have special initialization process.

The special_init function is designed to deal with this situation.

class mmyolo.models.dense_heads.YOLOXHeadModule(num_classes: int, in_channels: Union[int, Sequence], widen_factor: float = 1.0, num_base_priors: int = 1, feat_channels: int = 256, stacked_convs: int = 2, featmap_strides: Sequence[int] = [8, 16, 32], use_depthwise: bool = False, dcn_on_last_conv: bool = False, conv_bias: Union[bool, str] = 'auto', conv_cfg: Optional[Union[mmengine.config.config.ConfigDict, dict]] = None, norm_cfg: Union[mmengine.config.config.ConfigDict, dict] = {'eps': 0.001, 'momentum': 0.03, 'type': 'BN'}, act_cfg: Union[mmengine.config.config.ConfigDict, dict] = {'inplace': True, 'type': 'SiLU'}, init_cfg: Optional[Union[mmengine.config.config.ConfigDict, dict, List[Union[dict, mmengine.config.config.ConfigDict]]]] = None)[源代码]

YOLOXHead head module used in `YOLOX.

  • num_classes (int) – Number of categories excluding the background category.

  • in_channels (Union[int, Sequence]) – Number of channels in the input feature map.

  • widen_factor (float) – Width multiplier, multiply number of channels in each layer by this amount. Defaults to 1.0.

  • num_base_priors (int) – The number of priors (points) at a point on the feature grid

  • stacked_convs (int) – Number of stacking convs of the head. Defaults to 2.

  • featmap_strides (Sequence[int]) – Downsample factor of each feature map. Defaults to [8, 16, 32].

  • use_depthwise (bool) – Whether to depthwise separable convolution in blocks. Defaults to False.

  • dcn_on_last_conv (bool) – If true, use dcn in the last layer of towers. Defaults to False.

  • conv_bias (bool or str) – If specified as auto, it will be decided by the norm_cfg. Bias of conv will be set as True if norm_cfg is None, otherwise False. Defaults to “auto”.

  • conv_cfg (ConfigDict or dict, optional) – Config dict for convolution layer. Defaults to None.

  • norm_cfg (ConfigDict or dict) – Config dict for normalization layer. Defaults to dict(type=’BN’, momentum=0.03, eps=0.001).

  • act_cfg (ConfigDict or dict) – Config dict for activation layer. Defaults to None.

:param init_cfg (ConfigDict or list[ConfigDict] or dict or: list[dict], optional): Initialization config dict.

Defaults to None.

forward(x: Tuple[torch.Tensor])Tuple[List][源代码]

Forward features from the upstream network.


x (Tuple[Tensor]) – Features from the upstream network, each is a 4D-tensor.


A tuple of multi-level classification scores, bbox predictions, and objectnesses.



forward_single(x: torch.Tensor, cls_convs: torch.nn.modules.module.Module, reg_convs: torch.nn.modules.module.Module, conv_cls: torch.nn.modules.module.Module, conv_reg: torch.nn.modules.module.Module, conv_obj: torch.nn.modules.module.Module)Tuple[torch.Tensor, torch.Tensor, torch.Tensor][源代码]

Forward feature of a single scale level.


Initialize weights of the head.

class mmyolo.models.dense_heads.YOLOXPoseHead(loss_pose: Optional[Union[mmengine.config.config.ConfigDict, dict]] = None, *args, **kwargs)[源代码]

YOLOXPoseHead head used in `YOLO-Pose.

<>`_. :param loss_pose: Config of keypoint OKS loss. :type loss_pose: ConfigDict, optional

decode_pose(grids: torch.Tensor, offsets: torch.Tensor, strides: Union[torch.Tensor, int])torch.Tensor[源代码]

Decode regression offsets to keypoints.

  • grids (torch.Tensor) – The coordinates of the feature map grids.

  • offsets (torch.Tensor) – The predicted offset of each keypoint relative to its corresponding grid.

  • strides (torch.Tensor | int) – The stride of the feature map for each instance.


The decoded keypoints coordinates.



static gt_instances_preprocess(batch_gt_instances: List[mmengine.structures.instance_data.InstanceData], *args, **kwargs)List[mmengine.structures.instance_data.InstanceData][源代码]

Split batch_gt_instances with batch size.

  • batch_gt_instances (Tensor) – Ground truth a 2D-Tensor for whole batch, shape [all_gt_bboxes, 6]

  • batch_size (int) – Batch size.


batch gt instances data, shape [batch_size, InstanceData]



static gt_kps_instances_preprocess(batch_gt_instances: torch.Tensor, batch_gt_keypoints, batch_gt_keypoints_visible, batch_size: int)List[mmengine.structures.instance_data.InstanceData][源代码]

Split batch_gt_instances with batch size.

  • batch_gt_instances (Tensor) – Ground truth a 2D-Tensor for whole batch, shape [all_gt_bboxes, 6]

  • batch_size (int) – Batch size.


batch gt instances data, shape [batch_size, InstanceData]



loss(x: Tuple[torch.Tensor], batch_data_samples: Union[list, dict])dict[源代码]

Perform forward propagation and loss calculation of the detection head on the features of the upstream network.

  • x (tuple[Tensor]) – Features from the upstream network, each is a 4D-tensor.

  • batch_data_samples (List[DetDataSample], dict) – The Data Samples. It usually includes information such as gt_instance, gt_panoptic_seg and gt_sem_seg.


A dictionary of loss components.



loss_by_feat(cls_scores: Sequence[torch.Tensor], bbox_preds: Sequence[torch.Tensor], objectnesses: Sequence[torch.Tensor], kpt_preds: Sequence[torch.Tensor], vis_preds: Sequence[torch.Tensor], batch_gt_instances: torch.Tensor, batch_gt_keypoints: torch.Tensor, batch_gt_keypoints_visible: torch.Tensor, batch_img_metas: Sequence[dict], batch_gt_instances_ignore: Optional[List[mmengine.structures.instance_data.InstanceData]] = None)dict[源代码]

Calculate the loss based on the features extracted by the detection head.

In addition to the base class method, keypoint losses are also calculated in this method.

predict_by_feat(cls_scores: List[torch.Tensor], bbox_preds: List[torch.Tensor], objectnesses: Optional[List[torch.Tensor]] = None, kpt_preds: Optional[List[torch.Tensor]] = None, vis_preds: Optional[List[torch.Tensor]] = None, batch_img_metas: Optional[List[dict]] = None, cfg: Optional[mmengine.config.config.ConfigDict] = None, rescale: bool = True, with_nms: bool = True)List[mmengine.structures.instance_data.InstanceData][源代码]

Transform a batch of output features extracted by the head into bbox and keypoint results.

In addition to the base class method, keypoint predictions are also calculated in this method.

class mmyolo.models.dense_heads.YOLOXPoseHeadModule(num_keypoints: int, *args, **kwargs)[源代码]

YOLOXPoseHeadModule serves as a head module for YOLOX-Pose.

In comparison to YOLOXHeadModule, this module introduces branches for keypoint prediction.

forward(x: Tuple[torch.Tensor])Tuple[List][源代码]

Forward features from the upstream network.


Initialize weights of the head.

class mmyolo.models.dense_heads.YOLOv5Head(head_module: Union[mmengine.config.config.ConfigDict, dict], prior_generator: Union[mmengine.config.config.ConfigDict, dict] = {'base_sizes': [[(10, 13), (16, 30), (33, 23)], [(30, 61), (62, 45), (59, 119)], [(116, 90), (156, 198), (373, 326)]], 'strides': [8, 16, 32], 'type': 'mmdet.YOLOAnchorGenerator'}, bbox_coder: Union[mmengine.config.config.ConfigDict, dict] = {'type': 'YOLOv5BBoxCoder'}, loss_cls: Union[mmengine.config.config.ConfigDict, dict] = {'loss_weight': 0.5, 'reduction': 'mean', 'type': 'mmdet.CrossEntropyLoss', 'use_sigmoid': True}, loss_bbox: Union[mmengine.config.config.ConfigDict, dict] = {'bbox_format': 'xywh', 'eps': 1e-07, 'iou_mode': 'ciou', 'loss_weight': 0.05, 'reduction': 'mean', 'return_iou': True, 'type': 'IoULoss'}, loss_obj: Union[mmengine.config.config.ConfigDict, dict] = {'loss_weight': 1.0, 'reduction': 'mean', 'type': 'mmdet.CrossEntropyLoss', 'use_sigmoid': True}, prior_match_thr: float = 4.0, near_neighbor_thr: float = 0.5, ignore_iof_thr: float = - 1.0, obj_level_weights: List[float] = [4.0, 1.0, 0.4], train_cfg: Optional[Union[mmengine.config.config.ConfigDict, dict]] = None, test_cfg: Optional[Union[mmengine.config.config.ConfigDict, dict]] = None, init_cfg: Optional[Union[mmengine.config.config.ConfigDict, dict, List[Union[dict, mmengine.config.config.ConfigDict]]]] = None)[源代码]

YOLOv5Head head used in YOLOv5.

  • head_module (ConfigType) – Base module used for YOLOv5Head

  • prior_generator (dict) – Points generator feature maps in 2D points-based detectors.

  • bbox_coder (ConfigDict or dict) – Config of bbox coder.

  • loss_cls (ConfigDict or dict) – Config of classification loss.

  • loss_bbox (ConfigDict or dict) – Config of localization loss.

  • loss_obj (ConfigDict or dict) – Config of objectness loss.

  • prior_match_thr (float) – Defaults to 4.0.

  • ignore_iof_thr (float) – Defaults to -1.0.

  • obj_level_weights (List[float]) – Defaults to [4.0, 1.0, 0.4].

  • train_cfg (ConfigDict or dict, optional) – Training config of anchor head. Defaults to None.

  • test_cfg (ConfigDict or dict, optional) – Testing config of anchor head. Defaults to None.

:param init_cfg (ConfigDict or list[ConfigDict] or dict or: list[dict], optional): Initialization config dict.

Defaults to None.

forward(x: Tuple[torch.Tensor])Tuple[List][源代码]

Forward features from the upstream network.


x (Tuple[Tensor]) – Features from the upstream network, each is a 4D-tensor.


A tuple of multi-level classification scores, bbox predictions, and objectnesses.



loss(x: Tuple[torch.Tensor], batch_data_samples: Union[list, dict])dict[源代码]

Perform forward propagation and loss calculation of the detection head on the features of the upstream network.

  • x (tuple[Tensor]) – Features from the upstream network, each is a 4D-tensor.

  • batch_data_samples (List[DetDataSample], dict) – The Data Samples. It usually includes information such as gt_instance, gt_panoptic_seg and gt_sem_seg.


A dictionary of loss components.



loss_by_feat(cls_scores: Sequence[torch.Tensor], bbox_preds: Sequence[torch.Tensor], objectnesses: Sequence[torch.Tensor], batch_gt_instances: Sequence[mmengine.structures.instance_data.InstanceData], batch_img_metas: Sequence[dict], batch_gt_instances_ignore: Optional[List[mmengine.structures.instance_data.InstanceData]] = None)dict[源代码]

Calculate the loss based on the features extracted by the detection head.

  • cls_scores (Sequence[Tensor]) – Box scores for each scale level, each is a 4D-tensor, the channel number is num_priors * num_classes.

  • bbox_preds (Sequence[Tensor]) – Box energies / deltas for each scale level, each is a 4D-tensor, the channel number is num_priors * 4.

  • objectnesses (Sequence[Tensor]) – Score factor for all scale level, each is a 4D-tensor, has shape (batch_size, 1, H, W).

  • batch_gt_instances (Sequence[InstanceData]) – Batch of gt_instance. It usually includes bboxes and labels attributes.

  • batch_img_metas (Sequence[dict]) – Meta information of each image, e.g., image size, scaling factor, etc.

  • batch_gt_instances_ignore (list[InstanceData], optional) – Batch of gt_instances_ignore. It includes bboxes attribute data that is ignored during training and testing. Defaults to None.


A dictionary of losses.


dict[str, Tensor]

predict_by_feat(cls_scores: List[torch.Tensor], bbox_preds: List[torch.Tensor], objectnesses: Optional[List[torch.Tensor]] = None, batch_img_metas: Optional[List[dict]] = None, cfg: Optional[mmengine.config.config.ConfigDict] = None, rescale: bool = True, with_nms: bool = True)List[mmengine.structures.instance_data.InstanceData][源代码]

Transform a batch of output features extracted by the head into bbox results. :param cls_scores: Classification scores for all

scale levels, each is a 4D-tensor, has shape (batch_size, num_priors * num_classes, H, W).

  • bbox_preds (list[Tensor]) – Box energies / deltas for all scale levels, each is a 4D-tensor, has shape (batch_size, num_priors * 4, H, W).

  • objectnesses (list[Tensor], Optional) – Score factor for all scale level, each is a 4D-tensor, has shape (batch_size, 1, H, W).

  • batch_img_metas (list[dict], Optional) – Batch image meta info. Defaults to None.

  • cfg (ConfigDict, optional) – Test / postprocessing configuration, if None, test_cfg would be used. Defaults to None.

  • rescale (bool) – If True, return boxes in original image space. Defaults to False.

  • with_nms (bool) – If True, do nms before return boxes. Defaults to True.


Object detection results of each image after the post process. Each item usually contains following keys.

  • scores (Tensor): Classification scores, has a shape (num_instance, )

  • labels (Tensor): Labels of bboxes, has a shape (num_instances, ).

  • bboxes (Tensor): Has a shape (num_instances, 4), the last dimension 4 arrange as (x1, y1, x2, y2).




Since YOLO series algorithms will inherit from YOLOv5Head, but different algorithms have special initialization process.

The special_init function is designed to deal with this situation.

class mmyolo.models.dense_heads.YOLOv5HeadModule(num_classes: int, in_channels: Union[int, Sequence], widen_factor: float = 1.0, num_base_priors: int = 3, featmap_strides: Sequence[int] = (8, 16, 32), init_cfg: Optional[Union[mmengine.config.config.ConfigDict, dict, List[Union[dict, mmengine.config.config.ConfigDict]]]] = None)[源代码]

YOLOv5Head head module used in YOLOv5.

  • num_classes (int) – Number of categories excluding the background category.

  • in_channels (Union[int, Sequence]) – Number of channels in the input feature map.

  • widen_factor (float) – Width multiplier, multiply number of channels in each layer by this amount. Defaults to 1.0.

  • num_base_priors (int) – The number of priors (points) at a point on the feature grid.

  • featmap_strides (Sequence[int]) – Downsample factor of each feature map. Defaults to (8, 16, 32).

:param init_cfg (ConfigDict or list[ConfigDict] or dict or: list[dict], optional): Initialization config dict.

Defaults to None.

forward(x: Tuple[torch.Tensor])Tuple[List][源代码]

Forward features from the upstream network.


x (Tuple[Tensor]) – Features from the upstream network, each is a 4D-tensor.


A tuple of multi-level classification scores, bbox predictions, and objectnesses.



forward_single(x: torch.Tensor, convs: torch.nn.modules.module.Module)Tuple[torch.Tensor, torch.Tensor, torch.Tensor][源代码]

Forward feature of a single scale level.


Initialize the bias of YOLOv5 head.

class mmyolo.models.dense_heads.YOLOv5InsHead(*args, mask_overlap: bool = True, loss_mask: Union[mmengine.config.config.ConfigDict, dict] = {'reduction': 'none', 'type': 'mmdet.CrossEntropyLoss', 'use_sigmoid': True}, loss_mask_weight=0.05, **kwargs)[源代码]

YOLOv5 Instance Segmentation and Detection head.

  • mask_overlap (bool) – Defaults to True.

  • loss_mask (ConfigDict or dict) – Config of mask loss.

  • loss_mask_weight (float) – The weight of mask loss.

crop_mask(masks: torch.Tensor, boxes: torch.Tensor)torch.Tensor[源代码]

Crop mask by the bounding box.

  • masks (Tensor) – Predicted mask results. Has shape (1, num_instance, H, W).

  • boxes (Tensor) – Tensor of the bbox. Has shape (num_instance, 4).


The masks are being cropped to the bounding box.



loss(x: Tuple[torch.Tensor], batch_data_samples: Union[list, dict])dict[源代码]

Perform forward propagation and loss calculation of the detection head on the features of the upstream network.

  • x (tuple[Tensor]) – Features from the upstream network, each is a 4D-tensor.

  • batch_data_samples (List[DetDataSample], dict) – The Data Samples. It usually includes information such as gt_instance, gt_panoptic_seg and gt_sem_seg.


A dictionary of loss components.



loss_by_feat(cls_scores: Sequence[torch.Tensor], bbox_preds: Sequence[torch.Tensor], objectnesses: Sequence[torch.Tensor], coeff_preds: Sequence[torch.Tensor], proto_preds: torch.Tensor, batch_gt_instances: Sequence[mmengine.structures.instance_data.InstanceData], batch_gt_masks: Sequence[torch.Tensor], batch_img_metas: Sequence[dict], batch_gt_instances_ignore: Optional[List[mmengine.structures.instance_data.InstanceData]] = None)dict[源代码]

Calculate the loss based on the features extracted by the detection head.

  • cls_scores (Sequence[Tensor]) – Box scores for each scale level, each is a 4D-tensor, the channel number is num_priors * num_classes.

  • bbox_preds (Sequence[Tensor]) – Box energies / deltas for each scale level, each is a 4D-tensor, the channel number is num_priors * 4.

  • objectnesses (Sequence[Tensor]) – Score factor for all scale level, each is a 4D-tensor, has shape (batch_size, 1, H, W).

  • coeff_preds (Sequence[Tensor]) – Mask coefficient for each scale level, each is a 4D-tensor, the channel number is num_priors * mask_channels.

  • proto_preds (Tensor) – Mask prototype features extracted from the mask head, has shape (batch_size, mask_channels, H, W).

  • batch_gt_instances (Sequence[InstanceData]) – Batch of gt_instance. It usually includes bboxes and labels attributes.

  • batch_gt_masks (Sequence[Tensor]) – Batch of gt_mask.

  • batch_img_metas (Sequence[dict]) – Meta information of each image, e.g., image size, scaling factor, etc.

  • batch_gt_instances_ignore (list[InstanceData], optional) – Batch of gt_instances_ignore. It includes bboxes attribute data that is ignored during training and testing. Defaults to None.


A dictionary of losses.


dict[str, Tensor]

predict_by_feat(cls_scores: List[torch.Tensor], bbox_preds: List[torch.Tensor], objectnesses: Optional[List[torch.Tensor]] = None, coeff_preds: Optional[List[torch.Tensor]] = None, proto_preds: Optional[torch.Tensor] = None, batch_img_metas: Optional[List[dict]] = None, cfg: Optional[mmengine.config.config.ConfigDict] = None, rescale: bool = True, with_nms: bool = True)List[mmengine.structures.instance_data.InstanceData][源代码]

Transform a batch of output features extracted from the head into bbox results. Note: When score_factors is not None, the cls_scores are usually multiplied by it then obtain the real score used in NMS. :param cls_scores: Classification scores for all

scale levels, each is a 4D-tensor, has shape (batch_size, num_priors * num_classes, H, W).

  • bbox_preds (list[Tensor]) – Box energies / deltas for all scale levels, each is a 4D-tensor, has shape (batch_size, num_priors * 4, H, W).

  • objectnesses (list[Tensor], Optional) – Score factor for all scale level, each is a 4D-tensor, has shape (batch_size, 1, H, W).

  • coeff_preds (list[Tensor]) – Mask coefficients predictions for all scale levels, each is a 4D-tensor, has shape (batch_size, mask_channels, H, W).

  • proto_preds (Tensor) – Mask prototype features extracted from the mask head, has shape (batch_size, mask_channels, H, W).

  • batch_img_metas (list[dict], Optional) – Batch image meta info. Defaults to None.

  • cfg (ConfigDict, optional) – Test / postprocessing configuration, if None, test_cfg would be used. Defaults to None.

  • rescale (bool) – If True, return boxes in original image space. Defaults to False.

  • with_nms (bool) – If True, do nms before return boxes. Defaults to True.


Object detection and instance segmentation results of each image after the post process. Each item usually contains following keys.

  • scores (Tensor): Classification scores, has a shape (num_instance, )

  • labels (Tensor): Labels of bboxes, has a shape (num_instances, ).

  • bboxes (Tensor): Has a shape (num_instances, 4), the last dimension 4 arrange as (x1, y1, x2, y2).

  • masks (Tensor): Has a shape (num_instances, h, w).



process_mask(mask_proto: torch.Tensor, mask_coeff_pred: torch.Tensor, bboxes: torch.Tensor, shape: Tuple[int, int], upsample: bool = False)torch.Tensor[源代码]

Generate mask logits results.

  • mask_proto (Tensor) – Mask prototype features. Has shape (num_instance, mask_channels).

  • mask_coeff_pred (Tensor) – Mask coefficients prediction for single image. Has shape (mask_channels, H, W)

  • bboxes (Tensor) – Tensor of the bbox. Has shape (num_instance, 4).

  • shape (Tuple) – Batch input shape of image.

  • upsample (bool) – Whether upsample masks results to batch input shape. Default to False.


Instance segmentation masks for each instance.

Has shape (num_instance, H, W).



class mmyolo.models.dense_heads.YOLOv5InsHeadModule(*args, num_classes: int, mask_channels: int = 32, proto_channels: int = 256, widen_factor: float = 1.0, norm_cfg: Union[mmengine.config.config.ConfigDict, dict] = {'eps': 0.001, 'momentum': 0.03, 'type': 'BN'}, act_cfg: Union[mmengine.config.config.ConfigDict, dict] = {'inplace': True, 'type': 'SiLU'}, **kwargs)[源代码]

Detection and Instance Segmentation Head of YOLOv5.

  • num_classes (int) – Number of categories excluding the background category.

  • mask_channels (int) – Number of channels in the mask feature map. This is the channel count of the mask.

  • proto_channels (int) – Number of channels in the proto feature map.

  • widen_factor (float) – Width multiplier, multiply number of channels in each layer by this amount. Defaults to 1.0.

  • norm_cfg (ConfigDict or dict) – Config dict for normalization layer. Defaults to dict(type='BN', momentum=0.03, eps=0.001).

  • act_cfg (ConfigDict or dict) – Config dict for activation layer. Default: dict(type=’SiLU’, inplace=True).

forward(x: Tuple[torch.Tensor])Tuple[List][源代码]

Forward features from the upstream network.


x (Tuple[Tensor]) – Features from the upstream network, each is a 4D-tensor.


A tuple of multi-level classification scores, bbox predictions, objectnesses, and mask predictions.



forward_single(x: torch.Tensor, convs_pred: torch.nn.modules.module.Module)Tuple[torch.Tensor, torch.Tensor, torch.Tensor, torch.Tensor][源代码]

Forward feature of a single scale level.

class mmyolo.models.dense_heads.YOLOv6Head(head_module: Union[mmengine.config.config.ConfigDict, dict], prior_generator: Union[mmengine.config.config.ConfigDict, dict] = {'offset': 0.5, 'strides': [8, 16, 32], 'type': 'mmdet.MlvlPointGenerator'}, bbox_coder: Union[mmengine.config.config.ConfigDict, dict] = {'type': 'DistancePointBBoxCoder'}, loss_cls: Union[mmengine.config.config.ConfigDict, dict] = {'alpha': 0.75, 'gamma': 2.0, 'iou_weighted': True, 'loss_weight': 1.0, 'reduction': 'sum', 'type': 'mmdet.VarifocalLoss', 'use_sigmoid': True}, loss_bbox: Union[mmengine.config.config.ConfigDict, dict] = {'bbox_format': 'xyxy', 'iou_mode': 'giou', 'loss_weight': 2.5, 'reduction': 'mean', 'return_iou': False, 'type': 'IoULoss'}, train_cfg: Optional[Union[mmengine.config.config.ConfigDict, dict]] = None, test_cfg: Optional[Union[mmengine.config.config.ConfigDict, dict]] = None, init_cfg: Optional[Union[mmengine.config.config.ConfigDict, dict, List[Union[dict, mmengine.config.config.ConfigDict]]]] = None)[源代码]

YOLOv6Head head used in YOLOv6.

  • head_module (ConfigType) – Base module used for YOLOv6Head

  • prior_generator (dict) – Points generator feature maps in 2D points-based detectors.

  • loss_cls (ConfigDict or dict) – Config of classification loss.

  • loss_bbox (ConfigDict or dict) – Config of localization loss.

  • train_cfg (ConfigDict or dict, optional) – Training config of anchor head. Defaults to None.

  • test_cfg (ConfigDict or dict, optional) – Testing config of anchor head. Defaults to None.

:param init_cfg (ConfigDict or list[ConfigDict] or dict or: list[dict], optional): Initialization config dict.

Defaults to None.

loss_by_feat(cls_scores: Sequence[torch.Tensor], bbox_preds: Sequence[torch.Tensor], bbox_dist_preds: Sequence[torch.Tensor], batch_gt_instances: Sequence[mmengine.structures.instance_data.InstanceData], batch_img_metas: Sequence[dict], batch_gt_instances_ignore: Optional[List[mmengine.structures.instance_data.InstanceData]] = None)dict[源代码]

Calculate the loss based on the features extracted by the detection head.

  • cls_scores (Sequence[Tensor]) – Box scores for each scale level, each is a 4D-tensor, the channel number is num_priors * num_classes.

  • bbox_preds (Sequence[Tensor]) – Box energies / deltas for each scale level, each is a 4D-tensor, the channel number is num_priors * 4.

  • batch_gt_instances (list[InstanceData]) – Batch of gt_instance. It usually includes bboxes and labels attributes.

  • batch_img_metas (list[dict]) – Meta information of each image, e.g., image size, scaling factor, etc.

  • batch_gt_instances_ignore (list[InstanceData], optional) – Batch of gt_instances_ignore. It includes bboxes attribute data that is ignored during training and testing. Defaults to None.


A dictionary of losses.


dict[str, Tensor]


Since YOLO series algorithms will inherit from YOLOv5Head, but different algorithms have special initialization process.

The special_init function is designed to deal with this situation.

class mmyolo.models.dense_heads.YOLOv6HeadModule(num_classes: int, in_channels: Union[int, Sequence], widen_factor: float = 1.0, num_base_priors: int = 1, reg_max=0, featmap_strides: Sequence[int] = (8, 16, 32), norm_cfg: Union[mmengine.config.config.ConfigDict, dict] = {'eps': 0.001, 'momentum': 0.03, 'type': 'BN'}, act_cfg: Union[mmengine.config.config.ConfigDict, dict] = {'inplace': True, 'type': 'SiLU'}, init_cfg: Optional[Union[mmengine.config.config.ConfigDict, dict, List[Union[dict, mmengine.config.config.ConfigDict]]]] = None)[源代码]

YOLOv6Head head module used in `YOLOv6.


  • num_classes (int) – Number of categories excluding the background category.

  • in_channels (Union[int, Sequence]) – Number of channels in the input feature map.

  • widen_factor (float) – Width multiplier, multiply number of channels in each layer by this amount. Defaults to 1.0.

  • num_base_priors – (int): The number of priors (points) at a point on the feature grid.

  • featmap_strides (Sequence[int]) –

    Downsample factor of each feature map.

    Defaults to [8, 16, 32].

    None, otherwise False. Defaults to “auto”.

  • norm_cfg (ConfigDict or dict) – Config dict for normalization layer. Defaults to dict(type=’BN’, momentum=0.03, eps=0.001).

  • act_cfg (ConfigDict or dict) – Config dict for activation layer. Defaults to None.

:param init_cfg (ConfigDict or list[ConfigDict] or dict or: list[dict], optional): Initialization config dict.

Defaults to None.

forward(x: Tuple[torch.Tensor])Tuple[List][源代码]

Forward features from the upstream network.


x (Tuple[Tensor]) – Features from the upstream network, each is a 4D-tensor.


A tuple of multi-level classification scores, bbox predictions.



forward_single(x: torch.Tensor, stem: torch.nn.modules.module.Module, cls_conv: torch.nn.modules.module.Module, cls_pred: torch.nn.modules.module.Module, reg_conv: torch.nn.modules.module.Module, reg_pred: torch.nn.modules.module.Module)Tuple[torch.Tensor, torch.Tensor][源代码]

Forward feature of a single scale level.


Initialize the weights.

class mmyolo.models.dense_heads.YOLOv7Head(*args, simota_candidate_topk: int = 20, simota_iou_weight: float = 3.0, simota_cls_weight: float = 1.0, aux_loss_weights: float = 0.25, **kwargs)[源代码]

YOLOv7Head head used in YOLOv7.

  • simota_candidate_topk (int) – The candidate top-k which used to get top-k ious to calculate dynamic-k in BatchYOLOv7Assigner. Defaults to 10.

  • simota_iou_weight (float) – The scale factor for regression iou cost in BatchYOLOv7Assigner. Defaults to 3.0.

  • simota_cls_weight (float) – The scale factor for classification cost in BatchYOLOv7Assigner. Defaults to 1.0.

loss_by_feat(cls_scores: Sequence[Union[torch.Tensor, List]], bbox_preds: Sequence[Union[torch.Tensor, List]], objectnesses: Sequence[Union[torch.Tensor, List]], batch_gt_instances: Sequence[mmengine.structures.instance_data.InstanceData], batch_img_metas: Sequence[dict], batch_gt_instances_ignore: Optional[List[mmengine.structures.instance_data.InstanceData]] = None)dict[源代码]

Calculate the loss based on the features extracted by the detection head.

  • cls_scores (Sequence[Tensor]) – Box scores for each scale level, each is a 4D-tensor, the channel number is num_priors * num_classes.

  • bbox_preds (Sequence[Tensor]) – Box energies / deltas for each scale level, each is a 4D-tensor, the channel number is num_priors * 4.

  • objectnesses (Sequence[Tensor]) – Score factor for all scale level, each is a 4D-tensor, has shape (batch_size, 1, H, W).

  • batch_gt_instances (list[InstanceData]) – Batch of gt_instance. It usually includes bboxes and labels attributes.

  • batch_img_metas (list[dict]) – Meta information of each image, e.g., image size, scaling factor, etc.

  • batch_gt_instances_ignore (list[InstanceData], optional) – Batch of gt_instances_ignore. It includes bboxes attribute data that is ignored during training and testing. Defaults to None.


A dictionary of losses.


dict[str, Tensor]

class mmyolo.models.dense_heads.YOLOv7HeadModule(num_classes: int, in_channels: Union[int, Sequence], widen_factor: float = 1.0, num_base_priors: int = 3, featmap_strides: Sequence[int] = (8, 16, 32), init_cfg: Optional[Union[mmengine.config.config.ConfigDict, dict, List[Union[dict, mmengine.config.config.ConfigDict]]]] = None)[源代码]

YOLOv7Head head module used in YOLOv7.


Initialize the bias of YOLOv7 head.

class mmyolo.models.dense_heads.YOLOv7p6HeadModule(*args, main_out_channels: Sequence[int] = [256, 512, 768, 1024], aux_out_channels: Sequence[int] = [320, 640, 960, 1280], use_aux: bool = True, norm_cfg: Union[mmengine.config.config.ConfigDict, dict] = {'eps': 0.001, 'momentum': 0.03, 'type': 'BN'}, act_cfg: Union[mmengine.config.config.ConfigDict, dict] = {'inplace': True, 'type': 'SiLU'}, **kwargs)[源代码]

YOLOv7Head head module used in YOLOv7.

forward(x: Tuple[torch.Tensor])Tuple[List][源代码]

Forward features from the upstream network.


x (Tuple[Tensor]) – Features from the upstream network, each is a 4D-tensor.


A tuple of multi-level classification scores, bbox predictions, and objectnesses.



forward_single(x: torch.Tensor, convs: torch.nn.modules.module.Module, aux_convs: Optional[torch.nn.modules.module.Module])Tuple[Union[torch.Tensor, List], Union[torch.Tensor, List], Union[torch.Tensor, List]][源代码]

Forward feature of a single scale level.


Initialize the bias of YOLOv5 head.

class mmyolo.models.dense_heads.YOLOv8Head(head_module: Union[mmengine.config.config.ConfigDict, dict], prior_generator: Union[mmengine.config.config.ConfigDict, dict] = {'offset': 0.5, 'strides': [8, 16, 32], 'type': 'mmdet.MlvlPointGenerator'}, bbox_coder: Union[mmengine.config.config.ConfigDict, dict] = {'type': 'DistancePointBBoxCoder'}, loss_cls: Union[mmengine.config.config.ConfigDict, dict] = {'loss_weight': 0.5, 'reduction': 'none', 'type': 'mmdet.CrossEntropyLoss', 'use_sigmoid': True}, loss_bbox: Union[mmengine.config.config.ConfigDict, dict] = {'bbox_format': 'xyxy', 'iou_mode': 'ciou', 'loss_weight': 7.5, 'reduction': 'sum', 'return_iou': False, 'type': 'IoULoss'}, loss_dfl={'loss_weight': 0.375, 'reduction': 'mean', 'type': 'mmdet.DistributionFocalLoss'}, train_cfg: Optional[Union[mmengine.config.config.ConfigDict, dict]] = None, test_cfg: Optional[Union[mmengine.config.config.ConfigDict, dict]] = None, init_cfg: Optional[Union[mmengine.config.config.ConfigDict, dict, List[Union[dict, mmengine.config.config.ConfigDict]]]] = None)[源代码]

YOLOv8Head head used in YOLOv8.

  • head_module (ConfigDict or dict) – Base module used for YOLOv8Head

  • prior_generator (dict) – Points generator feature maps in 2D points-based detectors.

  • bbox_coder (ConfigDict or dict) – Config of bbox coder.

  • loss_cls (ConfigDict or dict) – Config of classification loss.

  • loss_bbox (ConfigDict or dict) – Config of localization loss.

  • loss_dfl (ConfigDict or dict) – Config of Distribution Focal Loss.

  • train_cfg (ConfigDict or dict, optional) – Training config of anchor head. Defaults to None.

  • test_cfg (ConfigDict or dict, optional) – Testing config of anchor head. Defaults to None.

:param init_cfg (ConfigDict or list[ConfigDict] or dict or: list[dict], optional): Initialization config dict.

Defaults to None.

loss_by_feat(cls_scores: Sequence[torch.Tensor], bbox_preds: Sequence[torch.Tensor], bbox_dist_preds: Sequence[torch.Tensor], batch_gt_instances: Sequence[mmengine.structures.instance_data.InstanceData], batch_img_metas: Sequence[dict], batch_gt_instances_ignore: Optional[List[mmengine.structures.instance_data.InstanceData]] = None)dict[源代码]

Calculate the loss based on the features extracted by the detection head.

  • cls_scores (Sequence[Tensor]) – Box scores for each scale level, each is a 4D-tensor, the channel number is num_priors * num_classes.

  • bbox_preds (Sequence[Tensor]) – Box energies / deltas for each scale level, each is a 4D-tensor, the channel number is num_priors * 4.

  • bbox_dist_preds (Sequence[Tensor]) – Box distribution logits for each scale level with shape (bs, reg_max + 1, H*W, 4).

  • batch_gt_instances (list[InstanceData]) – Batch of gt_instance. It usually includes bboxes and labels attributes.

  • batch_img_metas (list[dict]) – Meta information of each image, e.g., image size, scaling factor, etc.

  • batch_gt_instances_ignore (list[InstanceData], optional) – Batch of gt_instances_ignore. It includes bboxes attribute data that is ignored during training and testing. Defaults to None.


A dictionary of losses.


dict[str, Tensor]


Since YOLO series algorithms will inherit from YOLOv5Head, but different algorithms have special initialization process.

The special_init function is designed to deal with this situation.

class mmyolo.models.dense_heads.YOLOv8HeadModule(num_classes: int, in_channels: Union[int, Sequence], widen_factor: float = 1.0, num_base_priors: int = 1, featmap_strides: Sequence[int] = (8, 16, 32), reg_max: int = 16, norm_cfg: Union[mmengine.config.config.ConfigDict, dict] = {'eps': 0.001, 'momentum': 0.03, 'type': 'BN'}, act_cfg: Union[mmengine.config.config.ConfigDict, dict] = {'inplace': True, 'type': 'SiLU'}, init_cfg: Optional[Union[mmengine.config.config.ConfigDict, dict, List[Union[dict, mmengine.config.config.ConfigDict]]]] = None)[源代码]

YOLOv8HeadModule head module used in YOLOv8.

  • num_classes (int) – Number of categories excluding the background category.

  • in_channels (Union[int, Sequence]) – Number of channels in the input feature map.

  • widen_factor (float) – Width multiplier, multiply number of channels in each layer by this amount. Defaults to 1.0.

  • num_base_priors (int) – The number of priors (points) at a point on the feature grid.

  • featmap_strides (Sequence[int]) – Downsample factor of each feature map. Defaults to [8, 16, 32].

  • reg_max (int) – Max value of integral set :math: {0, ..., reg_max-1} in QFL setting. Defaults to 16.

  • norm_cfg (ConfigDict or dict) – Config dict for normalization layer. Defaults to dict(type=’BN’, momentum=0.03, eps=0.001).

  • act_cfg (ConfigDict or dict) – Config dict for activation layer. Defaults to None.

:param init_cfg (ConfigDict or list[ConfigDict] or dict or: list[dict], optional): Initialization config dict.

Defaults to None.

forward(x: Tuple[torch.Tensor])Tuple[List][源代码]

Forward features from the upstream network.


x (Tuple[Tensor]) – Features from the upstream network, each is a 4D-tensor.


A tuple of multi-level classification scores, bbox predictions



forward_single(x: torch.Tensor, cls_pred: torch.nn.modules.container.ModuleList, reg_pred: torch.nn.modules.container.ModuleList)Tuple[源代码]

Forward feature of a single scale level.


Initialize the weight and bias of PPYOLOE head.


class mmyolo.models.detectors.YOLODetector(backbone: Union[mmengine.config.config.ConfigDict, dict], neck: Union[mmengine.config.config.ConfigDict, dict], bbox_head: Union[mmengine.config.config.ConfigDict, dict], train_cfg: Optional[Union[mmengine.config.config.ConfigDict, dict]] = None, test_cfg: Optional[Union[mmengine.config.config.ConfigDict, dict]] = None, data_preprocessor: Optional[Union[mmengine.config.config.ConfigDict, dict]] = None, init_cfg: Optional[Union[mmengine.config.config.ConfigDict, dict, List[Union[dict, mmengine.config.config.ConfigDict]]]] = None, use_syncbn: bool = True)[源代码]

Implementation of YOLO Series

  • backbone (ConfigDict or dict) – The backbone config.

  • neck (ConfigDict or dict) – The neck config.

  • bbox_head (ConfigDict or dict) – The bbox head config.

  • train_cfg (ConfigDict or dict, optional) – The training config of YOLO. Defaults to None.

  • test_cfg (ConfigDict or dict, optional) – The testing config of YOLO. Defaults to None.

  • data_preprocessor (ConfigDict or dict, optional) – Config of DetDataPreprocessor to process the input data. Defaults to None.

:param init_cfg (ConfigDict or list[ConfigDict] or dict or: list[dict], optional): Initialization config dict.

Defaults to None.


use_syncbn (bool) – whether to use SyncBatchNorm. Defaults to True.


class mmyolo.models.layers.BepC3StageBlock(in_channels: int, out_channels: int, num_blocks: int = 1, hidden_ratio: float = 0.5, concat_all_layer: bool = True, block_cfg: Union[mmengine.config.config.ConfigDict, dict] = {'type': 'RepVGGBlock'}, norm_cfg: Union[mmengine.config.config.ConfigDict, dict] = {'eps': 0.001, 'momentum': 0.03, 'type': 'BN'}, act_cfg: Union[mmengine.config.config.ConfigDict, dict] = {'inplace': True, 'type': 'ReLU'})[源代码]

Beer-mug RepC3 Block.

  • in_channels (int) – Number of channels in the input image

  • out_channels (int) – Number of channels produced by the convolution

  • num_blocks (int) – Number of blocks. Defaults to 1

  • hidden_ratio (float) – Hidden channel expansion. Default: 0.5

  • concat_all_layer (bool) – Concat all layer when forward calculate. Default: True

  • block_cfg (dict) – Config dict for the block used to build each layer. Defaults to dict(type=’RepVGGBlock’).

  • norm_cfg (ConfigType) – Config dict for normalization layer. Defaults to dict(type=’BN’, momentum=0.03, eps=0.001).

  • act_cfg (ConfigType) – Config dict for activation layer. Defaults to dict(type=’ReLU’, inplace=True).


Defines the computation performed at every call.

Should be overridden by all subclasses.


Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

class mmyolo.models.layers.BiFusion(in_channels0: int, in_channels1: int, out_channels: int, norm_cfg: Union[mmengine.config.config.ConfigDict, dict] = {'eps': 0.001, 'momentum': 0.03, 'type': 'BN'}, act_cfg: Union[mmengine.config.config.ConfigDict, dict] = {'inplace': True, 'type': 'ReLU'})[源代码]

BiFusion Block in YOLOv6.

BiFusion fuses current-, high- and low-level features. Compared with concatenation in PAN, it fuses an extra low-level feature.

  • in_channels0 (int) – The channels of current-level feature.

  • in_channels1 (int) – The input channels of lower-level feature.

  • out_channels (int) – The out channels of the BiFusion module.

  • norm_cfg (dict) – Config dict for normalization layer. Defaults to dict(type=’BN’, momentum=0.03, eps=0.001).

  • act_cfg (dict) – Config dict for activation layer. Defaults to dict(type=’SiLU’, inplace=True).

  • init_cfg (dict or list[dict], optional) – Initialization config dict. Defaults to None.

forward(x: List[torch.Tensor])torch.Tensor[源代码]

Forward process :param x: The tensor list of length 3.

x[0]: The high-level feature. x[1]: The current-level feature. x[2]: The low-level feature.

class mmyolo.models.layers.CSPLayerWithTwoConv(in_channels: int, out_channels: int, expand_ratio: float = 0.5, num_blocks: int = 1, add_identity: bool = True, conv_cfg: Optional[Union[mmengine.config.config.ConfigDict, dict]] = None, norm_cfg: Union[mmengine.config.config.ConfigDict, dict] = {'eps': 0.001, 'momentum': 0.03, 'type': 'BN'}, act_cfg: Union[mmengine.config.config.ConfigDict, dict] = {'inplace': True, 'type': 'SiLU'}, init_cfg: Optional[Union[mmengine.config.config.ConfigDict, dict, List[Union[dict, mmengine.config.config.ConfigDict]]]] = None)[源代码]

Cross Stage Partial Layer with 2 convolutions.

  • in_channels (int) – The input channels of the CSP layer.

  • out_channels (int) – The output channels of the CSP layer.

  • expand_ratio (float) – Ratio to adjust the number of channels of the hidden layer. Defaults to 0.5.

  • num_blocks (int) – Number of blocks. Defaults to 1

  • add_identity (bool) – Whether to add identity in blocks. Defaults to True.

  • conv_cfg (dict, optional) – Config dict for convolution layer. Defaults to None, which means using conv2d.

  • norm_cfg (dict) – Config dict for normalization layer. Defaults to dict(type=’BN’).

  • act_cfg (dict) – Config dict for activation layer. Defaults to dict(type=’SiLU’, inplace=True).

:param init_cfg (ConfigDict or dict or list[dict] or: list[ConfigDict], optional): Initialization config dict.

Defaults to None.

forward(x: torch.Tensor)torch.Tensor[源代码]

Forward process.

class mmyolo.models.layers.DarknetBottleneck(in_channels: int, out_channels: int, expansion: float = 0.5, kernel_size: Sequence[int] = (1, 3), padding: Sequence[int] = (0, 1), add_identity: bool = True, use_depthwise: bool = False, conv_cfg: Optional[Union[mmengine.config.config.ConfigDict, dict]] = None, norm_cfg: Union[mmengine.config.config.ConfigDict, dict] = {'eps': 0.001, 'momentum': 0.03, 'type': 'BN'}, act_cfg: Union[mmengine.config.config.ConfigDict, dict] = {'inplace': True, 'type': 'SiLU'}, init_cfg: Optional[Union[mmengine.config.config.ConfigDict, dict, List[Union[dict, mmengine.config.config.ConfigDict]]]] = None)[源代码]

The basic bottleneck block used in Darknet.

Each ResBlock consists of two ConvModules and the input is added to the final output. Each ConvModule is composed of Conv, BN, and LeakyReLU. The first convLayer has filter size of k1Xk1 and the second one has the filter size of k2Xk2.

Note: This DarknetBottleneck is little different from MMDet’s, we can change the kernel size and padding for each conv.

  • in_channels (int) – The input channels of this Module.

  • out_channels (int) – The output channels of this Module.

  • expansion (float) – The kernel size for hidden channel. Defaults to 0.5.

  • kernel_size (Sequence[int]) – The kernel size of the convolution. Defaults to (1, 3).

  • padding (Sequence[int]) – The padding size of the convolution. Defaults to (0, 1).

  • add_identity (bool) – Whether to add identity to the out. Defaults to True

  • use_depthwise (bool) – Whether to use depthwise separable convolution. Defaults to False

  • conv_cfg (dict) – Config dict for convolution layer. Default: None, which means using conv2d.

  • norm_cfg (dict) – Config dict for normalization layer. Defaults to dict(type=’BN’).

  • act_cfg (dict) – Config dict for activation layer. Defaults to dict(type=’Swish’).

class mmyolo.models.layers.EELANBlock(num_elan_block: int, **kwargs)[源代码]

Expand efficient layer aggregation networks for YOLOv7.


num_elan_block (int) – The number of ELANBlock.

forward(x: torch.Tensor)torch.Tensor[源代码]

Defines the computation performed at every call.

Should be overridden by all subclasses.


Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

class mmyolo.models.layers.ELANBlock(in_channels: int, out_channels: int, middle_ratio: float, block_ratio: float, num_blocks: int = 2, num_convs_in_block: int = 1, conv_cfg: Optional[Union[mmengine.config.config.ConfigDict, dict]] = None, norm_cfg: Union[mmengine.config.config.ConfigDict, dict] = {'eps': 0.001, 'momentum': 0.03, 'type': 'BN'}, act_cfg: Union[mmengine.config.config.ConfigDict, dict] = {'inplace': True, 'type': 'SiLU'}, init_cfg: Optional[Union[mmengine.config.config.ConfigDict, dict, List[Union[dict, mmengine.config.config.ConfigDict]]]] = None)[源代码]

Efficient layer aggregation networks for YOLOv7.

  • in_channels (int) – The input channels of this Module.

  • out_channels (int) – The out channels of this Module.

  • middle_ratio (float) – The scaling ratio of the middle layer based on the in_channels.

  • block_ratio (float) – The scaling ratio of the block layer based on the in_channels.

  • num_blocks (int) – The number of blocks in the main branch. Defaults to 2.

  • num_convs_in_block (int) – The number of convs pre block. Defaults to 1.

  • conv_cfg (dict) – Config dict for convolution layer. Defaults to None. which means using conv2d. Defaults to None.

  • norm_cfg (dict) – Config dict for normalization layer. Defaults to dict(type=’BN’, momentum=0.03, eps=0.001).

  • act_cfg (dict) – Config dict for activation layer. Defaults to dict(type=’SiLU’, inplace=True).

  • init_cfg (dict or list[dict], optional) – Initialization config dict. Defaults to None.

forward(x: torch.Tensor)torch.Tensor[源代码]

Forward process :param x: The input tensor. :type x: Tensor

class mmyolo.models.layers.EffectiveSELayer(channels: int, act_cfg: Union[mmengine.config.config.ConfigDict, dict] = {'type': 'HSigmoid'})[源代码]

Effective Squeeze-Excitation.

From CenterMask : Real-Time Anchor-Free Instance Segmentation arxiv ( This code referenced to # noqa

  • channels (int) – The input and output channels of this Module.

  • act_cfg (dict) – Config dict for activation layer. Defaults to dict(type=’HSigmoid’).

forward(x: torch.Tensor)torch.Tensor[源代码]

Forward process :param x: The input tensor. :type x: Tensor

class mmyolo.models.layers.ExpMomentumEMA(model: torch.nn.modules.module.Module, momentum: float = 0.0002, gamma: int = 2000, interval=1, device: Optional[torch.device] = None, update_buffers: bool = False)[源代码]

Exponential moving average (EMA) with exponential momentum strategy, which is used in YOLO.

  • model (nn.Module) – The model to be averaged.

  • momentum (float) –

    The momentum used for updating ema parameter.

    Ema’s parameters are updated with the formula:

    averaged_param = (1-momentum) * averaged_param + momentum * source_param. Defaults to 0.0002.

  • gamma (int) – Use a larger momentum early in training and gradually annealing to a smaller value to update the ema model smoothly. The momentum is calculated as (1 - momentum) * exp(-(1 + steps) / gamma) + momentum. Defaults to 2000.

  • interval (int) – Interval between two updates. Defaults to 1.

  • device (torch.device, optional) – If provided, the averaged model will be stored on the device. Defaults to None.

  • update_buffers (bool) – if True, it will compute running averages for both the parameters and the buffers of the model. Defaults to False.

avg_func(averaged_param: torch.Tensor, source_param: torch.Tensor, steps: int)[源代码]

Compute the moving average of the parameters using the exponential momentum strategy.

  • averaged_param (Tensor) – The averaged parameters.

  • source_param (Tensor) – The source parameters.

  • steps (int) – The number of times the parameters have been updated.

update_parameters(model: torch.nn.modules.module.Module)[源代码]

Update the parameters after each training step.


model (nn.Module) – The model of the parameter needs to be updated.

class mmyolo.models.layers.ImplicitA(in_channels: int, mean: float = 0.0, std: float = 0.02)[源代码]

Implicit add layer in YOLOv7.

  • in_channels (int) – The input channels of this Module.

  • mean (float) – Mean value of implicit module. Defaults to 0.

  • std (float) – Std value of implicit module. Defaults to 0.02


Forward process :param x: The input tensor. :type x: Tensor

class mmyolo.models.layers.ImplicitM(in_channels: int, mean: float = 1.0, std: float = 0.02)[源代码]

Implicit multiplier layer in YOLOv7.

  • in_channels (int) – The input channels of this Module.

  • mean (float) – Mean value of implicit module. Defaults to 1.

  • std (float) – Std value of implicit module. Defaults to 0.02.


Forward process :param x: The input tensor. :type x: Tensor

class mmyolo.models.layers.MaxPoolAndStrideConvBlock(in_channels: int, out_channels: int, maxpool_kernel_sizes: int = 2, use_in_channels_of_middle: bool = False, conv_cfg: Optional[Union[mmengine.config.config.ConfigDict, dict]] = None, norm_cfg: Union[mmengine.config.config.ConfigDict, dict] = {'eps': 0.001, 'momentum': 0.03, 'type': 'BN'}, act_cfg: Union[mmengine.config.config.ConfigDict, dict] = {'inplace': True, 'type': 'SiLU'}, init_cfg: Optional[Union[mmengine.config.config.ConfigDict, dict, List[Union[dict, mmengine.config.config.ConfigDict]]]] = None)[源代码]

Max pooling and stride conv layer for YOLOv7.

  • in_channels (int) – The input channels of this Module.

  • out_channels (int) – The out channels of this Module.

  • maxpool_kernel_sizes (int) – kernel sizes of pooling layers. Defaults to 2.

  • use_in_channels_of_middle (bool) – Whether to calculate middle channels based on in_channels. Defaults to False.

  • conv_cfg (dict) – Config dict for convolution layer. Defaults to None. which means using conv2d. Defaults to None.

  • norm_cfg (dict) – Config dict for normalization layer. Defaults to dict(type=’BN’, momentum=0.03, eps=0.001).

  • act_cfg (dict) – Config dict for activation layer. Defaults to dict(type=’SiLU’, inplace=True).

  • init_cfg (dict or list[dict], optional) – Initialization config dict. Defaults to None.

forward(x: torch.Tensor)torch.Tensor[源代码]

Forward process :param x: The input tensor. :type x: Tensor

class mmyolo.models.layers.PPYOLOEBasicBlock(in_channels: int, out_channels: int, norm_cfg: Union[mmengine.config.config.ConfigDict, dict] = {'eps': 1e-05, 'momentum': 0.1, 'type': 'BN'}, act_cfg: Union[mmengine.config.config.ConfigDict, dict] = {'inplace': True, 'type': 'SiLU'}, shortcut: bool = True, use_alpha: bool = False)[源代码]

PPYOLOE Backbone BasicBlock.

  • in_channels (int) – The input channels of this Module.

  • out_channels (int) – The output channels of this Module.

  • norm_cfg (dict) – Config dict for normalization layer. Defaults to dict(type=’BN’, momentum=0.1, eps=1e-5).

  • act_cfg (dict) – Config dict for activation layer. Defaults to dict(type=’SiLU’, inplace=True).

  • shortcut (bool) – Whether to add inputs and outputs together

  • the end of this layer. Defaults to True. (at) –

  • use_alpha (bool) – Whether to use alpha parameter at 1x1 conv.

forward(x: torch.Tensor)torch.Tensor[源代码]

Forward process. :param inputs: The input tensor. :type inputs: Tensor


The output tensor.



class mmyolo.models.layers.RepStageBlock(in_channels: int, out_channels: int, num_blocks: int = 1, bottle_block: torch.nn.modules.module.Module = <class 'mmyolo.models.layers.yolo_bricks.RepVGGBlock'>, block_cfg: Union[mmengine.config.config.ConfigDict, dict] = {'type': 'RepVGGBlock'})[源代码]

RepStageBlock is a stage block with rep-style basic block.

  • in_channels (int) – The input channels of this Module.

  • out_channels (int) – The output channels of this Module.

  • num_blocks (int, tuple[int]) – Number of blocks. Defaults to 1.

  • bottle_block (nn.Module) – Basic unit of RepStage. Defaults to RepVGGBlock.

  • block_cfg (ConfigType) – Config of RepStage. Defaults to ‘RepVGGBlock’.

forward(x: torch.Tensor)torch.Tensor[源代码]

Forward process.


x (Tensor) – The input tensor.


The output tensor.



class mmyolo.models.layers.RepVGGBlock(in_channels: int, out_channels: int, kernel_size: Union[int, Tuple[int]] = 3, stride: Union[int, Tuple[int]] = 1, padding: Union[int, Tuple[int]] = 1, dilation: Union[int, Tuple[int]] = 1, groups: Optional[int] = 1, padding_mode: Optional[str] = 'zeros', norm_cfg: Union[mmengine.config.config.ConfigDict, dict] = {'eps': 0.001, 'momentum': 0.03, 'type': 'BN'}, act_cfg: Union[mmengine.config.config.ConfigDict, dict] = {'inplace': True, 'type': 'ReLU'}, use_se: bool = False, use_alpha: bool = False, use_bn_first=True, deploy: bool = False)[源代码]

RepVGGBlock is a basic rep-style block, including training and deploy status This code is based on

  • in_channels (int) – Number of channels in the input image

  • out_channels (int) – Number of channels produced by the convolution

  • kernel_size (int or tuple) – Size of the convolving kernel

  • stride (int or tuple) – Stride of the convolution. Default: 1

  • padding (int, tuple) – Padding added to all four sides of the input. Default: 1

  • dilation (int or tuple) – Spacing between kernel elements. Default: 1

  • groups (int, optional) – Number of blocked connections from input channels to output channels. Default: 1

  • padding_mode (string, optional) – Default: ‘zeros’

  • use_se (bool) – Whether to use se. Default: False

  • use_alpha (bool) – Whether to use alpha parameter at 1x1 conv. In PPYOLOE+ model backbone, use_alpha will be set to True. Default: False.

  • use_bn_first (bool) – Whether to use bn layer before conv. In YOLOv6 and YOLOv7, this will be set to True. In PPYOLOE, this will be set to False. Default: True.

  • deploy (bool) – Whether in deploy mode. Default: False

forward(inputs: torch.Tensor)torch.Tensor[源代码]

Forward process. :param inputs: The input tensor. :type inputs: Tensor


The output tensor.




Derives the equivalent kernel and bias in a differentiable way.


Equivalent kernel and bias




Switch to deploy mode.

class mmyolo.models.layers.SPPFBottleneck(in_channels: int, out_channels: int, kernel_sizes: Union[int, Sequence[int]] = 5, use_conv_first: bool = True, mid_channels_scale: float = 0.5, conv_cfg: Optional[Union[mmengine.config.config.ConfigDict, dict]] = None, norm_cfg: Union[mmengine.config.config.ConfigDict, dict] = {'eps': 0.001, 'momentum': 0.03, 'type': 'BN'}, act_cfg: Union[mmengine.config.config.ConfigDict, dict] = {'inplace': True, 'type': 'SiLU'}, init_cfg: Optional[Union[mmengine.config.config.ConfigDict, dict, List[Union[dict, mmengine.config.config.ConfigDict]]]] = None)[源代码]

Spatial pyramid pooling - Fast (SPPF) layer for YOLOv5, YOLOX and PPYOLOE by Glenn Jocher

  • in_channels (int) – The input channels of this Module.

  • out_channels (int) – The output channels of this Module.

  • kernel_sizes (int, tuple[int]) – Sequential or number of kernel sizes of pooling layers. Defaults to 5.

  • use_conv_first (bool) – Whether to use conv before pooling layer. In YOLOv5 and YOLOX, the para set to True. In PPYOLOE, the para set to False. Defaults to True.

  • mid_channels_scale (float) – Channel multiplier, multiply in_channels by this amount to get mid_channels. This parameter is valid only when use_conv_fist=True.Defaults to 0.5.

  • conv_cfg (dict) – Config dict for convolution layer. Defaults to None. which means using conv2d. Defaults to None.

  • norm_cfg (dict) – Config dict for normalization layer. Defaults to dict(type=’BN’, momentum=0.03, eps=0.001).

  • act_cfg (dict) – Config dict for activation layer. Defaults to dict(type=’SiLU’, inplace=True).

  • init_cfg (dict or list[dict], optional) – Initialization config dict. Defaults to None.

forward(x: torch.Tensor)torch.Tensor[源代码]

Forward process :param x: The input tensor. :type x: Tensor

class mmyolo.models.layers.SPPFCSPBlock(in_channels: int, out_channels: int, expand_ratio: float = 0.5, kernel_sizes: Union[int, Sequence[int]] = 5, is_tiny_version: bool = False, conv_cfg: Optional[Union[mmengine.config.config.ConfigDict, dict]] = None, norm_cfg: Union[mmengine.config.config.ConfigDict, dict] = {'eps': 0.001, 'momentum': 0.03, 'type': 'BN'}, act_cfg: Union[mmengine.config.config.ConfigDict, dict] = {'inplace': True, 'type': 'SiLU'}, init_cfg: Optional[Union[mmengine.config.config.ConfigDict, dict, List[Union[dict, mmengine.config.config.ConfigDict]]]] = None)[源代码]

Spatial pyramid pooling - Fast (SPPF) layer with CSP for YOLOv7

  • in_channels (int) – The input channels of this Module.

  • out_channels (int) – The output channels of this Module.

  • expand_ratio (float) – Expand ratio of SPPCSPBlock. Defaults to 0.5.

  • kernel_sizes (int, tuple[int]) – Sequential or number of kernel sizes of pooling layers. Defaults to 5.

  • is_tiny_version (bool) – Is tiny version of SPPFCSPBlock. If True, it means it is a yolov7 tiny model. Defaults to False.

  • conv_cfg (dict) – Config dict for convolution layer. Defaults to None. which means using conv2d. Defaults to None.

  • norm_cfg (dict) – Config dict for normalization layer. Defaults to dict(type=’BN’, momentum=0.03, eps=0.001).

  • act_cfg (dict) – Config dict for activation layer. Defaults to dict(type=’SiLU’, inplace=True).

  • init_cfg (dict or list[dict], optional) – Initialization config dict. Defaults to None.


Forward process :param x: The input tensor. :type x: Tensor

class mmyolo.models.layers.TinyDownSampleBlock(in_channels: int, out_channels: int, middle_ratio: float = 1.0, kernel_sizes: Union[int, Sequence[int]] = 3, conv_cfg: Optional[Union[mmengine.config.config.ConfigDict, dict]] = None, norm_cfg: Union[mmengine.config.config.ConfigDict, dict] = {'eps': 0.001, 'momentum': 0.03, 'type': 'BN'}, act_cfg: Union[mmengine.config.config.ConfigDict, dict] = {'negative_slope': 0.1, 'type': 'LeakyReLU'}, init_cfg: Optional[Union[mmengine.config.config.ConfigDict, dict, List[Union[dict, mmengine.config.config.ConfigDict]]]] = None)[源代码]

Down sample layer for YOLOv7-tiny.

  • in_channels (int) – The input channels of this Module.

  • out_channels (int) – The out channels of this Module.

  • middle_ratio (float) – The scaling ratio of the middle layer based on the in_channels. Defaults to 1.0.

  • kernel_sizes (int, tuple[int]) – Sequential or number of kernel sizes of pooling layers. Defaults to 3.

  • conv_cfg (dict) – Config dict for convolution layer. Defaults to None. which means using conv2d. Defaults to None.

  • norm_cfg (dict) – Config dict for normalization layer. Defaults to dict(type=’BN’, momentum=0.03, eps=0.001).

  • act_cfg (dict) – Config dict for activation layer. Defaults to dict(type=’LeakyReLU’, negative_slope=0.1).

  • init_cfg (dict or list[dict], optional) – Initialization config dict. Defaults to None.


Defines the computation performed at every call.

Should be overridden by all subclasses.


Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.


class mmyolo.models.losses.IoULoss(iou_mode: str = 'ciou', bbox_format: str = 'xywh', eps: float = 1e-07, reduction: str = 'mean', loss_weight: float = 1.0, return_iou: bool = True)[源代码]


Computing the IoU loss between a set of predicted bboxes and target bboxes. :param iou_mode: Options are “ciou”.

Defaults to “ciou”.

  • bbox_format (str) – Options are “xywh” and “xyxy”. Defaults to “xywh”.

  • eps (float) – Eps to avoid log(0).

  • reduction (str) – Options are “none”, “mean” and “sum”.

  • loss_weight (float) – Weight of loss.

  • return_iou (bool) – If True, return loss and iou.

forward(pred: torch.Tensor, target: torch.Tensor, weight: Optional[torch.Tensor] = None, avg_factor: Optional[float] = None, reduction_override: Optional[Union[str, bool]] = None)Tuple[torch.Tensor, torch.Tensor][源代码]

Forward function.

  • pred (Tensor) – Predicted bboxes of format (x1, y1, x2, y2) or (x, y, w, h),shape (n, 4).

  • target (Tensor) – Corresponding gt bboxes, shape (n, 4).

  • weight (Tensor, optional) – Element-wise weights.

  • avg_factor (float, optional) – Average factor when computing the mean of losses.

  • reduction_override (str, bool, optional) – Same as built-in losses of PyTorch. Defaults to None.



loss or tuple(loss, iou)

class mmyolo.models.losses.OksLoss(metainfo: Optional[str] = None, loss_weight: float = 1.0)[源代码]

A PyTorch implementation of the Object Keypoint Similarity (OKS) loss as described in the paper “YOLO-Pose: Enhancing YOLO for Multi Person Pose Estimation Using Object Keypoint Similarity Loss” by Debapriya et al.

(2022). The OKS loss is used for keypoint-based object recognition and consists of a measure of the similarity between predicted and ground truth keypoint locations, adjusted by the size of the object in the image. The loss function takes as input the predicted keypoint locations, the ground truth keypoint locations, a mask indicating which keypoints are valid, and bounding boxes for the objects. :param metainfo: Path to a JSON file containing information

about the dataset’s annotations.


loss_weight (float) – Weight for the loss.

compute_oks(output: torch.Tensor, target: torch.Tensor, target_weights: torch.Tensor, bboxes: Optional[torch.Tensor] = None)torch.Tensor[源代码]

Calculates the OKS loss.

  • output (Tensor) – Predicted keypoints in shape N x k x 2, where N is batch size, k is the number of keypoints, and 2 are the xy coordinates.

  • target (Tensor) – Ground truth keypoints in the same shape as output.

  • target_weights (Tensor) – Mask of valid keypoints in shape N x k, with 1 for valid and 0 for invalid.

  • bboxes (Optional[Tensor]) – Bounding boxes in shape N x 4, where 4 are the xyxy coordinates.


The calculated OKS loss.



forward(output: torch.Tensor, target: torch.Tensor, target_weights: torch.Tensor, bboxes: Optional[torch.Tensor] = None)torch.Tensor[源代码]

Defines the computation performed at every call.

Should be overridden by all subclasses.


Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

mmyolo.models.losses.bbox_overlaps(pred: torch.Tensor, target: torch.Tensor, iou_mode: str = 'ciou', bbox_format: str = 'xywh', siou_theta: float = 4.0, eps: float = 1e-07)torch.Tensor[源代码]

Calculate overlap between two set of bboxes. Implementation of paper `Enhancing Geometric Factors into Model Learning and Inference for Object Detection and Instance Segmentation.

In the CIoU implementation of YOLOv5 and MMDetection, there is a slight difference in the way the alpha parameter is computed.

mmdet version:

alpha = (ious > 0.5).float() * v / (1 - ious + v)

YOLOv5 version:

alpha = v / (v - ious + (1 + eps)

  • pred (Tensor) – Predicted bboxes of format (x1, y1, x2, y2) or (x, y, w, h),shape (n, 4).

  • target (Tensor) – Corresponding gt bboxes, shape (n, 4).

  • iou_mode (str) – Options are (‘iou’, ‘ciou’, ‘giou’, ‘siou’). Defaults to “ciou”.

  • bbox_format (str) – Options are “xywh” and “xyxy”. Defaults to “xywh”.

  • siou_theta (float) – siou_theta for SIoU when calculate shape cost. Defaults to 4.0.

  • eps (float) – Eps to avoid log(0).


shape (n, ).




class mmyolo.models.necks.BaseYOLONeck(in_channels: List[int], out_channels: Union[int, List[int]], deepen_factor: float = 1.0, widen_factor: float = 1.0, upsample_feats_cat_first: bool = True, freeze_all: bool = False, norm_cfg: Optional[Union[mmengine.config.config.ConfigDict, dict]] = None, act_cfg: Optional[Union[mmengine.config.config.ConfigDict, dict]] = None, init_cfg: Optional[Union[mmengine.config.config.ConfigDict, dict, List[Union[dict, mmengine.config.config.ConfigDict]]]] = None, **kwargs)[源代码]

Base neck used in YOLO series.

P5 neck model structure diagram
                   +--------+                     +-------+
                   |top_down|----------+--------->|  out  |---> output0
                   | layer1 |          |          | layer0|
                   +--------+          |          +-------+
stride=8                ^              |
idx=0  +------+    +--------+          |
-----> |reduce|--->|   cat  |          |
       |layer0|    +--------+          |
       +------+         ^              v
                   +--------+    +-----------+
                   |upsample|    |downsample |
                   | layer1 |    |  layer0   |
                   +--------+    +-----------+
                        ^              |
                   +--------+          v
                   |top_down|    +-----------+
                   | layer2 |--->|    cat    |
                   +--------+    +-----------+
stride=16               ^              v
idx=1  +------+    +--------+    +-----------+    +-------+
-----> |reduce|--->|   cat  |    | bottom_up |--->|  out  |---> output1
       |layer1|    +--------+    |   layer0  |    | layer1|
       +------+         ^        +-----------+    +-------+
                        |              v
                   +--------+    +-----------+
                   |upsample|    |downsample |
                   | layer2 |    |  layer1   |
stride=32          +--------+    +-----------+
idx=2  +------+         ^              v
-----> |reduce|         |        +-----------+
       |layer2|---------+------->|    cat    |
       +------+                  +-----------+
                                 +-----------+    +-------+
                                 | bottom_up |--->|  out  |---> output2
                                 |  layer1   |    | layer2|
                                 +-----------+    +-------+
P6 neck model structure diagram
                   +--------+                     +-------+
                   |top_down|----------+--------->|  out  |---> output0
                   | layer1 |          |          | layer0|
                   +--------+          |          +-------+
stride=8                ^              |
idx=0  +------+    +--------+          |
-----> |reduce|--->|   cat  |          |
       |layer0|    +--------+          |
       +------+         ^              v
                   +--------+    +-----------+
                   |upsample|    |downsample |
                   | layer1 |    |  layer0   |
                   +--------+    +-----------+
                        ^              |
                   +--------+          v
                   |top_down|    +-----------+
                   | layer2 |--->|    cat    |
                   +--------+    +-----------+
stride=16               ^              v
idx=1  +------+    +--------+    +-----------+    +-------+
-----> |reduce|--->|   cat  |    | bottom_up |--->|  out  |---> output1
       |layer1|    +--------+    |   layer0  |    | layer1|
       +------+         ^        +-----------+    +-------+
                        |              v
                   +--------+    +-----------+
                   |upsample|    |downsample |
                   | layer2 |    |  layer1   |
                   +--------+    +-----------+
                        ^              |
                   +--------+          v
                   |top_down|    +-----------+
                   | layer3 |--->|    cat    |
                   +--------+    +-----------+
stride=32               ^              v
idx=2  +------+    +--------+    +-----------+    +-------+
-----> |reduce|--->|   cat  |    | bottom_up |--->|  out  |---> output2
       |layer2|    +--------+    |   layer1  |    | layer2|
       +------+         ^        +-----------+    +-------+
                        |              v
                   +--------+    +-----------+
                   |upsample|    |downsample |
                   | layer3 |    |  layer2   |
                   +--------+    +-----------+
stride=64               ^              v
idx=3  +------+         |        +-----------+
-----> |reduce|---------+------->|    cat    |
       |layer3|                  +-----------+
       +------+                        v
                                 +-----------+    +-------+
                                 | bottom_up |--->|  out  |---> output3
                                 |  layer2   |    | layer3|
                                 +-----------+    +-------+
  • in_channels (List[int]) – Number of input channels per scale.

  • out_channels (int) – Number of output channels (used at each scale)

  • deepen_factor (float) – Depth multiplier, multiply number of blocks in CSP layer by this amount. Defaults to 1.0.

  • widen_factor (float) – Width multiplier, multiply number of channels in each layer by this amount. Defaults to 1.0.

  • upsample_feats_cat_first (bool) – Whether the output features are concat first after upsampling in the topdown module. Defaults to True. Currently only YOLOv7 is false.

  • freeze_all (bool) – Whether to freeze the model. Defaults to False

  • norm_cfg (dict) – Config dict for normalization layer. Defaults to None.

  • act_cfg (dict) – Config dict for activation layer. Defaults to None.

  • init_cfg (dict or list[dict], optional) – Initialization config dict. Defaults to None.

abstract build_bottom_up_layer(idx: int)[源代码]

build bottom up layer.

abstract build_downsample_layer(idx: int)[源代码]

build downsample layer.

abstract build_out_layer(idx: int)[源代码]

build out layer.

abstract build_reduce_layer(idx: int)[源代码]

build reduce layer.

abstract build_top_down_layer(idx: int)[源代码]

build top down layer.

abstract build_upsample_layer(idx: int)[源代码]

build upsample layer.

forward(inputs: List[torch.Tensor])tuple[源代码]

Forward function.


Convert the model into training mode while keep the normalization layer freezed.

class mmyolo.models.necks.CSPNeXtPAFPN(in_channels: Sequence[int], out_channels: int, deepen_factor: float = 1.0, widen_factor: float = 1.0, num_csp_blocks: int = 3, freeze_all: bool = False, use_depthwise: bool = False, expand_ratio: float = 0.5, upsample_cfg: Union[mmengine.config.config.ConfigDict, dict] = {'mode': 'nearest', 'scale_factor': 2}, conv_cfg: Optional[bool] = None, norm_cfg: Union[mmengine.config.config.ConfigDict, dict] = {'type': 'BN'}, act_cfg: Union[mmengine.config.config.ConfigDict, dict] = {'inplace': True, 'type': 'SiLU'}, init_cfg: Optional[Union[mmengine.config.config.ConfigDict, dict, List[Union[dict, mmengine.config.config.ConfigDict]]]] = {'a': 2.23606797749979, 'distribution': 'uniform', 'layer': 'Conv2d', 'mode': 'fan_in', 'nonlinearity': 'leaky_relu', 'type': 'Kaiming'})[源代码]

Path Aggregation Network with CSPNeXt blocks.

  • in_channels (Sequence[int]) – Number of input channels per scale.

  • out_channels (int) – Number of output channels (used at each scale)

  • deepen_factor (float) – Depth multiplier, multiply number of blocks in CSP layer by this amount. Defaults to 1.0.

  • widen_factor (float) – Width multiplier, multiply number of channels in each layer by this amount. Defaults to 1.0.

  • num_csp_blocks (int) – Number of bottlenecks in CSPLayer. Defaults to 3.

  • use_depthwise (bool) – Whether to use depthwise separable convolution in blocks. Defaults to False.

  • expand_ratio (float) – Ratio to adjust the number of channels of the hidden layer. Defaults to 0.5.

  • upsample_cfg (dict) – Config dict for interpolate layer. Default: dict(scale_factor=2, mode=’nearest’)

  • conv_cfg (dict, optional) – Config dict for convolution layer. Default: None, which means using conv2d.

  • norm_cfg (dict) – Config dict for normalization layer. Default: dict(type=’BN’)

  • act_cfg (dict) – Config dict for activation layer. Default: dict(type=’SiLU’, inplace=True)

  • init_cfg (dict or list[dict], optional) – Initialization config dict. Default: None.

build_bottom_up_layer(idx: int)torch.nn.modules.module.Module[源代码]

build bottom up layer.


idx (int) – layer idx.


The bottom up layer.



build_downsample_layer(idx: int)torch.nn.modules.module.Module[源代码]

build downsample layer.


idx (int) – layer idx.


The downsample layer.



build_out_layer(idx: int)torch.nn.modules.module.Module[源代码]

build out layer.


idx (int) – layer idx.


The out layer.



build_reduce_layer(idx: int)torch.nn.modules.module.Module[源代码]

build reduce layer.


idx (int) – layer idx.


The reduce layer.



build_top_down_layer(idx: int)torch.nn.modules.module.Module[源代码]

build top down layer.


idx (int) – layer idx.


The top down layer.



build_upsample_layer(*args, **kwargs)torch.nn.modules.module.Module[源代码]

build upsample layer.

class mmyolo.models.necks.PPYOLOECSPPAFPN(in_channels: List[int] = [256, 512, 1024], out_channels: List[int] = [256, 512, 1024], deepen_factor: float = 1.0, widen_factor: float = 1.0, freeze_all: bool = False, num_csplayer: int = 1, num_blocks_per_layer: int = 3, block_cfg: Union[mmengine.config.config.ConfigDict, dict] = {'shortcut': False, 'type': 'PPYOLOEBasicBlock', 'use_alpha': False}, norm_cfg: Union[mmengine.config.config.ConfigDict, dict] = {'eps': 1e-05, 'momentum': 0.1, 'type': 'BN'}, act_cfg: Union[mmengine.config.config.ConfigDict, dict] = {'inplace': True, 'type': 'SiLU'}, drop_block_cfg: Optional[Union[mmengine.config.config.ConfigDict, dict]] = None, init_cfg: Optional[Union[mmengine.config.config.ConfigDict, dict, List[Union[dict, mmengine.config.config.ConfigDict]]]] = None, use_spp: bool = False)[源代码]


  • in_channels (List[int]) – Number of input channels per scale.

  • out_channels (List[int]) – Number of output channels (used at each scale).

  • deepen_factor (float) – Depth multiplier, multiply number of blocks in CSP layer by this amount. Defaults to 1.0.

  • widen_factor (float) – Width multiplier, multiply number of channels in each layer by this amount. Defaults to 1.0.

  • freeze_all (bool) – Whether to freeze the model.

  • num_csplayer (int) – Number of CSPResLayer in per layer. Defaults to 1.

  • num_blocks_per_layer (int) – Number of blocks per CSPResLayer. Defaults to 3.

  • block_cfg (dict) – Config dict for block. Defaults to dict(type=’PPYOLOEBasicBlock’, shortcut=True, use_alpha=False)

  • norm_cfg (dict) – Config dict for normalization layer. Defaults to dict(type=’BN’, momentum=0.1, eps=1e-5).

  • act_cfg (dict) – Config dict for activation layer. Defaults to dict(type=’SiLU’, inplace=True).

  • drop_block_cfg (dict, optional) – Drop block config. Defaults to None. If you want to use Drop block after CSPResLayer, you can set this para as dict(type=’mmdet.DropBlock’, drop_prob=0.1, block_size=3, warm_iters=0).

  • init_cfg (dict or list[dict], optional) – Initialization config dict. Defaults to None.

  • use_spp (bool) – Whether to use SPP in reduce layer. Defaults to False.

build_bottom_up_layer(idx: int)torch.nn.modules.module.Module[源代码]

build bottom up layer.


idx (int) – layer idx.


The bottom up layer.



build_downsample_layer(idx: int)torch.nn.modules.module.Module[源代码]

build downsample layer.


idx (int) – layer idx.


The downsample layer.



build_out_layer(*args, **kwargs)torch.nn.modules.module.Module[源代码]

build out layer.

build_reduce_layer(idx: int)[源代码]

build reduce layer.


idx (int) – layer idx.


The reduce layer.



build_top_down_layer(idx: int)torch.nn.modules.module.Module[源代码]

build top down layer.


idx (int) – layer idx.


The top down layer.



build_upsample_layer(idx: int)torch.nn.modules.module.Module[源代码]

build upsample layer.

class mmyolo.models.necks.YOLOXPAFPN(in_channels: List[int], out_channels: int, deepen_factor: float = 1.0, widen_factor: float = 1.0, num_csp_blocks: int = 3, use_depthwise: bool = False, freeze_all: bool = False, norm_cfg: Union[mmengine.config.config.ConfigDict, dict] = {'eps': 0.001, 'momentum': 0.03, 'type': 'BN'}, act_cfg: Union[mmengine.config.config.ConfigDict, dict] = {'inplace': True, 'type': 'SiLU'}, init_cfg: Optional[Union[mmengine.config.config.ConfigDict, dict, List[Union[dict, mmengine.config.config.ConfigDict]]]] = None)[源代码]

Path Aggregation Network used in YOLOX.

  • in_channels (List[int]) – Number of input channels per scale.

  • out_channels (int) – Number of output channels (used at each scale).

  • deepen_factor (float) – Depth multiplier, multiply number of blocks in CSP layer by this amount. Defaults to 1.0.

  • widen_factor (float) – Width multiplier, multiply number of channels in each layer by this amount. Defaults to 1.0.

  • num_csp_blocks (int) – Number of bottlenecks in CSPLayer. Defaults to 1.

  • use_depthwise (bool) – Whether to use depthwise separable convolution. Defaults to False.

  • freeze_all (bool) – Whether to freeze the model. Defaults to False.

  • norm_cfg (dict) – Config dict for normalization layer. Defaults to dict(type=’BN’, momentum=0.03, eps=0.001).

  • act_cfg (dict) – Config dict for activation layer. Defaults to dict(type=’SiLU’, inplace=True).

  • init_cfg (dict or list[dict], optional) – Initialization config dict. Defaults to None.

build_bottom_up_layer(idx: int)torch.nn.modules.module.Module[源代码]

build bottom up layer.


idx (int) – layer idx.


The bottom up layer.



build_downsample_layer(idx: int)torch.nn.modules.module.Module[源代码]

build downsample layer.


idx (int) – layer idx.


The downsample layer.



build_out_layer(idx: int)torch.nn.modules.module.Module[源代码]

build out layer.


idx (int) – layer idx.


The out layer.



build_reduce_layer(idx: int)torch.nn.modules.module.Module[源代码]

build reduce layer.


idx (int) – layer idx.


The reduce layer.



build_top_down_layer(idx: int)torch.nn.modules.module.Module[源代码]

build top down layer.


idx (int) – layer idx.


The top down layer.



build_upsample_layer(*args, **kwargs)torch.nn.modules.module.Module[源代码]

build upsample layer.

class mmyolo.models.necks.YOLOv5PAFPN(in_channels: List[int], out_channels: Union[List[int], int], deepen_factor: float = 1.0, widen_factor: float = 1.0, num_csp_blocks: int = 1, freeze_all: bool = False, norm_cfg: Union[mmengine.config.config.ConfigDict, dict] = {'eps': 0.001, 'momentum': 0.03, 'type': 'BN'}, act_cfg: Union[mmengine.config.config.ConfigDict, dict] = {'inplace': True, 'type': 'SiLU'}, init_cfg: Optional[Union[mmengine.config.config.ConfigDict, dict, List[Union[dict, mmengine.config.config.ConfigDict]]]] = None)[源代码]

Path Aggregation Network used in YOLOv5.

  • in_channels (List[int]) – Number of input channels per scale.

  • out_channels (int) – Number of output channels (used at each scale)

  • deepen_factor (float) – Depth multiplier, multiply number of blocks in CSP layer by this amount. Defaults to 1.0.

  • widen_factor (float) – Width multiplier, multiply number of channels in each layer by this amount. Defaults to 1.0.

  • num_csp_blocks (int) – Number of bottlenecks in CSPLayer. Defaults to 1.

  • freeze_all (bool) – Whether to freeze the model

  • norm_cfg (dict) – Config dict for normalization layer. Defaults to dict(type=’BN’, momentum=0.03, eps=0.001).

  • act_cfg (dict) – Config dict for activation layer. Defaults to dict(type=’SiLU’, inplace=True).

  • init_cfg (dict or list[dict], optional) – Initialization config dict. Defaults to None.

build_bottom_up_layer(idx: int)torch.nn.modules.module.Module[源代码]

build bottom up layer.


idx (int) – layer idx.


The bottom up layer.



build_downsample_layer(idx: int)torch.nn.modules.module.Module[源代码]

build downsample layer.


idx (int) – layer idx.


The downsample layer.



build_out_layer(*args, **kwargs)torch.nn.modules.module.Module[源代码]

build out layer.

build_reduce_layer(idx: int)torch.nn.modules.module.Module[源代码]

build reduce layer.


idx (int) – layer idx.


The reduce layer.



build_top_down_layer(idx: int)[源代码]

build top down layer.


idx (int) – layer idx.


The top down layer.



build_upsample_layer(*args, **kwargs)torch.nn.modules.module.Module[源代码]

build upsample layer.


Initialize the weights.

class mmyolo.models.necks.YOLOv6CSPRepBiPAFPN(in_channels: List[int], out_channels: int, deepen_factor: float = 1.0, widen_factor: float = 1.0, hidden_ratio: float = 0.5, num_csp_blocks: int = 12, freeze_all: bool = False, norm_cfg: Union[mmengine.config.config.ConfigDict, dict] = {'eps': 0.001, 'momentum': 0.03, 'type': 'BN'}, act_cfg: Union[mmengine.config.config.ConfigDict, dict] = {'inplace': True, 'type': 'ReLU'}, block_act_cfg: Union[mmengine.config.config.ConfigDict, dict] = {'inplace': True, 'type': 'SiLU'}, block_cfg: Union[mmengine.config.config.ConfigDict, dict] = {'type': 'RepVGGBlock'}, init_cfg: Optional[Union[mmengine.config.config.ConfigDict, dict, List[Union[dict, mmengine.config.config.ConfigDict]]]] = None)[源代码]

Path Aggregation Network used in YOLOv6 3.0.

  • in_channels (List[int]) – Number of input channels per scale.

  • out_channels (int) – Number of output channels (used at each scale)

  • deepen_factor (float) – Depth multiplier, multiply number of blocks in CSP layer by this amount. Defaults to 1.0.

  • widen_factor (float) – Width multiplier, multiply number of channels in each layer by this amount. Defaults to 1.0.

  • num_csp_blocks (int) – Number of bottlenecks in CSPLayer. Defaults to 1.

  • freeze_all (bool) – Whether to freeze the model.

  • norm_cfg (dict) – Config dict for normalization layer. Defaults to dict(type=’BN’, momentum=0.03, eps=0.001).

  • act_cfg (dict) – Config dict for activation layer. Defaults to dict(type=’ReLU’, inplace=True).

  • block_cfg (dict) – Config dict for the block used to build each layer. Defaults to dict(type=’RepVGGBlock’).

  • block_act_cfg (dict) – Config dict for activation layer used in each stage. Defaults to dict(type=’SiLU’, inplace=True).

  • init_cfg (dict or list[dict], optional) – Initialization config dict. Defaults to None.

build_bottom_up_layer(idx: int)torch.nn.modules.module.Module[源代码]

build bottom up layer.


idx (int) – layer idx.


The bottom up layer.



build_top_down_layer(idx: int)torch.nn.modules.module.Module[源代码]

build top down layer.


idx (int) – layer idx.


The top down layer.



class mmyolo.models.necks.YOLOv6CSPRepPAFPN(in_channels: List[int], out_channels: int, deepen_factor: float = 1.0, widen_factor: float = 1.0, hidden_ratio: float = 0.5, num_csp_blocks: int = 12, freeze_all: bool = False, norm_cfg: Union[mmengine.config.config.ConfigDict, dict] = {'eps': 0.001, 'momentum': 0.03, 'type': 'BN'}, act_cfg: Union[mmengine.config.config.ConfigDict, dict] = {'inplace': True, 'type': 'ReLU'}, block_act_cfg: Union[mmengine.config.config.ConfigDict, dict] = {'inplace': True, 'type': 'SiLU'}, block_cfg: Union[mmengine.config.config.ConfigDict, dict] = {'type': 'RepVGGBlock'}, init_cfg: Optional[Union[mmengine.config.config.ConfigDict, dict, List[Union[dict, mmengine.config.config.ConfigDict]]]] = None)[源代码]

Path Aggregation Network used in YOLOv6.

  • in_channels (List[int]) – Number of input channels per scale.

  • out_channels (int) – Number of output channels (used at each scale)

  • deepen_factor (float) – Depth multiplier, multiply number of blocks in CSP layer by this amount. Defaults to 1.0.

  • widen_factor (float) – Width multiplier, multiply number of channels in each layer by this amount. Defaults to 1.0.

  • num_csp_blocks (int) – Number of bottlenecks in CSPLayer. Defaults to 1.

  • freeze_all (bool) – Whether to freeze the model.

  • norm_cfg (dict) – Config dict for normalization layer. Defaults to dict(type=’BN’, momentum=0.03, eps=0.001).

  • act_cfg (dict) – Config dict for activation layer. Defaults to dict(type=’ReLU’, inplace=True).

  • block_cfg (dict) – Config dict for the block used to build each layer. Defaults to dict(type=’RepVGGBlock’).

  • block_act_cfg (dict) – Config dict for activation layer used in each stage. Defaults to dict(type=’SiLU’, inplace=True).

  • init_cfg (dict or list[dict], optional) – Initialization config dict. Defaults to None.

build_bottom_up_layer(idx: int)torch.nn.modules.module.Module[源代码]

build bottom up layer.


idx (int) – layer idx.


The bottom up layer.



build_top_down_layer(idx: int)torch.nn.modules.module.Module[源代码]

build top down layer.


idx (int) – layer idx.


The top down layer.



class mmyolo.models.necks.YOLOv6RepBiPAFPN(in_channels: List[int], out_channels: int, deepen_factor: float = 1.0, widen_factor: float = 1.0, num_csp_blocks: int = 12, freeze_all: bool = False, norm_cfg: Union[mmengine.config.config.ConfigDict, dict] = {'eps': 0.001, 'momentum': 0.03, 'type': 'BN'}, act_cfg: Union[mmengine.config.config.ConfigDict, dict] = {'inplace': True, 'type': 'ReLU'}, block_cfg: Union[mmengine.config.config.ConfigDict, dict] = {'type': 'RepVGGBlock'}, init_cfg: Optional[Union[mmengine.config.config.ConfigDict, dict, List[Union[dict, mmengine.config.config.ConfigDict]]]] = None)[源代码]

Path Aggregation Network used in YOLOv6 3.0.

  • in_channels (List[int]) – Number of input channels per scale.

  • out_channels (int) – Number of output channels (used at each scale)

  • deepen_factor (float) – Depth multiplier, multiply number of blocks in CSP layer by this amount. Defaults to 1.0.

  • widen_factor (float) – Width multiplier, multiply number of channels in each layer by this amount. Defaults to 1.0.

  • num_csp_blocks (int) – Number of bottlenecks in CSPLayer. Defaults to 1.

  • freeze_all (bool) – Whether to freeze the model.

  • norm_cfg (dict) – Config dict for normalization layer. Defaults to dict(type=’BN’, momentum=0.03, eps=0.001).

  • act_cfg (dict) – Config dict for activation layer. Defaults to dict(type=’ReLU’, inplace=True).

  • block_cfg (dict) – Config dict for the block used to build each layer. Defaults to dict(type=’RepVGGBlock’).

  • init_cfg (dict or list[dict], optional) – Initialization config dict. Defaults to None.

build_top_down_layer(idx: int)torch.nn.modules.module.Module[源代码]

build top down layer.


idx (int) – layer idx.


The top down layer.



build_upsample_layer(idx: int)torch.nn.modules.module.Module[源代码]

build upsample layer.


idx (int) – layer idx.


The upsample layer.



forward(inputs: List[torch.Tensor])tuple[源代码]

Forward function.

class mmyolo.models.necks.YOLOv6RepPAFPN(in_channels: List[int], out_channels: int, deepen_factor: float = 1.0, widen_factor: float = 1.0, num_csp_blocks: int = 12, freeze_all: bool = False, norm_cfg: Union[mmengine.config.config.ConfigDict, dict] = {'eps': 0.001, 'momentum': 0.03, 'type': 'BN'}, act_cfg: Union[mmengine.config.config.ConfigDict, dict] = {'inplace': True, 'type': 'ReLU'}, block_cfg: Union[mmengine.config.config.ConfigDict, dict] = {'type': 'RepVGGBlock'}, init_cfg: Optional[Union[mmengine.config.config.ConfigDict, dict, List[Union[dict, mmengine.config.config.ConfigDict]]]] = None)[源代码]

Path Aggregation Network used in YOLOv6.

  • in_channels (List[int]) – Number of input channels per scale.

  • out_channels (int) – Number of output channels (used at each scale)

  • deepen_factor (float) – Depth multiplier, multiply number of blocks in CSP layer by this amount. Defaults to 1.0.

  • widen_factor (float) – Width multiplier, multiply number of channels in each layer by this amount. Defaults to 1.0.

  • num_csp_blocks (int) – Number of bottlenecks in CSPLayer. Defaults to 1.

  • freeze_all (bool) – Whether to freeze the model.

  • norm_cfg (dict) – Config dict for normalization layer. Defaults to dict(type=’BN’, momentum=0.03, eps=0.001).

  • act_cfg (dict) – Config dict for activation layer. Defaults to dict(type=’ReLU’, inplace=True).

  • block_cfg (dict) – Config dict for the block used to build each layer. Defaults to dict(type=’RepVGGBlock’).

  • init_cfg (dict or list[dict], optional) – Initialization config dict. Defaults to None.

build_bottom_up_layer(idx: int)torch.nn.modules.module.Module[源代码]

build bottom up layer.


idx (int) – layer idx.


The bottom up layer.



build_downsample_layer(idx: int)torch.nn.modules.module.Module[源代码]

build downsample layer.


idx (int) – layer idx.


The downsample layer.



build_out_layer(*args, **kwargs)torch.nn.modules.module.Module[源代码]

build out layer.

build_reduce_layer(idx: int)torch.nn.modules.module.Module[源代码]

build reduce layer.


idx (int) – layer idx.


The reduce layer.



build_top_down_layer(idx: int)torch.nn.modules.module.Module[源代码]

build top down layer.


idx (int) – layer idx.


The top down layer.



build_upsample_layer(idx: int)torch.nn.modules.module.Module[源代码]

build upsample layer.


idx (int) – layer idx.


The upsample layer.




Initialize the weights.

class mmyolo.models.necks.YOLOv7PAFPN(in_channels: List[int], out_channels: List[int], block_cfg: dict = {'block_ratio': 0.25, 'middle_ratio': 0.5, 'num_blocks': 4, 'num_convs_in_block': 1, 'type': 'ELANBlock'}, deepen_factor: float = 1.0, widen_factor: float = 1.0, spp_expand_ratio: float = 0.5, is_tiny_version: bool = False, use_maxpool_in_downsample: bool = True, use_in_channels_in_downsample: bool = False, use_repconv_outs: bool = True, upsample_feats_cat_first: bool = False, freeze_all: bool = False, norm_cfg: Union[mmengine.config.config.ConfigDict, dict] = {'eps': 0.001, 'momentum': 0.03, 'type': 'BN'}, act_cfg: Union[mmengine.config.config.ConfigDict, dict] = {'inplace': True, 'type': 'SiLU'}, init_cfg: Optional[Union[mmengine.config.config.ConfigDict, dict, List[Union[dict, mmengine.config.config.ConfigDict]]]] = None)[源代码]

Path Aggregation Network used in YOLOv7.

  • in_channels (List[int]) – Number of input channels per scale.

  • out_channels (int) – Number of output channels (used at each scale).

  • block_cfg (dict) – Config dict for block.

  • deepen_factor (float) – Depth multiplier, multiply number of blocks in CSP layer by this amount. Defaults to 1.0.

  • widen_factor (float) – Width multiplier, multiply number of channels in each layer by this amount. Defaults to 1.0.

  • spp_expand_ratio (float) – Expand ratio of SPPCSPBlock. Defaults to 0.5.

  • is_tiny_version (bool) – Is tiny version of neck. If True, it means it is a yolov7 tiny model. Defaults to False.

  • use_maxpool_in_downsample (bool) – Whether maxpooling is used in downsample layers. Defaults to True.

  • use_in_channels_in_downsample (bool) – MaxPoolAndStrideConvBlock module input parameters. Defaults to False.

  • use_repconv_outs (bool) – Whether to use repconv in the output layer. Defaults to True.

  • upsample_feats_cat_first (bool) – Whether the output features are concat first after upsampling in the topdown module. Defaults to True. Currently only YOLOv7 is false.

  • freeze_all (bool) – Whether to freeze the model. Defaults to False.

  • norm_cfg (dict) – Config dict for normalization layer. Defaults to dict(type=’BN’, momentum=0.03, eps=0.001).

  • act_cfg (dict) – Config dict for activation layer. Defaults to dict(type=’SiLU’, inplace=True).

  • init_cfg (dict or list[dict], optional) – Initialization config dict. Defaults to None.

build_bottom_up_layer(idx: int)torch.nn.modules.module.Module[源代码]

build bottom up layer.


idx (int) – layer idx.


The bottom up layer.



build_downsample_layer(idx: int)torch.nn.modules.module.Module[源代码]

build downsample layer.


idx (int) – layer idx.


The downsample layer.



build_out_layer(idx: int)torch.nn.modules.module.Module[源代码]

build out layer.


idx (int) – layer idx.


The out layer.



build_reduce_layer(idx: int)torch.nn.modules.module.Module[源代码]

build reduce layer.


idx (int) – layer idx.


The reduce layer.



build_top_down_layer(idx: int)torch.nn.modules.module.Module[源代码]

build top down layer.


idx (int) – layer idx.


The top down layer.



build_upsample_layer(idx: int)torch.nn.modules.module.Module[源代码]

build upsample layer.

class mmyolo.models.necks.YOLOv8PAFPN(in_channels: List[int], out_channels: Union[List[int], int], deepen_factor: float = 1.0, widen_factor: float = 1.0, num_csp_blocks: int = 3, freeze_all: bool = False, norm_cfg: Union[mmengine.config.config.ConfigDict, dict] = {'eps': 0.001, 'momentum': 0.03, 'type': 'BN'}, act_cfg: Union[mmengine.config.config.ConfigDict, dict] = {'inplace': True, 'type': 'SiLU'}, init_cfg: Optional[Union[mmengine.config.config.ConfigDict, dict, List[Union[dict, mmengine.config.config.ConfigDict]]]] = None)[源代码]

Path Aggregation Network used in YOLOv8.

  • in_channels (List[int]) – Number of input channels per scale.

  • out_channels (int) – Number of output channels (used at each scale)

  • deepen_factor (float) – Depth multiplier, multiply number of blocks in CSP layer by this amount. Defaults to 1.0.

  • widen_factor (float) – Width multiplier, multiply number of channels in each layer by this amount. Defaults to 1.0.

  • num_csp_blocks (int) – Number of bottlenecks in CSPLayer. Defaults to 1.

  • freeze_all (bool) – Whether to freeze the model

  • norm_cfg (dict) – Config dict for normalization layer. Defaults to dict(type=’BN’, momentum=0.03, eps=0.001).

  • act_cfg (dict) – Config dict for activation layer. Defaults to dict(type=’SiLU’, inplace=True).

  • init_cfg (dict or list[dict], optional) – Initialization config dict. Defaults to None.

build_bottom_up_layer(idx: int)torch.nn.modules.module.Module[源代码]

build bottom up layer.


idx (int) – layer idx.


The bottom up layer.



build_reduce_layer(idx: int)torch.nn.modules.module.Module[源代码]

build reduce layer.


idx (int) – layer idx.


The reduce layer.



build_top_down_layer(idx: int)torch.nn.modules.module.Module[源代码]

build top down layer.


idx (int) – layer idx.


The top down layer.




class mmyolo.models.task_modules.BatchATSSAssigner(num_classes: int, iou_calculator: Union[mmengine.config.config.ConfigDict, dict] = {'type': 'mmdet.BboxOverlaps2D'}, topk: int = 9)[源代码]

Assign a batch of corresponding gt bboxes or background to each prior.

This code is based on

Each proposal will be assigned with 0 or a positive integer indicating the ground truth index.

  • 0: negative sample, no assigned gt

  • positive integer: positive sample, index (1-based) of assigned gt

  • num_classes (int) – number of class

  • iou_calculator (ConfigDict or dict) – Config dict for iou calculator. Defaults to dict(type='BboxOverlaps2D')

  • topk (int) – number of priors selected in each level

forward(pred_bboxes: torch.Tensor, priors: torch.Tensor, num_level_priors: List, gt_labels: torch.Tensor, gt_bboxes: torch.Tensor, pad_bbox_flag: torch.Tensor)dict[源代码]

Assign gt to priors.

The assignment is done in following steps

  1. compute iou between all prior (prior of all pyramid levels) and gt

  2. compute center distance between all prior and gt

  3. on each pyramid level, for each gt, select k prior whose center are closest to the gt center, so we total select k*l prior as candidates for each gt

  4. get corresponding iou for the these candidates, and compute the mean and std, set mean + std as the iou threshold

  5. select these candidates whose iou are greater than or equal to the threshold as positive

  6. limit the positive sample’s center in gt

  • pred_bboxes (Tensor) – Predicted bounding boxes, shape(batch_size, num_priors, 4)

  • priors (Tensor) – Model priors with stride, shape(num_priors, 4)

  • num_level_priors (List) – Number of bboxes in each level, len(3)

  • gt_labels (Tensor) – Ground truth label, shape(batch_size, num_gt, 1)

  • gt_bboxes (Tensor) – Ground truth bbox, shape(batch_size, num_gt, 4)

  • pad_bbox_flag (Tensor) – Ground truth bbox mask, 1 means bbox, 0 means no bbox, shape(batch_size, num_gt, 1)


Assigned result

’assigned_labels’ (Tensor): shape(batch_size, num_gt) ‘assigned_bboxes’ (Tensor): shape(batch_size, num_gt, 4) ‘assigned_scores’ (Tensor):

shape(batch_size, num_gt, number_classes)

’fg_mask_pre_prior’ (Tensor): shape(bs, num_gt)


assigned_result (dict)

get_targets(gt_labels: torch.Tensor, gt_bboxes: torch.Tensor, assigned_gt_inds: torch.Tensor, fg_mask_pre_prior: torch.Tensor, num_priors: int, batch_size: int, num_gt: int)Tuple[torch.Tensor, torch.Tensor, torch.Tensor][源代码]

Get target info.

  • gt_labels (Tensor) – Ground true labels, shape(batch_size, num_gt, 1)

  • gt_bboxes (Tensor) – Ground true bboxes, shape(batch_size, num_gt, 4)

  • assigned_gt_inds (Tensor) – Assigned ground truth indexes, shape(batch_size, num_priors)

  • fg_mask_pre_prior (Tensor) – Force ground truth matching mask, shape(batch_size, num_priors)

  • num_priors (int) – Number of priors.

  • batch_size (int) – Batch size.

  • num_gt (int) – Number of ground truth.


Assigned labels,

shape(batch_size, num_priors)

assigned_bboxes (Tensor): Assigned bboxes,

shape(batch_size, num_priors)

assigned_scores (Tensor): Assigned scores,

shape(batch_size, num_priors)


assigned_labels (Tensor)

select_topk_candidates(distances: torch.Tensor, num_level_priors: List[int], pad_bbox_flag: torch.Tensor)Tuple[torch.Tensor, torch.Tensor][源代码]

Selecting candidates based on the center distance.

  • distances (Tensor) – Distance between all bbox and gt, shape(batch_size, num_gt, num_priors)

  • num_level_priors (List[int]) – Number of bboxes in each level, len(3)

  • pad_bbox_flag (Tensor) – Ground truth bbox mask, shape(batch_size, num_gt, 1)


Flag show that each level have

topk candidates or not, shape(batch_size, num_gt, num_priors)

candidate_idxs (Tensor): Candidates index,

shape(batch_size, num_gt, num_gt)


is_in_candidate_list (Tensor)

static threshold_calculator(is_in_candidate: List, candidate_idxs: torch.Tensor, overlaps: torch.Tensor, num_priors: int, batch_size: int, num_gt: int)Tuple[torch.Tensor, torch.Tensor][源代码]

Get corresponding iou for the these candidates, and compute the mean and std, set mean + std as the iou threshold.

  • is_in_candidate (Tensor) – Flag show that each level have topk candidates or not, shape(batch_size, num_gt, num_priors).

  • candidate_idxs (Tensor) – Candidates index, shape(batch_size, num_gt, num_gt)

  • overlaps (Tensor) – Overlaps area, shape(batch_size, num_gt, num_priors).

  • num_priors (int) – Number of priors.

  • batch_size (int) – Batch size.

  • num_gt (int) – Number of ground truth.


Overlap threshold of

per ground truth, shape(batch_size, num_gt, 1).

candidate_overlaps (Tensor): Candidate overlaps,

shape(batch_size, num_gt, num_priors).


overlaps_thr_per_gt (Tensor)

class mmyolo.models.task_modules.BatchTaskAlignedAssigner(num_classes: int, topk: int = 13, alpha: float = 1.0, beta: float = 6.0, eps: float = 1e-07, use_ciou: bool = False)[源代码]

This code referenced to assigners/ Batch Task aligned assigner base on the paper: TOOD: Task-aligned One-stage Object Detection.. Assign a corresponding gt bboxes or background to a batch of predicted bboxes. Each bbox will be assigned with 0 or a positive integer indicating the ground truth index. - 0: negative sample, no assigned gt - positive integer: positive sample, index (1-based) of assigned gt :param num_classes: number of class :type num_classes: int :param topk: number of bbox selected in each level :type topk: int :param alpha: Hyper-parameters related to alignment_metrics.

Defaults to 1.0

  • beta (float) – Hyper-parameters related to alignment_metrics. Defaults to 6.

  • eps (float) – Eps to avoid log(0). Default set to 1e-9

  • use_ciou (bool) – Whether to use ciou while calculating iou. Defaults to False.

forward(pred_bboxes: torch.Tensor, pred_scores: torch.Tensor, priors: torch.Tensor, gt_labels: torch.Tensor, gt_bboxes: torch.Tensor, pad_bbox_flag: torch.Tensor)dict[源代码]

Assign gt to bboxes.

The assignment is done in following steps 1. compute alignment metric between all bbox (bbox of all pyramid

levels) and gt

  1. select top-k bbox as candidates for each gt

  2. limit the positive sample’s center in gt (because the anchor-free detector only can predict positive distance)

  • pred_bboxes (Tensor) – Predict bboxes, shape(batch_size, num_priors, 4)

  • pred_scores (Tensor) – Scores of predict bboxes, shape(batch_size, num_priors, num_classes)

  • priors (Tensor) – Model priors, shape (num_priors, 4)

  • gt_labels (Tensor) – Ground true labels, shape(batch_size, num_gt, 1)

  • gt_bboxes (Tensor) – Ground true bboxes, shape(batch_size, num_gt, 4)

  • pad_bbox_flag (Tensor) – Ground truth bbox mask, 1 means bbox, 0 means no bbox, shape(batch_size, num_gt, 1)


assigned_labels (Tensor): Assigned labels,

shape(batch_size, num_priors)

assigned_bboxes (Tensor): Assigned boxes,

shape(batch_size, num_priors, 4)

assigned_scores (Tensor): Assigned scores,

shape(batch_size, num_priors, num_classes)

fg_mask_pre_prior (Tensor): Force ground truth matching mask,

shape(batch_size, num_priors)


assigned_result (dict) Assigned result

get_box_metrics(pred_bboxes: torch.Tensor, pred_scores: torch.Tensor, gt_labels: torch.Tensor, gt_bboxes: torch.Tensor, batch_size: int, num_gt: int)Tuple[torch.Tensor, torch.Tensor][源代码]

Compute alignment metric between all bbox and gt.

  • pred_bboxes (Tensor) – Predict bboxes, shape(batch_size, num_priors, 4)

  • pred_scores (Tensor) – Scores of predict bbox, shape(batch_size, num_priors, num_classes)

  • gt_labels (Tensor) – Ground true labels, shape(batch_size, num_gt, 1)

  • gt_bboxes (Tensor) – Ground true bboxes, shape(batch_size, num_gt, 4)

  • batch_size (int) – Batch size.

  • num_gt (int) – Number of ground truth.


Align metric,

shape(batch_size, num_gt, num_priors)

overlaps (Tensor): Overlaps, shape(batch_size, num_gt, num_priors)


alignment_metrics (Tensor)

get_pos_mask(pred_bboxes: torch.Tensor, pred_scores: torch.Tensor, priors: torch.Tensor, gt_labels: torch.Tensor, gt_bboxes: torch.Tensor, pad_bbox_flag: torch.Tensor, batch_size: int, num_gt: int)Tuple[torch.Tensor, torch.Tensor, torch.Tensor][源代码]

Get possible mask.

  • pred_bboxes (Tensor) – Predict bboxes, shape(batch_size, num_priors, 4)

  • pred_scores (Tensor) – Scores of predict bbox, shape(batch_size, num_priors, num_classes)

  • priors (Tensor) – Model priors, shape (num_priors, 2)

  • gt_labels (Tensor) – Ground true labels, shape(batch_size, num_gt, 1)

  • gt_bboxes (Tensor) – Ground true bboxes, shape(batch_size, num_gt, 4)

  • pad_bbox_flag (Tensor) – Ground truth bbox mask, 1 means bbox, 0 means no bbox, shape(batch_size, num_gt, 1)

  • batch_size (int) – Batch size.

  • num_gt (int) – Number of ground truth.


Possible mask,

shape(batch_size, num_gt, num_priors)

alignment_metrics (Tensor): Alignment metrics,

shape(batch_size, num_gt, num_priors)

overlaps (Tensor): Overlaps of gt_bboxes and pred_bboxes,

shape(batch_size, num_gt, num_priors)


pos_mask (Tensor)

get_targets(gt_labels: torch.Tensor, gt_bboxes: torch.Tensor, assigned_gt_idxs: torch.Tensor, fg_mask_pre_prior: torch.Tensor, batch_size: int, num_gt: int)Tuple[torch.Tensor, torch.Tensor, torch.Tensor][源代码]

Get assigner info.

  • gt_labels (Tensor) – Ground true labels, shape(batch_size, num_gt, 1)

  • gt_bboxes (Tensor) – Ground true bboxes, shape(batch_size, num_gt, 4)

  • assigned_gt_idxs (Tensor) – Assigned ground truth indexes, shape(batch_size, num_priors)

  • fg_mask_pre_prior (Tensor) – Force ground truth matching mask, shape(batch_size, num_priors)

  • batch_size (int) – Batch size.

  • num_gt (int) – Number of ground truth.


Assigned labels,

shape(batch_size, num_priors)

assigned_bboxes (Tensor): Assigned bboxes,

shape(batch_size, num_priors)

assigned_scores (Tensor): Assigned scores,

shape(batch_size, num_priors)


assigned_labels (Tensor)

select_topk_candidates(alignment_gt_metrics: torch.Tensor, using_largest_topk: bool = True, topk_mask: Optional[torch.Tensor] = None)torch.Tensor[源代码]

Compute alignment metric between all bbox and gt.

  • alignment_gt_metrics (Tensor) – Alignment metric of gt candidates, shape(batch_size, num_gt, num_priors)

  • using_largest_topk (bool) – Controls whether to using largest or smallest elements.

  • topk_mask (Tensor) – Topk mask, shape(batch_size, num_gt, self.topk)


Topk candidates mask,

shape(batch_size, num_gt, num_priors)



class mmyolo.models.task_modules.YOLOXBBoxCoder(use_box_type: bool = False, **kwargs)[源代码]

YOLOX BBox coder.

This decoder decodes pred bboxes (delta_x, delta_x, w, h) to bboxes (tl_x, tl_y, br_x, br_y).

decode(priors: torch.Tensor, pred_bboxes: torch.Tensor, stride: Union[torch.Tensor, int])torch.Tensor[源代码]

Decode regression results (delta_x, delta_x, w, h) to bboxes (tl_x, tl_y, br_x, br_y).

  • priors (torch.Tensor) – Basic boxes or points, e.g. anchors.

  • pred_bboxes (torch.Tensor) – Encoded boxes with shape

  • stride (torch.Tensor | int) – Strides of bboxes.


Decoded boxes.




Encode deltas between bboxes and ground truth boxes.

class mmyolo.models.task_modules.YOLOv5BBoxCoder(use_box_type: bool = False, **kwargs)[源代码]

YOLOv5 BBox coder.

This decoder decodes pred bboxes (delta_x, delta_x, w, h) to bboxes (tl_x, tl_y, br_x, br_y).

decode(priors: torch.Tensor, pred_bboxes: torch.Tensor, stride: Union[torch.Tensor, int])torch.Tensor[源代码]

Decode regression results (delta_x, delta_x, w, h) to bboxes (tl_x, tl_y, br_x, br_y).

  • priors (torch.Tensor) – Basic boxes or points, e.g. anchors.

  • pred_bboxes (torch.Tensor) – Encoded boxes with shape

  • stride (torch.Tensor | int) – Strides of bboxes.


Decoded boxes.




Encode deltas between bboxes and ground truth boxes.


class mmyolo.models.utils.OutputSaveFunctionWrapper(func: Callable, spec: Optional[Dict])[源代码]

A class that wraps a function and saves its outputs.

This class can be used to decorate a function to save its outputs. It wraps the function with a __call__ method that calls the original function and saves the results in a log attribute. :param func: A function to wrap. :type func: Callable :param spec: A dictionary of global variables to use as the

namespace for the wrapper. If None, the global namespace of the original function is used.

class mmyolo.models.utils.OutputSaveObjectWrapper(obj: Any)[源代码]

A wrapper class that saves the output of function calls on an object.


Clears the log of function call outputs.

mmyolo.models.utils.gt_instances_preprocess(batch_gt_instances: Union[torch.Tensor, Sequence], batch_size: int)torch.Tensor[源代码]

Split batch_gt_instances with batch size.

From [all_gt_bboxes, box_dim+2] to [batch_size, number_gt, box_dim+1]. For horizontal box, box_dim=4, for rotated box, box_dim=5

If some shape of single batch smaller than gt bbox len, then using zeros to fill.

  • batch_gt_instances (Sequence[Tensor]) – Ground truth instances for whole batch, shape [all_gt_bboxes, box_dim+2]

  • batch_size (int) – Batch size.


batch gt instances data, shape

[batch_size, number_gt, box_dim+1]



mmyolo.models.utils.make_divisible(x: float, widen_factor: float = 1.0, divisor: int = 8)int[源代码]

Make sure that x*widen_factor is divisible by divisor.

mmyolo.models.utils.make_round(x: float, deepen_factor: float = 1.0)int[源代码]

Make sure that x*deepen_factor becomes an integer not less than 1.
