Shortcuts

mmyolo.datasets

datasets

class mmyolo.datasets.BatchShapePolicy(batch_size: int = 32, img_size: int = 640, size_divisor: int = 32, extra_pad_ratio: float = 0.5)[source]

BatchShapePolicy is only used in the testing phase, which can reduce the number of pad pixels during batch inference.

Parameters
  • batch_size (int) – Single GPU batch size during batch inference. Defaults to 32.

  • img_size (int) – Expected output image size. Defaults to 640.

  • size_divisor (int) – The minimum size that is divisible by size_divisor. Defaults to 32.

  • extra_pad_ratio (float) – Extra pad ratio. Defaults to 0.5.

class mmyolo.datasets.YOLOv5CocoDataset(*args, batch_shapes_cfg: Optional[dict] = None, **kwargs)[source]

Dataset for YOLOv5 COCO Dataset.

We only add BatchShapePolicy function compared with CocoDataset. See mmyolo/datasets/utils.py#BatchShapePolicy for details

class mmyolo.datasets.YOLOv5CrowdHumanDataset(*args, batch_shapes_cfg: Optional[dict] = None, **kwargs)[source]

Dataset for YOLOv5 CrowdHuman Dataset.

We only add BatchShapePolicy function compared with CrowdHumanDataset. See mmyolo/datasets/utils.py#BatchShapePolicy for details

class mmyolo.datasets.YOLOv5DOTADataset(*args, **kwargs)[source]

Dataset for YOLOv5 DOTA Dataset.

We only add BatchShapePolicy function compared with DOTADataset. See mmyolo/datasets/utils.py#BatchShapePolicy for details

class mmyolo.datasets.YOLOv5VOCDataset(*args, batch_shapes_cfg: Optional[dict] = None, **kwargs)[source]

Dataset for YOLOv5 VOC Dataset.

We only add BatchShapePolicy function compared with VOCDataset. See mmyolo/datasets/utils.py#BatchShapePolicy for details

mmyolo.datasets.yolov5_collate(data_batch: Sequence, use_ms_training: bool = False)dict[source]

Rewrite collate_fn to get faster training speed.

Parameters
  • data_batch (Sequence) – Batch of data.

  • use_ms_training (bool) – Whether to use multi-scale training.

transforms

class mmyolo.datasets.transforms.FilterAnnotations(by_keypoints: bool = False, **kwargs)[source]

Filter invalid annotations.

In addition to the conditions checked by FilterDetAnnotations, this filter adds a new condition requiring instances to have at least one visible keypoints.

class mmyolo.datasets.transforms.LetterResize(scale: Union[int, Tuple[int, int]], pad_val: dict = {'img': 0, 'mask': 0, 'seg': 255}, use_mini_pad: bool = False, stretch_only: bool = False, allow_scale_up: bool = True, half_pad_param: bool = False, **kwargs)[source]

Resize and pad image while meeting stride-multiple constraints.

Required Keys:

  • img (np.uint8)

  • batch_shape (np.int64) (optional)

Modified Keys:

  • img (np.uint8)

  • img_shape (tuple)

  • gt_bboxes (optional)

Added Keys: - pad_param (np.float32)

Parameters
  • scale (Union[int, Tuple[int, int]]) – Images scales for resizing.

  • pad_val (dict) – Padding value. Defaults to dict(img=0, seg=255).

  • use_mini_pad (bool) – Whether using minimum rectangle padding. Defaults to True

  • stretch_only (bool) – Whether stretch to the specified size directly. Defaults to False

  • allow_scale_up (bool) – Allow scale up when ratio > 1. Defaults to True

  • half_pad_param (bool) – If set to True, left and right pad_param will be given by dividing padding_h by 2. If set to False, pad_param is in int format. We recommend setting this to False for object detection tasks, and True for instance segmentation tasks. Default to False.

transform(results: dict)dict[source]

Transform function to resize images, bounding boxes, semantic segmentation map and keypoints.

Parameters

results (dict) – Result dict from loading pipeline.

Returns

Resized results, ‘img’, ‘gt_bboxes’, ‘gt_seg_map’, ‘gt_keypoints’, ‘scale’, ‘scale_factor’, ‘img_shape’, and ‘keep_ratio’ keys are updated in result dict.

Return type

dict

class mmyolo.datasets.transforms.LoadAnnotations(mask2bbox: bool = False, poly2mask: bool = False, merge_polygons: bool = True, **kwargs)[source]

Because the yolo series does not need to consider ignore bboxes for the time being, in order to speed up the pipeline, it can be excluded in advance.

Parameters
  • mask2bbox (bool) – Whether to use mask annotation to get bbox. Defaults to False.

  • poly2mask (bool) – Whether to transform the polygons to bitmaps. Defaults to False.

  • merge_polygons (bool) – Whether to merge polygons into one polygon. If merged, the storage structure is simpler and training is more effcient, especially if the mask inside a bbox is divided into multiple polygons. Defaults to True.

merge_multi_segment(gt_masks: List[numpy.ndarray])List[numpy.ndarray][source]

Merge multi segments to one list.

Find the coordinates with min distance between each segment, then connect these coordinates with one thin line to merge all segments into one. :param gt_masks: original segmentations in coco’s json file.

like [segmentation1, segmentation2,…], each segmentation is a list of coordinates.

Returns

merged gt_masks

Return type

gt_masks(List(np.array))

min_index(arr1: numpy.ndarray, arr2: numpy.ndarray)Tuple[int, int][source]

Find a pair of indexes with the shortest distance.

Parameters
  • arr1 – (N, 2).

  • arr2 – (M, 2).

Returns

a pair of indexes.

Return type

tuple

transform(results: dict)dict[source]

Function to load multiple types annotations.

Parameters

results (dict) – Result dict from :obj:mmengine.BaseDataset.

Returns

The dict contains loaded bounding box, label and semantic segmentation.

Return type

dict

class mmyolo.datasets.transforms.Mosaic(img_scale: Tuple[int, int] = (640, 640), center_ratio_range: Tuple[float, float] = (0.5, 1.5), bbox_clip_border: bool = True, pad_val: float = 114.0, pre_transform: Optional[Sequence[dict]] = None, prob: float = 1.0, use_cached: bool = False, max_cached_images: int = 40, random_pop: bool = True, max_refetch: int = 15)[source]

Mosaic augmentation.

Given 4 images, mosaic transform combines them into one output image. The output image is composed of the parts from each sub- image.

                   mosaic transform
                      center_x
           +------------------------------+
           |       pad        |           |
           |      +-----------+    pad    |
           |      |           |           |
           |      |  image1   +-----------+
           |      |           |           |
           |      |           |   image2  |
center_y   |----+-+-----------+-----------+
           |    |   cropped   |           |
           |pad |   image3    |   image4  |
           |    |             |           |
           +----|-------------+-----------+
                |             |
                +-------------+

The mosaic transform steps are as follows:

    1. Choose the mosaic center as the intersections of 4 images
    2. Get the left top image according to the index, and randomly
       sample another 3 images from the custom dataset.
    3. Sub image will be cropped if image is larger than mosaic patch

Required Keys:

  • img

  • gt_bboxes (BaseBoxes[torch.float32]) (optional)

  • gt_bboxes_labels (np.int64) (optional)

  • gt_ignore_flags (bool) (optional)

  • mix_results (List[dict])

Modified Keys:

  • img

  • img_shape

  • gt_bboxes (optional)

  • gt_bboxes_labels (optional)

  • gt_ignore_flags (optional)

Parameters
  • img_scale (Sequence[int]) – Image size after mosaic pipeline of single image. The shape order should be (width, height). Defaults to (640, 640).

  • center_ratio_range (Sequence[float]) – Center ratio range of mosaic output. Defaults to (0.5, 1.5).

  • bbox_clip_border (bool, optional) – Whether to clip the objects outside the border of the image. In some dataset like MOT17, the gt bboxes are allowed to cross the border of images. Therefore, we don’t need to clip the gt bboxes in these cases. Defaults to True.

  • pad_val (int) – Pad value. Defaults to 114.

  • pre_transform (Sequence[dict]) – Sequence of transform object or config dict to be composed.

  • prob (float) – Probability of applying this transformation. Defaults to 1.0.

  • use_cached (bool) – Whether to use cache. Defaults to False.

  • max_cached_images (int) – The maximum length of the cache. The larger the cache, the stronger the randomness of this transform. As a rule of thumb, providing 10 caches for each image suffices for randomness. Defaults to 40.

  • random_pop (bool) – Whether to randomly pop a result from the cache when the cache is full. If set to False, use FIFO popping method. Defaults to True.

  • max_refetch (int) – The maximum number of retry iterations for getting valid results from the pipeline. If the number of iterations is greater than max_refetch, but results is still None, then the iteration is terminated and raise the error. Defaults to 15.

get_indexes(dataset: Union[mmengine.dataset.base_dataset.BaseDataset, list])list[source]

Call function to collect indexes.

Parameters

dataset (Dataset or list) – The dataset or cached list.

Returns

indexes.

Return type

list

mix_img_transform(results: dict)dict[source]

Mixed image data transformation.

Parameters

results (dict) – Result dict.

Returns

Updated result dict.

Return type

results (dict)

class mmyolo.datasets.transforms.Mosaic9(img_scale: Tuple[int, int] = (640, 640), bbox_clip_border: bool = True, pad_val: Union[float, int] = 114.0, pre_transform: Optional[Sequence[dict]] = None, prob: float = 1.0, use_cached: bool = False, max_cached_images: int = 50, random_pop: bool = True, max_refetch: int = 15)[source]

Mosaic9 augmentation.

Given 9 images, mosaic transform combines them into one output image. The output image is composed of the parts from each sub- image.

           +-------------------------------+------------+
           | pad           |      pad      |            |
           |    +----------+               |            |
           |    |          +---------------+  top_right |
           |    |          |      top      |   image2   |
           |    | top_left |     image1    |            |
           |    |  image8  o--------+------+--------+---+
           |    |          |        |               |   |
           +----+----------+        |     right     |pad|
           |               | center |     image3    |   |
           |     left      | image0 +---------------+---|
           |    image7     |        |               |   |
       +---+-----------+---+--------+               |   |
       |   |  cropped  |            |  bottom_right |pad|
       |   |bottom_left|            |    image4     |   |
       |   |  image6   |   bottom   |               |   |
       +---|-----------+   image5   +---------------+---|
           |    pad    |            |        pad        |
           +-----------+------------+-------------------+

The mosaic transform steps are as follows:

    1. Get the center image according to the index, and randomly
       sample another 8 images from the custom dataset.
    2. Randomly offset the image after Mosaic

Required Keys:

  • img

  • gt_bboxes (BaseBoxes[torch.float32]) (optional)

  • gt_bboxes_labels (np.int64) (optional)

  • gt_ignore_flags (bool) (optional)

  • mix_results (List[dict])

Modified Keys:

  • img

  • img_shape

  • gt_bboxes (optional)

  • gt_bboxes_labels (optional)

  • gt_ignore_flags (optional)

Parameters
  • img_scale (Sequence[int]) – Image size after mosaic pipeline of single image. The shape order should be (width, height). Defaults to (640, 640).

  • bbox_clip_border (bool, optional) – Whether to clip the objects outside the border of the image. In some dataset like MOT17, the gt bboxes are allowed to cross the border of images. Therefore, we don’t need to clip the gt bboxes in these cases. Defaults to True.

  • pad_val (int) – Pad value. Defaults to 114.

  • pre_transform (Sequence[dict]) – Sequence of transform object or config dict to be composed.

  • prob (float) – Probability of applying this transformation. Defaults to 1.0.

  • use_cached (bool) – Whether to use cache. Defaults to False.

  • max_cached_images (int) – The maximum length of the cache. The larger the cache, the stronger the randomness of this transform. As a rule of thumb, providing 5 caches for each image suffices for randomness. Defaults to 50.

  • random_pop (bool) – Whether to randomly pop a result from the cache when the cache is full. If set to False, use FIFO popping method. Defaults to True.

  • max_refetch (int) – The maximum number of retry iterations for getting valid results from the pipeline. If the number of iterations is greater than max_refetch, but results is still None, then the iteration is terminated and raise the error. Defaults to 15.

get_indexes(dataset: Union[mmengine.dataset.base_dataset.BaseDataset, list])list[source]

Call function to collect indexes.

Parameters

dataset (Dataset or list) – The dataset or cached list.

Returns

indexes.

Return type

list

mix_img_transform(results: dict)dict[source]

Mixed image data transformation.

Parameters

results (dict) – Result dict.

Returns

Updated result dict.

Return type

results (dict)

class mmyolo.datasets.transforms.PPYOLOERandomCrop(aspect_ratio: List[float] = [0.5, 2.0], thresholds: List[float] = [0.0, 0.1, 0.3, 0.5, 0.7, 0.9], scaling: List[float] = [0.3, 1.0], num_attempts: int = 50, allow_no_crop: bool = True, cover_all_box: bool = False)[source]

Random crop the img and bboxes. Different thresholds are used in PPYOLOE to judge whether the clipped image meets the requirements. This implementation is different from the implementation of RandomCrop in mmdet.

Required Keys:

  • img

  • gt_bboxes (BaseBoxes[torch.float32]) (optional)

  • gt_bboxes_labels (np.int64) (optional)

  • gt_ignore_flags (bool) (optional)

Modified Keys:

  • img

  • img_shape

  • gt_bboxes (optional)

  • gt_bboxes_labels (optional)

  • gt_ignore_flags (optional)

Added Keys: - pad_param (np.float32)

Parameters
  • aspect_ratio (List[float]) – Aspect ratio of cropped region. Default to [.5, 2].

  • thresholds (List[float]) – Iou thresholds for deciding a valid bbox crop in [min, max] format. Defaults to [.0, .1, .3, .5, .7, .9].

  • scaling (List[float]) – Ratio between a cropped region and the original image in [min, max] format. Default to [.3, 1.].

  • num_attempts (int) – Number of tries for each threshold before giving up. Default to 50.

  • allow_no_crop (bool) – Allow return without actually cropping them. Default to True.

  • cover_all_box (bool) – Ensure all bboxes are covered in the final crop. Default to False.

class mmyolo.datasets.transforms.PPYOLOERandomDistort(hue_cfg: dict = {'max': 18, 'min': - 18, 'prob': 0.5}, saturation_cfg: dict = {'max': 1.5, 'min': 0.5, 'prob': 0.5}, contrast_cfg: dict = {'max': 1.5, 'min': 0.5, 'prob': 0.5}, brightness_cfg: dict = {'max': 1.5, 'min': 0.5, 'prob': 0.5}, num_distort_func: int = 4)[source]

Random hue, saturation, contrast and brightness distortion.

Required Keys:

  • img

Modified Keys:

  • img (np.float32)

Parameters
  • hue_cfg (dict) – Hue settings. Defaults to dict(min=-18, max=18, prob=0.5).

  • saturation_cfg (dict) – Saturation settings. Defaults to dict( min=0.5, max=1.5, prob=0.5).

  • contrast_cfg (dict) – Contrast settings. Defaults to dict( min=0.5, max=1.5, prob=0.5).

  • brightness_cfg (dict) – Brightness settings. Defaults to dict( min=0.5, max=1.5, prob=0.5).

  • num_distort_func (int) – The number of distort function. Defaults to 4.

transform(results: dict)dict[source]

The hue, saturation, contrast and brightness distortion function.

Parameters

results (dict) – The result dict.

Returns

The result dict.

Return type

dict

transform_brightness(results)[source]

Transform brightness randomly.

transform_contrast(results)[source]

Transform contrast randomly.

transform_hue(results)[source]

Transform hue randomly.

transform_saturation(results)[source]

Transform saturation randomly.

class mmyolo.datasets.transforms.PackDetInputs(meta_keys=('img_id', 'img_path', 'ori_shape', 'img_shape', 'scale_factor', 'flip', 'flip_direction'))[source]

Pack the inputs data for the detection / semantic segmentation / panoptic segmentation.

Compared to mmdet, we just add the gt_panoptic_seg field and logic.

transform(results: dict)dict[source]

Method to pack the input data. :param results: Result dict from the data pipeline. :type results: dict

Returns

  • ‘inputs’ (obj:torch.Tensor): The forward data of models.

  • ’data_sample’ (obj:DetDataSample): The annotation info of the

    sample.

Return type

dict

class mmyolo.datasets.transforms.Polygon2Mask(downsample_ratio: int = 4, mask_overlap: bool = True, coco_style: bool = False)[source]

Polygons to bitmaps in YOLOv5.

Parameters
  • downsample_ratio (int) – Downsample ratio of mask.

  • mask_overlap (bool) – Whether to use maskoverlap in mask process. When set to True, the implementation here is the same as the official, with higher training speed. If set to True, all gt masks will compress into one overlap mask, the value of mask indicates the index of gt masks. If set to False, one mask is a binary mask. Default to True.

  • coco_style (bool) – Whether to use coco_style to convert the polygons to bitmaps. Note that this option is only used to test if there is an improvement in training speed and we recommend setting it to False.

polygon2mask(img_shape: Tuple[int, int], polygons: numpy.ndarray, color: int = 1)numpy.ndarray[source]
Parameters
  • img_shape (tuple) – The image size.

  • polygons (np.ndarray) – [N, M], N is the number of polygons, M is the number of points(Be divided by 2).

  • color (int) – color in fillPoly.

Returns

the overlap mask.

Return type

np.ndarray

polygons2masks(img_shape: Tuple[int, int], polygons: mmdet.structures.mask.structures.PolygonMasks, color: int = 1)numpy.ndarray[source]

Return a list of bitmap masks.

Parameters
  • img_shape (tuple) – The image size.

  • polygons (PolygonMasks) – The mask annotations.

  • color (int) – color in fillPoly.

Returns

the list of masks in bitmaps.

Return type

List[np.ndarray]

polygons2masks_overlap(img_shape: Tuple[int, int], polygons: mmdet.structures.mask.structures.PolygonMasks)Tuple[numpy.ndarray, numpy.ndarray][source]

Return a overlap mask and the sorted idx of area.

Parameters
  • img_shape (tuple) – The image size.

  • polygons (PolygonMasks) – The mask annotations.

  • color (int) – color in fillPoly.

Returns

the overlap mask and the sorted idx of area.

Return type

Tuple[np.ndarray, np.ndarray]

transform(results: dict)dict[source]

The transform function. All subclass of BaseTransform should override this method.

This function takes the result dict as the input, and can add new items to the dict or modify existing items in the dict. And the result dict will be returned in the end, which allows to concate multiple transforms into a pipeline.

Parameters

results (dict) – The result dict.

Returns

The result dict.

Return type

dict

class mmyolo.datasets.transforms.RandomAffine(**kwargs)[source]
class mmyolo.datasets.transforms.RandomFlip(prob: Optional[Union[float, Iterable[float]]] = None, direction: Union[str, Sequence[Optional[str]]] = 'horizontal', swap_seg_labels: Optional[Sequence] = None)[source]
class mmyolo.datasets.transforms.RegularizeRotatedBox(angle_version='le90')[source]

Regularize rotated boxes.

Due to the angle periodicity, one rotated box can be represented in many different (x, y, w, h, t). To make each rotated box unique, regularize_boxes will take the remainder of the angle divided by 180 degrees.

For convenience, three angle_version can be used here:

  • ‘oc’: OpenCV Definition. Has the same box representation as

    cv2.minAreaRect the angle ranges in [-90, 0).

  • ‘le90’: Long Edge Definition (90). the angle ranges in [-90, 90).

    The width is always longer than the height.

  • ‘le135’: Long Edge Definition (135). the angle ranges in [-45, 135).

    The width is always longer than the height.

Required Keys:

  • gt_bboxes (RotatedBoxes[torch.float32])

Modified Keys:

  • gt_bboxes

Parameters

angle_version (str) – Angle version. Can only be ‘oc’, ‘le90’, or ‘le135’. Defaults to ‘le90.

transform(results: dict)dict[source]

The transform function. All subclass of BaseTransform should override this method.

This function takes the result dict as the input, and can add new items to the dict or modify existing items in the dict. And the result dict will be returned in the end, which allows to concate multiple transforms into a pipeline.

Parameters

results (dict) – The result dict.

Returns

The result dict.

Return type

dict

class mmyolo.datasets.transforms.RemoveDataElement(keys: Union[str, Sequence[str]])[source]

Remove unnecessary data element in results.

Parameters

keys (Union[str, Sequence[str]]) – Keys need to be removed.

transform(results: dict)dict[source]

The transform function. All subclass of BaseTransform should override this method.

This function takes the result dict as the input, and can add new items to the dict or modify existing items in the dict. And the result dict will be returned in the end, which allows to concate multiple transforms into a pipeline.

Parameters

results (dict) – The result dict.

Returns

The result dict.

Return type

dict

class mmyolo.datasets.transforms.Resize(scale: Optional[Union[int, Tuple[int, int]]] = None, scale_factor: Optional[Union[float, Tuple[float, float]]] = None, keep_ratio: bool = False, clip_object_border: bool = True, backend: str = 'cv2', interpolation='bilinear')[source]
class mmyolo.datasets.transforms.YOLOXMixUp(img_scale: Tuple[int, int] = (640, 640), ratio_range: Tuple[float, float] = (0.5, 1.5), flip_ratio: float = 0.5, pad_val: float = 114.0, bbox_clip_border: bool = True, pre_transform: Optional[Sequence[dict]] = None, prob: float = 1.0, use_cached: bool = False, max_cached_images: int = 20, random_pop: bool = True, max_refetch: int = 15)[source]

MixUp data augmentation for YOLOX.

         mixup transform
+---------------+--------------+
| mixup image   |              |
|      +--------|--------+     |
|      |        |        |     |
+---------------+        |     |
|      |                 |     |
|      |      image      |     |
|      |                 |     |
|      |                 |     |
|      +-----------------+     |
|             pad              |
+------------------------------+

The mixup transform steps are as follows:

  1. Another random image is picked by dataset and embedded in the top left patch(after padding and resizing)

  2. The target of mixup transform is the weighted average of mixup image and origin image.

Required Keys:

  • img

  • gt_bboxes (BaseBoxes[torch.float32]) (optional)

  • gt_bboxes_labels (np.int64) (optional)

  • gt_ignore_flags (bool) (optional)

  • mix_results (List[dict])

Modified Keys:

  • img

  • img_shape

  • gt_bboxes (optional)

  • gt_bboxes_labels (optional)

  • gt_ignore_flags (optional)

Parameters
  • img_scale (Sequence[int]) – Image output size after mixup pipeline. The shape order should be (width, height). Defaults to (640, 640).

  • ratio_range (Sequence[float]) – Scale ratio of mixup image. Defaults to (0.5, 1.5).

  • flip_ratio (float) – Horizontal flip ratio of mixup image. Defaults to 0.5.

  • pad_val (int) – Pad value. Defaults to 114.

  • bbox_clip_border (bool, optional) – Whether to clip the objects outside the border of the image. In some dataset like MOT17, the gt bboxes are allowed to cross the border of images. Therefore, we don’t need to clip the gt bboxes in these cases. Defaults to True.

  • pre_transform (Sequence[dict]) – Sequence of transform object or config dict to be composed.

  • prob (float) – Probability of applying this transformation. Defaults to 1.0.

  • use_cached (bool) – Whether to use cache. Defaults to False.

  • max_cached_images (int) – The maximum length of the cache. The larger the cache, the stronger the randomness of this transform. As a rule of thumb, providing 10 caches for each image suffices for randomness. Defaults to 20.

  • random_pop (bool) – Whether to randomly pop a result from the cache when the cache is full. If set to False, use FIFO popping method. Defaults to True.

  • max_refetch (int) – The maximum number of iterations. If the number of iterations is greater than max_refetch, but gt_bbox is still empty, then the iteration is terminated. Defaults to 15.

get_indexes(dataset: Union[mmengine.dataset.base_dataset.BaseDataset, list])int[source]

Call function to collect indexes.

Parameters

dataset (Dataset or list) – The dataset or cached list.

Returns

indexes.

Return type

int

mix_img_transform(results: dict)dict[source]

YOLOX MixUp transform function.

Parameters

results (dict) – Result dict.

Returns

Updated result dict.

Return type

results (dict)

class mmyolo.datasets.transforms.YOLOv5CopyPaste(ioa_thresh: float = 0.3, prob: float = 0.5)[source]

Copy-Paste used in YOLOv5 and YOLOv8.

This transform randomly copy some objects in the image to the mirror position of the image.It is different from the CopyPaste in mmdet.

Required Keys:

  • img (np.uint8)

  • gt_bboxes (BaseBoxes[torch.float32])

  • gt_bboxes_labels (np.int64) (optional)

  • gt_ignore_flags (bool) (optional)

  • gt_masks (PolygonMasks) (optional)

Modified Keys:

  • img

  • gt_bboxes

  • gt_bboxes_labels (np.int64) (optional)

  • gt_ignore_flags (optional)

  • gt_masks (optional)

Parameters
  • ioa_thresh (float) – Ioa thresholds for deciding valid bbox.

  • prob (float) – Probability of choosing objects. Defaults to 0.5.

static bbox_ioa(gt_bboxes_flip: mmdet.structures.bbox.horizontal_boxes.HorizontalBoxes, gt_bboxes: mmdet.structures.bbox.horizontal_boxes.HorizontalBoxes, eps: float = 1e-07)numpy.ndarray[source]

Calculate ioa between gt_bboxes_flip and gt_bboxes.

Parameters
  • gt_bboxes_flip (HorizontalBoxes) – Flipped ground truth bounding boxes.

  • gt_bboxes (HorizontalBoxes) – Ground truth bounding boxes.

  • eps (float) – Default to 1e-10.

Returns

Ioa.

Return type

(Tensor)

class mmyolo.datasets.transforms.YOLOv5HSVRandomAug(hue_delta: Union[int, float] = 0.015, saturation_delta: Union[int, float] = 0.7, value_delta: Union[int, float] = 0.4)[source]

Apply HSV augmentation to image sequentially.

Required Keys:

  • img

Modified Keys:

  • img

Parameters
  • hue_delta ([int, float]) – delta of hue. Defaults to 0.015.

  • saturation_delta ([int, float]) – delta of saturation. Defaults to 0.7.

  • value_delta ([int, float]) – delta of value. Defaults to 0.4.

transform(results: dict)dict[source]

The HSV augmentation transform function.

Parameters

results (dict) – The result dict.

Returns

The result dict.

Return type

dict

class mmyolo.datasets.transforms.YOLOv5KeepRatioResize(scale: Union[int, Tuple[int, int]], keep_ratio: bool = True, **kwargs)[source]

Resize images & bbox(if existed).

This transform resizes the input image according to scale. Bboxes (if existed) are then resized with the same scale factor.

Required Keys:

  • img (np.uint8)

  • gt_bboxes (BaseBoxes[torch.float32]) (optional)

Modified Keys:

  • img (np.uint8)

  • img_shape (tuple)

  • gt_bboxes (optional)

  • scale (float)

Added Keys:

  • scale_factor (np.float32)

Parameters

scale (Union[int, Tuple[int, int]]) – Images scales for resizing.

class mmyolo.datasets.transforms.YOLOv5MixUp(alpha: float = 32.0, beta: float = 32.0, pre_transform: Optional[Sequence[dict]] = None, prob: float = 1.0, use_cached: bool = False, max_cached_images: int = 20, random_pop: bool = True, max_refetch: int = 15)[source]

MixUp data augmentation for YOLOv5.

The mixup transform steps are as follows:

  1. Another random image is picked by dataset.

  2. Randomly obtain the fusion ratio from the beta distribution,

    then fuse the target

of the original image and mixup image through this ratio.

Required Keys:

  • img

  • gt_bboxes (BaseBoxes[torch.float32]) (optional)

  • gt_bboxes_labels (np.int64) (optional)

  • gt_ignore_flags (bool) (optional)

  • mix_results (List[dict])

Modified Keys:

  • img

  • img_shape

  • gt_bboxes (optional)

  • gt_bboxes_labels (optional)

  • gt_ignore_flags (optional)

Parameters
  • alpha (float) – parameter of beta distribution to get mixup ratio. Defaults to 32.

  • beta (float) – parameter of beta distribution to get mixup ratio. Defaults to 32.

  • pre_transform (Sequence[dict]) – Sequence of transform object or config dict to be composed.

  • prob (float) – Probability of applying this transformation. Defaults to 1.0.

  • use_cached (bool) – Whether to use cache. Defaults to False.

  • max_cached_images (int) – The maximum length of the cache. The larger the cache, the stronger the randomness of this transform. As a rule of thumb, providing 10 caches for each image suffices for randomness. Defaults to 20.

  • random_pop (bool) – Whether to randomly pop a result from the cache when the cache is full. If set to False, use FIFO popping method. Defaults to True.

  • max_refetch (int) – The maximum number of iterations. If the number of iterations is greater than max_refetch, but gt_bbox is still empty, then the iteration is terminated. Defaults to 15.

get_indexes(dataset: Union[mmengine.dataset.base_dataset.BaseDataset, list])int[source]

Call function to collect indexes.

Parameters

dataset (Dataset or list) – The dataset or cached list.

Returns

indexes.

Return type

int

mix_img_transform(results: dict)dict[source]

YOLOv5 MixUp transform function.

Parameters

results (dict) – Result dict

Returns

Updated result dict.

Return type

results (dict)

class mmyolo.datasets.transforms.YOLOv5RandomAffine(max_rotate_degree: float = 10.0, max_translate_ratio: float = 0.1, scaling_ratio_range: Tuple[float, float] = (0.5, 1.5), max_shear_degree: float = 2.0, border: Tuple[int, int] = (0, 0), border_val: Tuple[int, int, int] = (114, 114, 114), bbox_clip_border: bool = True, min_bbox_size: int = 2, min_area_ratio: float = 0.1, use_mask_refine: bool = False, max_aspect_ratio: float = 20.0, resample_num: int = 1000)[source]

Random affine transform data augmentation in YOLOv5 and YOLOv8. It is different from the implementation in YOLOX.

This operation randomly generates affine transform matrix which including rotation, translation, shear and scaling transforms. If you set use_mask_refine == True, the code will use the masks annotation to refine the bbox. Our implementation is slightly different from the official. In COCO dataset, a gt may have multiple mask tags. The official YOLOv5 annotation file already combines the masks that an object has, but our code takes into account the fact that an object has multiple masks.

Required Keys:

  • img

  • gt_bboxes (BaseBoxes[torch.float32]) (optional)

  • gt_bboxes_labels (np.int64) (optional)

  • gt_ignore_flags (bool) (optional)

  • gt_masks (PolygonMasks) (optional)

Modified Keys:

  • img

  • img_shape

  • gt_bboxes (optional)

  • gt_bboxes_labels (optional)

  • gt_ignore_flags (optional)

  • gt_masks (PolygonMasks) (optional)

Parameters
  • max_rotate_degree (float) – Maximum degrees of rotation transform. Defaults to 10.

  • max_translate_ratio (float) – Maximum ratio of translation. Defaults to 0.1.

  • scaling_ratio_range (tuple[float]) – Min and max ratio of scaling transform. Defaults to (0.5, 1.5).

  • max_shear_degree (float) – Maximum degrees of shear transform. Defaults to 2.

  • border (tuple[int]) – Distance from width and height sides of input image to adjust output shape. Only used in mosaic dataset. Defaults to (0, 0).

  • border_val (tuple[int]) – Border padding values of 3 channels. Defaults to (114, 114, 114).

  • bbox_clip_border (bool, optional) – Whether to clip the objects outside the border of the image. In some dataset like MOT17, the gt bboxes are allowed to cross the border of images. Therefore, we don’t need to clip the gt bboxes in these cases. Defaults to True.

  • min_bbox_size (float) – Width and height threshold to filter bboxes. If the height or width of a box is smaller than this value, it will be removed. Defaults to 2.

  • min_area_ratio (float) – Threshold of area ratio between original bboxes and wrapped bboxes. If smaller than this value, the box will be removed. Defaults to 0.1.

  • use_mask_refine (bool) – Whether to refine bbox by mask. Deprecated.

  • max_aspect_ratio (float) – Aspect ratio of width and height threshold to filter bboxes. If max(h/w, w/h) larger than this value, the box will be removed. Defaults to 20.

  • resample_num (int) – Number of poly to resample to.

clip_polygons(gt_masks: mmdet.structures.mask.structures.PolygonMasks, height: int, width: int)mmdet.structures.mask.structures.PolygonMasks[source]

Function to clip points of polygons with height and width.

Parameters
  • gt_masks (PolygonMasks) – Annotations of instance segmentation.

  • height (int) – height of clip border.

  • width (int) – width of clip border.

Returns

Clip annotations of instance segmentation.

Return type

clipped_masks (PolygonMasks)

filter_gt_bboxes(origin_bboxes: mmdet.structures.bbox.horizontal_boxes.HorizontalBoxes, wrapped_bboxes: mmdet.structures.bbox.horizontal_boxes.HorizontalBoxes)torch.Tensor[source]

Filter gt bboxes.

Parameters
  • origin_bboxes (HorizontalBoxes) – Origin bboxes.

  • wrapped_bboxes (HorizontalBoxes) – Wrapped bboxes

Returns

The result dict.

Return type

dict

resample_masks(gt_masks: mmdet.structures.mask.structures.PolygonMasks)mmdet.structures.mask.structures.PolygonMasks[source]

Function to resample each mask annotation with shape (2 * n, ) to shape (resample_num * 2, ).

Parameters

gt_masks (PolygonMasks) – Annotations of semantic segmentation.

segment2box(gt_masks: mmdet.structures.mask.structures.PolygonMasks, height: int, width: int)mmdet.structures.bbox.horizontal_boxes.HorizontalBoxes[source]

Convert 1 segment label to 1 box label, applying inside-image constraint i.e. (xy1, xy2, …) to (xyxy) :param gt_masks: the segment label :type gt_masks: torch.Tensor :param width: the width of the image. Defaults to 640 :type width: int :param height: The height of the image. Defaults to 640 :type height: int

Returns

the clip bboxes from gt_masks.

Return type

HorizontalBoxes

warp_mask(gt_masks: mmdet.structures.mask.structures.PolygonMasks, warp_matrix: numpy.ndarray, img_w: int, img_h: int)mmdet.structures.mask.structures.PolygonMasks[source]

Warp masks by warp_matrix and retain masks inside image after warping.

Parameters
  • gt_masks (PolygonMasks) – Annotations of semantic segmentation.

  • warp_matrix (np.ndarray) – Affine transformation matrix. Shape: (3, 3).

  • img_w (int) – Width of output image.

  • img_h (int) – Height of output image.

Returns

Masks after warping.

Return type

PolygonMasks

static warp_poly(poly: numpy.ndarray, warp_matrix: numpy.ndarray, img_w: int, img_h: int)numpy.ndarray[source]

Function to warp one mask and filter points outside image.

Parameters
  • poly (np.ndarray) – Segmentation annotation with shape (n, ) and with format (x1, y1, x2, y2, …).

  • warp_matrix (np.ndarray) – Affine transformation matrix. Shape: (3, 3).

  • img_w (int) – Width of output image.

  • img_h (int) – Height of output image.

mmyolo.engine

hooks

optimizers

mmyolo.models

backbones

class mmyolo.models.backbones.BaseBackbone(arch_setting: list, deepen_factor: float = 1.0, widen_factor: float = 1.0, input_channels: int = 3, out_indices: Sequence[int] = (2, 3, 4), frozen_stages: int = - 1, plugins: Optional[Union[dict, List[dict]]] = None, norm_cfg: Optional[Union[mmengine.config.config.ConfigDict, dict]] = None, act_cfg: Optional[Union[mmengine.config.config.ConfigDict, dict]] = None, norm_eval: bool = False, init_cfg: Optional[Union[mmengine.config.config.ConfigDict, dict, List[Union[dict, mmengine.config.config.ConfigDict]]]] = None)[source]

BaseBackbone backbone used in YOLO series.

Backbone model structure diagram
+-----------+
|   input   |
+-----------+
      v
+-----------+
|   stem    |
|   layer   |
+-----------+
      v
+-----------+
|   stage   |
|  layer 1  |
+-----------+
      v
+-----------+
|   stage   |
|  layer 2  |
+-----------+
      v
    ......
      v
+-----------+
|   stage   |
|  layer n  |
+-----------+
In P5 model, n=4
In P6 model, n=5
Parameters
  • arch_setting (list) – Architecture of BaseBackbone.

  • plugins (list[dict]) –

    List of plugins for stages, each dict contains:

    • cfg (dict, required): Cfg dict to build plugin.

    • stages (tuple[bool], optional): Stages to apply plugin, length should be same as ‘num_stages’.

  • deepen_factor (float) – Depth multiplier, multiply number of blocks in CSP layer by this amount. Defaults to 1.0.

  • widen_factor (float) – Width multiplier, multiply number of channels in each layer by this amount. Defaults to 1.0.

  • input_channels – Number of input image channels. Defaults to 3.

  • out_indices (Sequence[int]) – Output from which stages. Defaults to (2, 3, 4).

  • frozen_stages (int) – Stages to be frozen (stop grad and set eval mode). -1 means not freezing any parameters. Defaults to -1.

  • norm_cfg (dict) – Dictionary to construct and config norm layer. Defaults to None.

  • act_cfg (dict) – Config dict for activation layer. Defaults to None.

  • norm_eval (bool) – Whether to set norm layers to eval mode, namely, freeze running stats (mean and var). Note: Effect on Batch Norm and its variants only. Defaults to False.

  • init_cfg (dict or list[dict], optional) – Initialization config dict. Defaults to None.

abstract build_stage_layer(stage_idx: int, setting: list)[source]

Build a stage layer.

Parameters
  • stage_idx (int) – The index of a stage layer.

  • setting (list) – The architecture setting of a stage layer.

abstract build_stem_layer()[source]

Build a stem layer.

forward(x: torch.Tensor)tuple[source]

Forward batch_inputs from the data_preprocessor.

make_stage_plugins(plugins, stage_idx, setting)[source]

Make plugins for backbone stage_idx th stage.

Currently we support to insert context_block, empirical_attention_block, nonlocal_block, dropout_block into the backbone.

An example of plugins format could be:

Examples

>>> plugins=[
...     dict(cfg=dict(type='xxx', arg1='xxx'),
...          stages=(False, True, True, True)),
...     dict(cfg=dict(type='yyy'),
...          stages=(True, True, True, True)),
... ]
>>> model = YOLOv5CSPDarknet()
>>> stage_plugins = model.make_stage_plugins(plugins, 0, setting)
>>> assert len(stage_plugins) == 1

Suppose stage_idx=0, the structure of blocks in the stage would be:

conv1 -> conv2 -> conv3 -> yyy

Suppose stage_idx=1, the structure of blocks in the stage would be:

conv1 -> conv2 -> conv3 -> xxx -> yyy
Parameters
  • plugins (list[dict]) – List of plugins cfg to build. The postfix is required if multiple same type plugins are inserted.

  • stage_idx (int) – Index of stage to build If stages is missing, the plugin would be applied to all stages.

  • setting (list) – The architecture setting of a stage layer.

Returns

Plugins for current stage

Return type

list[nn.Module]

train(mode: bool = True)[source]

Convert the model into training mode while keep normalization layer frozen.

class mmyolo.models.backbones.CSPNeXt(arch: str = 'P5', deepen_factor: float = 1.0, widen_factor: float = 1.0, input_channels: int = 3, out_indices: Sequence[int] = (2, 3, 4), frozen_stages: int = - 1, plugins: Optional[Union[dict, List[dict]]] = None, use_depthwise: bool = False, expand_ratio: float = 0.5, arch_ovewrite: Optional[dict] = None, channel_attention: bool = True, conv_cfg: Optional[Union[mmengine.config.config.ConfigDict, dict]] = None, norm_cfg: Union[mmengine.config.config.ConfigDict, dict] = {'type': 'BN'}, act_cfg: Union[mmengine.config.config.ConfigDict, dict] = {'inplace': True, 'type': 'SiLU'}, norm_eval: bool = False, init_cfg: Optional[Union[mmengine.config.config.ConfigDict, dict, List[Union[dict, mmengine.config.config.ConfigDict]]]] = {'a': 2.23606797749979, 'distribution': 'uniform', 'layer': 'Conv2d', 'mode': 'fan_in', 'nonlinearity': 'leaky_relu', 'type': 'Kaiming'})[source]

CSPNeXt backbone used in RTMDet.

Parameters
  • arch (str) – Architecture of CSPNeXt, from {P5, P6}. Defaults to P5.

  • deepen_factor (float) – Depth multiplier, multiply number of blocks in CSP layer by this amount. Defaults to 1.0.

  • widen_factor (float) – Width multiplier, multiply number of channels in each layer by this amount. Defaults to 1.0.

  • out_indices (Sequence[int]) – Output from which stages. Defaults to (2, 3, 4).

  • frozen_stages (int) – Stages to be frozen (stop grad and set eval mode). -1 means not freezing any parameters. Defaults to -1.

  • plugins (list[dict]) –

    List of plugins for stages, each dict contains: - cfg (dict, required): Cfg dict to build plugin.Defaults to - stages (tuple[bool], optional): Stages to apply plugin, length

    should be same as ‘num_stages’.

  • use_depthwise (bool) – Whether to use depthwise separable convolution. Defaults to False.

  • expand_ratio (float) – Ratio to adjust the number of channels of the hidden layer. Defaults to 0.5.

  • arch_ovewrite (list) – Overwrite default arch settings. Defaults to None.

  • channel_attention (bool) – Whether to add channel attention in each stage. Defaults to True.

  • conv_cfg (ConfigDict or dict, optional) – Config dict for convolution layer. Defaults to None.

  • norm_cfg (ConfigDict or dict) – Dictionary to construct and config norm layer. Defaults to dict(type=’BN’, requires_grad=True).

  • act_cfg (ConfigDict or dict) – Config dict for activation layer. Defaults to dict(type=’SiLU’, inplace=True).

  • norm_eval (bool) – Whether to set norm layers to eval mode, namely, freeze running stats (mean and var). Note: Effect on Batch Norm and its variants only.

:param init_cfg (ConfigDict or dict or list[dict] or: list[ConfigDict]): Initialization config dict.

build_stage_layer(stage_idx: int, setting: list)list[source]

Build a stage layer.

Parameters
  • stage_idx (int) – The index of a stage layer.

  • setting (list) – The architecture setting of a stage layer.

build_stem_layer()torch.nn.modules.module.Module[source]

Build a stem layer.

class mmyolo.models.backbones.PPYOLOECSPResNet(arch: str = 'P5', deepen_factor: float = 1.0, widen_factor: float = 1.0, input_channels: int = 3, out_indices: Tuple[int] = (2, 3, 4), frozen_stages: int = - 1, plugins: Optional[Union[dict, List[dict]]] = None, arch_ovewrite: Optional[dict] = None, block_cfg: Union[mmengine.config.config.ConfigDict, dict] = {'shortcut': True, 'type': 'PPYOLOEBasicBlock', 'use_alpha': True}, norm_cfg: Union[mmengine.config.config.ConfigDict, dict] = {'eps': 1e-05, 'momentum': 0.1, 'type': 'BN'}, act_cfg: Union[mmengine.config.config.ConfigDict, dict] = {'inplace': True, 'type': 'SiLU'}, attention_cfg: Union[mmengine.config.config.ConfigDict, dict] = {'act_cfg': {'type': 'HSigmoid'}, 'type': 'EffectiveSELayer'}, norm_eval: bool = False, init_cfg: Optional[Union[mmengine.config.config.ConfigDict, dict, List[Union[dict, mmengine.config.config.ConfigDict]]]] = None, use_large_stem: bool = False)[source]

CSP-ResNet backbone used in PPYOLOE.

Parameters
  • arch (str) – Architecture of CSPNeXt, from {P5, P6}. Defaults to P5.

  • deepen_factor (float) – Depth multiplier, multiply number of blocks in CSP layer by this amount. Defaults to 1.0.

  • widen_factor (float) – Width multiplier, multiply number of channels in each layer by this amount. Defaults to 1.0.

  • out_indices (Sequence[int]) – Output from which stages. Defaults to (2, 3, 4).

  • frozen_stages (int) – Stages to be frozen (stop grad and set eval mode). -1 means not freezing any parameters. Defaults to -1.

  • plugins (list[dict]) –

    List of plugins for stages, each dict contains: - cfg (dict, required): Cfg dict to build plugin. - stages (tuple[bool], optional): Stages to apply plugin, length

    should be same as ‘num_stages’.

  • arch_ovewrite (list) – Overwrite default arch settings. Defaults to None.

  • block_cfg (dict) – Config dict for block. Defaults to dict(type=’PPYOLOEBasicBlock’, shortcut=True, use_alpha=True)

  • norm_cfg (ConfigDict or dict) – Dictionary to construct and config norm layer. Defaults to dict(type=’BN’, momentum=0.1, eps=1e-5).

  • act_cfg (ConfigDict or dict) – Config dict for activation layer. Defaults to dict(type=’SiLU’, inplace=True).

  • attention_cfg (dict) – Config dict for EffectiveSELayer. Defaults to dict(type=’EffectiveSELayer’, act_cfg=dict(type=’HSigmoid’)).

  • norm_eval (bool) – Whether to set norm layers to eval mode, namely, freeze running stats (mean and var). Note: Effect on Batch Norm and its variants only.

:param init_cfg (ConfigDict or dict or list[dict] or: list[ConfigDict]): Initialization config dict. :param use_large_stem: Whether to use large stem layer.

Defaults to False.

build_stage_layer(stage_idx: int, setting: list)list[source]

Build a stage layer.

Parameters
  • stage_idx (int) – The index of a stage layer.

  • setting (list) – The architecture setting of a stage layer.

build_stem_layer()torch.nn.modules.module.Module[source]

Build a stem layer.

class mmyolo.models.backbones.YOLOXCSPDarknet(arch: str = 'P5', plugins: Optional[Union[dict, List[dict]]] = None, deepen_factor: float = 1.0, widen_factor: float = 1.0, input_channels: int = 3, out_indices: Tuple[int] = (2, 3, 4), frozen_stages: int = - 1, use_depthwise: bool = False, spp_kernal_sizes: Tuple[int] = (5, 9, 13), norm_cfg: Union[mmengine.config.config.ConfigDict, dict] = {'eps': 0.001, 'momentum': 0.03, 'type': 'BN'}, act_cfg: Union[mmengine.config.config.ConfigDict, dict] = {'inplace': True, 'type': 'SiLU'}, norm_eval: bool = False, init_cfg: Optional[Union[mmengine.config.config.ConfigDict, dict, List[Union[dict, mmengine.config.config.ConfigDict]]]] = None)[source]

CSP-Darknet backbone used in YOLOX.

Parameters
  • arch (str) – Architecture of CSP-Darknet, from {P5, P6}. Defaults to P5.

  • plugins (list[dict]) –

    List of plugins for stages, each dict contains:

    • cfg (dict, required): Cfg dict to build plugin.

    • stages (tuple[bool], optional): Stages to apply plugin, length should be same as ‘num_stages’.

  • deepen_factor (float) – Depth multiplier, multiply number of blocks in CSP layer by this amount. Defaults to 1.0.

  • widen_factor (float) – Width multiplier, multiply number of channels in each layer by this amount. Defaults to 1.0.

  • input_channels (int) – Number of input image channels. Defaults to 3.

  • out_indices (Tuple[int]) – Output from which stages. Defaults to (2, 3, 4).

  • frozen_stages (int) – Stages to be frozen (stop grad and set eval mode). -1 means not freezing any parameters. Defaults to -1.

  • use_depthwise (bool) – Whether to use depthwise separable convolution. Defaults to False.

  • spp_kernal_sizes – (tuple[int]): Sequential of kernel sizes of SPP layers. Defaults to (5, 9, 13).

  • norm_cfg (dict) – Dictionary to construct and config norm layer. Defaults to dict(type=’BN’, momentum=0.03, eps=0.001).

  • act_cfg (dict) – Config dict for activation layer. Defaults to dict(type=’SiLU’, inplace=True).

  • norm_eval (bool) – Whether to set norm layers to eval mode, namely, freeze running stats (mean and var). Note: Effect on Batch Norm and its variants only.

  • init_cfg (Union[dict,list[dict]], optional) – Initialization config dict. Defaults to None.

Example

>>> from mmyolo.models import YOLOXCSPDarknet
>>> import torch
>>> model = YOLOXCSPDarknet()
>>> model.eval()
>>> inputs = torch.rand(1, 3, 416, 416)
>>> level_outputs = model(inputs)
>>> for level_out in level_outputs:
...     print(tuple(level_out.shape))
...
(1, 256, 52, 52)
(1, 512, 26, 26)
(1, 1024, 13, 13)
build_stage_layer(stage_idx: int, setting: list)list[source]

Build a stage layer.

Parameters
  • stage_idx (int) – The index of a stage layer.

  • setting (list) – The architecture setting of a stage layer.

build_stem_layer()torch.nn.modules.module.Module[source]

Build a stem layer.

class mmyolo.models.backbones.YOLOv5CSPDarknet(arch: str = 'P5', plugins: Optional[Union[dict, List[dict]]] = None, deepen_factor: float = 1.0, widen_factor: float = 1.0, input_channels: int = 3, out_indices: Tuple[int] = (2, 3, 4), frozen_stages: int = - 1, norm_cfg: Union[mmengine.config.config.ConfigDict, dict] = {'eps': 0.001, 'momentum': 0.03, 'type': 'BN'}, act_cfg: Union[mmengine.config.config.ConfigDict, dict] = {'inplace': True, 'type': 'SiLU'}, norm_eval: bool = False, init_cfg: Optional[Union[mmengine.config.config.ConfigDict, dict, List[Union[dict, mmengine.config.config.ConfigDict]]]] = None)[source]

CSP-Darknet backbone used in YOLOv5. :param arch: Architecture of CSP-Darknet, from {P5, P6}.

Defaults to P5.

Parameters
  • plugins (list[dict]) –

    List of plugins for stages, each dict contains: - cfg (dict, required): Cfg dict to build plugin. - stages (tuple[bool], optional): Stages to apply plugin, length

    should be same as ‘num_stages’.

  • deepen_factor (float) – Depth multiplier, multiply number of blocks in CSP layer by this amount. Defaults to 1.0.

  • widen_factor (float) – Width multiplier, multiply number of channels in each layer by this amount. Defaults to 1.0.

  • input_channels (int) – Number of input image channels. Defaults to: 3.

  • out_indices (Tuple[int]) – Output from which stages. Defaults to (2, 3, 4).

  • frozen_stages (int) – Stages to be frozen (stop grad and set eval mode). -1 means not freezing any parameters. Defaults to -1.

  • norm_cfg (dict) – Dictionary to construct and config norm layer. Defaults to dict(type=’BN’, requires_grad=True).

  • act_cfg (dict) – Config dict for activation layer. Defaults to dict(type=’SiLU’, inplace=True).

  • norm_eval (bool) – Whether to set norm layers to eval mode, namely, freeze running stats (mean and var). Note: Effect on Batch Norm and its variants only. Defaults to False.

  • init_cfg (Union[dict,list[dict]], optional) – Initialization config dict. Defaults to None.

Example

>>> from mmyolo.models import YOLOv5CSPDarknet
>>> import torch
>>> model = YOLOv5CSPDarknet()
>>> model.eval()
>>> inputs = torch.rand(1, 3, 416, 416)
>>> level_outputs = model(inputs)
>>> for level_out in level_outputs:
...     print(tuple(level_out.shape))
...
(1, 256, 52, 52)
(1, 512, 26, 26)
(1, 1024, 13, 13)
build_stage_layer(stage_idx: int, setting: list)list[source]

Build a stage layer.

Parameters
  • stage_idx (int) – The index of a stage layer.

  • setting (list) – The architecture setting of a stage layer.

build_stem_layer()torch.nn.modules.module.Module[source]

Build a stem layer.

init_weights()[source]

Initialize the parameters.

class mmyolo.models.backbones.YOLOv6CSPBep(arch: str = 'P5', plugins: Optional[Union[dict, List[dict]]] = None, deepen_factor: float = 1.0, widen_factor: float = 1.0, input_channels: int = 3, hidden_ratio: float = 0.5, out_indices: Tuple[int] = (2, 3, 4), frozen_stages: int = - 1, use_cspsppf: bool = False, norm_cfg: Union[mmengine.config.config.ConfigDict, dict] = {'eps': 0.001, 'momentum': 0.03, 'type': 'BN'}, act_cfg: Union[mmengine.config.config.ConfigDict, dict] = {'inplace': True, 'type': 'SiLU'}, norm_eval: bool = False, block_cfg: Union[mmengine.config.config.ConfigDict, dict] = {'type': 'ConvWrapper'}, init_cfg: Optional[Union[mmengine.config.config.ConfigDict, dict, List[Union[dict, mmengine.config.config.ConfigDict]]]] = None)[source]

CSPBep backbone used in YOLOv6. :param arch: Architecture of BaseDarknet, from {P5, P6}.

Defaults to P5.

Parameters
  • plugins (list[dict]) –

    List of plugins for stages, each dict contains: - cfg (dict, required): Cfg dict to build plugin. - stages (tuple[bool], optional): Stages to apply plugin, length

    should be same as ‘num_stages’.

  • deepen_factor (float) – Depth multiplier, multiply number of blocks in CSP layer by this amount. Defaults to 1.0.

  • widen_factor (float) – Width multiplier, multiply number of channels in each layer by this amount. Defaults to 1.0.

  • input_channels (int) – Number of input image channels. Defaults to 3.

  • out_indices (Tuple[int]) – Output from which stages. Defaults to (2, 3, 4).

  • frozen_stages (int) – Stages to be frozen (stop grad and set eval mode). -1 means not freezing any parameters. Defaults to -1.

  • norm_cfg (dict) – Dictionary to construct and config norm layer. Defaults to dict(type=’BN’, requires_grad=True).

  • act_cfg (dict) – Config dict for activation layer. Defaults to dict(type=’LeakyReLU’, negative_slope=0.1).

  • norm_eval (bool) – Whether to set norm layers to eval mode, namely, freeze running stats (mean and var). Note: Effect on Batch Norm and its variants only. Defaults to False.

  • block_cfg (dict) – Config dict for the block used to build each layer. Defaults to dict(type=’RepVGGBlock’).

  • block_act_cfg (dict) – Config dict for activation layer used in each stage. Defaults to dict(type=’SiLU’, inplace=True).

  • init_cfg (Union[dict, list[dict]], optional) – Initialization config dict. Defaults to None.

Example

>>> from mmyolo.models import YOLOv6CSPBep
>>> import torch
>>> model = YOLOv6CSPBep()
>>> model.eval()
>>> inputs = torch.rand(1, 3, 416, 416)
>>> level_outputs = model(inputs)
>>> for level_out in level_outputs:
...     print(tuple(level_out.shape))
...
(1, 256, 52, 52)
(1, 512, 26, 26)
(1, 1024, 13, 13)
build_stage_layer(stage_idx: int, setting: list)list[source]

Build a stage layer.

Parameters
  • stage_idx (int) – The index of a stage layer.

  • setting (list) – The architecture setting of a stage layer.

class mmyolo.models.backbones.YOLOv6EfficientRep(arch: str = 'P5', plugins: Optional[Union[dict, List[dict]]] = None, deepen_factor: float = 1.0, widen_factor: float = 1.0, input_channels: int = 3, out_indices: Tuple[int] = (2, 3, 4), frozen_stages: int = - 1, use_cspsppf: bool = False, norm_cfg: Union[mmengine.config.config.ConfigDict, dict] = {'eps': 0.001, 'momentum': 0.03, 'type': 'BN'}, act_cfg: Union[mmengine.config.config.ConfigDict, dict] = {'inplace': True, 'type': 'ReLU'}, norm_eval: bool = False, block_cfg: Union[mmengine.config.config.ConfigDict, dict] = {'type': 'RepVGGBlock'}, init_cfg: Optional[Union[mmengine.config.config.ConfigDict, dict, List[Union[dict, mmengine.config.config.ConfigDict]]]] = None)[source]

EfficientRep backbone used in YOLOv6. :param arch: Architecture of BaseDarknet, from {P5, P6}.

Defaults to P5.

Parameters
  • plugins (list[dict]) –

    List of plugins for stages, each dict contains: - cfg (dict, required): Cfg dict to build plugin. - stages (tuple[bool], optional): Stages to apply plugin, length

    should be same as ‘num_stages’.

  • deepen_factor (float) – Depth multiplier, multiply number of blocks in CSP layer by this amount. Defaults to 1.0.

  • widen_factor (float) – Width multiplier, multiply number of channels in each layer by this amount. Defaults to 1.0.

  • input_channels (int) – Number of input image channels. Defaults to 3.

  • out_indices (Tuple[int]) – Output from which stages. Defaults to (2, 3, 4).

  • frozen_stages (int) – Stages to be frozen (stop grad and set eval mode). -1 means not freezing any parameters. Defaults to -1.

  • norm_cfg (dict) – Dictionary to construct and config norm layer. Defaults to dict(type=’BN’, requires_grad=True).

  • act_cfg (dict) – Config dict for activation layer. Defaults to dict(type=’LeakyReLU’, negative_slope=0.1).

  • norm_eval (bool) – Whether to set norm layers to eval mode, namely, freeze running stats (mean and var). Note: Effect on Batch Norm and its variants only. Defaults to False.

  • block_cfg (dict) – Config dict for the block used to build each layer. Defaults to dict(type=’RepVGGBlock’).

  • init_cfg (Union[dict, list[dict]], optional) – Initialization config dict. Defaults to None.

Example

>>> from mmyolo.models import YOLOv6EfficientRep
>>> import torch
>>> model = YOLOv6EfficientRep()
>>> model.eval()
>>> inputs = torch.rand(1, 3, 416, 416)
>>> level_outputs = model(inputs)
>>> for level_out in level_outputs:
...     print(tuple(level_out.shape))
...
(1, 256, 52, 52)
(1, 512, 26, 26)
(1, 1024, 13, 13)
build_stage_layer(stage_idx: int, setting: list)list[source]

Build a stage layer.

Parameters
  • stage_idx (int) – The index of a stage layer.

  • setting (list) – The architecture setting of a stage layer.

build_stem_layer()torch.nn.modules.module.Module[source]

Build a stem layer.

init_weights()[source]

Initialize the weights.

class mmyolo.models.backbones.YOLOv7Backbone(arch: str = 'L', deepen_factor: float = 1.0, widen_factor: float = 1.0, input_channels: int = 3, out_indices: Tuple[int] = (2, 3, 4), frozen_stages: int = - 1, plugins: Optional[Union[dict, List[dict]]] = None, norm_cfg: Union[mmengine.config.config.ConfigDict, dict] = {'eps': 0.001, 'momentum': 0.03, 'type': 'BN'}, act_cfg: Union[mmengine.config.config.ConfigDict, dict] = {'inplace': True, 'type': 'SiLU'}, norm_eval: bool = False, init_cfg: Optional[Union[mmengine.config.config.ConfigDict, dict, List[Union[dict, mmengine.config.config.ConfigDict]]]] = None)[source]

Backbone used in YOLOv7.

Parameters
  • arch (str) – Architecture of YOLOv7Defaults to L.

  • deepen_factor (float) – Depth multiplier, multiply number of blocks in CSP layer by this amount. Defaults to 1.0.

  • widen_factor (float) – Width multiplier, multiply number of channels in each layer by this amount. Defaults to 1.0.

  • out_indices (Sequence[int]) – Output from which stages. Defaults to (2, 3, 4).

  • frozen_stages (int) – Stages to be frozen (stop grad and set eval mode). -1 means not freezing any parameters. Defaults to -1.

  • plugins (list[dict]) –

    List of plugins for stages, each dict contains:

    • cfg (dict, required): Cfg dict to build plugin.

    • stages (tuple[bool], optional): Stages to apply plugin, length should be same as ‘num_stages’.

  • norm_cfg (ConfigDict or dict) – Dictionary to construct and config norm layer. Defaults to dict(type=’BN’, requires_grad=True).

  • act_cfg (ConfigDict or dict) – Config dict for activation layer. Defaults to dict(type=’SiLU’, inplace=True).

  • norm_eval (bool) – Whether to set norm layers to eval mode, namely, freeze running stats (mean and var). Note: Effect on Batch Norm and its variants only.

:param init_cfg (ConfigDict or dict or list[dict] or: list[ConfigDict]): Initialization config dict.

build_stage_layer(stage_idx: int, setting: list)list[source]

Build a stage layer.

Parameters
  • stage_idx (int) – The index of a stage layer.

  • setting (list) – The architecture setting of a stage layer.

build_stem_layer()torch.nn.modules.module.Module[source]

Build a stem layer.

class mmyolo.models.backbones.YOLOv8CSPDarknet(arch: str = 'P5', last_stage_out_channels: int = 1024, plugins: Optional[Union[dict, List[dict]]] = None, deepen_factor: float = 1.0, widen_factor: float = 1.0, input_channels: int = 3, out_indices: Tuple[int] = (2, 3, 4), frozen_stages: int = - 1, norm_cfg: Union[mmengine.config.config.ConfigDict, dict] = {'eps': 0.001, 'momentum': 0.03, 'type': 'BN'}, act_cfg: Union[mmengine.config.config.ConfigDict, dict] = {'inplace': True, 'type': 'SiLU'}, norm_eval: bool = False, init_cfg: Optional[Union[mmengine.config.config.ConfigDict, dict, List[Union[dict, mmengine.config.config.ConfigDict]]]] = None)[source]

CSP-Darknet backbone used in YOLOv8.

Parameters
  • arch (str) – Architecture of CSP-Darknet, from {P5}. Defaults to P5.

  • last_stage_out_channels (int) – Final layer output channel. Defaults to 1024.

  • plugins (list[dict]) –

    List of plugins for stages, each dict contains: - cfg (dict, required): Cfg dict to build plugin. - stages (tuple[bool], optional): Stages to apply plugin, length

    should be same as ‘num_stages’.

  • deepen_factor (float) – Depth multiplier, multiply number of blocks in CSP layer by this amount. Defaults to 1.0.

  • widen_factor (float) – Width multiplier, multiply number of channels in each layer by this amount. Defaults to 1.0.

  • input_channels (int) – Number of input image channels. Defaults to: 3.

  • out_indices (Tuple[int]) – Output from which stages. Defaults to (2, 3, 4).

  • frozen_stages (int) – Stages to be frozen (stop grad and set eval mode). -1 means not freezing any parameters. Defaults to -1.

  • norm_cfg (dict) – Dictionary to construct and config norm layer. Defaults to dict(type=’BN’, requires_grad=True).

  • act_cfg (dict) – Config dict for activation layer. Defaults to dict(type=’SiLU’, inplace=True).

  • norm_eval (bool) – Whether to set norm layers to eval mode, namely, freeze running stats (mean and var). Note: Effect on Batch Norm and its variants only. Defaults to False.

  • init_cfg (Union[dict,list[dict]], optional) – Initialization config dict. Defaults to None.

Example

>>> from mmyolo.models import YOLOv8CSPDarknet
>>> import torch
>>> model = YOLOv8CSPDarknet()
>>> model.eval()
>>> inputs = torch.rand(1, 3, 416, 416)
>>> level_outputs = model(inputs)
>>> for level_out in level_outputs:
...     print(tuple(level_out.shape))
...
(1, 256, 52, 52)
(1, 512, 26, 26)
(1, 1024, 13, 13)
build_stage_layer(stage_idx: int, setting: list)list[source]

Build a stage layer.

Parameters
  • stage_idx (int) – The index of a stage layer.

  • setting (list) – The architecture setting of a stage layer.

build_stem_layer()torch.nn.modules.module.Module[source]

Build a stem layer.

init_weights()[source]

Initialize the parameters.

data_preprocessor

dense_heads

class mmyolo.models.dense_heads.PPYOLOEHead(head_module: Union[mmengine.config.config.ConfigDict, dict], prior_generator: Union[mmengine.config.config.ConfigDict, dict] = {'offset': 0.5, 'strides': [8, 16, 32], 'type': 'mmdet.MlvlPointGenerator'}, bbox_coder: Union[mmengine.config.config.ConfigDict, dict] = {'type': 'DistancePointBBoxCoder'}, loss_cls: Union[mmengine.config.config.ConfigDict, dict] = {'alpha': 0.75, 'gamma': 2.0, 'iou_weighted': True, 'loss_weight': 1.0, 'reduction': 'sum', 'type': 'mmdet.VarifocalLoss', 'use_sigmoid': True}, loss_bbox: Union[mmengine.config.config.ConfigDict, dict] = {'bbox_format': 'xyxy', 'iou_mode': 'giou', 'loss_weight': 2.5, 'reduction': 'mean', 'return_iou': False, 'type': 'IoULoss'}, loss_dfl: Union[mmengine.config.config.ConfigDict, dict] = {'loss_weight': 0.125, 'reduction': 'mean', 'type': 'mmdet.DistributionFocalLoss'}, train_cfg: Optional[Union[mmengine.config.config.ConfigDict, dict]] = None, test_cfg: Optional[Union[mmengine.config.config.ConfigDict, dict]] = None, init_cfg: Optional[Union[mmengine.config.config.ConfigDict, dict, List[Union[dict, mmengine.config.config.ConfigDict]]]] = None)[source]

PPYOLOEHead head used in PPYOLOE. The YOLOv6 head and the PPYOLOE head are only slightly different. Distribution focal loss is extra used in PPYOLOE, but not in YOLOv6.

Parameters
  • head_module (ConfigType) – Base module used for YOLOv5Head

  • prior_generator (dict) – Points generator feature maps in 2D points-based detectors.

  • bbox_coder (ConfigDict or dict) – Config of bbox coder.

  • loss_cls (ConfigDict or dict) – Config of classification loss.

  • loss_bbox (ConfigDict or dict) – Config of localization loss.

  • loss_dfl (ConfigDict or dict) – Config of distribution focal loss.

  • train_cfg (ConfigDict or dict, optional) – Training config of anchor head. Defaults to None.

  • test_cfg (ConfigDict or dict, optional) – Testing config of anchor head. Defaults to None.

:param init_cfg (ConfigDict or list[ConfigDict] or dict or: list[dict], optional): Initialization config dict.

Defaults to None.

loss_by_feat(cls_scores: Sequence[torch.Tensor], bbox_preds: Sequence[torch.Tensor], bbox_dist_preds: Sequence[torch.Tensor], batch_gt_instances: Sequence[mmengine.structures.instance_data.InstanceData], batch_img_metas: Sequence[dict], batch_gt_instances_ignore: Optional[List[mmengine.structures.instance_data.InstanceData]] = None)dict[source]

Calculate the loss based on the features extracted by the detection head.

Parameters
  • cls_scores (Sequence[Tensor]) – Box scores for each scale level, each is a 4D-tensor, the channel number is num_priors * num_classes.

  • bbox_preds (Sequence[Tensor]) – Box energies / deltas for each scale level, each is a 4D-tensor, the channel number is num_priors * 4.

  • bbox_dist_preds (Sequence[Tensor]) – Box distribution logits for each scale level with shape (bs, reg_max + 1, H*W, 4).

  • batch_gt_instances (list[InstanceData]) – Batch of gt_instance. It usually includes bboxes and labels attributes.

  • batch_img_metas (list[dict]) – Meta information of each image, e.g., image size, scaling factor, etc.

  • batch_gt_instances_ignore (list[InstanceData], optional) – Batch of gt_instances_ignore. It includes bboxes attribute data that is ignored during training and testing. Defaults to None.

Returns

A dictionary of losses.

Return type

dict[str, Tensor]

class mmyolo.models.dense_heads.PPYOLOEHeadModule(num_classes: int, in_channels: Union[int, Sequence], widen_factor: float = 1.0, num_base_priors: int = 1, featmap_strides: Sequence[int] = (8, 16, 32), reg_max: int = 16, norm_cfg: Union[mmengine.config.config.ConfigDict, dict] = {'eps': 1e-05, 'momentum': 0.1, 'type': 'BN'}, act_cfg: Union[mmengine.config.config.ConfigDict, dict] = {'inplace': True, 'type': 'SiLU'}, init_cfg: Optional[Union[mmengine.config.config.ConfigDict, dict, List[Union[dict, mmengine.config.config.ConfigDict]]]] = None)[source]

PPYOLOEHead head module used in `PPYOLOE.

<https://arxiv.org/abs/2203.16250>`_.

Parameters
  • num_classes (int) – Number of categories excluding the background category.

  • in_channels (int) – Number of channels in the input feature map.

  • widen_factor (float) – Width multiplier, multiply number of channels in each layer by this amount. Defaults to 1.0.

  • num_base_priors (int) – The number of priors (points) at a point on the feature grid.

  • featmap_strides (Sequence[int]) – Downsample factor of each feature map. Defaults to (8, 16, 32).

  • reg_max (int) – Max value of integral set :math: {0, ..., reg_max} in QFL setting. Defaults to 16.

  • norm_cfg (dict) – Config dict for normalization layer. Defaults to dict(type=’BN’, momentum=0.03, eps=0.001).

  • act_cfg (dict) – Config dict for activation layer. Defaults to dict(type=’SiLU’, inplace=True).

  • init_cfg (dict or list[dict], optional) – Initialization config dict. Defaults to None.

forward(x: Tuple[torch.Tensor])torch.Tensor[source]

Forward features from the upstream network.

Parameters

x (Tuple[Tensor]) – Features from the upstream network, each is a 4D-tensor.

Returns

A tuple of multi-level classification scores, bbox predictions.

Return type

Tuple[List]

forward_single(x: torch.Tensor, cls_stem: torch.nn.modules.container.ModuleList, cls_pred: torch.nn.modules.container.ModuleList, reg_stem: torch.nn.modules.container.ModuleList, reg_pred: torch.nn.modules.container.ModuleList)torch.Tensor[source]

Forward feature of a single scale level.

init_weights(prior_prob=0.01)[source]

Initialize the weight and bias of PPYOLOE head.

class mmyolo.models.dense_heads.RTMDetHead(head_module: Union[mmengine.config.config.ConfigDict, dict], prior_generator: Union[mmengine.config.config.ConfigDict, dict] = {'offset': 0, 'strides': [8, 16, 32], 'type': 'mmdet.MlvlPointGenerator'}, bbox_coder: Union[mmengine.config.config.ConfigDict, dict] = {'type': 'DistancePointBBoxCoder'}, loss_cls: Union[mmengine.config.config.ConfigDict, dict] = {'beta': 2.0, 'loss_weight': 1.0, 'type': 'mmdet.QualityFocalLoss', 'use_sigmoid': True}, loss_bbox: Union[mmengine.config.config.ConfigDict, dict] = {'loss_weight': 2.0, 'type': 'mmdet.GIoULoss'}, train_cfg: Optional[Union[mmengine.config.config.ConfigDict, dict]] = None, test_cfg: Optional[Union[mmengine.config.config.ConfigDict, dict]] = None, init_cfg: Optional[Union[mmengine.config.config.ConfigDict, dict, List[Union[dict, mmengine.config.config.ConfigDict]]]] = None)[source]

RTMDet head.

Parameters
  • head_module (ConfigType) – Base module used for RTMDetHead

  • prior_generator – Points generator feature maps in 2D points-based detectors.

  • bbox_coder (ConfigDict or dict) – Config of bbox coder.

  • loss_cls (ConfigDict or dict) – Config of classification loss.

  • loss_bbox (ConfigDict or dict) – Config of localization loss.

  • train_cfg (ConfigDict or dict, optional) – Training config of anchor head. Defaults to None.

  • test_cfg (ConfigDict or dict, optional) – Testing config of anchor head. Defaults to None.

:param init_cfg (ConfigDict or list[ConfigDict] or dict or: list[dict], optional): Initialization config dict.

Defaults to None.

forward(x: Tuple[torch.Tensor])Tuple[List][source]

Forward features from the upstream network.

Parameters

x (Tuple[Tensor]) – Features from the upstream network, each is a 4D-tensor.

Returns

A tuple of multi-level classification scores, bbox predictions, and objectnesses.

Return type

Tuple[List]

loss_by_feat(cls_scores: List[torch.Tensor], bbox_preds: List[torch.Tensor], batch_gt_instances: List[mmengine.structures.instance_data.InstanceData], batch_img_metas: List[dict], batch_gt_instances_ignore: Optional[List[mmengine.structures.instance_data.InstanceData]] = None)dict[source]

Compute losses of the head.

Parameters
  • cls_scores (list[Tensor]) – Box scores for each scale level Has shape (N, num_anchors * num_classes, H, W)

  • bbox_preds (list[Tensor]) – Decoded box for each scale level with shape (N, num_anchors * 4, H, W) in [tl_x, tl_y, br_x, br_y] format.

  • batch_gt_instances (list[InstanceData]) – Batch of gt_instance. It usually includes bboxes and labels attributes.

  • batch_img_metas (list[dict]) – Meta information of each image, e.g., image size, scaling factor, etc.

  • batch_gt_instances_ignore (list[InstanceData], Optional) – Batch of gt_instances_ignore. It includes bboxes attribute data that is ignored during training and testing. Defaults to None.

Returns

A dictionary of loss components.

Return type

dict[str, Tensor]

special_init()[source]

Since YOLO series algorithms will inherit from YOLOv5Head, but different algorithms have special initialization process.

The special_init function is designed to deal with this situation.

class mmyolo.models.dense_heads.RTMDetInsSepBNHead(head_module: Union[mmengine.config.config.ConfigDict, dict], prior_generator: Union[mmengine.config.config.ConfigDict, dict] = {'offset': 0, 'strides': [8, 16, 32], 'type': 'mmdet.MlvlPointGenerator'}, bbox_coder: Union[mmengine.config.config.ConfigDict, dict] = {'type': 'DistancePointBBoxCoder'}, loss_cls: Union[mmengine.config.config.ConfigDict, dict] = {'beta': 2.0, 'loss_weight': 1.0, 'type': 'mmdet.QualityFocalLoss', 'use_sigmoid': True}, loss_bbox: Union[mmengine.config.config.ConfigDict, dict] = {'loss_weight': 2.0, 'type': 'mmdet.GIoULoss'}, loss_mask={'eps': 5e-06, 'loss_weight': 2.0, 'reduction': 'mean', 'type': 'mmdet.DiceLoss'}, train_cfg: Optional[Union[mmengine.config.config.ConfigDict, dict]] = None, test_cfg: Optional[Union[mmengine.config.config.ConfigDict, dict]] = None, init_cfg: Optional[Union[mmengine.config.config.ConfigDict, dict, List[Union[dict, mmengine.config.config.ConfigDict]]]] = None)[source]

RTMDet Instance Segmentation head.

Parameters
  • head_module (ConfigType) – Base module used for RTMDetInsSepBNHead

  • prior_generator – Points generator feature maps in 2D points-based detectors.

  • bbox_coder (ConfigDict or dict) – Config of bbox coder.

  • loss_cls (ConfigDict or dict) – Config of classification loss.

  • loss_bbox (ConfigDict or dict) – Config of localization loss.

  • loss_mask (ConfigDict or dict) – Config of mask loss.

  • train_cfg (ConfigDict or dict, optional) – Training config of anchor head. Defaults to None.

  • test_cfg (ConfigDict or dict, optional) – Testing config of anchor head. Defaults to None.

:param init_cfg (ConfigDict or list[ConfigDict] or dict or: list[dict], optional): Initialization config dict.

Defaults to None.

loss_by_feat(cls_scores: List[torch.Tensor], bbox_preds: List[torch.Tensor], batch_gt_instances: List[mmengine.structures.instance_data.InstanceData], batch_img_metas: List[dict], batch_gt_instances_ignore: Optional[List[mmengine.structures.instance_data.InstanceData]] = None)dict[source]

Compute losses of the head.

Parameters
  • cls_scores (list[Tensor]) – Box scores for each scale level Has shape (N, num_anchors * num_classes, H, W)

  • bbox_preds (list[Tensor]) – Decoded box for each scale level with shape (N, num_anchors * 4, H, W) in [tl_x, tl_y, br_x, br_y] format.

  • batch_gt_instances (list[InstanceData]) – Batch of gt_instance. It usually includes bboxes and labels attributes.

  • batch_img_metas (list[dict]) – Meta information of each image, e.g., image size, scaling factor, etc.

  • batch_gt_instances_ignore (list[InstanceData], Optional) – Batch of gt_instances_ignore. It includes bboxes attribute data that is ignored during training and testing. Defaults to None.

Returns

A dictionary of loss components.

Return type

dict[str, Tensor]

parse_dynamic_params(flatten_kernels: torch.Tensor)tuple[source]

split kernel head prediction to conv weight and bias.

predict_by_feat(cls_scores: List[torch.Tensor], bbox_preds: List[torch.Tensor], kernel_preds: List[torch.Tensor], mask_feats: torch.Tensor, score_factors: Optional[List[torch.Tensor]] = None, batch_img_metas: Optional[List[dict]] = None, cfg: Optional[mmengine.config.config.ConfigDict] = None, rescale: bool = True, with_nms: bool = True)List[mmengine.structures.instance_data.InstanceData][source]

Transform a batch of output features extracted from the head into bbox results.

Note: When score_factors is not None, the cls_scores are usually multiplied by it then obtain the real score used in NMS.

Parameters
  • cls_scores (list[Tensor]) – Classification scores for all scale levels, each is a 4D-tensor, has shape (batch_size, num_priors * num_classes, H, W).

  • bbox_preds (list[Tensor]) – Box energies / deltas for all scale levels, each is a 4D-tensor, has shape (batch_size, num_priors * 4, H, W).

  • kernel_preds (list[Tensor]) – Kernel predictions of dynamic convs for all scale levels, each is a 4D-tensor, has shape (batch_size, num_params, H, W).

  • mask_feats (Tensor) – Mask prototype features extracted from the mask head, has shape (batch_size, num_prototypes, H, W).

  • score_factors (list[Tensor], optional) – Score factor for all scale level, each is a 4D-tensor, has shape (batch_size, num_priors * 1, H, W). Defaults to None.

  • batch_img_metas (list[dict], Optional) – Batch image meta info. Defaults to None.

  • cfg (ConfigDict, optional) – Test / postprocessing configuration, if None, test_cfg would be used. Defaults to None.

  • rescale (bool) – If True, return boxes in original image space. Defaults to False.

  • with_nms (bool) – If True, do nms before return boxes. Defaults to True.

Returns

Object detection and instance segmentation results of each image after the post process. Each item usually contains following keys.

  • scores (Tensor): Classification scores, has a shape (num_instance, )

  • labels (Tensor): Labels of bboxes, has a shape (num_instances, ).

  • bboxes (Tensor): Has a shape (num_instances, 4), the last dimension 4 arrange as (x1, y1, x2, y2).

  • masks (Tensor): Has a shape (num_instances, h, w).

Return type

list[InstanceData]

class mmyolo.models.dense_heads.RTMDetInsSepBNHeadModule(num_classes: int, *args, num_prototypes: int = 8, dyconv_channels: int = 8, num_dyconvs: int = 3, use_sigmoid_cls: bool = True, **kwargs)[source]

Detection and Instance Segmentation Head of RTMDet.

Parameters
  • num_classes (int) – Number of categories excluding the background category.

  • num_prototypes (int) – Number of mask prototype features extracted from the mask head. Defaults to 8.

  • dyconv_channels (int) – Channel of the dynamic conv layers. Defaults to 8.

  • num_dyconvs (int) – Number of the dynamic convolution layers. Defaults to 3.

  • use_sigmoid_cls (bool) – Use sigmoid for class prediction. Defaults to True.

forward(feats: Tuple[torch.Tensor, ...])tuple[source]

Forward features from the upstream network.

Parameters

feats (tuple[Tensor]) – Features from the upstream network, each is a 4D-tensor.

Returns

Usually a tuple of classification scores and bbox prediction - cls_scores (list[Tensor]): Classification scores for all scale

levels, each is a 4D-tensor, the channels number is num_base_priors * num_classes.

  • bbox_preds (list[Tensor]): Box energies / deltas for all scale levels, each is a 4D-tensor, the channels number is num_base_priors * 4.

  • kernel_preds (list[Tensor]): Dynamic conv kernels for all scale levels, each is a 4D-tensor, the channels number is num_gen_params.

  • mask_feat (Tensor): Mask prototype features.

    Has shape (batch_size, num_prototypes, H, W).

Return type

tuple

init_weights()None[source]

Initialize weights of the head.

class mmyolo.models.dense_heads.RTMDetRotatedHead(head_module: Union[mmengine.config.config.ConfigDict, dict], prior_generator: Union[mmengine.config.config.ConfigDict, dict] = {'offset': 0, 'strides': [8, 16, 32], 'type': 'mmdet.MlvlPointGenerator'}, bbox_coder: Union[mmengine.config.config.ConfigDict, dict] = {'type': 'DistanceAnglePointCoder'}, loss_cls: Union[mmengine.config.config.ConfigDict, dict] = {'beta': 2.0, 'loss_weight': 1.0, 'type': 'mmdet.QualityFocalLoss', 'use_sigmoid': True}, loss_bbox: Union[mmengine.config.config.ConfigDict, dict] = {'loss_weight': 2.0, 'mode': 'linear', 'type': 'mmrotate.RotatedIoULoss'}, angle_version: str = 'le90', use_hbbox_loss: bool = False, angle_coder: Union[mmengine.config.config.ConfigDict, dict] = {'type': 'mmrotate.PseudoAngleCoder'}, loss_angle: Optional[Union[mmengine.config.config.ConfigDict, dict]] = None, train_cfg: Optional[Union[mmengine.config.config.ConfigDict, dict]] = None, test_cfg: Optional[Union[mmengine.config.config.ConfigDict, dict]] = None, init_cfg: Optional[Union[mmengine.config.config.ConfigDict, dict, List[Union[dict, mmengine.config.config.ConfigDict]]]] = None)[source]

RTMDet-R head.

Compared with RTMDetHead, RTMDetRotatedHead add some args to support rotated object detection.

  • angle_version used to limit angle_range during training.

  • angle_coder used to encode and decode angle, which is similar to bbox_coder.

  • use_hbbox_loss and loss_angle allow custom regression loss calculation for rotated box.

    There are three combination options for regression:

    1. use_hbbox_loss=False and loss_angle is None.

    bbox_pred────(tblr)───┐
                          ▼
    angle_pred          decode──►rbox_pred──(xywha)─►loss_bbox
        │                 ▲
        └────►decode──(a)─┘
    
    1. use_hbbox_loss=False and loss_angle is specified. A angle loss is added on angle_pred.

    bbox_pred────(tblr)───┐
                          ▼
    angle_pred          decode──►rbox_pred──(xywha)─►loss_bbox
        │                 ▲
        ├────►decode──(a)─┘
        │
        └───────────────────────────────────────────►loss_angle
    
    1. use_hbbox_loss=True and loss_angle is specified. In this case the loss_angle must be set.

    bbox_pred──(tblr)──►decode──►hbox_pred──(xyxy)──►loss_bbox
    
    angle_pred──────────────────────────────────────►loss_angle
    
  • There’s a decoded_with_angle flag in test_cfg, which is similar to training process.

    When decoded_with_angle=True:

    bbox_pred────(tblr)───┐
                          ▼
    angle_pred          decode──(xywha)──►rbox_pred
        │                 ▲
        └────►decode──(a)─┘
    

    When decoded_with_angle=False:

    bbox_pred──(tblr)─►decode
                          │ (xyxy)
                          ▼
                       format───(xywh)──►concat──(xywha)──►rbox_pred
                                           ▲
    angle_pred────────►decode────(a)───────┘
    
Parameters
  • head_module (ConfigType) – Base module used for RTMDetRotatedHead.

  • prior_generator – Points generator feature maps in 2D points-based detectors.

  • bbox_coder (ConfigDict or dict) – Config of bbox coder.

  • loss_cls (ConfigDict or dict) – Config of classification loss.

  • loss_bbox (ConfigDict or dict) – Config of localization loss.

  • angle_version (str) – Angle representations. Defaults to ‘le90’.

  • use_hbbox_loss (bool) – If true, use horizontal bbox loss and loss_angle should not be None. Default to False.

  • angle_coder (ConfigDict or dict) – Config of angle coder.

  • loss_angle (ConfigDict or dict, optional) – Config of angle loss.

  • train_cfg (ConfigDict or dict, optional) – Training config of anchor head. Defaults to None.

  • test_cfg (ConfigDict or dict, optional) – Testing config of anchor head. Defaults to None.

:param init_cfg (ConfigDict or list[ConfigDict] or dict or: list[dict], optional): Initialization config dict.

Defaults to None.

loss_by_feat(cls_scores: List[torch.Tensor], bbox_preds: List[torch.Tensor], angle_preds: List[torch.Tensor], batch_gt_instances: List[mmengine.structures.instance_data.InstanceData], batch_img_metas: List[dict], batch_gt_instances_ignore: Optional[List[mmengine.structures.instance_data.InstanceData]] = None)dict[source]

Compute losses of the head.

Parameters
  • cls_scores (list[Tensor]) – Box scores for each scale level Has shape (N, num_anchors * num_classes, H, W)

  • bbox_preds (list[Tensor]) – Decoded box for each scale level with shape (N, num_anchors * 4, H, W) in [tl_x, tl_y, br_x, br_y] format.

  • angle_preds (list[Tensor]) – Angle prediction for each scale level with shape (N, num_anchors * angle_out_dim, H, W).

  • batch_gt_instances (list[InstanceData]) – Batch of gt_instance. It usually includes bboxes and labels attributes.

  • batch_img_metas (list[dict]) – Meta information of each image, e.g., image size, scaling factor, etc.

  • batch_gt_instances_ignore (list[InstanceData], Optional) – Batch of gt_instances_ignore. It includes bboxes attribute data that is ignored during training and testing. Defaults to None.

Returns

A dictionary of loss components.

Return type

dict[str, Tensor]

predict_by_feat(cls_scores: List[torch.Tensor], bbox_preds: List[torch.Tensor], angle_preds: List[torch.Tensor], objectnesses: Optional[List[torch.Tensor]] = None, batch_img_metas: Optional[List[dict]] = None, cfg: Optional[mmengine.config.config.ConfigDict] = None, rescale: bool = True, with_nms: bool = True)List[mmengine.structures.instance_data.InstanceData][source]

Transform a batch of output features extracted by the head into bbox results.

Parameters
  • cls_scores (list[Tensor]) – Classification scores for all scale levels, each is a 4D-tensor, has shape (batch_size, num_priors * num_classes, H, W).

  • bbox_preds (list[Tensor]) – Box energies / deltas for all scale levels, each is a 4D-tensor, has shape (batch_size, num_priors * 4, H, W).

  • angle_preds (list[Tensor]) – Box angle for each scale level with shape (N, num_points * angle_dim, H, W)

  • objectnesses (list[Tensor], Optional) – Score factor for all scale level, each is a 4D-tensor, has shape (batch_size, 1, H, W).

  • batch_img_metas (list[dict], Optional) – Batch image meta info. Defaults to None.

  • cfg (ConfigDict, optional) – Test / postprocessing configuration, if None, test_cfg would be used. Defaults to None.

  • rescale (bool) – If True, return boxes in original image space. Defaults to False.

  • with_nms (bool) – If True, do nms before return boxes. Defaults to True.

Returns

Object detection results of each image after the post process. Each item usually contains following keys. - scores (Tensor): Classification scores, has a shape

(num_instance, )

  • labels (Tensor): Labels of bboxes, has a shape (num_instances, ).

  • bboxes (Tensor): Has a shape (num_instances, 5), the last dimension 4 arrange as (x, y, w, h, angle).

Return type

list[InstanceData]

class mmyolo.models.dense_heads.RTMDetRotatedSepBNHeadModule(num_classes: int, in_channels: int, widen_factor: float = 1.0, num_base_priors: int = 1, feat_channels: int = 256, stacked_convs: int = 2, featmap_strides: Sequence[int] = [8, 16, 32], share_conv: bool = True, pred_kernel_size: int = 1, angle_out_dim: int = 1, conv_cfg: Optional[Union[mmengine.config.config.ConfigDict, dict]] = None, norm_cfg: Union[mmengine.config.config.ConfigDict, dict] = {'type': 'BN'}, act_cfg: Union[mmengine.config.config.ConfigDict, dict] = {'inplace': True, 'type': 'SiLU'}, init_cfg: Optional[Union[mmengine.config.config.ConfigDict, dict, List[Union[dict, mmengine.config.config.ConfigDict]]]] = None)[source]

Detection Head Module of RTMDet-R.

Compared with RTMDet Detection Head Module, RTMDet-R adds a conv for angle prediction. An angle_out_dim arg is added, which is generated by the angle_coder module and controls the angle pred dim.

Parameters
  • num_classes (int) – Number of categories excluding the background category.

  • in_channels (int) – Number of channels in the input feature map.

  • widen_factor (float) – Width multiplier, multiply number of channels in each layer by this amount. Defaults to 1.0.

  • num_base_priors (int) – The number of priors (points) at a point on the feature grid. Defaults to 1.

  • feat_channels (int) – Number of hidden channels. Used in child classes. Defaults to 256

  • stacked_convs (int) – Number of stacking convs of the head. Defaults to 2.

  • featmap_strides (Sequence[int]) – Downsample factor of each feature map. Defaults to (8, 16, 32).

  • share_conv (bool) – Whether to share conv layers between stages. Defaults to True.

  • pred_kernel_size (int) – Kernel size of nn.Conv2d. Defaults to 1.

  • angle_out_dim (int) – Encoded length of angle, will passed by head. Defaults to 1.

  • conv_cfg (ConfigDict or dict, optional) – Config dict for convolution layer. Defaults to None.

  • norm_cfg (ConfigDict or dict) – Config dict for normalization layer. Defaults to dict(type='BN').

  • act_cfg (ConfigDict or dict) – Config dict for activation layer. Default: dict(type=’SiLU’, inplace=True).

:param init_cfg (ConfigDict or list[ConfigDict] or dict or: list[dict], optional): Initialization config dict.

Defaults to None.

forward(feats: Tuple[torch.Tensor, ...])tuple[source]

Forward features from the upstream network.

Parameters

feats (tuple[Tensor]) – Features from the upstream network, each is a 4D-tensor.

Returns

Usually a tuple of classification scores and bbox prediction - cls_scores (list[Tensor]): Classification scores for all scale

levels, each is a 4D-tensor, the channels number is num_base_priors * num_classes.

  • bbox_preds (list[Tensor]): Box energies / deltas for all scale levels, each is a 4D-tensor, the channels number is num_base_priors * 4.

  • angle_preds (list[Tensor]): Angle prediction for all scale levels, each is a 4D-tensor, the channels number is num_base_priors * angle_out_dim.

Return type

tuple

init_weights()None[source]

Initialize weights of the head.

class mmyolo.models.dense_heads.RTMDetSepBNHeadModule(num_classes: int, in_channels: int, widen_factor: float = 1.0, num_base_priors: int = 1, feat_channels: int = 256, stacked_convs: int = 2, featmap_strides: Sequence[int] = [8, 16, 32], share_conv: bool = True, pred_kernel_size: int = 1, conv_cfg: Optional[Union[mmengine.config.config.ConfigDict, dict]] = None, norm_cfg: Union[mmengine.config.config.ConfigDict, dict] = {'type': 'BN'}, act_cfg: Union[mmengine.config.config.ConfigDict, dict] = {'inplace': True, 'type': 'SiLU'}, init_cfg: Optional[Union[mmengine.config.config.ConfigDict, dict, List[Union[dict, mmengine.config.config.ConfigDict]]]] = None)[source]

Detection Head of RTMDet.

Parameters
  • num_classes (int) – Number of categories excluding the background category.

  • in_channels (int) – Number of channels in the input feature map.

  • widen_factor (float) – Width multiplier, multiply number of channels in each layer by this amount. Defaults to 1.0.

  • num_base_priors (int) – The number of priors (points) at a point on the feature grid. Defaults to 1.

  • feat_channels (int) – Number of hidden channels. Used in child classes. Defaults to 256

  • stacked_convs (int) – Number of stacking convs of the head. Defaults to 2.

  • featmap_strides (Sequence[int]) – Downsample factor of each feature map. Defaults to (8, 16, 32).

  • share_conv (bool) – Whether to share conv layers between stages. Defaults to True.

  • pred_kernel_size (int) – Kernel size of nn.Conv2d. Defaults to 1.

  • conv_cfg (ConfigDict or dict, optional) – Config dict for convolution layer. Defaults to None.

  • norm_cfg (ConfigDict or dict) – Config dict for normalization layer. Defaults to dict(type='BN').

  • act_cfg (ConfigDict or dict) – Config dict for activation layer. Default: dict(type=’SiLU’, inplace=True).

:param init_cfg (ConfigDict or list[ConfigDict] or dict or: list[dict], optional): Initialization config dict.

Defaults to None.

forward(feats: Tuple[torch.Tensor, ...])tuple[source]

Forward features from the upstream network.

Parameters

feats (tuple[Tensor]) – Features from the upstream network, each is a 4D-tensor.

Returns

Usually a tuple of classification scores and bbox prediction - cls_scores (list[Tensor]): Classification scores for all scale

levels, each is a 4D-tensor, the channels number is num_base_priors * num_classes.

  • bbox_preds (list[Tensor]): Box energies / deltas for all scale levels, each is a 4D-tensor, the channels number is num_base_priors * 4.

Return type

tuple

init_weights()None[source]

Initialize weights of the head.

class mmyolo.models.dense_heads.YOLOXHead(head_module: Union[mmengine.config.config.ConfigDict, dict], prior_generator: Union[mmengine.config.config.ConfigDict, dict] = {'offset': 0, 'strides': [8, 16, 32], 'type': 'mmdet.MlvlPointGenerator'}, bbox_coder: Union[mmengine.config.config.ConfigDict, dict] = {'type': 'YOLOXBBoxCoder'}, loss_cls: Union[mmengine.config.config.ConfigDict, dict] = {'loss_weight': 1.0, 'reduction': 'sum', 'type': 'mmdet.CrossEntropyLoss', 'use_sigmoid': True}, loss_bbox: Union[mmengine.config.config.ConfigDict, dict] = {'eps': 1e-16, 'loss_weight': 5.0, 'mode': 'square', 'reduction': 'sum', 'type': 'mmdet.IoULoss'}, loss_obj: Union[mmengine.config.config.ConfigDict, dict] = {'loss_weight': 1.0, 'reduction': 'sum', 'type': 'mmdet.CrossEntropyLoss', 'use_sigmoid': True}, loss_bbox_aux: Union[mmengine.config.config.ConfigDict, dict] = {'loss_weight': 1.0, 'reduction': 'sum', 'type': 'mmdet.L1Loss'}, train_cfg: Optional[Union[mmengine.config.config.ConfigDict, dict]] = None, test_cfg: Optional[Union[mmengine.config.config.ConfigDict, dict]] = None, init_cfg: Optional[Union[mmengine.config.config.ConfigDict, dict, List[Union[dict, mmengine.config.config.ConfigDict]]]] = None)[source]

YOLOXHead head used in YOLOX.

Parameters
  • head_module (ConfigType) – Base module used for YOLOXHead

  • prior_generator – Points generator feature maps in 2D points-based detectors.

  • loss_cls (ConfigDict or dict) – Config of classification loss.

  • loss_bbox (ConfigDict or dict) – Config of localization loss.

  • loss_obj (ConfigDict or dict) – Config of objectness loss.

  • loss_bbox_aux (ConfigDict or dict) – Config of bbox aux loss.

  • train_cfg (ConfigDict or dict, optional) – Training config of anchor head. Defaults to None.

  • test_cfg (ConfigDict or dict, optional) – Testing config of anchor head. Defaults to None.

:param init_cfg (ConfigDict or list[ConfigDict] or dict or: list[dict], optional): Initialization config dict.

Defaults to None.

forward(x: Tuple[torch.Tensor])Tuple[List][source]

Forward features from the upstream network.

Parameters

x (Tuple[Tensor]) – Features from the upstream network, each is a 4D-tensor.

Returns

A tuple of multi-level classification scores, bbox predictions, and objectnesses.

Return type

Tuple[List]

static gt_instances_preprocess(batch_gt_instances: torch.Tensor, batch_size: int)List[mmengine.structures.instance_data.InstanceData][source]

Split batch_gt_instances with batch size.

Parameters
  • batch_gt_instances (Tensor) – Ground truth a 2D-Tensor for whole batch, shape [all_gt_bboxes, 6]

  • batch_size (int) – Batch size.

Returns

batch gt instances data, shape [batch_size, InstanceData]

Return type

List

loss_by_feat(cls_scores: Sequence[torch.Tensor], bbox_preds: Sequence[torch.Tensor], objectnesses: Sequence[torch.Tensor], batch_gt_instances: torch.Tensor, batch_img_metas: Sequence[dict], batch_gt_instances_ignore: Optional[List[mmengine.structures.instance_data.InstanceData]] = None)dict[source]

Calculate the loss based on the features extracted by the detection head.

Parameters
  • cls_scores (Sequence[Tensor]) – Box scores for each scale level, each is a 4D-tensor, the channel number is num_priors * num_classes.

  • bbox_preds (Sequence[Tensor]) – Box energies / deltas for each scale level, each is a 4D-tensor, the channel number is num_priors * 4.

  • objectnesses (Sequence[Tensor]) – Score factor for all scale level, each is a 4D-tensor, has shape (batch_size, 1, H, W).

  • batch_gt_instances (list[InstanceData]) – Batch of gt_instance. It usually includes bboxes and labels attributes.

  • batch_img_metas (list[dict]) – Meta information of each image, e.g., image size, scaling factor, etc.

  • batch_gt_instances_ignore (list[InstanceData], optional) – Batch of gt_instances_ignore. It includes bboxes attribute data that is ignored during training and testing. Defaults to None.

Returns

A dictionary of losses.

Return type

dict[str, Tensor]

special_init()[source]

Since YOLO series algorithms will inherit from YOLOv5Head, but different algorithms have special initialization process.

The special_init function is designed to deal with this situation.

class mmyolo.models.dense_heads.YOLOXHeadModule(num_classes: int, in_channels: Union[int, Sequence], widen_factor: float = 1.0, num_base_priors: int = 1, feat_channels: int = 256, stacked_convs: int = 2, featmap_strides: Sequence[int] = [8, 16, 32], use_depthwise: bool = False, dcn_on_last_conv: bool = False, conv_bias: Union[bool, str] = 'auto', conv_cfg: Optional[Union[mmengine.config.config.ConfigDict, dict]] = None, norm_cfg: Union[mmengine.config.config.ConfigDict, dict] = {'eps': 0.001, 'momentum': 0.03, 'type': 'BN'}, act_cfg: Union[mmengine.config.config.ConfigDict, dict] = {'inplace': True, 'type': 'SiLU'}, init_cfg: Optional[Union[mmengine.config.config.ConfigDict, dict, List[Union[dict, mmengine.config.config.ConfigDict]]]] = None)[source]

YOLOXHead head module used in `YOLOX.

https://arxiv.org/abs/2107.08430

Parameters
  • num_classes (int) – Number of categories excluding the background category.

  • in_channels (Union[int, Sequence]) – Number of channels in the input feature map.

  • widen_factor (float) – Width multiplier, multiply number of channels in each layer by this amount. Defaults to 1.0.

  • num_base_priors (int) – The number of priors (points) at a point on the feature grid

  • stacked_convs (int) – Number of stacking convs of the head. Defaults to 2.

  • featmap_strides (Sequence[int]) – Downsample factor of each feature map. Defaults to [8, 16, 32].

  • use_depthwise (bool) – Whether to depthwise separable convolution in blocks. Defaults to False.

  • dcn_on_last_conv (bool) – If true, use dcn in the last layer of towers. Defaults to False.

  • conv_bias (bool or str) – If specified as auto, it will be decided by the norm_cfg. Bias of conv will be set as True if norm_cfg is None, otherwise False. Defaults to “auto”.

  • conv_cfg (ConfigDict or dict, optional) – Config dict for convolution layer. Defaults to None.

  • norm_cfg (ConfigDict or dict) – Config dict for normalization layer. Defaults to dict(type=’BN’, momentum=0.03, eps=0.001).

  • act_cfg (ConfigDict or dict) – Config dict for activation layer. Defaults to None.

:param init_cfg (ConfigDict or list[ConfigDict] or dict or: list[dict], optional): Initialization config dict.

Defaults to None.

forward(x: Tuple[torch.Tensor])Tuple[List][source]

Forward features from the upstream network.

Parameters

x (Tuple[Tensor]) – Features from the upstream network, each is a 4D-tensor.

Returns

A tuple of multi-level classification scores, bbox predictions, and objectnesses.

Return type

Tuple[List]

forward_single(x: torch.Tensor, cls_convs: torch.nn.modules.module.Module, reg_convs: torch.nn.modules.module.Module, conv_cls: torch.nn.modules.module.Module, conv_reg: torch.nn.modules.module.Module, conv_obj: torch.nn.modules.module.Module)Tuple[torch.Tensor, torch.Tensor, torch.Tensor][source]

Forward feature of a single scale level.

init_weights()[source]

Initialize weights of the head.

class mmyolo.models.dense_heads.YOLOXPoseHead(loss_pose: Optional[Union[mmengine.config.config.ConfigDict, dict]] = None, *args, **kwargs)[source]

YOLOXPoseHead head used in `YOLO-Pose.

<https://arxiv.org/abs/2204.06806>`_. :param loss_pose: Config of keypoint OKS loss. :type loss_pose: ConfigDict, optional

decode_pose(grids: torch.Tensor, offsets: torch.Tensor, strides: Union[torch.Tensor, int])torch.Tensor[source]

Decode regression offsets to keypoints.

Parameters
  • grids (torch.Tensor) – The coordinates of the feature map grids.

  • offsets (torch.Tensor) – The predicted offset of each keypoint relative to its corresponding grid.

  • strides (torch.Tensor | int) – The stride of the feature map for each instance.

Returns

The decoded keypoints coordinates.

Return type

torch.Tensor

static gt_instances_preprocess(batch_gt_instances: List[mmengine.structures.instance_data.InstanceData], *args, **kwargs)List[mmengine.structures.instance_data.InstanceData][source]

Split batch_gt_instances with batch size.

Parameters
  • batch_gt_instances (Tensor) – Ground truth a 2D-Tensor for whole batch, shape [all_gt_bboxes, 6]

  • batch_size (int) – Batch size.

Returns

batch gt instances data, shape [batch_size, InstanceData]

Return type

List

static gt_kps_instances_preprocess(batch_gt_instances: torch.Tensor, batch_gt_keypoints, batch_gt_keypoints_visible, batch_size: int)List[mmengine.structures.instance_data.InstanceData][source]

Split batch_gt_instances with batch size.

Parameters
  • batch_gt_instances (Tensor) – Ground truth a 2D-Tensor for whole batch, shape [all_gt_bboxes, 6]

  • batch_size (int) – Batch size.

Returns

batch gt instances data, shape [batch_size, InstanceData]

Return type

List

loss(x: Tuple[torch.Tensor], batch_data_samples: Union[list, dict])dict[source]

Perform forward propagation and loss calculation of the detection head on the features of the upstream network.

Parameters
  • x (tuple[Tensor]) – Features from the upstream network, each is a 4D-tensor.

  • batch_data_samples (List[DetDataSample], dict) – The Data Samples. It usually includes information such as gt_instance, gt_panoptic_seg and gt_sem_seg.

Returns

A dictionary of loss components.

Return type

dict

loss_by_feat(cls_scores: Sequence[torch.Tensor], bbox_preds: Sequence[torch.Tensor], objectnesses: Sequence[torch.Tensor], kpt_preds: Sequence[torch.Tensor], vis_preds: Sequence[torch.Tensor], batch_gt_instances: torch.Tensor, batch_gt_keypoints: torch.Tensor, batch_gt_keypoints_visible: torch.Tensor, batch_img_metas: Sequence[dict], batch_gt_instances_ignore: Optional[List[mmengine.structures.instance_data.InstanceData]] = None)dict[source]

Calculate the loss based on the features extracted by the detection head.

In addition to the base class method, keypoint losses are also calculated in this method.

predict_by_feat(cls_scores: List[torch.Tensor], bbox_preds: List[torch.Tensor], objectnesses: Optional[List[torch.Tensor]] = None, kpt_preds: Optional[List[torch.Tensor]] = None, vis_preds: Optional[List[torch.Tensor]] = None, batch_img_metas: Optional[List[dict]] = None, cfg: Optional[mmengine.config.config.ConfigDict] = None, rescale: bool = True, with_nms: bool = True)List[mmengine.structures.instance_data.InstanceData][source]

Transform a batch of output features extracted by the head into bbox and keypoint results.

In addition to the base class method, keypoint predictions are also calculated in this method.

class mmyolo.models.dense_heads.YOLOXPoseHeadModule(num_keypoints: int, *args, **kwargs)[source]

YOLOXPoseHeadModule serves as a head module for YOLOX-Pose.

In comparison to YOLOXHeadModule, this module introduces branches for keypoint prediction.

forward(x: Tuple[torch.Tensor])Tuple[List][source]

Forward features from the upstream network.

init_weights()[source]

Initialize weights of the head.

class mmyolo.models.dense_heads.YOLOv5Head(head_module: Union[mmengine.config.config.ConfigDict, dict], prior_generator: Union[mmengine.config.config.ConfigDict, dict] = {'base_sizes': [[(10, 13), (16, 30), (33, 23)], [(30, 61), (62, 45), (59, 119)], [(116, 90), (156, 198), (373, 326)]], 'strides': [8, 16, 32], 'type': 'mmdet.YOLOAnchorGenerator'}, bbox_coder: Union[mmengine.config.config.ConfigDict, dict] = {'type': 'YOLOv5BBoxCoder'}, loss_cls: Union[mmengine.config.config.ConfigDict, dict] = {'loss_weight': 0.5, 'reduction': 'mean', 'type': 'mmdet.CrossEntropyLoss', 'use_sigmoid': True}, loss_bbox: Union[mmengine.config.config.ConfigDict, dict] = {'bbox_format': 'xywh', 'eps': 1e-07, 'iou_mode': 'ciou', 'loss_weight': 0.05, 'reduction': 'mean', 'return_iou': True, 'type': 'IoULoss'}, loss_obj: Union[mmengine.config.config.ConfigDict, dict] = {'loss_weight': 1.0, 'reduction': 'mean', 'type': 'mmdet.CrossEntropyLoss', 'use_sigmoid': True}, prior_match_thr: float = 4.0, near_neighbor_thr: float = 0.5, ignore_iof_thr: float = - 1.0, obj_level_weights: List[float] = [4.0, 1.0, 0.4], train_cfg: Optional[Union[mmengine.config.config.ConfigDict, dict]] = None, test_cfg: Optional[Union[mmengine.config.config.ConfigDict, dict]] = None, init_cfg: Optional[Union[mmengine.config.config.ConfigDict, dict, List[Union[dict, mmengine.config.config.ConfigDict]]]] = None)[source]

YOLOv5Head head used in YOLOv5.

Parameters
  • head_module (ConfigType) – Base module used for YOLOv5Head

  • prior_generator (dict) – Points generator feature maps in 2D points-based detectors.

  • bbox_coder (ConfigDict or dict) – Config of bbox coder.

  • loss_cls (ConfigDict or dict) – Config of classification loss.

  • loss_bbox (ConfigDict or dict) – Config of localization loss.

  • loss_obj (ConfigDict or dict) – Config of objectness loss.

  • prior_match_thr (float) – Defaults to 4.0.

  • ignore_iof_thr (float) – Defaults to -1.0.

  • obj_level_weights (List[float]) – Defaults to [4.0, 1.0, 0.4].

  • train_cfg (ConfigDict or dict, optional) – Training config of anchor head. Defaults to None.

  • test_cfg (ConfigDict or dict, optional) – Testing config of anchor head. Defaults to None.

:param init_cfg (ConfigDict or list[ConfigDict] or dict or: list[dict], optional): Initialization config dict.

Defaults to None.

forward(x: Tuple[torch.Tensor])Tuple[List][source]

Forward features from the upstream network.

Parameters

x (Tuple[Tensor]) – Features from the upstream network, each is a 4D-tensor.

Returns

A tuple of multi-level classification scores, bbox predictions, and objectnesses.

Return type

Tuple[List]

loss(x: Tuple[torch.Tensor], batch_data_samples: Union[list, dict])dict[source]

Perform forward propagation and loss calculation of the detection head on the features of the upstream network.

Parameters
  • x (tuple[Tensor]) – Features from the upstream network, each is a 4D-tensor.

  • batch_data_samples (List[DetDataSample], dict) – The Data Samples. It usually includes information such as gt_instance, gt_panoptic_seg and gt_sem_seg.

Returns

A dictionary of loss components.

Return type

dict

loss_by_feat(cls_scores: Sequence[torch.Tensor], bbox_preds: Sequence[torch.Tensor], objectnesses: Sequence[torch.Tensor], batch_gt_instances: Sequence[mmengine.structures.instance_data.InstanceData], batch_img_metas: Sequence[dict], batch_gt_instances_ignore: Optional[List[mmengine.structures.instance_data.InstanceData]] = None)dict[source]

Calculate the loss based on the features extracted by the detection head.

Parameters
  • cls_scores (Sequence[Tensor]) – Box scores for each scale level, each is a 4D-tensor, the channel number is num_priors * num_classes.

  • bbox_preds (Sequence[Tensor]) – Box energies / deltas for each scale level, each is a 4D-tensor, the channel number is num_priors * 4.

  • objectnesses (Sequence[Tensor]) – Score factor for all scale level, each is a 4D-tensor, has shape (batch_size, 1, H, W).

  • batch_gt_instances (Sequence[InstanceData]) – Batch of gt_instance. It usually includes bboxes and labels attributes.

  • batch_img_metas (Sequence[dict]) – Meta information of each image, e.g., image size, scaling factor, etc.

  • batch_gt_instances_ignore (list[InstanceData], optional) – Batch of gt_instances_ignore. It includes bboxes attribute data that is ignored during training and testing. Defaults to None.

Returns

A dictionary of losses.

Return type

dict[str, Tensor]

predict_by_feat(cls_scores: List[torch.Tensor], bbox_preds: List[torch.Tensor], objectnesses: Optional[List[torch.Tensor]] = None, batch_img_metas: Optional[List[dict]] = None, cfg: Optional[mmengine.config.config.ConfigDict] = None, rescale: bool = True, with_nms: bool = True)List[mmengine.structures.instance_data.InstanceData][source]

Transform a batch of output features extracted by the head into bbox results. :param cls_scores: Classification scores for all

scale levels, each is a 4D-tensor, has shape (batch_size, num_priors * num_classes, H, W).

Parameters
  • bbox_preds (list[Tensor]) – Box energies / deltas for all scale levels, each is a 4D-tensor, has shape (batch_size, num_priors * 4, H, W).

  • objectnesses (list[Tensor], Optional) – Score factor for all scale level, each is a 4D-tensor, has shape (batch_size, 1, H, W).

  • batch_img_metas (list[dict], Optional) – Batch image meta info. Defaults to None.

  • cfg (ConfigDict, optional) – Test / postprocessing configuration, if None, test_cfg would be used. Defaults to None.

  • rescale (bool) – If True, return boxes in original image space. Defaults to False.

  • with_nms (bool) – If True, do nms before return boxes. Defaults to True.

Returns

Object detection results of each image after the post process. Each item usually contains following keys.

  • scores (Tensor): Classification scores, has a shape (num_instance, )

  • labels (Tensor): Labels of bboxes, has a shape (num_instances, ).

  • bboxes (Tensor): Has a shape (num_instances, 4), the last dimension 4 arrange as (x1, y1, x2, y2).

Return type

list[InstanceData]

special_init()[source]

Since YOLO series algorithms will inherit from YOLOv5Head, but different algorithms have special initialization process.

The special_init function is designed to deal with this situation.

class mmyolo.models.dense_heads.YOLOv5HeadModule(num_classes: int, in_channels: Union[int, Sequence], widen_factor: float = 1.0, num_base_priors: int = 3, featmap_strides: Sequence[int] = (8, 16, 32), init_cfg: Optional[Union[mmengine.config.config.ConfigDict, dict, List[Union[dict, mmengine.config.config.ConfigDict]]]] = None)[source]

YOLOv5Head head module used in YOLOv5.

Parameters
  • num_classes (int) – Number of categories excluding the background category.

  • in_channels (Union[int, Sequence]) – Number of channels in the input feature map.

  • widen_factor (float) – Width multiplier, multiply number of channels in each layer by this amount. Defaults to 1.0.

  • num_base_priors (int) – The number of priors (points) at a point on the feature grid.

  • featmap_strides (Sequence[int]) – Downsample factor of each feature map. Defaults to (8, 16, 32).

:param init_cfg (ConfigDict or list[ConfigDict] or dict or: list[dict], optional): Initialization config dict.

Defaults to None.

forward(x: Tuple[torch.Tensor])Tuple[List][source]

Forward features from the upstream network.

Parameters

x (Tuple[Tensor]) – Features from the upstream network, each is a 4D-tensor.

Returns

A tuple of multi-level classification scores, bbox predictions, and objectnesses.

Return type

Tuple[List]

forward_single(x: torch.Tensor, convs: torch.nn.modules.module.Module)Tuple[torch.Tensor, torch.Tensor, torch.Tensor][source]

Forward feature of a single scale level.

init_weights()[source]

Initialize the bias of YOLOv5 head.

class mmyolo.models.dense_heads.YOLOv5InsHead(*args, mask_overlap: bool = True, loss_mask: Union[mmengine.config.config.ConfigDict, dict] = {'reduction': 'none', 'type': 'mmdet.CrossEntropyLoss', 'use_sigmoid': True}, loss_mask_weight=0.05, **kwargs)[source]

YOLOv5 Instance Segmentation and Detection head.

Parameters
  • mask_overlap (bool) – Defaults to True.

  • loss_mask (ConfigDict or dict) – Config of mask loss.

  • loss_mask_weight (float) – The weight of mask loss.

crop_mask(masks: torch.Tensor, boxes: torch.Tensor)torch.Tensor[source]

Crop mask by the bounding box.

Parameters
  • masks (Tensor) – Predicted mask results. Has shape (1, num_instance, H, W).

  • boxes (Tensor) – Tensor of the bbox. Has shape (num_instance, 4).

Returns

The masks are being cropped to the bounding box.

Return type

(torch.Tensor)

loss(x: Tuple[torch.Tensor], batch_data_samples: Union[list, dict])dict[source]

Perform forward propagation and loss calculation of the detection head on the features of the upstream network.

Parameters
  • x (tuple[Tensor]) – Features from the upstream network, each is a 4D-tensor.

  • batch_data_samples (List[DetDataSample], dict) – The Data Samples. It usually includes information such as gt_instance, gt_panoptic_seg and gt_sem_seg.

Returns

A dictionary of loss components.

Return type

dict

loss_by_feat(cls_scores: Sequence[torch.Tensor], bbox_preds: Sequence[torch.Tensor], objectnesses: Sequence[torch.Tensor], coeff_preds: Sequence[torch.Tensor], proto_preds: torch.Tensor, batch_gt_instances: Sequence[mmengine.structures.instance_data.InstanceData], batch_gt_masks: Sequence[torch.Tensor], batch_img_metas: Sequence[dict], batch_gt_instances_ignore: Optional[List[mmengine.structures.instance_data.InstanceData]] = None)dict[source]

Calculate the loss based on the features extracted by the detection head.

Parameters
  • cls_scores (Sequence[Tensor]) – Box scores for each scale level, each is a 4D-tensor, the channel number is num_priors * num_classes.

  • bbox_preds (Sequence[Tensor]) – Box energies / deltas for each scale level, each is a 4D-tensor, the channel number is num_priors * 4.

  • objectnesses (Sequence[Tensor]) – Score factor for all scale level, each is a 4D-tensor, has shape (batch_size, 1, H, W).

  • coeff_preds (Sequence[Tensor]) – Mask coefficient for each scale level, each is a 4D-tensor, the channel number is num_priors * mask_channels.

  • proto_preds (Tensor) – Mask prototype features extracted from the mask head, has shape (batch_size, mask_channels, H, W).

  • batch_gt_instances (Sequence[InstanceData]) – Batch of gt_instance. It usually includes bboxes and labels attributes.

  • batch_gt_masks (Sequence[Tensor]) – Batch of gt_mask.

  • batch_img_metas (Sequence[dict]) – Meta information of each image, e.g., image size, scaling factor, etc.

  • batch_gt_instances_ignore (list[InstanceData], optional) – Batch of gt_instances_ignore. It includes bboxes attribute data that is ignored during training and testing. Defaults to None.

Returns

A dictionary of losses.

Return type

dict[str, Tensor]

predict_by_feat(cls_scores: List[torch.Tensor], bbox_preds: List[torch.Tensor], objectnesses: Optional[List[torch.Tensor]] = None, coeff_preds: Optional[List[torch.Tensor]] = None, proto_preds: Optional[torch.Tensor] = None, batch_img_metas: Optional[List[dict]] = None, cfg: Optional[mmengine.config.config.ConfigDict] = None, rescale: bool = True, with_nms: bool = True)List[mmengine.structures.instance_data.InstanceData][source]

Transform a batch of output features extracted from the head into bbox results. Note: When score_factors is not None, the cls_scores are usually multiplied by it then obtain the real score used in NMS. :param cls_scores: Classification scores for all

scale levels, each is a 4D-tensor, has shape (batch_size, num_priors * num_classes, H, W).

Parameters
  • bbox_preds (list[Tensor]) – Box energies / deltas for all scale levels, each is a 4D-tensor, has shape (batch_size, num_priors * 4, H, W).

  • objectnesses (list[Tensor], Optional) – Score factor for all scale level, each is a 4D-tensor, has shape (batch_size, 1, H, W).

  • coeff_preds (list[Tensor]) – Mask coefficients predictions for all scale levels, each is a 4D-tensor, has shape (batch_size, mask_channels, H, W).

  • proto_preds (Tensor) – Mask prototype features extracted from the mask head, has shape (batch_size, mask_channels, H, W).

  • batch_img_metas (list[dict], Optional) – Batch image meta info. Defaults to None.

  • cfg (ConfigDict, optional) – Test / postprocessing configuration, if None, test_cfg would be used. Defaults to None.

  • rescale (bool) – If True, return boxes in original image space. Defaults to False.

  • with_nms (bool) – If True, do nms before return boxes. Defaults to True.

Returns

Object detection and instance segmentation results of each image after the post process. Each item usually contains following keys.

  • scores (Tensor): Classification scores, has a shape (num_instance, )

  • labels (Tensor): Labels of bboxes, has a shape (num_instances, ).

  • bboxes (Tensor): Has a shape (num_instances, 4), the last dimension 4 arrange as (x1, y1, x2, y2).

  • masks (Tensor): Has a shape (num_instances, h, w).

Return type

list[InstanceData]

process_mask(mask_proto: torch.Tensor, mask_coeff_pred: torch.Tensor, bboxes: torch.Tensor, shape: Tuple[int, int], upsample: bool = False)torch.Tensor[source]

Generate mask logits results.

Parameters
  • mask_proto (Tensor) – Mask prototype features. Has shape (num_instance, mask_channels).

  • mask_coeff_pred (Tensor) – Mask coefficients prediction for single image. Has shape (mask_channels, H, W)

  • bboxes (Tensor) – Tensor of the bbox. Has shape (num_instance, 4).

  • shape (Tuple) – Batch input shape of image.

  • upsample (bool) – Whether upsample masks results to batch input shape. Default to False.

Returns

Instance segmentation masks for each instance.

Has shape (num_instance, H, W).

Return type

Tensor

class mmyolo.models.dense_heads.YOLOv5InsHeadModule(*args, num_classes: int, mask_channels: int = 32, proto_channels: int = 256, widen_factor: float = 1.0, norm_cfg: Union[mmengine.config.config.ConfigDict, dict] = {'eps': 0.001, 'momentum': 0.03, 'type': 'BN'}, act_cfg: Union[mmengine.config.config.ConfigDict, dict] = {'inplace': True, 'type': 'SiLU'}, **kwargs)[source]

Detection and Instance Segmentation Head of YOLOv5.

Parameters
  • num_classes (int) – Number of categories excluding the background category.

  • mask_channels (int) – Number of channels in the mask feature map. This is the channel count of the mask.

  • proto_channels (int) – Number of channels in the proto feature map.

  • widen_factor (float) – Width multiplier, multiply number of channels in each layer by this amount. Defaults to 1.0.

  • norm_cfg (ConfigDict or dict) – Config dict for normalization layer. Defaults to dict(type='BN', momentum=0.03, eps=0.001).

  • act_cfg (ConfigDict or dict) – Config dict for activation layer. Default: dict(type=’SiLU’, inplace=True).

forward(x: Tuple[torch.Tensor])Tuple[List][source]

Forward features from the upstream network.

Parameters

x (Tuple[Tensor]) – Features from the upstream network, each is a 4D-tensor.

Returns

A tuple of multi-level classification scores, bbox predictions, objectnesses, and mask predictions.

Return type

Tuple[List]

forward_single(x: torch.Tensor, convs_pred: torch.nn.modules.module.Module)Tuple[torch.Tensor, torch.Tensor, torch.Tensor, torch.Tensor][source]

Forward feature of a single scale level.

class mmyolo.models.dense_heads.YOLOv6Head(head_module: Union[mmengine.config.config.ConfigDict, dict], prior_generator: Union[mmengine.config.config.ConfigDict, dict] = {'offset': 0.5, 'strides': [8, 16, 32], 'type': 'mmdet.MlvlPointGenerator'}, bbox_coder: Union[mmengine.config.config.ConfigDict, dict] = {'type': 'DistancePointBBoxCoder'}, loss_cls: Union[mmengine.config.config.ConfigDict, dict] = {'alpha': 0.75, 'gamma': 2.0, 'iou_weighted': True, 'loss_weight': 1.0, 'reduction': 'sum', 'type': 'mmdet.VarifocalLoss', 'use_sigmoid': True}, loss_bbox: Union[mmengine.config.config.ConfigDict, dict] = {'bbox_format': 'xyxy', 'iou_mode': 'giou', 'loss_weight': 2.5, 'reduction': 'mean', 'return_iou': False, 'type': 'IoULoss'}, train_cfg: Optional[Union[mmengine.config.config.ConfigDict, dict]] = None, test_cfg: Optional[Union[mmengine.config.config.ConfigDict, dict]] = None, init_cfg: Optional[Union[mmengine.config.config.ConfigDict, dict, List[Union[dict, mmengine.config.config.ConfigDict]]]] = None)[source]

YOLOv6Head head used in YOLOv6.

Parameters
  • head_module (ConfigType) – Base module used for YOLOv6Head

  • prior_generator (dict) – Points generator feature maps in 2D points-based detectors.

  • loss_cls (ConfigDict or dict) – Config of classification loss.

  • loss_bbox (ConfigDict or dict) – Config of localization loss.

  • train_cfg (ConfigDict or dict, optional) – Training config of anchor head. Defaults to None.

  • test_cfg (ConfigDict or dict, optional) – Testing config of anchor head. Defaults to None.

:param init_cfg (ConfigDict or list[ConfigDict] or dict or: list[dict], optional): Initialization config dict.

Defaults to None.

loss_by_feat(cls_scores: Sequence[torch.Tensor], bbox_preds: Sequence[torch.Tensor], bbox_dist_preds: Sequence[torch.Tensor], batch_gt_instances: Sequence[mmengine.structures.instance_data.InstanceData], batch_img_metas: Sequence[dict], batch_gt_instances_ignore: Optional[List[mmengine.structures.instance_data.InstanceData]] = None)dict[source]

Calculate the loss based on the features extracted by the detection head.

Parameters
  • cls_scores (Sequence[Tensor]) – Box scores for each scale level, each is a 4D-tensor, the channel number is num_priors * num_classes.

  • bbox_preds (Sequence[Tensor]) – Box energies / deltas for each scale level, each is a 4D-tensor, the channel number is num_priors * 4.

  • batch_gt_instances (list[InstanceData]) – Batch of gt_instance. It usually includes bboxes and labels attributes.

  • batch_img_metas (list[dict]) – Meta information of each image, e.g., image size, scaling factor, etc.

  • batch_gt_instances_ignore (list[InstanceData], optional) – Batch of gt_instances_ignore. It includes bboxes attribute data that is ignored during training and testing. Defaults to None.

Returns

A dictionary of losses.

Return type

dict[str, Tensor]

special_init()[source]

Since YOLO series algorithms will inherit from YOLOv5Head, but different algorithms have special initialization process.

The special_init function is designed to deal with this situation.

class mmyolo.models.dense_heads.YOLOv6HeadModule(num_classes: int, in_channels: Union[int, Sequence], widen_factor: float = 1.0, num_base_priors: int = 1, reg_max=0, featmap_strides: Sequence[int] = (8, 16, 32), norm_cfg: Union[mmengine.config.config.ConfigDict, dict] = {'eps': 0.001, 'momentum': 0.03, 'type': 'BN'}, act_cfg: Union[mmengine.config.config.ConfigDict, dict] = {'inplace': True, 'type': 'SiLU'}, init_cfg: Optional[Union[mmengine.config.config.ConfigDict, dict, List[Union[dict, mmengine.config.config.ConfigDict]]]] = None)[source]

YOLOv6Head head module used in `YOLOv6.

<https://arxiv.org/pdf/2209.02976>`_.

Parameters
  • num_classes (int) – Number of categories excluding the background category.

  • in_channels (Union[int, Sequence]) – Number of channels in the input feature map.

  • widen_factor (float) – Width multiplier, multiply number of channels in each layer by this amount. Defaults to 1.0.

  • num_base_priors – (int): The number of priors (points) at a point on the feature grid.

  • featmap_strides (Sequence[int]) –

    Downsample factor of each feature map.

    Defaults to [8, 16, 32].

    None, otherwise False. Defaults to “auto”.

  • norm_cfg (ConfigDict or dict) – Config dict for normalization layer. Defaults to dict(type=’BN’, momentum=0.03, eps=0.001).

  • act_cfg (ConfigDict or dict) – Config dict for activation layer. Defaults to None.

:param init_cfg (ConfigDict or list[ConfigDict] or dict or: list[dict], optional): Initialization config dict.

Defaults to None.

forward(x: Tuple[torch.Tensor])Tuple[List][source]

Forward features from the upstream network.

Parameters

x (Tuple[Tensor]) – Features from the upstream network, each is a 4D-tensor.

Returns

A tuple of multi-level classification scores, bbox predictions.

Return type

Tuple[List]

forward_single(x: torch.Tensor, stem: torch.nn.modules.module.Module, cls_conv: torch.nn.modules.module.Module, cls_pred: torch.nn.modules.module.Module, reg_conv: torch.nn.modules.module.Module, reg_pred: torch.nn.modules.module.Module)Tuple[torch.Tensor, torch.Tensor][source]

Forward feature of a single scale level.

init_weights()[source]

Initialize the weights.

class mmyolo.models.dense_heads.YOLOv7Head(*args, simota_candidate_topk: int = 20, simota_iou_weight: float = 3.0, simota_cls_weight: float = 1.0, aux_loss_weights: float = 0.25, **kwargs)[source]

YOLOv7Head head used in YOLOv7.

Parameters
  • simota_candidate_topk (int) – The candidate top-k which used to get top-k ious to calculate dynamic-k in BatchYOLOv7Assigner. Defaults to 10.

  • simota_iou_weight (float) – The scale factor for regression iou cost in BatchYOLOv7Assigner. Defaults to 3.0.

  • simota_cls_weight (float) – The scale factor for classification cost in BatchYOLOv7Assigner. Defaults to 1.0.

loss_by_feat(cls_scores: Sequence[Union[torch.Tensor, List]], bbox_preds: Sequence[Union[torch.Tensor, List]], objectnesses: Sequence[Union[torch.Tensor, List]], batch_gt_instances: Sequence[mmengine.structures.instance_data.InstanceData], batch_img_metas: Sequence[dict], batch_gt_instances_ignore: Optional[List[mmengine.structures.instance_data.InstanceData]] = None)dict[source]

Calculate the loss based on the features extracted by the detection head.

Parameters
  • cls_scores (Sequence[Tensor]) – Box scores for each scale level, each is a 4D-tensor, the channel number is num_priors * num_classes.

  • bbox_preds (Sequence[Tensor]) – Box energies / deltas for each scale level, each is a 4D-tensor, the channel number is num_priors * 4.

  • objectnesses (Sequence[Tensor]) – Score factor for all scale level, each is a 4D-tensor, has shape (batch_size, 1, H, W).

  • batch_gt_instances (list[InstanceData]) – Batch of gt_instance. It usually includes bboxes and labels attributes.

  • batch_img_metas (list[dict]) – Meta information of each image, e.g., image size, scaling factor, etc.

  • batch_gt_instances_ignore (list[InstanceData], optional) – Batch of gt_instances_ignore. It includes bboxes attribute data that is ignored during training and testing. Defaults to None.

Returns

A dictionary of losses.

Return type

dict[str, Tensor]

class mmyolo.models.dense_heads.YOLOv7HeadModule(num_classes: int, in_channels: Union[int, Sequence], widen_factor: float = 1.0, num_base_priors: int = 3, featmap_strides: Sequence[int] = (8, 16, 32), init_cfg: Optional[Union[mmengine.config.config.ConfigDict, dict, List[Union[dict, mmengine.config.config.ConfigDict]]]] = None)[source]

YOLOv7Head head module used in YOLOv7.

init_weights()[source]

Initialize the bias of YOLOv7 head.

class mmyolo.models.dense_heads.YOLOv7p6HeadModule(*args, main_out_channels: Sequence[int] = [256, 512, 768, 1024], aux_out_channels: Sequence[int] = [320, 640, 960, 1280], use_aux: bool = True, norm_cfg: Union[mmengine.config.config.ConfigDict, dict] = {'eps': 0.001, 'momentum': 0.03, 'type': 'BN'}, act_cfg: Union[mmengine.config.config.ConfigDict, dict] = {'inplace': True, 'type': 'SiLU'}, **kwargs)[source]

YOLOv7Head head module used in YOLOv7.

forward(x: Tuple[torch.Tensor])Tuple[List][source]

Forward features from the upstream network.

Parameters

x (Tuple[Tensor]) – Features from the upstream network, each is a 4D-tensor.

Returns

A tuple of multi-level classification scores, bbox predictions, and objectnesses.

Return type

Tuple[List]

forward_single(x: torch.Tensor, convs: torch.nn.modules.module.Module, aux_convs: Optional[torch.nn.modules.module.Module])Tuple[Union[torch.Tensor, List], Union[torch.Tensor, List], Union[torch.Tensor, List]][source]

Forward feature of a single scale level.

init_weights()[source]

Initialize the bias of YOLOv5 head.

class mmyolo.models.dense_heads.YOLOv8Head(head_module: Union[mmengine.config.config.ConfigDict, dict], prior_generator: Union[mmengine.config.config.ConfigDict, dict] = {'offset': 0.5, 'strides': [8, 16, 32], 'type': 'mmdet.MlvlPointGenerator'}, bbox_coder: Union[mmengine.config.config.ConfigDict, dict] = {'type': 'DistancePointBBoxCoder'}, loss_cls: Union[mmengine.config.config.ConfigDict, dict] = {'loss_weight': 0.5, 'reduction': 'none', 'type': 'mmdet.CrossEntropyLoss', 'use_sigmoid': True}, loss_bbox: Union[mmengine.config.config.ConfigDict, dict] = {'bbox_format': 'xyxy', 'iou_mode': 'ciou', 'loss_weight': 7.5, 'reduction': 'sum', 'return_iou': False, 'type': 'IoULoss'}, loss_dfl={'loss_weight': 0.375, 'reduction': 'mean', 'type': 'mmdet.DistributionFocalLoss'}, train_cfg: Optional[Union[mmengine.config.config.ConfigDict, dict]] = None, test_cfg: Optional[Union[mmengine.config.config.ConfigDict, dict]] = None, init_cfg: Optional[Union[mmengine.config.config.ConfigDict, dict, List[Union[dict, mmengine.config.config.ConfigDict]]]] = None)[source]

YOLOv8Head head used in YOLOv8.

Parameters
  • head_module (ConfigDict or dict) – Base module used for YOLOv8Head

  • prior_generator (dict) – Points generator feature maps in 2D points-based detectors.

  • bbox_coder (ConfigDict or dict) – Config of bbox coder.

  • loss_cls (ConfigDict or dict) – Config of classification loss.

  • loss_bbox (ConfigDict or dict) – Config of localization loss.

  • loss_dfl (ConfigDict or dict) – Config of Distribution Focal Loss.

  • train_cfg (ConfigDict or dict, optional) – Training config of anchor head. Defaults to None.

  • test_cfg (ConfigDict or dict, optional) – Testing config of anchor head. Defaults to None.

:param init_cfg (ConfigDict or list[ConfigDict] or dict or: list[dict], optional): Initialization config dict.

Defaults to None.

loss_by_feat(cls_scores: Sequence[torch.Tensor], bbox_preds: Sequence[torch.Tensor], bbox_dist_preds: Sequence[torch.Tensor], batch_gt_instances: Sequence[mmengine.structures.instance_data.InstanceData], batch_img_metas: Sequence[dict], batch_gt_instances_ignore: Optional[List[mmengine.structures.instance_data.InstanceData]] = None)dict[source]

Calculate the loss based on the features extracted by the detection head.

Parameters
  • cls_scores (Sequence[Tensor]) – Box scores for each scale level, each is a 4D-tensor, the channel number is num_priors * num_classes.

  • bbox_preds (Sequence[Tensor]) – Box energies / deltas for each scale level, each is a 4D-tensor, the channel number is num_priors * 4.

  • bbox_dist_preds (Sequence[Tensor]) – Box distribution logits for each scale level with shape (bs, reg_max + 1, H*W, 4).

  • batch_gt_instances (list[InstanceData]) – Batch of gt_instance. It usually includes bboxes and labels attributes.

  • batch_img_metas (list[dict]) – Meta information of each image, e.g., image size, scaling factor, etc.

  • batch_gt_instances_ignore (list[InstanceData], optional) – Batch of gt_instances_ignore. It includes bboxes attribute data that is ignored during training and testing. Defaults to None.

Returns

A dictionary of losses.

Return type

dict[str, Tensor]

special_init()[source]

Since YOLO series algorithms will inherit from YOLOv5Head, but different algorithms have special initialization process.

The special_init function is designed to deal with this situation.

class mmyolo.models.dense_heads.YOLOv8HeadModule(num_classes: int, in_channels: Union[int, Sequence], widen_factor: float = 1.0, num_base_priors: int = 1, featmap_strides: Sequence[int] = (8, 16, 32), reg_max: int = 16, norm_cfg: Union[mmengine.config.config.ConfigDict, dict] = {'eps': 0.001, 'momentum': 0.03, 'type': 'BN'}, act_cfg: Union[mmengine.config.config.ConfigDict, dict] = {'inplace': True, 'type': 'SiLU'}, init_cfg: Optional[Union[mmengine.config.config.ConfigDict, dict, List[Union[dict, mmengine.config.config.ConfigDict]]]] = None)[source]

YOLOv8HeadModule head module used in YOLOv8.

Parameters
  • num_classes (int) – Number of categories excluding the background category.

  • in_channels (Union[int, Sequence]) – Number of channels in the input feature map.

  • widen_factor (float) – Width multiplier, multiply number of channels in each layer by this amount. Defaults to 1.0.

  • num_base_priors (int) – The number of priors (points) at a point on the feature grid.

  • featmap_strides (Sequence[int]) – Downsample factor of each feature map. Defaults to [8, 16, 32].

  • reg_max (int) – Max value of integral set :math: {0, ..., reg_max-1} in QFL setting. Defaults to 16.

  • norm_cfg (ConfigDict or dict) – Config dict for normalization layer. Defaults to dict(type=’BN’, momentum=0.03, eps=0.001).

  • act_cfg (ConfigDict or dict) – Config dict for activation layer. Defaults to None.

:param init_cfg (ConfigDict or list[ConfigDict] or dict or: list[dict], optional): Initialization config dict.

Defaults to None.

forward(x: Tuple[torch.Tensor])Tuple[List][source]

Forward features from the upstream network.

Parameters

x (Tuple[Tensor]) – Features from the upstream network, each is a 4D-tensor.

Returns

A tuple of multi-level classification scores, bbox predictions

Return type

Tuple[List]

forward_single(x: torch.Tensor, cls_pred: torch.nn.modules.container.ModuleList, reg_pred: torch.nn.modules.container.ModuleList)Tuple[source]

Forward feature of a single scale level.

init_weights(prior_prob=0.01)[source]

Initialize the weight and bias of PPYOLOE head.

detectors

class mmyolo.models.detectors.YOLODetector(backbone: Union[mmengine.config.config.ConfigDict, dict], neck: Union[mmengine.config.config.ConfigDict, dict], bbox_head: Union[mmengine.config.config.ConfigDict, dict], train_cfg: Optional[Union[mmengine.config.config.ConfigDict, dict]] = None, test_cfg: Optional[Union[mmengine.config.config.ConfigDict, dict]] = None, data_preprocessor: Optional[Union[mmengine.config.config.ConfigDict, dict]] = None, init_cfg: Optional[Union[mmengine.config.config.ConfigDict, dict, List[Union[dict, mmengine.config.config.ConfigDict]]]] = None, use_syncbn: bool = True)[source]

Implementation of YOLO Series

Parameters
  • backbone (ConfigDict or dict) – The backbone config.

  • neck (ConfigDict or dict) – The neck config.

  • bbox_head (ConfigDict or dict) – The bbox head config.

  • train_cfg (ConfigDict or dict, optional) – The training config of YOLO. Defaults to None.

  • test_cfg (ConfigDict or dict, optional) – The testing config of YOLO. Defaults to None.

  • data_preprocessor (ConfigDict or dict, optional) – Config of DetDataPreprocessor to process the input data. Defaults to None.

:param init_cfg (ConfigDict or list[ConfigDict] or dict or: list[dict], optional): Initialization config dict.

Defaults to None.

Parameters

use_syncbn (bool) – whether to use SyncBatchNorm. Defaults to True.

layers

class mmyolo.models.layers.BepC3StageBlock(in_channels: int, out_channels: int, num_blocks: int = 1, hidden_ratio: float = 0.5, concat_all_layer: bool = True, block_cfg: Union[mmengine.config.config.ConfigDict, dict] = {'type': 'RepVGGBlock'}, norm_cfg: Union[mmengine.config.config.ConfigDict, dict] = {'eps': 0.001, 'momentum': 0.03, 'type': 'BN'}, act_cfg: Union[mmengine.config.config.ConfigDict, dict] = {'inplace': True, 'type': 'ReLU'})[source]

Beer-mug RepC3 Block.

Parameters
  • in_channels (int) – Number of channels in the input image

  • out_channels (int) – Number of channels produced by the convolution

  • num_blocks (int) – Number of blocks. Defaults to 1

  • hidden_ratio (float) – Hidden channel expansion. Default: 0.5

  • concat_all_layer (bool) – Concat all layer when forward calculate. Default: True

  • block_cfg (dict) – Config dict for the block used to build each layer. Defaults to dict(type=’RepVGGBlock’).

  • norm_cfg (ConfigType) – Config dict for normalization layer. Defaults to dict(type=’BN’, momentum=0.03, eps=0.001).

  • act_cfg (ConfigType) – Config dict for activation layer. Defaults to dict(type=’ReLU’, inplace=True).

forward(x)[source]

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

class mmyolo.models.layers.BiFusion(in_channels0: int, in_channels1: int, out_channels: int, norm_cfg: Union[mmengine.config.config.ConfigDict, dict] = {'eps': 0.001, 'momentum': 0.03, 'type': 'BN'}, act_cfg: Union[mmengine.config.config.ConfigDict, dict] = {'inplace': True, 'type': 'ReLU'})[source]

BiFusion Block in YOLOv6.

BiFusion fuses current-, high- and low-level features. Compared with concatenation in PAN, it fuses an extra low-level feature.

Parameters
  • in_channels0 (int) – The channels of current-level feature.

  • in_channels1 (int) – The input channels of lower-level feature.

  • out_channels (int) – The out channels of the BiFusion module.

  • norm_cfg (dict) – Config dict for normalization layer. Defaults to dict(type=’BN’, momentum=0.03, eps=0.001).

  • act_cfg (dict) – Config dict for activation layer. Defaults to dict(type=’SiLU’, inplace=True).

  • init_cfg (dict or list[dict], optional) – Initialization config dict. Defaults to None.

forward(x: List[torch.Tensor])torch.Tensor[source]

Forward process :param x: The tensor list of length 3.

x[0]: The high-level feature. x[1]: The current-level feature. x[2]: The low-level feature.

class mmyolo.models.layers.CSPLayerWithTwoConv(in_channels: int, out_channels: int, expand_ratio: float = 0.5, num_blocks: int = 1, add_identity: bool = True, conv_cfg: Optional[Union[mmengine.config.config.ConfigDict, dict]] = None, norm_cfg: Union[mmengine.config.config.ConfigDict, dict] = {'eps': 0.001, 'momentum': 0.03, 'type': 'BN'}, act_cfg: Union[mmengine.config.config.ConfigDict, dict] = {'inplace': True, 'type': 'SiLU'}, init_cfg: Optional[Union[mmengine.config.config.ConfigDict, dict, List[Union[dict, mmengine.config.config.ConfigDict]]]] = None)[source]

Cross Stage Partial Layer with 2 convolutions.

Parameters
  • in_channels (int) – The input channels of the CSP layer.

  • out_channels (int) – The output channels of the CSP layer.

  • expand_ratio (float) – Ratio to adjust the number of channels of the hidden layer. Defaults to 0.5.

  • num_blocks (int) – Number of blocks. Defaults to 1

  • add_identity (bool) – Whether to add identity in blocks. Defaults to True.

  • conv_cfg (dict, optional) – Config dict for convolution layer. Defaults to None, which means using conv2d.

  • norm_cfg (dict) – Config dict for normalization layer. Defaults to dict(type=’BN’).

  • act_cfg (dict) – Config dict for activation layer. Defaults to dict(type=’SiLU’, inplace=True).

:param init_cfg (ConfigDict or dict or list[dict] or: list[ConfigDict], optional): Initialization config dict.

Defaults to None.

forward(x: torch.Tensor)torch.Tensor[source]

Forward process.

class mmyolo.models.layers.DarknetBottleneck(in_channels: int, out_channels: int, expansion: float = 0.5, kernel_size: Sequence[int] = (1, 3), padding: Sequence[int] = (0, 1), add_identity: bool = True, use_depthwise: bool = False, conv_cfg: Optional[Union[mmengine.config.config.ConfigDict, dict]] = None, norm_cfg: Union[mmengine.config.config.ConfigDict, dict] = {'eps': 0.001, 'momentum': 0.03, 'type': 'BN'}, act_cfg: Union[mmengine.config.config.ConfigDict, dict] = {'inplace': True, 'type': 'SiLU'}, init_cfg: Optional[Union[mmengine.config.config.ConfigDict, dict, List[Union[dict, mmengine.config.config.ConfigDict]]]] = None)[source]

The basic bottleneck block used in Darknet.

Each ResBlock consists of two ConvModules and the input is added to the final output. Each ConvModule is composed of Conv, BN, and LeakyReLU. The first convLayer has filter size of k1Xk1 and the second one has the filter size of k2Xk2.

Note: This DarknetBottleneck is little different from MMDet’s, we can change the kernel size and padding for each conv.

Parameters
  • in_channels (int) – The input channels of this Module.

  • out_channels (int) – The output channels of this Module.

  • expansion (float) – The kernel size for hidden channel. Defaults to 0.5.

  • kernel_size (Sequence[int]) – The kernel size of the convolution. Defaults to (1, 3).

  • padding (Sequence[int]) – The padding size of the convolution. Defaults to (0, 1).

  • add_identity (bool) – Whether to add identity to the out. Defaults to True

  • use_depthwise (bool) – Whether to use depthwise separable convolution. Defaults to False

  • conv_cfg (dict) – Config dict for convolution layer. Default: None, which means using conv2d.

  • norm_cfg (dict) – Config dict for normalization layer. Defaults to dict(type=’BN’).

  • act_cfg (dict) – Config dict for activation layer. Defaults to dict(type=’Swish’).

class mmyolo.models.layers.EELANBlock(num_elan_block: int, **kwargs)[source]

Expand efficient layer aggregation networks for YOLOv7.

Parameters

num_elan_block (int) – The number of ELANBlock.

forward(x: torch.Tensor)torch.Tensor[source]

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

class mmyolo.models.layers.ELANBlock(in_channels: int, out_channels: int, middle_ratio: float, block_ratio: float, num_blocks: int = 2, num_convs_in_block: int = 1, conv_cfg: Optional[Union[mmengine.config.config.ConfigDict, dict]] = None, norm_cfg: Union[mmengine.config.config.ConfigDict, dict] = {'eps': 0.001, 'momentum': 0.03, 'type': 'BN'}, act_cfg: Union[mmengine.config.config.ConfigDict, dict] = {'inplace': True, 'type': 'SiLU'}, init_cfg: Optional[Union[mmengine.config.config.ConfigDict, dict, List[Union[dict, mmengine.config.config.ConfigDict]]]] = None)[source]

Efficient layer aggregation networks for YOLOv7.

Parameters
  • in_channels (int) – The input channels of this Module.

  • out_channels (int) – The out channels of this Module.

  • middle_ratio (float) – The scaling ratio of the middle layer based on the in_channels.

  • block_ratio (float) – The scaling ratio of the block layer based on the in_channels.

  • num_blocks (int) – The number of blocks in the main branch. Defaults to 2.

  • num_convs_in_block (int) – The number of convs pre block. Defaults to 1.

  • conv_cfg (dict) – Config dict for convolution layer. Defaults to None. which means using conv2d. Defaults to None.

  • norm_cfg (dict) – Config dict for normalization layer. Defaults to dict(type=’BN’, momentum=0.03, eps=0.001).

  • act_cfg (dict) – Config dict for activation layer. Defaults to dict(type=’SiLU’, inplace=True).

  • init_cfg (dict or list[dict], optional) – Initialization config dict. Defaults to None.

forward(x: torch.Tensor)torch.Tensor[source]

Forward process :param x: The input tensor. :type x: Tensor

class mmyolo.models.layers.EffectiveSELayer(channels: int, act_cfg: Union[mmengine.config.config.ConfigDict, dict] = {'type': 'HSigmoid'})[source]

Effective Squeeze-Excitation.

From CenterMask : Real-Time Anchor-Free Instance Segmentation arxiv (https://arxiv.org/abs/1911.06667) This code referenced to https://github.com/youngwanLEE/CenterMask/blob/72147e8aae673fcaf4103ee90a6a6b73863e7fa1/maskrcnn_benchmark/modeling/backbone/vovnet.py#L108-L121 # noqa

Parameters
  • channels (int) – The input and output channels of this Module.

  • act_cfg (dict) – Config dict for activation layer. Defaults to dict(type=’HSigmoid’).

forward(x: torch.Tensor)torch.Tensor[source]

Forward process :param x: The input tensor. :type x: Tensor

class mmyolo.models.layers.ExpMomentumEMA(model: torch.nn.modules.module.Module, momentum: float = 0.0002, gamma: int = 2000, interval=1, device: Optional[torch.device] = None, update_buffers: bool = False)[source]

Exponential moving average (EMA) with exponential momentum strategy, which is used in YOLO.

Parameters
  • model (nn.Module) – The model to be averaged.

  • momentum (float) –

    The momentum used for updating ema parameter.

    Ema’s parameters are updated with the formula:

    averaged_param = (1-momentum) * averaged_param + momentum * source_param. Defaults to 0.0002.

  • gamma (int) – Use a larger momentum early in training and gradually annealing to a smaller value to update the ema model smoothly. The momentum is calculated as (1 - momentum) * exp(-(1 + steps) / gamma) + momentum. Defaults to 2000.

  • interval (int) – Interval between two updates. Defaults to 1.

  • device (torch.device, optional) – If provided, the averaged model will be stored on the device. Defaults to None.

  • update_buffers (bool) – if True, it will compute running averages for both the parameters and the buffers of the model. Defaults to False.

avg_func(averaged_param: torch.Tensor, source_param: torch.Tensor, steps: int)[source]

Compute the moving average of the parameters using the exponential momentum strategy.

Parameters
  • averaged_param (Tensor) – The averaged parameters.

  • source_param (Tensor) – The source parameters.

  • steps (int) – The number of times the parameters have been updated.

update_parameters(model: torch.nn.modules.module.Module)[source]

Update the parameters after each training step.

Parameters

model (nn.Module) – The model of the parameter needs to be updated.

class mmyolo.models.layers.ImplicitA(in_channels: int, mean: float = 0.0, std: float = 0.02)[source]

Implicit add layer in YOLOv7.

Parameters
  • in_channels (int) – The input channels of this Module.

  • mean (float) – Mean value of implicit module. Defaults to 0.

  • std (float) – Std value of implicit module. Defaults to 0.02

forward(x)[source]

Forward process :param x: The input tensor. :type x: Tensor

class mmyolo.models.layers.ImplicitM(in_channels: int, mean: float = 1.0, std: float = 0.02)[source]

Implicit multiplier layer in YOLOv7.

Parameters
  • in_channels (int) – The input channels of this Module.

  • mean (float) – Mean value of implicit module. Defaults to 1.

  • std (float) – Std value of implicit module. Defaults to 0.02.

forward(x)[source]

Forward process :param x: The input tensor. :type x: Tensor

class mmyolo.models.layers.MaxPoolAndStrideConvBlock(in_channels: int, out_channels: int, maxpool_kernel_sizes: int = 2, use_in_channels_of_middle: bool = False, conv_cfg: Optional[Union[mmengine.config.config.ConfigDict, dict]] = None, norm_cfg: Union[mmengine.config.config.ConfigDict, dict] = {'eps': 0.001, 'momentum': 0.03, 'type': 'BN'}, act_cfg: Union[mmengine.config.config.ConfigDict, dict] = {'inplace': True, 'type': 'SiLU'}, init_cfg: Optional[Union[mmengine.config.config.ConfigDict, dict, List[Union[dict, mmengine.config.config.ConfigDict]]]] = None)[source]

Max pooling and stride conv layer for YOLOv7.

Parameters
  • in_channels (int) – The input channels of this Module.

  • out_channels (int) – The out channels of this Module.

  • maxpool_kernel_sizes (int) – kernel sizes of pooling layers. Defaults to 2.

  • use_in_channels_of_middle (bool) – Whether to calculate middle channels based on in_channels. Defaults to False.

  • conv_cfg (dict) – Config dict for convolution layer. Defaults to None. which means using conv2d. Defaults to None.

  • norm_cfg (dict) – Config dict for normalization layer. Defaults to dict(type=’BN’, momentum=0.03, eps=0.001).

  • act_cfg (dict) – Config dict for activation layer. Defaults to dict(type=’SiLU’, inplace=True).

  • init_cfg (dict or list[dict], optional) – Initialization config dict. Defaults to None.

forward(x: torch.Tensor)torch.Tensor[source]

Forward process :param x: The input tensor. :type x: Tensor

class mmyolo.models.layers.PPYOLOEBasicBlock(in_channels: int, out_channels: int, norm_cfg: Union[mmengine.config.config.ConfigDict, dict] = {'eps': 1e-05, 'momentum': 0.1, 'type': 'BN'}, act_cfg: Union[mmengine.config.config.ConfigDict, dict] = {'inplace': True, 'type': 'SiLU'}, shortcut: bool = True, use_alpha: bool = False)[source]

PPYOLOE Backbone BasicBlock.

Parameters
  • in_channels (int) – The input channels of this Module.

  • out_channels (int) – The output channels of this Module.

  • norm_cfg (dict) – Config dict for normalization layer. Defaults to dict(type=’BN’, momentum=0.1, eps=1e-5).

  • act_cfg (dict) – Config dict for activation layer. Defaults to dict(type=’SiLU’, inplace=True).

  • shortcut (bool) – Whether to add inputs and outputs together

  • the end of this layer. Defaults to True. (at) –

  • use_alpha (bool) – Whether to use alpha parameter at 1x1 conv.

forward(x: torch.Tensor)torch.Tensor[source]

Forward process. :param inputs: The input tensor. :type inputs: Tensor

Returns

The output tensor.

Return type

Tensor

class mmyolo.models.layers.RepStageBlock(in_channels: int, out_channels: int, num_blocks: int = 1, bottle_block: torch.nn.modules.module.Module = <class 'mmyolo.models.layers.yolo_bricks.RepVGGBlock'>, block_cfg: Union[mmengine.config.config.ConfigDict, dict] = {'type': 'RepVGGBlock'})[source]

RepStageBlock is a stage block with rep-style basic block.

Parameters
  • in_channels (int) – The input channels of this Module.

  • out_channels (int) – The output channels of this Module.

  • num_blocks (int, tuple[int]) – Number of blocks. Defaults to 1.

  • bottle_block (nn.Module) – Basic unit of RepStage. Defaults to RepVGGBlock.

  • block_cfg (ConfigType) – Config of RepStage. Defaults to ‘RepVGGBlock’.

forward(x: torch.Tensor)torch.Tensor[source]

Forward process.

Parameters

x (Tensor) – The input tensor.

Returns

The output tensor.

Return type

Tensor

class mmyolo.models.layers.RepVGGBlock(in_channels: int, out_channels: int, kernel_size: Union[int, Tuple[int]] = 3, stride: Union[int, Tuple[int]] = 1, padding: Union[int, Tuple[int]] = 1, dilation: Union[int, Tuple[int]] = 1, groups: Optional[int] = 1, padding_mode: Optional[str] = 'zeros', norm_cfg: Union[mmengine.config.config.ConfigDict, dict] = {'eps': 0.001, 'momentum': 0.03, 'type': 'BN'}, act_cfg: Union[mmengine.config.config.ConfigDict, dict] = {'inplace': True, 'type': 'ReLU'}, use_se: bool = False, use_alpha: bool = False, use_bn_first=True, deploy: bool = False)[source]

RepVGGBlock is a basic rep-style block, including training and deploy status This code is based on https://github.com/DingXiaoH/RepVGG/blob/main/repvgg.py.

Parameters
  • in_channels (int) – Number of channels in the input image

  • out_channels (int) – Number of channels produced by the convolution

  • kernel_size (int or tuple) – Size of the convolving kernel

  • stride (int or tuple) – Stride of the convolution. Default: 1

  • padding (int, tuple) – Padding added to all four sides of the input. Default: 1

  • dilation (int or tuple) – Spacing between kernel elements. Default: 1

  • groups (int, optional) – Number of blocked connections from input channels to output channels. Default: 1

  • padding_mode (string, optional) – Default: ‘zeros’

  • use_se (bool) – Whether to use se. Default: False

  • use_alpha (bool) – Whether to use alpha parameter at 1x1 conv. In PPYOLOE+ model backbone, use_alpha will be set to True. Default: False.

  • use_bn_first (bool) – Whether to use bn layer before conv. In YOLOv6 and YOLOv7, this will be set to True. In PPYOLOE, this will be set to False. Default: True.

  • deploy (bool) – Whether in deploy mode. Default: False

forward(inputs: torch.Tensor)torch.Tensor[source]

Forward process. :param inputs: The input tensor. :type inputs: Tensor

Returns

The output tensor.

Return type

Tensor

get_equivalent_kernel_bias()[source]

Derives the equivalent kernel and bias in a differentiable way.

Returns

Equivalent kernel and bias

Return type

tuple

switch_to_deploy()[source]

Switch to deploy mode.

class mmyolo.models.layers.SPPFBottleneck(in_channels: int, out_channels: int, kernel_sizes: Union[int, Sequence[int]] = 5, use_conv_first: bool = True, mid_channels_scale: float = 0.5, conv_cfg: Optional[Union[mmengine.config.config.ConfigDict, dict]] = None, norm_cfg: Union[mmengine.config.config.ConfigDict, dict] = {'eps': 0.001, 'momentum': 0.03, 'type': 'BN'}, act_cfg: Union[mmengine.config.config.ConfigDict, dict] = {'inplace': True, 'type': 'SiLU'}, init_cfg: Optional[Union[mmengine.config.config.ConfigDict, dict, List[Union[dict, mmengine.config.config.ConfigDict]]]] = None)[source]

Spatial pyramid pooling - Fast (SPPF) layer for YOLOv5, YOLOX and PPYOLOE by Glenn Jocher

Parameters
  • in_channels (int) – The input channels of this Module.

  • out_channels (int) – The output channels of this Module.

  • kernel_sizes (int, tuple[int]) – Sequential or number of kernel sizes of pooling layers. Defaults to 5.

  • use_conv_first (bool) – Whether to use conv before pooling layer. In YOLOv5 and YOLOX, the para set to True. In PPYOLOE, the para set to False. Defaults to True.

  • mid_channels_scale (float) – Channel multiplier, multiply in_channels by this amount to get mid_channels. This parameter is valid only when use_conv_fist=True.Defaults to 0.5.

  • conv_cfg (dict) – Config dict for convolution layer. Defaults to None. which means using conv2d. Defaults to None.

  • norm_cfg (dict) – Config dict for normalization layer. Defaults to dict(type=’BN’, momentum=0.03, eps=0.001).

  • act_cfg (dict) – Config dict for activation layer. Defaults to dict(type=’SiLU’, inplace=True).

  • init_cfg (dict or list[dict], optional) – Initialization config dict. Defaults to None.

forward(x: torch.Tensor)torch.Tensor[source]

Forward process :param x: The input tensor. :type x: Tensor

class mmyolo.models.layers.SPPFCSPBlock(in_channels: int, out_channels: int, expand_ratio: float = 0.5, kernel_sizes: Union[int, Sequence[int]] = 5, is_tiny_version: bool = False, conv_cfg: Optional[Union[mmengine.config.config.ConfigDict, dict]] = None, norm_cfg: Union[mmengine.config.config.ConfigDict, dict] = {'eps': 0.001, 'momentum': 0.03, 'type': 'BN'}, act_cfg: Union[mmengine.config.config.ConfigDict, dict] = {'inplace': True, 'type': 'SiLU'}, init_cfg: Optional[Union[mmengine.config.config.ConfigDict, dict, List[Union[dict, mmengine.config.config.ConfigDict]]]] = None)[source]

Spatial pyramid pooling - Fast (SPPF) layer with CSP for YOLOv7

Parameters
  • in_channels (int) – The input channels of this Module.

  • out_channels (int) – The output channels of this Module.

  • expand_ratio (float) – Expand ratio of SPPCSPBlock. Defaults to 0.5.

  • kernel_sizes (int, tuple[int]) – Sequential or number of kernel sizes of pooling layers. Defaults to 5.

  • is_tiny_version (bool) – Is tiny version of SPPFCSPBlock. If True, it means it is a yolov7 tiny model. Defaults to False.

  • conv_cfg (dict) – Config dict for convolution layer. Defaults to None. which means using conv2d. Defaults to None.

  • norm_cfg (dict) – Config dict for normalization layer. Defaults to dict(type=’BN’, momentum=0.03, eps=0.001).

  • act_cfg (dict) – Config dict for activation layer. Defaults to dict(type=’SiLU’, inplace=True).

  • init_cfg (dict or list[dict], optional) – Initialization config dict. Defaults to None.

forward(x)torch.Tensor[source]

Forward process :param x: The input tensor. :type x: Tensor

class mmyolo.models.layers.TinyDownSampleBlock(in_channels: int, out_channels: int, middle_ratio: float = 1.0, kernel_sizes: Union[int, Sequence[int]] = 3, conv_cfg: Optional[Union[mmengine.config.config.ConfigDict, dict]] = None, norm_cfg: Union[mmengine.config.config.ConfigDict, dict] = {'eps': 0.001, 'momentum': 0.03, 'type': 'BN'}, act_cfg: Union[mmengine.config.config.ConfigDict, dict] = {'negative_slope': 0.1, 'type': 'LeakyReLU'}, init_cfg: Optional[Union[mmengine.config.config.ConfigDict, dict, List[Union[dict, mmengine.config.config.ConfigDict]]]] = None)[source]

Down sample layer for YOLOv7-tiny.

Parameters
  • in_channels (int) – The input channels of this Module.

  • out_channels (int) – The out channels of this Module.

  • middle_ratio (float) – The scaling ratio of the middle layer based on the in_channels. Defaults to 1.0.

  • kernel_sizes (int, tuple[int]) – Sequential or number of kernel sizes of pooling layers. Defaults to 3.

  • conv_cfg (dict) – Config dict for convolution layer. Defaults to None. which means using conv2d. Defaults to None.

  • norm_cfg (dict) – Config dict for normalization layer. Defaults to dict(type=’BN’, momentum=0.03, eps=0.001).

  • act_cfg (dict) – Config dict for activation layer. Defaults to dict(type=’LeakyReLU’, negative_slope=0.1).

  • init_cfg (dict or list[dict], optional) – Initialization config dict. Defaults to None.

forward(x)torch.Tensor[source]

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

losses

class mmyolo.models.losses.IoULoss(iou_mode: str = 'ciou', bbox_format: str = 'xywh', eps: float = 1e-07, reduction: str = 'mean', loss_weight: float = 1.0, return_iou: bool = True)[source]

IoULoss.

Computing the IoU loss between a set of predicted bboxes and target bboxes. :param iou_mode: Options are “ciou”.

Defaults to “ciou”.

Parameters
  • bbox_format (str) – Options are “xywh” and “xyxy”. Defaults to “xywh”.

  • eps (float) – Eps to avoid log(0).

  • reduction (str) – Options are “none”, “mean” and “sum”.

  • loss_weight (float) – Weight of loss.

  • return_iou (bool) – If True, return loss and iou.

forward(pred: torch.Tensor, target: torch.Tensor, weight: Optional[torch.Tensor] = None, avg_factor: Optional[float] = None, reduction_override: Optional[Union[str, bool]] = None)Tuple[torch.Tensor, torch.Tensor][source]

Forward function.

Parameters
  • pred (Tensor) – Predicted bboxes of format (x1, y1, x2, y2) or (x, y, w, h),shape (n, 4).

  • target (Tensor) – Corresponding gt bboxes, shape (n, 4).

  • weight (Tensor, optional) – Element-wise weights.

  • avg_factor (float, optional) – Average factor when computing the mean of losses.

  • reduction_override (str, bool, optional) – Same as built-in losses of PyTorch. Defaults to None.

Returns

Return type

loss or tuple(loss, iou)

class mmyolo.models.losses.OksLoss(metainfo: Optional[str] = None, loss_weight: float = 1.0)[source]

A PyTorch implementation of the Object Keypoint Similarity (OKS) loss as described in the paper “YOLO-Pose: Enhancing YOLO for Multi Person Pose Estimation Using Object Keypoint Similarity Loss” by Debapriya et al.

(2022). The OKS loss is used for keypoint-based object recognition and consists of a measure of the similarity between predicted and ground truth keypoint locations, adjusted by the size of the object in the image. The loss function takes as input the predicted keypoint locations, the ground truth keypoint locations, a mask indicating which keypoints are valid, and bounding boxes for the objects. :param metainfo: Path to a JSON file containing information

about the dataset’s annotations.

Parameters

loss_weight (float) – Weight for the loss.

compute_oks(output: torch.Tensor, target: torch.Tensor, target_weights: torch.Tensor, bboxes: Optional[torch.Tensor] = None)torch.Tensor[source]

Calculates the OKS loss.

Parameters
  • output (Tensor) – Predicted keypoints in shape N x k x 2, where N is batch size, k is the number of keypoints, and 2 are the xy coordinates.

  • target (Tensor) – Ground truth keypoints in the same shape as output.

  • target_weights (Tensor) – Mask of valid keypoints in shape N x k, with 1 for valid and 0 for invalid.

  • bboxes (Optional[Tensor]) – Bounding boxes in shape N x 4, where 4 are the xyxy coordinates.

Returns

The calculated OKS loss.

Return type

Tensor

forward(output: torch.Tensor, target: torch.Tensor, target_weights: torch.Tensor, bboxes: Optional[torch.Tensor] = None)torch.Tensor[source]

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

mmyolo.models.losses.bbox_overlaps(pred: torch.Tensor, target: torch.Tensor, iou_mode: str = 'ciou', bbox_format: str = 'xywh', siou_theta: float = 4.0, eps: float = 1e-07)torch.Tensor[source]

Calculate overlap between two set of bboxes. Implementation of paper `Enhancing Geometric Factors into Model Learning and Inference for Object Detection and Instance Segmentation.

In the CIoU implementation of YOLOv5 and MMDetection, there is a slight difference in the way the alpha parameter is computed.

mmdet version:

alpha = (ious > 0.5).float() * v / (1 - ious + v)

YOLOv5 version:

alpha = v / (v - ious + (1 + eps)

Parameters
  • pred (Tensor) – Predicted bboxes of format (x1, y1, x2, y2) or (x, y, w, h),shape (n, 4).

  • target (Tensor) – Corresponding gt bboxes, shape (n, 4).

  • iou_mode (str) – Options are (‘iou’, ‘ciou’, ‘giou’, ‘siou’). Defaults to “ciou”.

  • bbox_format (str) – Options are “xywh” and “xyxy”. Defaults to “xywh”.

  • siou_theta (float) – siou_theta for SIoU when calculate shape cost. Defaults to 4.0.

  • eps (float) – Eps to avoid log(0).

Returns

shape (n, ).

Return type

Tensor

necks

class mmyolo.models.necks.BaseYOLONeck(in_channels: List[int], out_channels: Union[int, List[int]], deepen_factor: float = 1.0, widen_factor: float = 1.0, upsample_feats_cat_first: bool = True, freeze_all: bool = False, norm_cfg: Optional[Union[mmengine.config.config.ConfigDict, dict]] = None, act_cfg: Optional[Union[mmengine.config.config.ConfigDict, dict]] = None, init_cfg: Optional[Union[mmengine.config.config.ConfigDict, dict, List[Union[dict, mmengine.config.config.ConfigDict]]]] = None, **kwargs)[source]

Base neck used in YOLO series.

P5 neck model structure diagram
                   +--------+                     +-------+
                   |top_down|----------+--------->|  out  |---> output0
                   | layer1 |          |          | layer0|
                   +--------+          |          +-------+
stride=8                ^              |
idx=0  +------+    +--------+          |
-----> |reduce|--->|   cat  |          |
       |layer0|    +--------+          |
       +------+         ^              v
                   +--------+    +-----------+
                   |upsample|    |downsample |
                   | layer1 |    |  layer0   |
                   +--------+    +-----------+
                        ^              |
                   +--------+          v
                   |top_down|    +-----------+
                   | layer2 |--->|    cat    |
                   +--------+    +-----------+
stride=16               ^              v
idx=1  +------+    +--------+    +-----------+    +-------+
-----> |reduce|--->|   cat  |    | bottom_up |--->|  out  |---> output1
       |layer1|    +--------+    |   layer0  |    | layer1|
       +------+         ^        +-----------+    +-------+
                        |              v
                   +--------+    +-----------+
                   |upsample|    |downsample |
                   | layer2 |    |  layer1   |
stride=32          +--------+    +-----------+
idx=2  +------+         ^              v
-----> |reduce|         |        +-----------+
       |layer2|---------+------->|    cat    |
       +------+                  +-----------+
                                       v
                                 +-----------+    +-------+
                                 | bottom_up |--->|  out  |---> output2
                                 |  layer1   |    | layer2|
                                 +-----------+    +-------+
P6 neck model structure diagram
                   +--------+                     +-------+
                   |top_down|----------+--------->|  out  |---> output0
                   | layer1 |          |          | layer0|
                   +--------+          |          +-------+
stride=8                ^              |
idx=0  +------+    +--------+          |
-----> |reduce|--->|   cat  |          |
       |layer0|    +--------+          |
       +------+         ^              v
                   +--------+    +-----------+
                   |upsample|    |downsample |
                   | layer1 |    |  layer0   |
                   +--------+    +-----------+
                        ^              |
                   +--------+          v
                   |top_down|    +-----------+
                   | layer2 |--->|    cat    |
                   +--------+    +-----------+
stride=16               ^              v
idx=1  +------+    +--------+    +-----------+    +-------+
-----> |reduce|--->|   cat  |    | bottom_up |--->|  out  |---> output1
       |layer1|    +--------+    |   layer0  |    | layer1|
       +------+         ^        +-----------+    +-------+
                        |              v
                   +--------+    +-----------+
                   |upsample|    |downsample |
                   | layer2 |    |  layer1   |
                   +--------+    +-----------+
                        ^              |
                   +--------+          v
                   |top_down|    +-----------+
                   | layer3 |--->|    cat    |
                   +--------+    +-----------+
stride=32               ^              v
idx=2  +------+    +--------+    +-----------+    +-------+
-----> |reduce|--->|   cat  |    | bottom_up |--->|  out  |---> output2
       |layer2|    +--------+    |   layer1  |    | layer2|
       +------+         ^        +-----------+    +-------+
                        |              v
                   +--------+    +-----------+
                   |upsample|    |downsample |
                   | layer3 |    |  layer2   |
                   +--------+    +-----------+
stride=64               ^              v
idx=3  +------+         |        +-----------+
-----> |reduce|---------+------->|    cat    |
       |layer3|                  +-----------+
       +------+                        v
                                 +-----------+    +-------+
                                 | bottom_up |--->|  out  |---> output3
                                 |  layer2   |    | layer3|
                                 +-----------+    +-------+
Parameters
  • in_channels (List[int]) – Number of input channels per scale.

  • out_channels (int) – Number of output channels (used at each scale)

  • deepen_factor (float) – Depth multiplier, multiply number of blocks in CSP layer by this amount. Defaults to 1.0.

  • widen_factor (float) – Width multiplier, multiply number of channels in each layer by this amount. Defaults to 1.0.

  • upsample_feats_cat_first (bool) – Whether the output features are concat first after upsampling in the topdown module. Defaults to True. Currently only YOLOv7 is false.

  • freeze_all (bool) – Whether to freeze the model. Defaults to False

  • norm_cfg (dict) – Config dict for normalization layer. Defaults to None.

  • act_cfg (dict) – Config dict for activation layer. Defaults to None.

  • init_cfg (dict or list[dict], optional) – Initialization config dict. Defaults to None.

abstract build_bottom_up_layer(idx: int)[source]

build bottom up layer.

abstract build_downsample_layer(idx: int)[source]

build downsample layer.

abstract build_out_layer(idx: int)[source]

build out layer.

abstract build_reduce_layer(idx: int)[source]

build reduce layer.

abstract build_top_down_layer(idx: int)[source]

build top down layer.

abstract build_upsample_layer(idx: int)[source]

build upsample layer.

forward(inputs: List[torch.Tensor])tuple[source]

Forward function.

train(mode=True)[source]

Convert the model into training mode while keep the normalization layer freezed.

class mmyolo.models.necks.CSPNeXtPAFPN(in_channels: Sequence[int], out_channels: int, deepen_factor: float = 1.0, widen_factor: float = 1.0, num_csp_blocks: int = 3, freeze_all: bool = False, use_depthwise: bool = False, expand_ratio: float = 0.5, upsample_cfg: Union[mmengine.config.config.ConfigDict, dict] = {'mode': 'nearest', 'scale_factor': 2}, conv_cfg: Optional[bool] = None, norm_cfg: Union[mmengine.config.config.ConfigDict, dict] = {'type': 'BN'}, act_cfg: Union[mmengine.config.config.ConfigDict, dict] = {'inplace': True, 'type': 'SiLU'}, init_cfg: Optional[Union[mmengine.config.config.ConfigDict, dict, List[Union[dict, mmengine.config.config.ConfigDict]]]] = {'a': 2.23606797749979, 'distribution': 'uniform', 'layer': 'Conv2d', 'mode': 'fan_in', 'nonlinearity': 'leaky_relu', 'type': 'Kaiming'})[source]

Path Aggregation Network with CSPNeXt blocks.

Parameters
  • in_channels (Sequence[int]) – Number of input channels per scale.

  • out_channels (int) – Number of output channels (used at each scale)

  • deepen_factor (float) – Depth multiplier, multiply number of blocks in CSP layer by this amount. Defaults to 1.0.

  • widen_factor (float) – Width multiplier, multiply number of channels in each layer by this amount. Defaults to 1.0.

  • num_csp_blocks (int) – Number of bottlenecks in CSPLayer. Defaults to 3.

  • use_depthwise (bool) – Whether to use depthwise separable convolution in blocks. Defaults to False.

  • expand_ratio (float) – Ratio to adjust the number of channels of the hidden layer. Defaults to 0.5.

  • upsample_cfg (dict) – Config dict for interpolate layer. Default: dict(scale_factor=2, mode=’nearest’)

  • conv_cfg (dict, optional) – Config dict for convolution layer. Default: None, which means using conv2d.

  • norm_cfg (dict) – Config dict for normalization layer. Default: dict(type=’BN’)

  • act_cfg (dict) – Config dict for activation layer. Default: dict(type=’SiLU’, inplace=True)

  • init_cfg (dict or list[dict], optional) – Initialization config dict. Default: None.

build_bottom_up_layer(idx: int)torch.nn.modules.module.Module[source]

build bottom up layer.

Parameters

idx (int) – layer idx.

Returns

The bottom up layer.

Return type

nn.Module

build_downsample_layer(idx: int)torch.nn.modules.module.Module[source]

build downsample layer.

Parameters

idx (int) – layer idx.

Returns

The downsample layer.

Return type

nn.Module

build_out_layer(idx: int)torch.nn.modules.module.Module[source]

build out layer.

Parameters

idx (int) – layer idx.

Returns

The out layer.

Return type

nn.Module

build_reduce_layer(idx: int)torch.nn.modules.module.Module[source]

build reduce layer.

Parameters

idx (int) – layer idx.

Returns

The reduce layer.

Return type

nn.Module

build_top_down_layer(idx: int)torch.nn.modules.module.Module[source]

build top down layer.

Parameters

idx (int) – layer idx.

Returns

The top down layer.

Return type

nn.Module

build_upsample_layer(*args, **kwargs)torch.nn.modules.module.Module[source]

build upsample layer.

class mmyolo.models.necks.PPYOLOECSPPAFPN(in_channels: List[int] = [256, 512, 1024], out_channels: List[int] = [256, 512, 1024], deepen_factor: float = 1.0, widen_factor: float = 1.0, freeze_all: bool = False, num_csplayer: int = 1, num_blocks_per_layer: int = 3, block_cfg: Union[mmengine.config.config.ConfigDict, dict] = {'shortcut': False, 'type': 'PPYOLOEBasicBlock', 'use_alpha': False}, norm_cfg: Union[mmengine.config.config.ConfigDict, dict] = {'eps': 1e-05, 'momentum': 0.1, 'type': 'BN'}, act_cfg: Union[mmengine.config.config.ConfigDict, dict] = {'inplace': True, 'type': 'SiLU'}, drop_block_cfg: Optional[Union[mmengine.config.config.ConfigDict, dict]] = None, init_cfg: Optional[Union[mmengine.config.config.ConfigDict, dict, List[Union[dict, mmengine.config.config.ConfigDict]]]] = None, use_spp: bool = False)[source]

CSPPAN in PPYOLOE.

Parameters
  • in_channels (List[int]) – Number of input channels per scale.

  • out_channels (List[int]) – Number of output channels (used at each scale).

  • deepen_factor (float) – Depth multiplier, multiply number of blocks in CSP layer by this amount. Defaults to 1.0.

  • widen_factor (float) – Width multiplier, multiply number of channels in each layer by this amount. Defaults to 1.0.

  • freeze_all (bool) – Whether to freeze the model.

  • num_csplayer (int) – Number of CSPResLayer in per layer. Defaults to 1.

  • num_blocks_per_layer (int) – Number of blocks per CSPResLayer. Defaults to 3.

  • block_cfg (dict) – Config dict for block. Defaults to dict(type=’PPYOLOEBasicBlock’, shortcut=True, use_alpha=False)

  • norm_cfg (dict) – Config dict for normalization layer. Defaults to dict(type=’BN’, momentum=0.1, eps=1e-5).

  • act_cfg (dict) – Config dict for activation layer. Defaults to dict(type=’SiLU’, inplace=True).

  • drop_block_cfg (dict, optional) – Drop block config. Defaults to None. If you want to use Drop block after CSPResLayer, you can set this para as dict(type=’mmdet.DropBlock’, drop_prob=0.1, block_size=3, warm_iters=0).

  • init_cfg (dict or list[dict], optional) – Initialization config dict. Defaults to None.

  • use_spp (bool) – Whether to use SPP in reduce layer. Defaults to False.

build_bottom_up_layer(idx: int)torch.nn.modules.module.Module[source]

build bottom up layer.

Parameters

idx (int) – layer idx.

Returns

The bottom up layer.

Return type

nn.Module

build_downsample_layer(idx: int)torch.nn.modules.module.Module[source]

build downsample layer.

Parameters

idx (int) – layer idx.

Returns

The downsample layer.

Return type

nn.Module

build_out_layer(*args, **kwargs)torch.nn.modules.module.Module[source]

build out layer.

build_reduce_layer(idx: int)[source]

build reduce layer.

Parameters

idx (int) – layer idx.

Returns

The reduce layer.

Return type

nn.Module

build_top_down_layer(idx: int)torch.nn.modules.module.Module[source]

build top down layer.

Parameters

idx (int) – layer idx.

Returns

The top down layer.

Return type

nn.Module

build_upsample_layer(idx: int)torch.nn.modules.module.Module[source]

build upsample layer.

class mmyolo.models.necks.YOLOXPAFPN(in_channels: List[int], out_channels: int, deepen_factor: float = 1.0, widen_factor: float = 1.0, num_csp_blocks: int = 3, use_depthwise: bool = False, freeze_all: bool = False, norm_cfg: Union[mmengine.config.config.ConfigDict, dict] = {'eps': 0.001, 'momentum': 0.03, 'type': 'BN'}, act_cfg: Union[mmengine.config.config.ConfigDict, dict] = {'inplace': True, 'type': 'SiLU'}, init_cfg: Optional[Union[mmengine.config.config.ConfigDict, dict, List[Union[dict, mmengine.config.config.ConfigDict]]]] = None)[source]

Path Aggregation Network used in YOLOX.

Parameters
  • in_channels (List[int]) – Number of input channels per scale.

  • out_channels (int) – Number of output channels (used at each scale).

  • deepen_factor (float) – Depth multiplier, multiply number of blocks in CSP layer by this amount. Defaults to 1.0.

  • widen_factor (float) – Width multiplier, multiply number of channels in each layer by this amount. Defaults to 1.0.

  • num_csp_blocks (int) – Number of bottlenecks in CSPLayer. Defaults to 1.

  • use_depthwise (bool) – Whether to use depthwise separable convolution. Defaults to False.

  • freeze_all (bool) – Whether to freeze the model. Defaults to False.

  • norm_cfg (dict) – Config dict for normalization layer. Defaults to dict(type=’BN’, momentum=0.03, eps=0.001).

  • act_cfg (dict) – Config dict for activation layer. Defaults to dict(type=’SiLU’, inplace=True).

  • init_cfg (dict or list[dict], optional) – Initialization config dict. Defaults to None.

build_bottom_up_layer(idx: int)torch.nn.modules.module.Module[source]

build bottom up layer.

Parameters

idx (int) – layer idx.

Returns

The bottom up layer.

Return type

nn.Module

build_downsample_layer(idx: int)torch.nn.modules.module.Module[source]

build downsample layer.

Parameters

idx (int) – layer idx.

Returns

The downsample layer.

Return type

nn.Module

build_out_layer(idx: int)torch.nn.modules.module.Module[source]

build out layer.

Parameters

idx (int) – layer idx.

Returns

The out layer.

Return type

nn.Module

build_reduce_layer(idx: int)torch.nn.modules.module.Module[source]

build reduce layer.

Parameters

idx (int) – layer idx.

Returns

The reduce layer.

Return type

nn.Module

build_top_down_layer(idx: int)torch.nn.modules.module.Module[source]

build top down layer.

Parameters

idx (int) – layer idx.

Returns

The top down layer.

Return type

nn.Module

build_upsample_layer(*args, **kwargs)torch.nn.modules.module.Module[source]

build upsample layer.

class mmyolo.models.necks.YOLOv5PAFPN(in_channels: List[int], out_channels: Union[List[int], int], deepen_factor: float = 1.0, widen_factor: float = 1.0, num_csp_blocks: int = 1, freeze_all: bool = False, norm_cfg: Union[mmengine.config.config.ConfigDict, dict] = {'eps': 0.001, 'momentum': 0.03, 'type': 'BN'}, act_cfg: Union[mmengine.config.config.ConfigDict, dict] = {'inplace': True, 'type': 'SiLU'}, init_cfg: Optional[Union[mmengine.config.config.ConfigDict, dict, List[Union[dict, mmengine.config.config.ConfigDict]]]] = None)[source]

Path Aggregation Network used in YOLOv5.

Parameters
  • in_channels (List[int]) – Number of input channels per scale.

  • out_channels (int) – Number of output channels (used at each scale)

  • deepen_factor (float) – Depth multiplier, multiply number of blocks in CSP layer by this amount. Defaults to 1.0.

  • widen_factor (float) – Width multiplier, multiply number of channels in each layer by this amount. Defaults to 1.0.

  • num_csp_blocks (int) – Number of bottlenecks in CSPLayer. Defaults to 1.

  • freeze_all (bool) – Whether to freeze the model

  • norm_cfg (dict) – Config dict for normalization layer. Defaults to dict(type=’BN’, momentum=0.03, eps=0.001).

  • act_cfg (dict) – Config dict for activation layer. Defaults to dict(type=’SiLU’, inplace=True).

  • init_cfg (dict or list[dict], optional) – Initialization config dict. Defaults to None.

build_bottom_up_layer(idx: int)torch.nn.modules.module.Module[source]

build bottom up layer.

Parameters

idx (int) – layer idx.

Returns

The bottom up layer.

Return type

nn.Module

build_downsample_layer(idx: int)torch.nn.modules.module.Module[source]

build downsample layer.

Parameters

idx (int) – layer idx.

Returns

The downsample layer.

Return type

nn.Module

build_out_layer(*args, **kwargs)torch.nn.modules.module.Module[source]

build out layer.

build_reduce_layer(idx: int)torch.nn.modules.module.Module[source]

build reduce layer.

Parameters

idx (int) – layer idx.

Returns

The reduce layer.

Return type

nn.Module

build_top_down_layer(idx: int)[source]

build top down layer.

Parameters

idx (int) – layer idx.

Returns

The top down layer.

Return type

nn.Module

build_upsample_layer(*args, **kwargs)torch.nn.modules.module.Module[source]

build upsample layer.

init_weights()[source]

Initialize the weights.

class mmyolo.models.necks.YOLOv6CSPRepBiPAFPN(in_channels: List[int], out_channels: int, deepen_factor: float = 1.0, widen_factor: float = 1.0, hidden_ratio: float = 0.5, num_csp_blocks: int = 12, freeze_all: bool = False, norm_cfg: Union[mmengine.config.config.ConfigDict, dict] = {'eps': 0.001, 'momentum': 0.03, 'type': 'BN'}, act_cfg: Union[mmengine.config.config.ConfigDict, dict] = {'inplace': True, 'type': 'ReLU'}, block_act_cfg: Union[mmengine.config.config.ConfigDict, dict] = {'inplace': True, 'type': 'SiLU'}, block_cfg: Union[mmengine.config.config.ConfigDict, dict] = {'type': 'RepVGGBlock'}, init_cfg: Optional[Union[mmengine.config.config.ConfigDict, dict, List[Union[dict, mmengine.config.config.ConfigDict]]]] = None)[source]

Path Aggregation Network used in YOLOv6 3.0.

Parameters
  • in_channels (List[int]) – Number of input channels per scale.

  • out_channels (int) – Number of output channels (used at each scale)

  • deepen_factor (float) – Depth multiplier, multiply number of blocks in CSP layer by this amount. Defaults to 1.0.

  • widen_factor (float) – Width multiplier, multiply number of channels in each layer by this amount. Defaults to 1.0.

  • num_csp_blocks (int) – Number of bottlenecks in CSPLayer. Defaults to 1.

  • freeze_all (bool) – Whether to freeze the model.

  • norm_cfg (dict) – Config dict for normalization layer. Defaults to dict(type=’BN’, momentum=0.03, eps=0.001).

  • act_cfg (dict) – Config dict for activation layer. Defaults to dict(type=’ReLU’, inplace=True).

  • block_cfg (dict) – Config dict for the block used to build each layer. Defaults to dict(type=’RepVGGBlock’).

  • block_act_cfg (dict) – Config dict for activation layer used in each stage. Defaults to dict(type=’SiLU’, inplace=True).

  • init_cfg (dict or list[dict], optional) – Initialization config dict. Defaults to None.

build_bottom_up_layer(idx: int)torch.nn.modules.module.Module[source]

build bottom up layer.

Parameters

idx (int) – layer idx.

Returns

The bottom up layer.

Return type

nn.Module

build_top_down_layer(idx: int)torch.nn.modules.module.Module[source]

build top down layer.

Parameters

idx (int) – layer idx.

Returns

The top down layer.

Return type

nn.Module

class mmyolo.models.necks.YOLOv6CSPRepPAFPN(in_channels: List[int], out_channels: int, deepen_factor: float = 1.0, widen_factor: float = 1.0, hidden_ratio: float = 0.5, num_csp_blocks: int = 12, freeze_all: bool = False, norm_cfg: Union[mmengine.config.config.ConfigDict, dict] = {'eps': 0.001, 'momentum': 0.03, 'type': 'BN'}, act_cfg: Union[mmengine.config.config.ConfigDict, dict] = {'inplace': True, 'type': 'ReLU'}, block_act_cfg: Union[mmengine.config.config.ConfigDict, dict] = {'inplace': True, 'type': 'SiLU'}, block_cfg: Union[mmengine.config.config.ConfigDict, dict] = {'type': 'RepVGGBlock'}, init_cfg: Optional[Union[mmengine.config.config.ConfigDict, dict, List[Union[dict, mmengine.config.config.ConfigDict]]]] = None)[source]

Path Aggregation Network used in YOLOv6.

Parameters
  • in_channels (List[int]) – Number of input channels per scale.

  • out_channels (int) – Number of output channels (used at each scale)

  • deepen_factor (float) – Depth multiplier, multiply number of blocks in CSP layer by this amount. Defaults to 1.0.

  • widen_factor (float) – Width multiplier, multiply number of channels in each layer by this amount. Defaults to 1.0.

  • num_csp_blocks (int) – Number of bottlenecks in CSPLayer. Defaults to 1.

  • freeze_all (bool) – Whether to freeze the model.

  • norm_cfg (dict) – Config dict for normalization layer. Defaults to dict(type=’BN’, momentum=0.03, eps=0.001).

  • act_cfg (dict) – Config dict for activation layer. Defaults to dict(type=’ReLU’, inplace=True).

  • block_cfg (dict) – Config dict for the block used to build each layer. Defaults to dict(type=’RepVGGBlock’).

  • block_act_cfg (dict) – Config dict for activation layer used in each stage. Defaults to dict(type=’SiLU’, inplace=True).

  • init_cfg (dict or list[dict], optional) – Initialization config dict. Defaults to None.

build_bottom_up_layer(idx: int)torch.nn.modules.module.Module[source]

build bottom up layer.

Parameters

idx (int) – layer idx.

Returns

The bottom up layer.

Return type

nn.Module

build_top_down_layer(idx: int)torch.nn.modules.module.Module[source]

build top down layer.

Parameters

idx (int) – layer idx.

Returns

The top down layer.

Return type

nn.Module

class mmyolo.models.necks.YOLOv6RepBiPAFPN(in_channels: List[int], out_channels: int, deepen_factor: float = 1.0, widen_factor: float = 1.0, num_csp_blocks: int = 12, freeze_all: bool = False, norm_cfg: Union[mmengine.config.config.ConfigDict, dict] = {'eps': 0.001, 'momentum': 0.03, 'type': 'BN'}, act_cfg: Union[mmengine.config.config.ConfigDict, dict] = {'inplace': True, 'type': 'ReLU'}, block_cfg: Union[mmengine.config.config.ConfigDict, dict] = {'type': 'RepVGGBlock'}, init_cfg: Optional[Union[mmengine.config.config.ConfigDict, dict, List[Union[dict, mmengine.config.config.ConfigDict]]]] = None)[source]

Path Aggregation Network used in YOLOv6 3.0.

Parameters
  • in_channels (List[int]) – Number of input channels per scale.

  • out_channels (int) – Number of output channels (used at each scale)

  • deepen_factor (float) – Depth multiplier, multiply number of blocks in CSP layer by this amount. Defaults to 1.0.

  • widen_factor (float) – Width multiplier, multiply number of channels in each layer by this amount. Defaults to 1.0.

  • num_csp_blocks (int) – Number of bottlenecks in CSPLayer. Defaults to 1.

  • freeze_all (bool) – Whether to freeze the model.

  • norm_cfg (dict) – Config dict for normalization layer. Defaults to dict(type=’BN’, momentum=0.03, eps=0.001).

  • act_cfg (dict) – Config dict for activation layer. Defaults to dict(type=’ReLU’, inplace=True).

  • block_cfg (dict) – Config dict for the block used to build each layer. Defaults to dict(type=’RepVGGBlock’).

  • init_cfg (dict or list[dict], optional) – Initialization config dict. Defaults to None.

build_top_down_layer(idx: int)torch.nn.modules.module.Module[source]

build top down layer.

Parameters

idx (int) – layer idx.

Returns

The top down layer.

Return type

nn.Module

build_upsample_layer(idx: int)torch.nn.modules.module.Module[source]

build upsample layer.

Parameters

idx (int) – layer idx.

Returns

The upsample layer.

Return type

nn.Module

forward(inputs: List[torch.Tensor])tuple[source]

Forward function.

class mmyolo.models.necks.YOLOv6RepPAFPN(in_channels: List[int], out_channels: int, deepen_factor: float = 1.0, widen_factor: float = 1.0, num_csp_blocks: int = 12, freeze_all: bool = False, norm_cfg: Union[mmengine.config.config.ConfigDict, dict] = {'eps': 0.001, 'momentum': 0.03, 'type': 'BN'}, act_cfg: Union[mmengine.config.config.ConfigDict, dict] = {'inplace': True, 'type': 'ReLU'}, block_cfg: Union[mmengine.config.config.ConfigDict, dict] = {'type': 'RepVGGBlock'}, init_cfg: Optional[Union[mmengine.config.config.ConfigDict, dict, List[Union[dict, mmengine.config.config.ConfigDict]]]] = None)[source]

Path Aggregation Network used in YOLOv6.

Parameters
  • in_channels (List[int]) – Number of input channels per scale.

  • out_channels (int) – Number of output channels (used at each scale)

  • deepen_factor (float) – Depth multiplier, multiply number of blocks in CSP layer by this amount. Defaults to 1.0.

  • widen_factor (float) – Width multiplier, multiply number of channels in each layer by this amount. Defaults to 1.0.

  • num_csp_blocks (int) – Number of bottlenecks in CSPLayer. Defaults to 1.

  • freeze_all (bool) – Whether to freeze the model.

  • norm_cfg (dict) – Config dict for normalization layer. Defaults to dict(type=’BN’, momentum=0.03, eps=0.001).

  • act_cfg (dict) – Config dict for activation layer. Defaults to dict(type=’ReLU’, inplace=True).

  • block_cfg (dict) – Config dict for the block used to build each layer. Defaults to dict(type=’RepVGGBlock’).

  • init_cfg (dict or list[dict], optional) – Initialization config dict. Defaults to None.

build_bottom_up_layer(idx: int)torch.nn.modules.module.Module[source]

build bottom up layer.

Parameters

idx (int) – layer idx.

Returns

The bottom up layer.

Return type

nn.Module

build_downsample_layer(idx: int)torch.nn.modules.module.Module[source]

build downsample layer.

Parameters

idx (int) – layer idx.

Returns

The downsample layer.

Return type

nn.Module

build_out_layer(*args, **kwargs)torch.nn.modules.module.Module[source]

build out layer.

build_reduce_layer(idx: int)torch.nn.modules.module.Module[source]

build reduce layer.

Parameters

idx (int) – layer idx.

Returns

The reduce layer.

Return type

nn.Module

build_top_down_layer(idx: int)torch.nn.modules.module.Module[source]

build top down layer.

Parameters

idx (int) – layer idx.

Returns

The top down layer.

Return type

nn.Module

build_upsample_layer(idx: int)torch.nn.modules.module.Module[source]

build upsample layer.

Parameters

idx (int) – layer idx.

Returns

The upsample layer.

Return type

nn.Module

init_weights()[source]

Initialize the weights.

class mmyolo.models.necks.YOLOv7PAFPN(in_channels: List[int], out_channels: List[int], block_cfg: dict = {'block_ratio': 0.25, 'middle_ratio': 0.5, 'num_blocks': 4, 'num_convs_in_block': 1, 'type': 'ELANBlock'}, deepen_factor: float = 1.0, widen_factor: float = 1.0, spp_expand_ratio: float = 0.5, is_tiny_version: bool = False, use_maxpool_in_downsample: bool = True, use_in_channels_in_downsample: bool = False, use_repconv_outs: bool = True, upsample_feats_cat_first: bool = False, freeze_all: bool = False, norm_cfg: Union[mmengine.config.config.ConfigDict, dict] = {'eps': 0.001, 'momentum': 0.03, 'type': 'BN'}, act_cfg: Union[mmengine.config.config.ConfigDict, dict] = {'inplace': True, 'type': 'SiLU'}, init_cfg: Optional[Union[mmengine.config.config.ConfigDict, dict, List[Union[dict, mmengine.config.config.ConfigDict]]]] = None)[source]

Path Aggregation Network used in YOLOv7.

Parameters
  • in_channels (List[int]) – Number of input channels per scale.

  • out_channels (int) – Number of output channels (used at each scale).

  • block_cfg (dict) – Config dict for block.

  • deepen_factor (float) – Depth multiplier, multiply number of blocks in CSP layer by this amount. Defaults to 1.0.

  • widen_factor (float) – Width multiplier, multiply number of channels in each layer by this amount. Defaults to 1.0.

  • spp_expand_ratio (float) – Expand ratio of SPPCSPBlock. Defaults to 0.5.

  • is_tiny_version (bool) – Is tiny version of neck. If True, it means it is a yolov7 tiny model. Defaults to False.

  • use_maxpool_in_downsample (bool) – Whether maxpooling is used in downsample layers. Defaults to True.

  • use_in_channels_in_downsample (bool) – MaxPoolAndStrideConvBlock module input parameters. Defaults to False.

  • use_repconv_outs (bool) – Whether to use repconv in the output layer. Defaults to True.

  • upsample_feats_cat_first (bool) – Whether the output features are concat first after upsampling in the topdown module. Defaults to True. Currently only YOLOv7 is false.

  • freeze_all (bool) – Whether to freeze the model. Defaults to False.

  • norm_cfg (dict) – Config dict for normalization layer. Defaults to dict(type=’BN’, momentum=0.03, eps=0.001).

  • act_cfg (dict) – Config dict for activation layer. Defaults to dict(type=’SiLU’, inplace=True).

  • init_cfg (dict or list[dict], optional) – Initialization config dict. Defaults to None.

build_bottom_up_layer(idx: int)torch.nn.modules.module.Module[source]

build bottom up layer.

Parameters

idx (int) – layer idx.

Returns

The bottom up layer.

Return type

nn.Module

build_downsample_layer(idx: int)torch.nn.modules.module.Module[source]

build downsample layer.

Parameters

idx (int) – layer idx.

Returns

The downsample layer.

Return type

nn.Module

build_out_layer(idx: int)torch.nn.modules.module.Module[source]

build out layer.

Parameters

idx (int) – layer idx.

Returns

The out layer.

Return type

nn.Module

build_reduce_layer(idx: int)torch.nn.modules.module.Module[source]

build reduce layer.

Parameters

idx (int) – layer idx.

Returns

The reduce layer.

Return type

nn.Module

build_top_down_layer(idx: int)torch.nn.modules.module.Module[source]

build top down layer.

Parameters

idx (int) – layer idx.

Returns

The top down layer.

Return type

nn.Module

build_upsample_layer(idx: int)torch.nn.modules.module.Module[source]

build upsample layer.

class mmyolo.models.necks.YOLOv8PAFPN(in_channels: List[int], out_channels: Union[List[int], int], deepen_factor: float = 1.0, widen_factor: float = 1.0, num_csp_blocks: int = 3, freeze_all: bool = False, norm_cfg: Union[mmengine.config.config.ConfigDict, dict] = {'eps': 0.001, 'momentum': 0.03, 'type': 'BN'}, act_cfg: Union[mmengine.config.config.ConfigDict, dict] = {'inplace': True, 'type': 'SiLU'}, init_cfg: Optional[Union[mmengine.config.config.ConfigDict, dict, List[Union[dict, mmengine.config.config.ConfigDict]]]] = None)[source]

Path Aggregation Network used in YOLOv8.

Parameters
  • in_channels (List[int]) – Number of input channels per scale.

  • out_channels (int) – Number of output channels (used at each scale)

  • deepen_factor (float) – Depth multiplier, multiply number of blocks in CSP layer by this amount. Defaults to 1.0.

  • widen_factor (float) – Width multiplier, multiply number of channels in each layer by this amount. Defaults to 1.0.

  • num_csp_blocks (int) – Number of bottlenecks in CSPLayer. Defaults to 1.

  • freeze_all (bool) – Whether to freeze the model

  • norm_cfg (dict) – Config dict for normalization layer. Defaults to dict(type=’BN’, momentum=0.03, eps=0.001).

  • act_cfg (dict) – Config dict for activation layer. Defaults to dict(type=’SiLU’, inplace=True).

  • init_cfg (dict or list[dict], optional) – Initialization config dict. Defaults to None.

build_bottom_up_layer(idx: int)torch.nn.modules.module.Module[source]

build bottom up layer.

Parameters

idx (int) – layer idx.

Returns

The bottom up layer.

Return type

nn.Module

build_reduce_layer(idx: int)torch.nn.modules.module.Module[source]

build reduce layer.

Parameters

idx (int) – layer idx.

Returns

The reduce layer.

Return type

nn.Module

build_top_down_layer(idx: int)torch.nn.modules.module.Module[source]

build top down layer.

Parameters

idx (int) – layer idx.

Returns

The top down layer.

Return type

nn.Module

task_modules

class mmyolo.models.task_modules.BatchATSSAssigner(num_classes: int, iou_calculator: Union[mmengine.config.config.ConfigDict, dict] = {'type': 'mmdet.BboxOverlaps2D'}, topk: int = 9)[source]

Assign a batch of corresponding gt bboxes or background to each prior.

This code is based on https://github.com/meituan/YOLOv6/blob/main/yolov6/assigners/atss_assigner.py

Each proposal will be assigned with 0 or a positive integer indicating the ground truth index.

  • 0: negative sample, no assigned gt

  • positive integer: positive sample, index (1-based) of assigned gt

Parameters
  • num_classes (int) – number of class

  • iou_calculator (ConfigDict or dict) – Config dict for iou calculator. Defaults to dict(type='BboxOverlaps2D')

  • topk (int) – number of priors selected in each level

forward(pred_bboxes: torch.Tensor, priors: torch.Tensor, num_level_priors: List, gt_labels: torch.Tensor, gt_bboxes: torch.Tensor, pad_bbox_flag: torch.Tensor)dict[source]

Assign gt to priors.

The assignment is done in following steps

  1. compute iou between all prior (prior of all pyramid levels) and gt

  2. compute center distance between all prior and gt

  3. on each pyramid level, for each gt, select k prior whose center are closest to the gt center, so we total select k*l prior as candidates for each gt

  4. get corresponding iou for the these candidates, and compute the mean and std, set mean + std as the iou threshold

  5. select these candidates whose iou are greater than or equal to the threshold as positive

  6. limit the positive sample’s center in gt

Parameters
  • pred_bboxes (Tensor) – Predicted bounding boxes, shape(batch_size, num_priors, 4)

  • priors (Tensor) – Model priors with stride, shape(num_priors, 4)

  • num_level_priors (List) – Number of bboxes in each level, len(3)

  • gt_labels (Tensor) – Ground truth label, shape(batch_size, num_gt, 1)

  • gt_bboxes (Tensor) – Ground truth bbox, shape(batch_size, num_gt, 4)

  • pad_bbox_flag (Tensor) – Ground truth bbox mask, 1 means bbox, 0 means no bbox, shape(batch_size, num_gt, 1)

Returns

Assigned result

’assigned_labels’ (Tensor): shape(batch_size, num_gt) ‘assigned_bboxes’ (Tensor): shape(batch_size, num_gt, 4) ‘assigned_scores’ (Tensor):

shape(batch_size, num_gt, number_classes)

’fg_mask_pre_prior’ (Tensor): shape(bs, num_gt)

Return type

assigned_result (dict)

get_targets(gt_labels: torch.Tensor, gt_bboxes: torch.Tensor, assigned_gt_inds: torch.Tensor, fg_mask_pre_prior: torch.Tensor, num_priors: int, batch_size: int, num_gt: int)Tuple[torch.Tensor, torch.Tensor, torch.Tensor][source]

Get target info.

Parameters
  • gt_labels (Tensor) – Ground true labels, shape(batch_size, num_gt, 1)

  • gt_bboxes (Tensor) – Ground true bboxes, shape(batch_size, num_gt, 4)

  • assigned_gt_inds (Tensor) – Assigned ground truth indexes, shape(batch_size, num_priors)

  • fg_mask_pre_prior (Tensor) – Force ground truth matching mask, shape(batch_size, num_priors)

  • num_priors (int) – Number of priors.

  • batch_size (int) – Batch size.

  • num_gt (int) – Number of ground truth.

Returns

Assigned labels,

shape(batch_size, num_priors)

assigned_bboxes (Tensor): Assigned bboxes,

shape(batch_size, num_priors)

assigned_scores (Tensor): Assigned scores,

shape(batch_size, num_priors)

Return type

assigned_labels (Tensor)

select_topk_candidates(distances: torch.Tensor, num_level_priors: List[int], pad_bbox_flag: torch.Tensor)Tuple[torch.Tensor, torch.Tensor][source]

Selecting candidates based on the center distance.

Parameters
  • distances (Tensor) – Distance between all bbox and gt, shape(batch_size, num_gt, num_priors)

  • num_level_priors (List[int]) – Number of bboxes in each level, len(3)

  • pad_bbox_flag (Tensor) – Ground truth bbox mask, shape(batch_size, num_gt, 1)

Returns

Flag show that each level have

topk candidates or not, shape(batch_size, num_gt, num_priors)

candidate_idxs (Tensor): Candidates index,

shape(batch_size, num_gt, num_gt)

Return type

is_in_candidate_list (Tensor)

static threshold_calculator(is_in_candidate: List, candidate_idxs: torch.Tensor, overlaps: torch.Tensor, num_priors: int, batch_size: int, num_gt: int)Tuple[torch.Tensor, torch.Tensor][source]

Get corresponding iou for the these candidates, and compute the mean and std, set mean + std as the iou threshold.

Parameters
  • is_in_candidate (Tensor) – Flag show that each level have topk candidates or not, shape(batch_size, num_gt, num_priors).

  • candidate_idxs (Tensor) – Candidates index, shape(batch_size, num_gt, num_gt)

  • overlaps (Tensor) – Overlaps area, shape(batch_size, num_gt, num_priors).

  • num_priors (int) – Number of priors.

  • batch_size (int) – Batch size.

  • num_gt (int) – Number of ground truth.

Returns

Overlap threshold of

per ground truth, shape(batch_size, num_gt, 1).

candidate_overlaps (Tensor): Candidate overlaps,

shape(batch_size, num_gt, num_priors).

Return type

overlaps_thr_per_gt (Tensor)

class mmyolo.models.task_modules.BatchTaskAlignedAssigner(num_classes: int, topk: int = 13, alpha: float = 1.0, beta: float = 6.0, eps: float = 1e-07, use_ciou: bool = False)[source]

This code referenced to https://github.com/meituan/YOLOv6/blob/main/yolov6/ assigners/tal_assigner.py. Batch Task aligned assigner base on the paper: TOOD: Task-aligned One-stage Object Detection.. Assign a corresponding gt bboxes or background to a batch of predicted bboxes. Each bbox will be assigned with 0 or a positive integer indicating the ground truth index. - 0: negative sample, no assigned gt - positive integer: positive sample, index (1-based) of assigned gt :param num_classes: number of class :type num_classes: int :param topk: number of bbox selected in each level :type topk: int :param alpha: Hyper-parameters related to alignment_metrics.

Defaults to 1.0

Parameters
  • beta (float) – Hyper-parameters related to alignment_metrics. Defaults to 6.

  • eps (float) – Eps to avoid log(0). Default set to 1e-9

  • use_ciou (bool) – Whether to use ciou while calculating iou. Defaults to False.

forward(pred_bboxes: torch.Tensor, pred_scores: torch.Tensor, priors: torch.Tensor, gt_labels: torch.Tensor, gt_bboxes: torch.Tensor, pad_bbox_flag: torch.Tensor)dict[source]

Assign gt to bboxes.

The assignment is done in following steps 1. compute alignment metric between all bbox (bbox of all pyramid

levels) and gt

  1. select top-k bbox as candidates for each gt

  2. limit the positive sample’s center in gt (because the anchor-free detector only can predict positive distance)

Parameters
  • pred_bboxes (Tensor) – Predict bboxes, shape(batch_size, num_priors, 4)

  • pred_scores (Tensor) – Scores of predict bboxes, shape(batch_size, num_priors, num_classes)

  • priors (Tensor) – Model priors, shape (num_priors, 4)

  • gt_labels (Tensor) – Ground true labels, shape(batch_size, num_gt, 1)

  • gt_bboxes (Tensor) – Ground true bboxes, shape(batch_size, num_gt, 4)

  • pad_bbox_flag (Tensor) – Ground truth bbox mask, 1 means bbox, 0 means no bbox, shape(batch_size, num_gt, 1)

Returns

assigned_labels (Tensor): Assigned labels,

shape(batch_size, num_priors)

assigned_bboxes (Tensor): Assigned boxes,

shape(batch_size, num_priors, 4)

assigned_scores (Tensor): Assigned scores,

shape(batch_size, num_priors, num_classes)

fg_mask_pre_prior (Tensor): Force ground truth matching mask,

shape(batch_size, num_priors)

Return type

assigned_result (dict) Assigned result

get_box_metrics(pred_bboxes: torch.Tensor, pred_scores: torch.Tensor, gt_labels: torch.Tensor, gt_bboxes: torch.Tensor, batch_size: int, num_gt: int)Tuple[torch.Tensor, torch.Tensor][source]

Compute alignment metric between all bbox and gt.

Parameters
  • pred_bboxes (Tensor) – Predict bboxes, shape(batch_size, num_priors, 4)

  • pred_scores (Tensor) – Scores of predict bbox, shape(batch_size, num_priors, num_classes)

  • gt_labels (Tensor) – Ground true labels, shape(batch_size, num_gt, 1)

  • gt_bboxes (Tensor) – Ground true bboxes, shape(batch_size, num_gt, 4)

  • batch_size (int) – Batch size.

  • num_gt (int) – Number of ground truth.

Returns

Align metric,

shape(batch_size, num_gt, num_priors)

overlaps (Tensor): Overlaps, shape(batch_size, num_gt, num_priors)

Return type

alignment_metrics (Tensor)

get_pos_mask(pred_bboxes: torch.Tensor, pred_scores: torch.Tensor, priors: torch.Tensor, gt_labels: torch.Tensor, gt_bboxes: torch.Tensor, pad_bbox_flag: torch.Tensor, batch_size: int, num_gt: int)Tuple[torch.Tensor, torch.Tensor, torch.Tensor][source]

Get possible mask.

Parameters
  • pred_bboxes (Tensor) – Predict bboxes, shape(batch_size, num_priors, 4)

  • pred_scores (Tensor) – Scores of predict bbox, shape(batch_size, num_priors, num_classes)

  • priors (Tensor) – Model priors, shape (num_priors, 2)

  • gt_labels (Tensor) – Ground true labels, shape(batch_size, num_gt, 1)

  • gt_bboxes (Tensor) – Ground true bboxes, shape(batch_size, num_gt, 4)

  • pad_bbox_flag (Tensor) – Ground truth bbox mask, 1 means bbox, 0 means no bbox, shape(batch_size, num_gt, 1)

  • batch_size (int) – Batch size.

  • num_gt (int) – Number of ground truth.

Returns

Possible mask,

shape(batch_size, num_gt, num_priors)

alignment_metrics (Tensor): Alignment metrics,

shape(batch_size, num_gt, num_priors)

overlaps (Tensor): Overlaps of gt_bboxes and pred_bboxes,

shape(batch_size, num_gt, num_priors)

Return type

pos_mask (Tensor)

get_targets(gt_labels: torch.Tensor, gt_bboxes: torch.Tensor, assigned_gt_idxs: torch.Tensor, fg_mask_pre_prior: torch.Tensor, batch_size: int, num_gt: int)Tuple[torch.Tensor, torch.Tensor, torch.Tensor][source]

Get assigner info.

Parameters
  • gt_labels (Tensor) – Ground true labels, shape(batch_size, num_gt, 1)

  • gt_bboxes (Tensor) – Ground true bboxes, shape(batch_size, num_gt, 4)

  • assigned_gt_idxs (Tensor) – Assigned ground truth indexes, shape(batch_size, num_priors)

  • fg_mask_pre_prior (Tensor) – Force ground truth matching mask, shape(batch_size, num_priors)

  • batch_size (int) – Batch size.

  • num_gt (int) – Number of ground truth.

Returns

Assigned labels,

shape(batch_size, num_priors)

assigned_bboxes (Tensor): Assigned bboxes,

shape(batch_size, num_priors)

assigned_scores (Tensor): Assigned scores,

shape(batch_size, num_priors)

Return type

assigned_labels (Tensor)

select_topk_candidates(alignment_gt_metrics: torch.Tensor, using_largest_topk: bool = True, topk_mask: Optional[torch.Tensor] = None)torch.Tensor[source]

Compute alignment metric between all bbox and gt.

Parameters
  • alignment_gt_metrics (Tensor) – Alignment metric of gt candidates, shape(batch_size, num_gt, num_priors)

  • using_largest_topk (bool) – Controls whether to using largest or smallest elements.

  • topk_mask (Tensor) – Topk mask, shape(batch_size, num_gt, self.topk)

Returns

Topk candidates mask,

shape(batch_size, num_gt, num_priors)

Return type

Tensor

class mmyolo.models.task_modules.YOLOXBBoxCoder(use_box_type: bool = False, **kwargs)[source]

YOLOX BBox coder.

This decoder decodes pred bboxes (delta_x, delta_x, w, h) to bboxes (tl_x, tl_y, br_x, br_y).

decode(priors: torch.Tensor, pred_bboxes: torch.Tensor, stride: Union[torch.Tensor, int])torch.Tensor[source]

Decode regression results (delta_x, delta_x, w, h) to bboxes (tl_x, tl_y, br_x, br_y).

Parameters
  • priors (torch.Tensor) – Basic boxes or points, e.g. anchors.

  • pred_bboxes (torch.Tensor) – Encoded boxes with shape

  • stride (torch.Tensor | int) – Strides of bboxes.

Returns

Decoded boxes.

Return type

torch.Tensor

encode(**kwargs)[source]

Encode deltas between bboxes and ground truth boxes.

class mmyolo.models.task_modules.YOLOv5BBoxCoder(use_box_type: bool = False, **kwargs)[source]

YOLOv5 BBox coder.

This decoder decodes pred bboxes (delta_x, delta_x, w, h) to bboxes (tl_x, tl_y, br_x, br_y).

decode(priors: torch.Tensor, pred_bboxes: torch.Tensor, stride: Union[torch.Tensor, int])torch.Tensor[source]

Decode regression results (delta_x, delta_x, w, h) to bboxes (tl_x, tl_y, br_x, br_y).

Parameters
  • priors (torch.Tensor) – Basic boxes or points, e.g. anchors.

  • pred_bboxes (torch.Tensor) – Encoded boxes with shape

  • stride (torch.Tensor | int) – Strides of bboxes.

Returns

Decoded boxes.

Return type

torch.Tensor

encode(**kwargs)[source]

Encode deltas between bboxes and ground truth boxes.

utils

class mmyolo.models.utils.OutputSaveFunctionWrapper(func: Callable, spec: Optional[Dict])[source]

A class that wraps a function and saves its outputs.

This class can be used to decorate a function to save its outputs. It wraps the function with a __call__ method that calls the original function and saves the results in a log attribute. :param func: A function to wrap. :type func: Callable :param spec: A dictionary of global variables to use as the

namespace for the wrapper. If None, the global namespace of the original function is used.

class mmyolo.models.utils.OutputSaveObjectWrapper(obj: Any)[source]

A wrapper class that saves the output of function calls on an object.

clear()[source]

Clears the log of function call outputs.

mmyolo.models.utils.gt_instances_preprocess(batch_gt_instances: Union[torch.Tensor, Sequence], batch_size: int)torch.Tensor[source]

Split batch_gt_instances with batch size.

From [all_gt_bboxes, box_dim+2] to [batch_size, number_gt, box_dim+1]. For horizontal box, box_dim=4, for rotated box, box_dim=5

If some shape of single batch smaller than gt bbox len, then using zeros to fill.

Parameters
  • batch_gt_instances (Sequence[Tensor]) – Ground truth instances for whole batch, shape [all_gt_bboxes, box_dim+2]

  • batch_size (int) – Batch size.

Returns

batch gt instances data, shape

[batch_size, number_gt, box_dim+1]

Return type

Tensor

mmyolo.models.utils.make_divisible(x: float, widen_factor: float = 1.0, divisor: int = 8)int[source]

Make sure that x*widen_factor is divisible by divisor.

mmyolo.models.utils.make_round(x: float, deepen_factor: float = 1.0)int[source]

Make sure that x*deepen_factor becomes an integer not less than 1.

mmyolo.utils

Read the Docs v: latest
Versions
latest
stable
dev
Downloads
pdf
html
epub
On Read the Docs
Project Home
Builds

Free document hosting provided by Read the Docs.