mmyolo.datasets¶
datasets¶
- class mmyolo.datasets.BatchShapePolicy(batch_size: int = 32, img_size: int = 640, size_divisor: int = 32, extra_pad_ratio: float = 0.5)[source]¶
BatchShapePolicy is only used in the testing phase, which can reduce the number of pad pixels during batch inference.
- Parameters
batch_size (int) – Single GPU batch size during batch inference. Defaults to 32.
img_size (int) – Expected output image size. Defaults to 640.
size_divisor (int) – The minimum size that is divisible by size_divisor. Defaults to 32.
extra_pad_ratio (float) – Extra pad ratio. Defaults to 0.5.
- class mmyolo.datasets.YOLOv5CocoDataset(*args, batch_shapes_cfg: Optional[dict] = None, **kwargs)[source]¶
Dataset for YOLOv5 COCO Dataset.
We only add BatchShapePolicy function compared with CocoDataset. See mmyolo/datasets/utils.py#BatchShapePolicy for details
- class mmyolo.datasets.YOLOv5CrowdHumanDataset(*args, batch_shapes_cfg: Optional[dict] = None, **kwargs)[source]¶
Dataset for YOLOv5 CrowdHuman Dataset.
We only add BatchShapePolicy function compared with CrowdHumanDataset. See mmyolo/datasets/utils.py#BatchShapePolicy for details
- class mmyolo.datasets.YOLOv5DOTADataset(*args, **kwargs)[source]¶
Dataset for YOLOv5 DOTA Dataset.
We only add BatchShapePolicy function compared with DOTADataset. See mmyolo/datasets/utils.py#BatchShapePolicy for details
transforms¶
- class mmyolo.datasets.transforms.FilterAnnotations(by_keypoints: bool = False, **kwargs)[source]¶
Filter invalid annotations.
In addition to the conditions checked by
FilterDetAnnotations
, this filter adds a new condition requiring instances to have at least one visible keypoints.
- class mmyolo.datasets.transforms.LetterResize(scale: Union[int, Tuple[int, int]], pad_val: dict = {'img': 0, 'mask': 0, 'seg': 255}, use_mini_pad: bool = False, stretch_only: bool = False, allow_scale_up: bool = True, half_pad_param: bool = False, **kwargs)[source]¶
Resize and pad image while meeting stride-multiple constraints.
Required Keys:
img (np.uint8)
batch_shape (np.int64) (optional)
Modified Keys:
img (np.uint8)
img_shape (tuple)
gt_bboxes (optional)
Added Keys: - pad_param (np.float32)
- Parameters
scale (Union[int, Tuple[int, int]]) – Images scales for resizing.
pad_val (dict) – Padding value. Defaults to dict(img=0, seg=255).
use_mini_pad (bool) – Whether using minimum rectangle padding. Defaults to True
stretch_only (bool) – Whether stretch to the specified size directly. Defaults to False
allow_scale_up (bool) – Allow scale up when ratio > 1. Defaults to True
half_pad_param (bool) – If set to True, left and right pad_param will be given by dividing padding_h by 2. If set to False, pad_param is in int format. We recommend setting this to False for object detection tasks, and True for instance segmentation tasks. Default to False.
- transform(results: dict) → dict[source]¶
Transform function to resize images, bounding boxes, semantic segmentation map and keypoints.
- Parameters
results (dict) – Result dict from loading pipeline.
- Returns
Resized results, ‘img’, ‘gt_bboxes’, ‘gt_seg_map’, ‘gt_keypoints’, ‘scale’, ‘scale_factor’, ‘img_shape’, and ‘keep_ratio’ keys are updated in result dict.
- Return type
dict
- class mmyolo.datasets.transforms.LoadAnnotations(mask2bbox: bool = False, poly2mask: bool = False, merge_polygons: bool = True, **kwargs)[source]¶
Because the yolo series does not need to consider ignore bboxes for the time being, in order to speed up the pipeline, it can be excluded in advance.
- Parameters
mask2bbox (bool) – Whether to use mask annotation to get bbox. Defaults to False.
poly2mask (bool) – Whether to transform the polygons to bitmaps. Defaults to False.
merge_polygons (bool) – Whether to merge polygons into one polygon. If merged, the storage structure is simpler and training is more effcient, especially if the mask inside a bbox is divided into multiple polygons. Defaults to True.
- merge_multi_segment(gt_masks: List[numpy.ndarray]) → List[numpy.ndarray][source]¶
Merge multi segments to one list.
Find the coordinates with min distance between each segment, then connect these coordinates with one thin line to merge all segments into one. :param gt_masks: original segmentations in coco’s json file.
like [segmentation1, segmentation2,…], each segmentation is a list of coordinates.
- Returns
merged gt_masks
- Return type
gt_masks(List(np.array))
- class mmyolo.datasets.transforms.Mosaic(img_scale: Tuple[int, int] = (640, 640), center_ratio_range: Tuple[float, float] = (0.5, 1.5), bbox_clip_border: bool = True, pad_val: float = 114.0, pre_transform: Optional[Sequence[dict]] = None, prob: float = 1.0, use_cached: bool = False, max_cached_images: int = 40, random_pop: bool = True, max_refetch: int = 15)[source]¶
Mosaic augmentation.
Given 4 images, mosaic transform combines them into one output image. The output image is composed of the parts from each sub- image.
mosaic transform center_x +------------------------------+ | pad | | | +-----------+ pad | | | | | | | image1 +-----------+ | | | | | | | image2 | center_y |----+-+-----------+-----------+ | | cropped | | |pad | image3 | image4 | | | | | +----|-------------+-----------+ | | +-------------+ The mosaic transform steps are as follows: 1. Choose the mosaic center as the intersections of 4 images 2. Get the left top image according to the index, and randomly sample another 3 images from the custom dataset. 3. Sub image will be cropped if image is larger than mosaic patch
Required Keys:
img
gt_bboxes (BaseBoxes[torch.float32]) (optional)
gt_bboxes_labels (np.int64) (optional)
gt_ignore_flags (bool) (optional)
mix_results (List[dict])
Modified Keys:
img
img_shape
gt_bboxes (optional)
gt_bboxes_labels (optional)
gt_ignore_flags (optional)
- Parameters
img_scale (Sequence[int]) – Image size after mosaic pipeline of single image. The shape order should be (width, height). Defaults to (640, 640).
center_ratio_range (Sequence[float]) – Center ratio range of mosaic output. Defaults to (0.5, 1.5).
bbox_clip_border (bool, optional) – Whether to clip the objects outside the border of the image. In some dataset like MOT17, the gt bboxes are allowed to cross the border of images. Therefore, we don’t need to clip the gt bboxes in these cases. Defaults to True.
pad_val (int) – Pad value. Defaults to 114.
pre_transform (Sequence[dict]) – Sequence of transform object or config dict to be composed.
prob (float) – Probability of applying this transformation. Defaults to 1.0.
use_cached (bool) – Whether to use cache. Defaults to False.
max_cached_images (int) – The maximum length of the cache. The larger the cache, the stronger the randomness of this transform. As a rule of thumb, providing 10 caches for each image suffices for randomness. Defaults to 40.
random_pop (bool) – Whether to randomly pop a result from the cache when the cache is full. If set to False, use FIFO popping method. Defaults to True.
max_refetch (int) – The maximum number of retry iterations for getting valid results from the pipeline. If the number of iterations is greater than max_refetch, but results is still None, then the iteration is terminated and raise the error. Defaults to 15.
- class mmyolo.datasets.transforms.Mosaic9(img_scale: Tuple[int, int] = (640, 640), bbox_clip_border: bool = True, pad_val: Union[float, int] = 114.0, pre_transform: Optional[Sequence[dict]] = None, prob: float = 1.0, use_cached: bool = False, max_cached_images: int = 50, random_pop: bool = True, max_refetch: int = 15)[source]¶
Mosaic9 augmentation.
Given 9 images, mosaic transform combines them into one output image. The output image is composed of the parts from each sub- image.
+-------------------------------+------------+ | pad | pad | | | +----------+ | | | | +---------------+ top_right | | | | top | image2 | | | top_left | image1 | | | | image8 o--------+------+--------+---+ | | | | | | +----+----------+ | right |pad| | | center | image3 | | | left | image0 +---------------+---| | image7 | | | | +---+-----------+---+--------+ | | | | cropped | | bottom_right |pad| | |bottom_left| | image4 | | | | image6 | bottom | | | +---|-----------+ image5 +---------------+---| | pad | | pad | +-----------+------------+-------------------+ The mosaic transform steps are as follows: 1. Get the center image according to the index, and randomly sample another 8 images from the custom dataset. 2. Randomly offset the image after Mosaic
Required Keys:
img
gt_bboxes (BaseBoxes[torch.float32]) (optional)
gt_bboxes_labels (np.int64) (optional)
gt_ignore_flags (bool) (optional)
mix_results (List[dict])
Modified Keys:
img
img_shape
gt_bboxes (optional)
gt_bboxes_labels (optional)
gt_ignore_flags (optional)
- Parameters
img_scale (Sequence[int]) – Image size after mosaic pipeline of single image. The shape order should be (width, height). Defaults to (640, 640).
bbox_clip_border (bool, optional) – Whether to clip the objects outside the border of the image. In some dataset like MOT17, the gt bboxes are allowed to cross the border of images. Therefore, we don’t need to clip the gt bboxes in these cases. Defaults to True.
pad_val (int) – Pad value. Defaults to 114.
pre_transform (Sequence[dict]) – Sequence of transform object or config dict to be composed.
prob (float) – Probability of applying this transformation. Defaults to 1.0.
use_cached (bool) – Whether to use cache. Defaults to False.
max_cached_images (int) – The maximum length of the cache. The larger the cache, the stronger the randomness of this transform. As a rule of thumb, providing 5 caches for each image suffices for randomness. Defaults to 50.
random_pop (bool) – Whether to randomly pop a result from the cache when the cache is full. If set to False, use FIFO popping method. Defaults to True.
max_refetch (int) – The maximum number of retry iterations for getting valid results from the pipeline. If the number of iterations is greater than max_refetch, but results is still None, then the iteration is terminated and raise the error. Defaults to 15.
- class mmyolo.datasets.transforms.PPYOLOERandomCrop(aspect_ratio: List[float] = [0.5, 2.0], thresholds: List[float] = [0.0, 0.1, 0.3, 0.5, 0.7, 0.9], scaling: List[float] = [0.3, 1.0], num_attempts: int = 50, allow_no_crop: bool = True, cover_all_box: bool = False)[source]¶
Random crop the img and bboxes. Different thresholds are used in PPYOLOE to judge whether the clipped image meets the requirements. This implementation is different from the implementation of RandomCrop in mmdet.
Required Keys:
img
gt_bboxes (BaseBoxes[torch.float32]) (optional)
gt_bboxes_labels (np.int64) (optional)
gt_ignore_flags (bool) (optional)
Modified Keys:
img
img_shape
gt_bboxes (optional)
gt_bboxes_labels (optional)
gt_ignore_flags (optional)
Added Keys: - pad_param (np.float32)
- Parameters
aspect_ratio (List[float]) – Aspect ratio of cropped region. Default to [.5, 2].
thresholds (List[float]) – Iou thresholds for deciding a valid bbox crop in [min, max] format. Defaults to [.0, .1, .3, .5, .7, .9].
scaling (List[float]) – Ratio between a cropped region and the original image in [min, max] format. Default to [.3, 1.].
num_attempts (int) – Number of tries for each threshold before giving up. Default to 50.
allow_no_crop (bool) – Allow return without actually cropping them. Default to True.
cover_all_box (bool) – Ensure all bboxes are covered in the final crop. Default to False.
- class mmyolo.datasets.transforms.PPYOLOERandomDistort(hue_cfg: dict = {'max': 18, 'min': - 18, 'prob': 0.5}, saturation_cfg: dict = {'max': 1.5, 'min': 0.5, 'prob': 0.5}, contrast_cfg: dict = {'max': 1.5, 'min': 0.5, 'prob': 0.5}, brightness_cfg: dict = {'max': 1.5, 'min': 0.5, 'prob': 0.5}, num_distort_func: int = 4)[source]¶
Random hue, saturation, contrast and brightness distortion.
Required Keys:
img
Modified Keys:
img (np.float32)
- Parameters
hue_cfg (dict) – Hue settings. Defaults to dict(min=-18, max=18, prob=0.5).
saturation_cfg (dict) – Saturation settings. Defaults to dict( min=0.5, max=1.5, prob=0.5).
contrast_cfg (dict) – Contrast settings. Defaults to dict( min=0.5, max=1.5, prob=0.5).
brightness_cfg (dict) – Brightness settings. Defaults to dict( min=0.5, max=1.5, prob=0.5).
num_distort_func (int) – The number of distort function. Defaults to 4.
- class mmyolo.datasets.transforms.PackDetInputs(meta_keys=('img_id', 'img_path', 'ori_shape', 'img_shape', 'scale_factor', 'flip', 'flip_direction'))[source]¶
Pack the inputs data for the detection / semantic segmentation / panoptic segmentation.
Compared to mmdet, we just add the gt_panoptic_seg field and logic.
- class mmyolo.datasets.transforms.Polygon2Mask(downsample_ratio: int = 4, mask_overlap: bool = True, coco_style: bool = False)[source]¶
Polygons to bitmaps in YOLOv5.
- Parameters
downsample_ratio (int) – Downsample ratio of mask.
mask_overlap (bool) – Whether to use maskoverlap in mask process. When set to True, the implementation here is the same as the official, with higher training speed. If set to True, all gt masks will compress into one overlap mask, the value of mask indicates the index of gt masks. If set to False, one mask is a binary mask. Default to True.
coco_style (bool) – Whether to use coco_style to convert the polygons to bitmaps. Note that this option is only used to test if there is an improvement in training speed and we recommend setting it to False.
- polygon2mask(img_shape: Tuple[int, int], polygons: numpy.ndarray, color: int = 1) → numpy.ndarray[source]¶
- Parameters
img_shape (tuple) – The image size.
polygons (np.ndarray) – [N, M], N is the number of polygons, M is the number of points(Be divided by 2).
color (int) – color in fillPoly.
- Returns
the overlap mask.
- Return type
np.ndarray
- polygons2masks(img_shape: Tuple[int, int], polygons: mmdet.structures.mask.structures.PolygonMasks, color: int = 1) → numpy.ndarray[source]¶
Return a list of bitmap masks.
- Parameters
img_shape (tuple) – The image size.
polygons (PolygonMasks) – The mask annotations.
color (int) – color in fillPoly.
- Returns
the list of masks in bitmaps.
- Return type
List[np.ndarray]
- polygons2masks_overlap(img_shape: Tuple[int, int], polygons: mmdet.structures.mask.structures.PolygonMasks) → Tuple[numpy.ndarray, numpy.ndarray][source]¶
Return a overlap mask and the sorted idx of area.
- Parameters
img_shape (tuple) – The image size.
polygons (PolygonMasks) – The mask annotations.
color (int) – color in fillPoly.
- Returns
the overlap mask and the sorted idx of area.
- Return type
Tuple[np.ndarray, np.ndarray]
- transform(results: dict) → dict[source]¶
The transform function. All subclass of BaseTransform should override this method.
This function takes the result dict as the input, and can add new items to the dict or modify existing items in the dict. And the result dict will be returned in the end, which allows to concate multiple transforms into a pipeline.
- Parameters
results (dict) – The result dict.
- Returns
The result dict.
- Return type
dict
- class mmyolo.datasets.transforms.RandomFlip(prob: Optional[Union[float, Iterable[float]]] = None, direction: Union[str, Sequence[Optional[str]]] = 'horizontal', swap_seg_labels: Optional[Sequence] = None)[source]¶
- class mmyolo.datasets.transforms.RegularizeRotatedBox(angle_version='le90')[source]¶
Regularize rotated boxes.
Due to the angle periodicity, one rotated box can be represented in many different (x, y, w, h, t). To make each rotated box unique,
regularize_boxes
will take the remainder of the angle divided by 180 degrees.For convenience, three angle_version can be used here:
- ‘oc’: OpenCV Definition. Has the same box representation as
cv2.minAreaRect
the angle ranges in [-90, 0).
- ‘le90’: Long Edge Definition (90). the angle ranges in [-90, 90).
The width is always longer than the height.
- ‘le135’: Long Edge Definition (135). the angle ranges in [-45, 135).
The width is always longer than the height.
Required Keys:
gt_bboxes (RotatedBoxes[torch.float32])
Modified Keys:
gt_bboxes
- Parameters
angle_version (str) – Angle version. Can only be ‘oc’, ‘le90’, or ‘le135’. Defaults to ‘le90.
- transform(results: dict) → dict[source]¶
The transform function. All subclass of BaseTransform should override this method.
This function takes the result dict as the input, and can add new items to the dict or modify existing items in the dict. And the result dict will be returned in the end, which allows to concate multiple transforms into a pipeline.
- Parameters
results (dict) – The result dict.
- Returns
The result dict.
- Return type
dict
- class mmyolo.datasets.transforms.RemoveDataElement(keys: Union[str, Sequence[str]])[source]¶
Remove unnecessary data element in results.
- Parameters
keys (Union[str, Sequence[str]]) – Keys need to be removed.
- transform(results: dict) → dict[source]¶
The transform function. All subclass of BaseTransform should override this method.
This function takes the result dict as the input, and can add new items to the dict or modify existing items in the dict. And the result dict will be returned in the end, which allows to concate multiple transforms into a pipeline.
- Parameters
results (dict) – The result dict.
- Returns
The result dict.
- Return type
dict
- class mmyolo.datasets.transforms.Resize(scale: Optional[Union[int, Tuple[int, int]]] = None, scale_factor: Optional[Union[float, Tuple[float, float]]] = None, keep_ratio: bool = False, clip_object_border: bool = True, backend: str = 'cv2', interpolation='bilinear')[source]¶
- class mmyolo.datasets.transforms.YOLOXMixUp(img_scale: Tuple[int, int] = (640, 640), ratio_range: Tuple[float, float] = (0.5, 1.5), flip_ratio: float = 0.5, pad_val: float = 114.0, bbox_clip_border: bool = True, pre_transform: Optional[Sequence[dict]] = None, prob: float = 1.0, use_cached: bool = False, max_cached_images: int = 20, random_pop: bool = True, max_refetch: int = 15)[source]¶
MixUp data augmentation for YOLOX.
mixup transform +---------------+--------------+ | mixup image | | | +--------|--------+ | | | | | | +---------------+ | | | | | | | | image | | | | | | | | | | | +-----------------+ | | pad | +------------------------------+
The mixup transform steps are as follows:
Another random image is picked by dataset and embedded in the top left patch(after padding and resizing)
The target of mixup transform is the weighted average of mixup image and origin image.
Required Keys:
img
gt_bboxes (BaseBoxes[torch.float32]) (optional)
gt_bboxes_labels (np.int64) (optional)
gt_ignore_flags (bool) (optional)
mix_results (List[dict])
Modified Keys:
img
img_shape
gt_bboxes (optional)
gt_bboxes_labels (optional)
gt_ignore_flags (optional)
- Parameters
img_scale (Sequence[int]) – Image output size after mixup pipeline. The shape order should be (width, height). Defaults to (640, 640).
ratio_range (Sequence[float]) – Scale ratio of mixup image. Defaults to (0.5, 1.5).
flip_ratio (float) – Horizontal flip ratio of mixup image. Defaults to 0.5.
pad_val (int) – Pad value. Defaults to 114.
bbox_clip_border (bool, optional) – Whether to clip the objects outside the border of the image. In some dataset like MOT17, the gt bboxes are allowed to cross the border of images. Therefore, we don’t need to clip the gt bboxes in these cases. Defaults to True.
pre_transform (Sequence[dict]) – Sequence of transform object or config dict to be composed.
prob (float) – Probability of applying this transformation. Defaults to 1.0.
use_cached (bool) – Whether to use cache. Defaults to False.
max_cached_images (int) – The maximum length of the cache. The larger the cache, the stronger the randomness of this transform. As a rule of thumb, providing 10 caches for each image suffices for randomness. Defaults to 20.
random_pop (bool) – Whether to randomly pop a result from the cache when the cache is full. If set to False, use FIFO popping method. Defaults to True.
max_refetch (int) – The maximum number of iterations. If the number of iterations is greater than max_refetch, but gt_bbox is still empty, then the iteration is terminated. Defaults to 15.
- class mmyolo.datasets.transforms.YOLOv5CopyPaste(ioa_thresh: float = 0.3, prob: float = 0.5)[source]¶
Copy-Paste used in YOLOv5 and YOLOv8.
This transform randomly copy some objects in the image to the mirror position of the image.It is different from the CopyPaste in mmdet.
Required Keys:
img (np.uint8)
gt_bboxes (BaseBoxes[torch.float32])
gt_bboxes_labels (np.int64) (optional)
gt_ignore_flags (bool) (optional)
gt_masks (PolygonMasks) (optional)
Modified Keys:
img
gt_bboxes
gt_bboxes_labels (np.int64) (optional)
gt_ignore_flags (optional)
gt_masks (optional)
- Parameters
ioa_thresh (float) – Ioa thresholds for deciding valid bbox.
prob (float) – Probability of choosing objects. Defaults to 0.5.
- static bbox_ioa(gt_bboxes_flip: mmdet.structures.bbox.horizontal_boxes.HorizontalBoxes, gt_bboxes: mmdet.structures.bbox.horizontal_boxes.HorizontalBoxes, eps: float = 1e-07) → numpy.ndarray[source]¶
Calculate ioa between gt_bboxes_flip and gt_bboxes.
- Parameters
gt_bboxes_flip (HorizontalBoxes) – Flipped ground truth bounding boxes.
gt_bboxes (HorizontalBoxes) – Ground truth bounding boxes.
eps (float) – Default to 1e-10.
- Returns
Ioa.
- Return type
(Tensor)
- class mmyolo.datasets.transforms.YOLOv5HSVRandomAug(hue_delta: Union[int, float] = 0.015, saturation_delta: Union[int, float] = 0.7, value_delta: Union[int, float] = 0.4)[source]¶
Apply HSV augmentation to image sequentially.
Required Keys:
img
Modified Keys:
img
- Parameters
hue_delta ([int, float]) – delta of hue. Defaults to 0.015.
saturation_delta ([int, float]) – delta of saturation. Defaults to 0.7.
value_delta ([int, float]) – delta of value. Defaults to 0.4.
- class mmyolo.datasets.transforms.YOLOv5KeepRatioResize(scale: Union[int, Tuple[int, int]], keep_ratio: bool = True, **kwargs)[source]¶
Resize images & bbox(if existed).
This transform resizes the input image according to
scale
. Bboxes (if existed) are then resized with the same scale factor.Required Keys:
img (np.uint8)
gt_bboxes (BaseBoxes[torch.float32]) (optional)
Modified Keys:
img (np.uint8)
img_shape (tuple)
gt_bboxes (optional)
scale (float)
Added Keys:
scale_factor (np.float32)
- Parameters
scale (Union[int, Tuple[int, int]]) – Images scales for resizing.
- class mmyolo.datasets.transforms.YOLOv5MixUp(alpha: float = 32.0, beta: float = 32.0, pre_transform: Optional[Sequence[dict]] = None, prob: float = 1.0, use_cached: bool = False, max_cached_images: int = 20, random_pop: bool = True, max_refetch: int = 15)[source]¶
MixUp data augmentation for YOLOv5.
The mixup transform steps are as follows:
Another random image is picked by dataset.
- Randomly obtain the fusion ratio from the beta distribution,
then fuse the target
of the original image and mixup image through this ratio.
Required Keys:
img
gt_bboxes (BaseBoxes[torch.float32]) (optional)
gt_bboxes_labels (np.int64) (optional)
gt_ignore_flags (bool) (optional)
mix_results (List[dict])
Modified Keys:
img
img_shape
gt_bboxes (optional)
gt_bboxes_labels (optional)
gt_ignore_flags (optional)
- Parameters
alpha (float) – parameter of beta distribution to get mixup ratio. Defaults to 32.
beta (float) – parameter of beta distribution to get mixup ratio. Defaults to 32.
pre_transform (Sequence[dict]) – Sequence of transform object or config dict to be composed.
prob (float) – Probability of applying this transformation. Defaults to 1.0.
use_cached (bool) – Whether to use cache. Defaults to False.
max_cached_images (int) – The maximum length of the cache. The larger the cache, the stronger the randomness of this transform. As a rule of thumb, providing 10 caches for each image suffices for randomness. Defaults to 20.
random_pop (bool) – Whether to randomly pop a result from the cache when the cache is full. If set to False, use FIFO popping method. Defaults to True.
max_refetch (int) – The maximum number of iterations. If the number of iterations is greater than max_refetch, but gt_bbox is still empty, then the iteration is terminated. Defaults to 15.
- class mmyolo.datasets.transforms.YOLOv5RandomAffine(max_rotate_degree: float = 10.0, max_translate_ratio: float = 0.1, scaling_ratio_range: Tuple[float, float] = (0.5, 1.5), max_shear_degree: float = 2.0, border: Tuple[int, int] = (0, 0), border_val: Tuple[int, int, int] = (114, 114, 114), bbox_clip_border: bool = True, min_bbox_size: int = 2, min_area_ratio: float = 0.1, use_mask_refine: bool = False, max_aspect_ratio: float = 20.0, resample_num: int = 1000)[source]¶
Random affine transform data augmentation in YOLOv5 and YOLOv8. It is different from the implementation in YOLOX.
This operation randomly generates affine transform matrix which including rotation, translation, shear and scaling transforms. If you set use_mask_refine == True, the code will use the masks annotation to refine the bbox. Our implementation is slightly different from the official. In COCO dataset, a gt may have multiple mask tags. The official YOLOv5 annotation file already combines the masks that an object has, but our code takes into account the fact that an object has multiple masks.
Required Keys:
img
gt_bboxes (BaseBoxes[torch.float32]) (optional)
gt_bboxes_labels (np.int64) (optional)
gt_ignore_flags (bool) (optional)
gt_masks (PolygonMasks) (optional)
Modified Keys:
img
img_shape
gt_bboxes (optional)
gt_bboxes_labels (optional)
gt_ignore_flags (optional)
gt_masks (PolygonMasks) (optional)
- Parameters
max_rotate_degree (float) – Maximum degrees of rotation transform. Defaults to 10.
max_translate_ratio (float) – Maximum ratio of translation. Defaults to 0.1.
scaling_ratio_range (tuple[float]) – Min and max ratio of scaling transform. Defaults to (0.5, 1.5).
max_shear_degree (float) – Maximum degrees of shear transform. Defaults to 2.
border (tuple[int]) – Distance from width and height sides of input image to adjust output shape. Only used in mosaic dataset. Defaults to (0, 0).
border_val (tuple[int]) – Border padding values of 3 channels. Defaults to (114, 114, 114).
bbox_clip_border (bool, optional) – Whether to clip the objects outside the border of the image. In some dataset like MOT17, the gt bboxes are allowed to cross the border of images. Therefore, we don’t need to clip the gt bboxes in these cases. Defaults to True.
min_bbox_size (float) – Width and height threshold to filter bboxes. If the height or width of a box is smaller than this value, it will be removed. Defaults to 2.
min_area_ratio (float) – Threshold of area ratio between original bboxes and wrapped bboxes. If smaller than this value, the box will be removed. Defaults to 0.1.
use_mask_refine (bool) – Whether to refine bbox by mask. Deprecated.
max_aspect_ratio (float) – Aspect ratio of width and height threshold to filter bboxes. If max(h/w, w/h) larger than this value, the box will be removed. Defaults to 20.
resample_num (int) – Number of poly to resample to.
- clip_polygons(gt_masks: mmdet.structures.mask.structures.PolygonMasks, height: int, width: int) → mmdet.structures.mask.structures.PolygonMasks[source]¶
Function to clip points of polygons with height and width.
- Parameters
gt_masks (PolygonMasks) – Annotations of instance segmentation.
height (int) – height of clip border.
width (int) – width of clip border.
- Returns
Clip annotations of instance segmentation.
- Return type
clipped_masks (PolygonMasks)
- filter_gt_bboxes(origin_bboxes: mmdet.structures.bbox.horizontal_boxes.HorizontalBoxes, wrapped_bboxes: mmdet.structures.bbox.horizontal_boxes.HorizontalBoxes) → torch.Tensor[source]¶
Filter gt bboxes.
- Parameters
origin_bboxes (HorizontalBoxes) – Origin bboxes.
wrapped_bboxes (HorizontalBoxes) – Wrapped bboxes
- Returns
The result dict.
- Return type
dict
- resample_masks(gt_masks: mmdet.structures.mask.structures.PolygonMasks) → mmdet.structures.mask.structures.PolygonMasks[source]¶
Function to resample each mask annotation with shape (2 * n, ) to shape (resample_num * 2, ).
- Parameters
gt_masks (PolygonMasks) – Annotations of semantic segmentation.
- segment2box(gt_masks: mmdet.structures.mask.structures.PolygonMasks, height: int, width: int) → mmdet.structures.bbox.horizontal_boxes.HorizontalBoxes[source]¶
Convert 1 segment label to 1 box label, applying inside-image constraint i.e. (xy1, xy2, …) to (xyxy) :param gt_masks: the segment label :type gt_masks: torch.Tensor :param width: the width of the image. Defaults to 640 :type width: int :param height: The height of the image. Defaults to 640 :type height: int
- Returns
the clip bboxes from gt_masks.
- Return type
HorizontalBoxes
- warp_mask(gt_masks: mmdet.structures.mask.structures.PolygonMasks, warp_matrix: numpy.ndarray, img_w: int, img_h: int) → mmdet.structures.mask.structures.PolygonMasks[source]¶
Warp masks by warp_matrix and retain masks inside image after warping.
- Parameters
gt_masks (PolygonMasks) – Annotations of semantic segmentation.
warp_matrix (np.ndarray) – Affine transformation matrix. Shape: (3, 3).
img_w (int) – Width of output image.
img_h (int) – Height of output image.
- Returns
Masks after warping.
- Return type
PolygonMasks
- static warp_poly(poly: numpy.ndarray, warp_matrix: numpy.ndarray, img_w: int, img_h: int) → numpy.ndarray[source]¶
Function to warp one mask and filter points outside image.
- Parameters
poly (np.ndarray) – Segmentation annotation with shape (n, ) and with format (x1, y1, x2, y2, …).
warp_matrix (np.ndarray) – Affine transformation matrix. Shape: (3, 3).
img_w (int) – Width of output image.
img_h (int) – Height of output image.
mmyolo.models¶
backbones¶
- class mmyolo.models.backbones.BaseBackbone(arch_setting: list, deepen_factor: float = 1.0, widen_factor: float = 1.0, input_channels: int = 3, out_indices: Sequence[int] = (2, 3, 4), frozen_stages: int = - 1, plugins: Optional[Union[dict, List[dict]]] = None, norm_cfg: Optional[Union[mmengine.config.config.ConfigDict, dict]] = None, act_cfg: Optional[Union[mmengine.config.config.ConfigDict, dict]] = None, norm_eval: bool = False, init_cfg: Optional[Union[mmengine.config.config.ConfigDict, dict, List[Union[dict, mmengine.config.config.ConfigDict]]]] = None)[source]¶
BaseBackbone backbone used in YOLO series.
Backbone model structure diagram +-----------+ | input | +-----------+ v +-----------+ | stem | | layer | +-----------+ v +-----------+ | stage | | layer 1 | +-----------+ v +-----------+ | stage | | layer 2 | +-----------+ v ...... v +-----------+ | stage | | layer n | +-----------+ In P5 model, n=4 In P6 model, n=5
- Parameters
arch_setting (list) – Architecture of BaseBackbone.
plugins (list[dict]) –
List of plugins for stages, each dict contains:
cfg (dict, required): Cfg dict to build plugin.
stages (tuple[bool], optional): Stages to apply plugin, length should be same as ‘num_stages’.
deepen_factor (float) – Depth multiplier, multiply number of blocks in CSP layer by this amount. Defaults to 1.0.
widen_factor (float) – Width multiplier, multiply number of channels in each layer by this amount. Defaults to 1.0.
input_channels – Number of input image channels. Defaults to 3.
out_indices (Sequence[int]) – Output from which stages. Defaults to (2, 3, 4).
frozen_stages (int) – Stages to be frozen (stop grad and set eval mode). -1 means not freezing any parameters. Defaults to -1.
norm_cfg (dict) – Dictionary to construct and config norm layer. Defaults to None.
act_cfg (dict) – Config dict for activation layer. Defaults to None.
norm_eval (bool) – Whether to set norm layers to eval mode, namely, freeze running stats (mean and var). Note: Effect on Batch Norm and its variants only. Defaults to False.
init_cfg (dict or list[dict], optional) – Initialization config dict. Defaults to None.
- abstract build_stage_layer(stage_idx: int, setting: list)[source]¶
Build a stage layer.
- Parameters
stage_idx (int) – The index of a stage layer.
setting (list) – The architecture setting of a stage layer.
- make_stage_plugins(plugins, stage_idx, setting)[source]¶
Make plugins for backbone
stage_idx
th stage.Currently we support to insert
context_block
,empirical_attention_block
,nonlocal_block
,dropout_block
into the backbone.An example of plugins format could be:
Examples
>>> plugins=[ ... dict(cfg=dict(type='xxx', arg1='xxx'), ... stages=(False, True, True, True)), ... dict(cfg=dict(type='yyy'), ... stages=(True, True, True, True)), ... ] >>> model = YOLOv5CSPDarknet() >>> stage_plugins = model.make_stage_plugins(plugins, 0, setting) >>> assert len(stage_plugins) == 1
Suppose
stage_idx=0
, the structure of blocks in the stage would be:conv1 -> conv2 -> conv3 -> yyy
Suppose
stage_idx=1
, the structure of blocks in the stage would be:conv1 -> conv2 -> conv3 -> xxx -> yyy
- Parameters
plugins (list[dict]) – List of plugins cfg to build. The postfix is required if multiple same type plugins are inserted.
stage_idx (int) – Index of stage to build If stages is missing, the plugin would be applied to all stages.
setting (list) – The architecture setting of a stage layer.
- Returns
Plugins for current stage
- Return type
list[nn.Module]
- class mmyolo.models.backbones.CSPNeXt(arch: str = 'P5', deepen_factor: float = 1.0, widen_factor: float = 1.0, input_channels: int = 3, out_indices: Sequence[int] = (2, 3, 4), frozen_stages: int = - 1, plugins: Optional[Union[dict, List[dict]]] = None, use_depthwise: bool = False, expand_ratio: float = 0.5, arch_ovewrite: Optional[dict] = None, channel_attention: bool = True, conv_cfg: Optional[Union[mmengine.config.config.ConfigDict, dict]] = None, norm_cfg: Union[mmengine.config.config.ConfigDict, dict] = {'type': 'BN'}, act_cfg: Union[mmengine.config.config.ConfigDict, dict] = {'inplace': True, 'type': 'SiLU'}, norm_eval: bool = False, init_cfg: Optional[Union[mmengine.config.config.ConfigDict, dict, List[Union[dict, mmengine.config.config.ConfigDict]]]] = {'a': 2.23606797749979, 'distribution': 'uniform', 'layer': 'Conv2d', 'mode': 'fan_in', 'nonlinearity': 'leaky_relu', 'type': 'Kaiming'})[source]¶
CSPNeXt backbone used in RTMDet.
- Parameters
arch (str) – Architecture of CSPNeXt, from {P5, P6}. Defaults to P5.
deepen_factor (float) – Depth multiplier, multiply number of blocks in CSP layer by this amount. Defaults to 1.0.
widen_factor (float) – Width multiplier, multiply number of channels in each layer by this amount. Defaults to 1.0.
out_indices (Sequence[int]) – Output from which stages. Defaults to (2, 3, 4).
frozen_stages (int) – Stages to be frozen (stop grad and set eval mode). -1 means not freezing any parameters. Defaults to -1.
plugins (list[dict]) –
List of plugins for stages, each dict contains: - cfg (dict, required): Cfg dict to build plugin.Defaults to - stages (tuple[bool], optional): Stages to apply plugin, length
should be same as ‘num_stages’.
use_depthwise (bool) – Whether to use depthwise separable convolution. Defaults to False.
expand_ratio (float) – Ratio to adjust the number of channels of the hidden layer. Defaults to 0.5.
arch_ovewrite (list) – Overwrite default arch settings. Defaults to None.
channel_attention (bool) – Whether to add channel attention in each stage. Defaults to True.
conv_cfg (
ConfigDict
or dict, optional) – Config dict for convolution layer. Defaults to None.norm_cfg (
ConfigDict
or dict) – Dictionary to construct and config norm layer. Defaults to dict(type=’BN’, requires_grad=True).act_cfg (
ConfigDict
or dict) – Config dict for activation layer. Defaults to dict(type=’SiLU’, inplace=True).norm_eval (bool) – Whether to set norm layers to eval mode, namely, freeze running stats (mean and var). Note: Effect on Batch Norm and its variants only.
:param init_cfg (
ConfigDict
or dict or list[dict] or: list[ConfigDict
]): Initialization config dict.
- class mmyolo.models.backbones.PPYOLOECSPResNet(arch: str = 'P5', deepen_factor: float = 1.0, widen_factor: float = 1.0, input_channels: int = 3, out_indices: Tuple[int] = (2, 3, 4), frozen_stages: int = - 1, plugins: Optional[Union[dict, List[dict]]] = None, arch_ovewrite: Optional[dict] = None, block_cfg: Union[mmengine.config.config.ConfigDict, dict] = {'shortcut': True, 'type': 'PPYOLOEBasicBlock', 'use_alpha': True}, norm_cfg: Union[mmengine.config.config.ConfigDict, dict] = {'eps': 1e-05, 'momentum': 0.1, 'type': 'BN'}, act_cfg: Union[mmengine.config.config.ConfigDict, dict] = {'inplace': True, 'type': 'SiLU'}, attention_cfg: Union[mmengine.config.config.ConfigDict, dict] = {'act_cfg': {'type': 'HSigmoid'}, 'type': 'EffectiveSELayer'}, norm_eval: bool = False, init_cfg: Optional[Union[mmengine.config.config.ConfigDict, dict, List[Union[dict, mmengine.config.config.ConfigDict]]]] = None, use_large_stem: bool = False)[source]¶
CSP-ResNet backbone used in PPYOLOE.
- Parameters
arch (str) – Architecture of CSPNeXt, from {P5, P6}. Defaults to P5.
deepen_factor (float) – Depth multiplier, multiply number of blocks in CSP layer by this amount. Defaults to 1.0.
widen_factor (float) – Width multiplier, multiply number of channels in each layer by this amount. Defaults to 1.0.
out_indices (Sequence[int]) – Output from which stages. Defaults to (2, 3, 4).
frozen_stages (int) – Stages to be frozen (stop grad and set eval mode). -1 means not freezing any parameters. Defaults to -1.
plugins (list[dict]) –
List of plugins for stages, each dict contains: - cfg (dict, required): Cfg dict to build plugin. - stages (tuple[bool], optional): Stages to apply plugin, length
should be same as ‘num_stages’.
arch_ovewrite (list) – Overwrite default arch settings. Defaults to None.
block_cfg (dict) – Config dict for block. Defaults to dict(type=’PPYOLOEBasicBlock’, shortcut=True, use_alpha=True)
norm_cfg (
ConfigDict
or dict) – Dictionary to construct and config norm layer. Defaults to dict(type=’BN’, momentum=0.1, eps=1e-5).act_cfg (
ConfigDict
or dict) – Config dict for activation layer. Defaults to dict(type=’SiLU’, inplace=True).attention_cfg (dict) – Config dict for EffectiveSELayer. Defaults to dict(type=’EffectiveSELayer’, act_cfg=dict(type=’HSigmoid’)).
norm_eval (bool) – Whether to set norm layers to eval mode, namely, freeze running stats (mean and var). Note: Effect on Batch Norm and its variants only.
:param init_cfg (
ConfigDict
or dict or list[dict] or: list[ConfigDict
]): Initialization config dict. :param use_large_stem: Whether to use large stem layer.Defaults to False.
- class mmyolo.models.backbones.YOLOXCSPDarknet(arch: str = 'P5', plugins: Optional[Union[dict, List[dict]]] = None, deepen_factor: float = 1.0, widen_factor: float = 1.0, input_channels: int = 3, out_indices: Tuple[int] = (2, 3, 4), frozen_stages: int = - 1, use_depthwise: bool = False, spp_kernal_sizes: Tuple[int] = (5, 9, 13), norm_cfg: Union[mmengine.config.config.ConfigDict, dict] = {'eps': 0.001, 'momentum': 0.03, 'type': 'BN'}, act_cfg: Union[mmengine.config.config.ConfigDict, dict] = {'inplace': True, 'type': 'SiLU'}, norm_eval: bool = False, init_cfg: Optional[Union[mmengine.config.config.ConfigDict, dict, List[Union[dict, mmengine.config.config.ConfigDict]]]] = None)[source]¶
CSP-Darknet backbone used in YOLOX.
- Parameters
arch (str) – Architecture of CSP-Darknet, from {P5, P6}. Defaults to P5.
plugins (list[dict]) –
List of plugins for stages, each dict contains:
cfg (dict, required): Cfg dict to build plugin.
stages (tuple[bool], optional): Stages to apply plugin, length should be same as ‘num_stages’.
deepen_factor (float) – Depth multiplier, multiply number of blocks in CSP layer by this amount. Defaults to 1.0.
widen_factor (float) – Width multiplier, multiply number of channels in each layer by this amount. Defaults to 1.0.
input_channels (int) – Number of input image channels. Defaults to 3.
out_indices (Tuple[int]) – Output from which stages. Defaults to (2, 3, 4).
frozen_stages (int) – Stages to be frozen (stop grad and set eval mode). -1 means not freezing any parameters. Defaults to -1.
use_depthwise (bool) – Whether to use depthwise separable convolution. Defaults to False.
spp_kernal_sizes – (tuple[int]): Sequential of kernel sizes of SPP layers. Defaults to (5, 9, 13).
norm_cfg (dict) – Dictionary to construct and config norm layer. Defaults to dict(type=’BN’, momentum=0.03, eps=0.001).
act_cfg (dict) – Config dict for activation layer. Defaults to dict(type=’SiLU’, inplace=True).
norm_eval (bool) – Whether to set norm layers to eval mode, namely, freeze running stats (mean and var). Note: Effect on Batch Norm and its variants only.
init_cfg (Union[dict,list[dict]], optional) – Initialization config dict. Defaults to None.
Example
>>> from mmyolo.models import YOLOXCSPDarknet >>> import torch >>> model = YOLOXCSPDarknet() >>> model.eval() >>> inputs = torch.rand(1, 3, 416, 416) >>> level_outputs = model(inputs) >>> for level_out in level_outputs: ... print(tuple(level_out.shape)) ... (1, 256, 52, 52) (1, 512, 26, 26) (1, 1024, 13, 13)
- class mmyolo.models.backbones.YOLOv5CSPDarknet(arch: str = 'P5', plugins: Optional[Union[dict, List[dict]]] = None, deepen_factor: float = 1.0, widen_factor: float = 1.0, input_channels: int = 3, out_indices: Tuple[int] = (2, 3, 4), frozen_stages: int = - 1, norm_cfg: Union[mmengine.config.config.ConfigDict, dict] = {'eps': 0.001, 'momentum': 0.03, 'type': 'BN'}, act_cfg: Union[mmengine.config.config.ConfigDict, dict] = {'inplace': True, 'type': 'SiLU'}, norm_eval: bool = False, init_cfg: Optional[Union[mmengine.config.config.ConfigDict, dict, List[Union[dict, mmengine.config.config.ConfigDict]]]] = None)[source]¶
CSP-Darknet backbone used in YOLOv5. :param arch: Architecture of CSP-Darknet, from {P5, P6}.
Defaults to P5.
- Parameters
plugins (list[dict]) –
List of plugins for stages, each dict contains: - cfg (dict, required): Cfg dict to build plugin. - stages (tuple[bool], optional): Stages to apply plugin, length
should be same as ‘num_stages’.
deepen_factor (float) – Depth multiplier, multiply number of blocks in CSP layer by this amount. Defaults to 1.0.
widen_factor (float) – Width multiplier, multiply number of channels in each layer by this amount. Defaults to 1.0.
input_channels (int) – Number of input image channels. Defaults to: 3.
out_indices (Tuple[int]) – Output from which stages. Defaults to (2, 3, 4).
frozen_stages (int) – Stages to be frozen (stop grad and set eval mode). -1 means not freezing any parameters. Defaults to -1.
norm_cfg (dict) – Dictionary to construct and config norm layer. Defaults to dict(type=’BN’, requires_grad=True).
act_cfg (dict) – Config dict for activation layer. Defaults to dict(type=’SiLU’, inplace=True).
norm_eval (bool) – Whether to set norm layers to eval mode, namely, freeze running stats (mean and var). Note: Effect on Batch Norm and its variants only. Defaults to False.
init_cfg (Union[dict,list[dict]], optional) – Initialization config dict. Defaults to None.
Example
>>> from mmyolo.models import YOLOv5CSPDarknet >>> import torch >>> model = YOLOv5CSPDarknet() >>> model.eval() >>> inputs = torch.rand(1, 3, 416, 416) >>> level_outputs = model(inputs) >>> for level_out in level_outputs: ... print(tuple(level_out.shape)) ... (1, 256, 52, 52) (1, 512, 26, 26) (1, 1024, 13, 13)
- class mmyolo.models.backbones.YOLOv6CSPBep(arch: str = 'P5', plugins: Optional[Union[dict, List[dict]]] = None, deepen_factor: float = 1.0, widen_factor: float = 1.0, input_channels: int = 3, hidden_ratio: float = 0.5, out_indices: Tuple[int] = (2, 3, 4), frozen_stages: int = - 1, use_cspsppf: bool = False, norm_cfg: Union[mmengine.config.config.ConfigDict, dict] = {'eps': 0.001, 'momentum': 0.03, 'type': 'BN'}, act_cfg: Union[mmengine.config.config.ConfigDict, dict] = {'inplace': True, 'type': 'SiLU'}, norm_eval: bool = False, block_cfg: Union[mmengine.config.config.ConfigDict, dict] = {'type': 'ConvWrapper'}, init_cfg: Optional[Union[mmengine.config.config.ConfigDict, dict, List[Union[dict, mmengine.config.config.ConfigDict]]]] = None)[source]¶
CSPBep backbone used in YOLOv6. :param arch: Architecture of BaseDarknet, from {P5, P6}.
Defaults to P5.
- Parameters
plugins (list[dict]) –
List of plugins for stages, each dict contains: - cfg (dict, required): Cfg dict to build plugin. - stages (tuple[bool], optional): Stages to apply plugin, length
should be same as ‘num_stages’.
deepen_factor (float) – Depth multiplier, multiply number of blocks in CSP layer by this amount. Defaults to 1.0.
widen_factor (float) – Width multiplier, multiply number of channels in each layer by this amount. Defaults to 1.0.
input_channels (int) – Number of input image channels. Defaults to 3.
out_indices (Tuple[int]) – Output from which stages. Defaults to (2, 3, 4).
frozen_stages (int) – Stages to be frozen (stop grad and set eval mode). -1 means not freezing any parameters. Defaults to -1.
norm_cfg (dict) – Dictionary to construct and config norm layer. Defaults to dict(type=’BN’, requires_grad=True).
act_cfg (dict) – Config dict for activation layer. Defaults to dict(type=’LeakyReLU’, negative_slope=0.1).
norm_eval (bool) – Whether to set norm layers to eval mode, namely, freeze running stats (mean and var). Note: Effect on Batch Norm and its variants only. Defaults to False.
block_cfg (dict) – Config dict for the block used to build each layer. Defaults to dict(type=’RepVGGBlock’).
block_act_cfg (dict) – Config dict for activation layer used in each stage. Defaults to dict(type=’SiLU’, inplace=True).
init_cfg (Union[dict, list[dict]], optional) – Initialization config dict. Defaults to None.
Example
>>> from mmyolo.models import YOLOv6CSPBep >>> import torch >>> model = YOLOv6CSPBep() >>> model.eval() >>> inputs = torch.rand(1, 3, 416, 416) >>> level_outputs = model(inputs) >>> for level_out in level_outputs: ... print(tuple(level_out.shape)) ... (1, 256, 52, 52) (1, 512, 26, 26) (1, 1024, 13, 13)
- class mmyolo.models.backbones.YOLOv6EfficientRep(arch: str = 'P5', plugins: Optional[Union[dict, List[dict]]] = None, deepen_factor: float = 1.0, widen_factor: float = 1.0, input_channels: int = 3, out_indices: Tuple[int] = (2, 3, 4), frozen_stages: int = - 1, use_cspsppf: bool = False, norm_cfg: Union[mmengine.config.config.ConfigDict, dict] = {'eps': 0.001, 'momentum': 0.03, 'type': 'BN'}, act_cfg: Union[mmengine.config.config.ConfigDict, dict] = {'inplace': True, 'type': 'ReLU'}, norm_eval: bool = False, block_cfg: Union[mmengine.config.config.ConfigDict, dict] = {'type': 'RepVGGBlock'}, init_cfg: Optional[Union[mmengine.config.config.ConfigDict, dict, List[Union[dict, mmengine.config.config.ConfigDict]]]] = None)[source]¶
EfficientRep backbone used in YOLOv6. :param arch: Architecture of BaseDarknet, from {P5, P6}.
Defaults to P5.
- Parameters
plugins (list[dict]) –
List of plugins for stages, each dict contains: - cfg (dict, required): Cfg dict to build plugin. - stages (tuple[bool], optional): Stages to apply plugin, length
should be same as ‘num_stages’.
deepen_factor (float) – Depth multiplier, multiply number of blocks in CSP layer by this amount. Defaults to 1.0.
widen_factor (float) – Width multiplier, multiply number of channels in each layer by this amount. Defaults to 1.0.
input_channels (int) – Number of input image channels. Defaults to 3.
out_indices (Tuple[int]) – Output from which stages. Defaults to (2, 3, 4).
frozen_stages (int) – Stages to be frozen (stop grad and set eval mode). -1 means not freezing any parameters. Defaults to -1.
norm_cfg (dict) – Dictionary to construct and config norm layer. Defaults to dict(type=’BN’, requires_grad=True).
act_cfg (dict) – Config dict for activation layer. Defaults to dict(type=’LeakyReLU’, negative_slope=0.1).
norm_eval (bool) – Whether to set norm layers to eval mode, namely, freeze running stats (mean and var). Note: Effect on Batch Norm and its variants only. Defaults to False.
block_cfg (dict) – Config dict for the block used to build each layer. Defaults to dict(type=’RepVGGBlock’).
init_cfg (Union[dict, list[dict]], optional) – Initialization config dict. Defaults to None.
Example
>>> from mmyolo.models import YOLOv6EfficientRep >>> import torch >>> model = YOLOv6EfficientRep() >>> model.eval() >>> inputs = torch.rand(1, 3, 416, 416) >>> level_outputs = model(inputs) >>> for level_out in level_outputs: ... print(tuple(level_out.shape)) ... (1, 256, 52, 52) (1, 512, 26, 26) (1, 1024, 13, 13)
- class mmyolo.models.backbones.YOLOv7Backbone(arch: str = 'L', deepen_factor: float = 1.0, widen_factor: float = 1.0, input_channels: int = 3, out_indices: Tuple[int] = (2, 3, 4), frozen_stages: int = - 1, plugins: Optional[Union[dict, List[dict]]] = None, norm_cfg: Union[mmengine.config.config.ConfigDict, dict] = {'eps': 0.001, 'momentum': 0.03, 'type': 'BN'}, act_cfg: Union[mmengine.config.config.ConfigDict, dict] = {'inplace': True, 'type': 'SiLU'}, norm_eval: bool = False, init_cfg: Optional[Union[mmengine.config.config.ConfigDict, dict, List[Union[dict, mmengine.config.config.ConfigDict]]]] = None)[source]¶
Backbone used in YOLOv7.
- Parameters
arch (str) – Architecture of YOLOv7Defaults to L.
deepen_factor (float) – Depth multiplier, multiply number of blocks in CSP layer by this amount. Defaults to 1.0.
widen_factor (float) – Width multiplier, multiply number of channels in each layer by this amount. Defaults to 1.0.
out_indices (Sequence[int]) – Output from which stages. Defaults to (2, 3, 4).
frozen_stages (int) – Stages to be frozen (stop grad and set eval mode). -1 means not freezing any parameters. Defaults to -1.
plugins (list[dict]) –
List of plugins for stages, each dict contains:
cfg (dict, required): Cfg dict to build plugin.
stages (tuple[bool], optional): Stages to apply plugin, length should be same as ‘num_stages’.
norm_cfg (
ConfigDict
or dict) – Dictionary to construct and config norm layer. Defaults to dict(type=’BN’, requires_grad=True).act_cfg (
ConfigDict
or dict) – Config dict for activation layer. Defaults to dict(type=’SiLU’, inplace=True).norm_eval (bool) – Whether to set norm layers to eval mode, namely, freeze running stats (mean and var). Note: Effect on Batch Norm and its variants only.
:param init_cfg (
ConfigDict
or dict or list[dict] or: list[ConfigDict
]): Initialization config dict.
- class mmyolo.models.backbones.YOLOv8CSPDarknet(arch: str = 'P5', last_stage_out_channels: int = 1024, plugins: Optional[Union[dict, List[dict]]] = None, deepen_factor: float = 1.0, widen_factor: float = 1.0, input_channels: int = 3, out_indices: Tuple[int] = (2, 3, 4), frozen_stages: int = - 1, norm_cfg: Union[mmengine.config.config.ConfigDict, dict] = {'eps': 0.001, 'momentum': 0.03, 'type': 'BN'}, act_cfg: Union[mmengine.config.config.ConfigDict, dict] = {'inplace': True, 'type': 'SiLU'}, norm_eval: bool = False, init_cfg: Optional[Union[mmengine.config.config.ConfigDict, dict, List[Union[dict, mmengine.config.config.ConfigDict]]]] = None)[source]¶
CSP-Darknet backbone used in YOLOv8.
- Parameters
arch (str) – Architecture of CSP-Darknet, from {P5}. Defaults to P5.
last_stage_out_channels (int) – Final layer output channel. Defaults to 1024.
plugins (list[dict]) –
List of plugins for stages, each dict contains: - cfg (dict, required): Cfg dict to build plugin. - stages (tuple[bool], optional): Stages to apply plugin, length
should be same as ‘num_stages’.
deepen_factor (float) – Depth multiplier, multiply number of blocks in CSP layer by this amount. Defaults to 1.0.
widen_factor (float) – Width multiplier, multiply number of channels in each layer by this amount. Defaults to 1.0.
input_channels (int) – Number of input image channels. Defaults to: 3.
out_indices (Tuple[int]) – Output from which stages. Defaults to (2, 3, 4).
frozen_stages (int) – Stages to be frozen (stop grad and set eval mode). -1 means not freezing any parameters. Defaults to -1.
norm_cfg (dict) – Dictionary to construct and config norm layer. Defaults to dict(type=’BN’, requires_grad=True).
act_cfg (dict) – Config dict for activation layer. Defaults to dict(type=’SiLU’, inplace=True).
norm_eval (bool) – Whether to set norm layers to eval mode, namely, freeze running stats (mean and var). Note: Effect on Batch Norm and its variants only. Defaults to False.
init_cfg (Union[dict,list[dict]], optional) – Initialization config dict. Defaults to None.
Example
>>> from mmyolo.models import YOLOv8CSPDarknet >>> import torch >>> model = YOLOv8CSPDarknet() >>> model.eval() >>> inputs = torch.rand(1, 3, 416, 416) >>> level_outputs = model(inputs) >>> for level_out in level_outputs: ... print(tuple(level_out.shape)) ... (1, 256, 52, 52) (1, 512, 26, 26) (1, 1024, 13, 13)
data_preprocessor¶
dense_heads¶
- class mmyolo.models.dense_heads.PPYOLOEHead(head_module: Union[mmengine.config.config.ConfigDict, dict], prior_generator: Union[mmengine.config.config.ConfigDict, dict] = {'offset': 0.5, 'strides': [8, 16, 32], 'type': 'mmdet.MlvlPointGenerator'}, bbox_coder: Union[mmengine.config.config.ConfigDict, dict] = {'type': 'DistancePointBBoxCoder'}, loss_cls: Union[mmengine.config.config.ConfigDict, dict] = {'alpha': 0.75, 'gamma': 2.0, 'iou_weighted': True, 'loss_weight': 1.0, 'reduction': 'sum', 'type': 'mmdet.VarifocalLoss', 'use_sigmoid': True}, loss_bbox: Union[mmengine.config.config.ConfigDict, dict] = {'bbox_format': 'xyxy', 'iou_mode': 'giou', 'loss_weight': 2.5, 'reduction': 'mean', 'return_iou': False, 'type': 'IoULoss'}, loss_dfl: Union[mmengine.config.config.ConfigDict, dict] = {'loss_weight': 0.125, 'reduction': 'mean', 'type': 'mmdet.DistributionFocalLoss'}, train_cfg: Optional[Union[mmengine.config.config.ConfigDict, dict]] = None, test_cfg: Optional[Union[mmengine.config.config.ConfigDict, dict]] = None, init_cfg: Optional[Union[mmengine.config.config.ConfigDict, dict, List[Union[dict, mmengine.config.config.ConfigDict]]]] = None)[source]¶
PPYOLOEHead head used in PPYOLOE. The YOLOv6 head and the PPYOLOE head are only slightly different. Distribution focal loss is extra used in PPYOLOE, but not in YOLOv6.
- Parameters
head_module (ConfigType) – Base module used for YOLOv5Head
prior_generator (dict) – Points generator feature maps in 2D points-based detectors.
bbox_coder (
ConfigDict
or dict) – Config of bbox coder.loss_cls (
ConfigDict
or dict) – Config of classification loss.loss_bbox (
ConfigDict
or dict) – Config of localization loss.loss_dfl (
ConfigDict
or dict) – Config of distribution focal loss.train_cfg (
ConfigDict
or dict, optional) – Training config of anchor head. Defaults to None.test_cfg (
ConfigDict
or dict, optional) – Testing config of anchor head. Defaults to None.
- :param init_cfg (
ConfigDict
or list[ConfigDict
] or dict or: list[dict], optional): Initialization config dict. Defaults to None.
- loss_by_feat(cls_scores: Sequence[torch.Tensor], bbox_preds: Sequence[torch.Tensor], bbox_dist_preds: Sequence[torch.Tensor], batch_gt_instances: Sequence[mmengine.structures.instance_data.InstanceData], batch_img_metas: Sequence[dict], batch_gt_instances_ignore: Optional[List[mmengine.structures.instance_data.InstanceData]] = None) → dict[source]¶
Calculate the loss based on the features extracted by the detection head.
- Parameters
cls_scores (Sequence[Tensor]) – Box scores for each scale level, each is a 4D-tensor, the channel number is num_priors * num_classes.
bbox_preds (Sequence[Tensor]) – Box energies / deltas for each scale level, each is a 4D-tensor, the channel number is num_priors * 4.
bbox_dist_preds (Sequence[Tensor]) – Box distribution logits for each scale level with shape (bs, reg_max + 1, H*W, 4).
batch_gt_instances (list[
InstanceData
]) – Batch of gt_instance. It usually includesbboxes
andlabels
attributes.batch_img_metas (list[dict]) – Meta information of each image, e.g., image size, scaling factor, etc.
batch_gt_instances_ignore (list[
InstanceData
], optional) – Batch of gt_instances_ignore. It includesbboxes
attribute data that is ignored during training and testing. Defaults to None.
- Returns
A dictionary of losses.
- Return type
dict[str, Tensor]
- class mmyolo.models.dense_heads.PPYOLOEHeadModule(num_classes: int, in_channels: Union[int, Sequence], widen_factor: float = 1.0, num_base_priors: int = 1, featmap_strides: Sequence[int] = (8, 16, 32), reg_max: int = 16, norm_cfg: Union[mmengine.config.config.ConfigDict, dict] = {'eps': 1e-05, 'momentum': 0.1, 'type': 'BN'}, act_cfg: Union[mmengine.config.config.ConfigDict, dict] = {'inplace': True, 'type': 'SiLU'}, init_cfg: Optional[Union[mmengine.config.config.ConfigDict, dict, List[Union[dict, mmengine.config.config.ConfigDict]]]] = None)[source]¶
PPYOLOEHead head module used in `PPYOLOE.
<https://arxiv.org/abs/2203.16250>`_.
- Parameters
num_classes (int) – Number of categories excluding the background category.
in_channels (int) – Number of channels in the input feature map.
widen_factor (float) – Width multiplier, multiply number of channels in each layer by this amount. Defaults to 1.0.
num_base_priors (int) – The number of priors (points) at a point on the feature grid.
featmap_strides (Sequence[int]) – Downsample factor of each feature map. Defaults to (8, 16, 32).
reg_max (int) – Max value of integral set :math:
{0, ..., reg_max}
in QFL setting. Defaults to 16.norm_cfg (dict) – Config dict for normalization layer. Defaults to dict(type=’BN’, momentum=0.03, eps=0.001).
act_cfg (dict) – Config dict for activation layer. Defaults to dict(type=’SiLU’, inplace=True).
init_cfg (dict or list[dict], optional) – Initialization config dict. Defaults to None.
- forward(x: Tuple[torch.Tensor]) → torch.Tensor[source]¶
Forward features from the upstream network.
- Parameters
x (Tuple[Tensor]) – Features from the upstream network, each is a 4D-tensor.
- Returns
A tuple of multi-level classification scores, bbox predictions.
- Return type
Tuple[List]
- class mmyolo.models.dense_heads.RTMDetHead(head_module: Union[mmengine.config.config.ConfigDict, dict], prior_generator: Union[mmengine.config.config.ConfigDict, dict] = {'offset': 0, 'strides': [8, 16, 32], 'type': 'mmdet.MlvlPointGenerator'}, bbox_coder: Union[mmengine.config.config.ConfigDict, dict] = {'type': 'DistancePointBBoxCoder'}, loss_cls: Union[mmengine.config.config.ConfigDict, dict] = {'beta': 2.0, 'loss_weight': 1.0, 'type': 'mmdet.QualityFocalLoss', 'use_sigmoid': True}, loss_bbox: Union[mmengine.config.config.ConfigDict, dict] = {'loss_weight': 2.0, 'type': 'mmdet.GIoULoss'}, train_cfg: Optional[Union[mmengine.config.config.ConfigDict, dict]] = None, test_cfg: Optional[Union[mmengine.config.config.ConfigDict, dict]] = None, init_cfg: Optional[Union[mmengine.config.config.ConfigDict, dict, List[Union[dict, mmengine.config.config.ConfigDict]]]] = None)[source]¶
RTMDet head.
- Parameters
head_module (ConfigType) – Base module used for RTMDetHead
prior_generator – Points generator feature maps in 2D points-based detectors.
bbox_coder (
ConfigDict
or dict) – Config of bbox coder.loss_cls (
ConfigDict
or dict) – Config of classification loss.loss_bbox (
ConfigDict
or dict) – Config of localization loss.train_cfg (
ConfigDict
or dict, optional) – Training config of anchor head. Defaults to None.test_cfg (
ConfigDict
or dict, optional) – Testing config of anchor head. Defaults to None.
- :param init_cfg (
ConfigDict
or list[ConfigDict
] or dict or: list[dict], optional): Initialization config dict. Defaults to None.
- forward(x: Tuple[torch.Tensor]) → Tuple[List][source]¶
Forward features from the upstream network.
- Parameters
x (Tuple[Tensor]) – Features from the upstream network, each is a 4D-tensor.
- Returns
A tuple of multi-level classification scores, bbox predictions, and objectnesses.
- Return type
Tuple[List]
- loss_by_feat(cls_scores: List[torch.Tensor], bbox_preds: List[torch.Tensor], batch_gt_instances: List[mmengine.structures.instance_data.InstanceData], batch_img_metas: List[dict], batch_gt_instances_ignore: Optional[List[mmengine.structures.instance_data.InstanceData]] = None) → dict[source]¶
Compute losses of the head.
- Parameters
cls_scores (list[Tensor]) – Box scores for each scale level Has shape (N, num_anchors * num_classes, H, W)
bbox_preds (list[Tensor]) – Decoded box for each scale level with shape (N, num_anchors * 4, H, W) in [tl_x, tl_y, br_x, br_y] format.
batch_gt_instances (list[
InstanceData
]) – Batch of gt_instance. It usually includesbboxes
andlabels
attributes.batch_img_metas (list[dict]) – Meta information of each image, e.g., image size, scaling factor, etc.
batch_gt_instances_ignore (list[
InstanceData
], Optional) – Batch of gt_instances_ignore. It includesbboxes
attribute data that is ignored during training and testing. Defaults to None.
- Returns
A dictionary of loss components.
- Return type
dict[str, Tensor]
- class mmyolo.models.dense_heads.RTMDetInsSepBNHead(head_module: Union[mmengine.config.config.ConfigDict, dict], prior_generator: Union[mmengine.config.config.ConfigDict, dict] = {'offset': 0, 'strides': [8, 16, 32], 'type': 'mmdet.MlvlPointGenerator'}, bbox_coder: Union[mmengine.config.config.ConfigDict, dict] = {'type': 'DistancePointBBoxCoder'}, loss_cls: Union[mmengine.config.config.ConfigDict, dict] = {'beta': 2.0, 'loss_weight': 1.0, 'type': 'mmdet.QualityFocalLoss', 'use_sigmoid': True}, loss_bbox: Union[mmengine.config.config.ConfigDict, dict] = {'loss_weight': 2.0, 'type': 'mmdet.GIoULoss'}, loss_mask={'eps': 5e-06, 'loss_weight': 2.0, 'reduction': 'mean', 'type': 'mmdet.DiceLoss'}, train_cfg: Optional[Union[mmengine.config.config.ConfigDict, dict]] = None, test_cfg: Optional[Union[mmengine.config.config.ConfigDict, dict]] = None, init_cfg: Optional[Union[mmengine.config.config.ConfigDict, dict, List[Union[dict, mmengine.config.config.ConfigDict]]]] = None)[source]¶
RTMDet Instance Segmentation head.
- Parameters
head_module (ConfigType) – Base module used for RTMDetInsSepBNHead
prior_generator – Points generator feature maps in 2D points-based detectors.
bbox_coder (
ConfigDict
or dict) – Config of bbox coder.loss_cls (
ConfigDict
or dict) – Config of classification loss.loss_bbox (
ConfigDict
or dict) – Config of localization loss.loss_mask (
ConfigDict
or dict) – Config of mask loss.train_cfg (
ConfigDict
or dict, optional) – Training config of anchor head. Defaults to None.test_cfg (
ConfigDict
or dict, optional) – Testing config of anchor head. Defaults to None.
- :param init_cfg (
ConfigDict
or list[ConfigDict
] or dict or: list[dict], optional): Initialization config dict. Defaults to None.
- loss_by_feat(cls_scores: List[torch.Tensor], bbox_preds: List[torch.Tensor], batch_gt_instances: List[mmengine.structures.instance_data.InstanceData], batch_img_metas: List[dict], batch_gt_instances_ignore: Optional[List[mmengine.structures.instance_data.InstanceData]] = None) → dict[source]¶
Compute losses of the head.
- Parameters
cls_scores (list[Tensor]) – Box scores for each scale level Has shape (N, num_anchors * num_classes, H, W)
bbox_preds (list[Tensor]) – Decoded box for each scale level with shape (N, num_anchors * 4, H, W) in [tl_x, tl_y, br_x, br_y] format.
batch_gt_instances (list[
InstanceData
]) – Batch of gt_instance. It usually includesbboxes
andlabels
attributes.batch_img_metas (list[dict]) – Meta information of each image, e.g., image size, scaling factor, etc.
batch_gt_instances_ignore (list[
InstanceData
], Optional) – Batch of gt_instances_ignore. It includesbboxes
attribute data that is ignored during training and testing. Defaults to None.
- Returns
A dictionary of loss components.
- Return type
dict[str, Tensor]
- parse_dynamic_params(flatten_kernels: torch.Tensor) → tuple[source]¶
split kernel head prediction to conv weight and bias.
- predict_by_feat(cls_scores: List[torch.Tensor], bbox_preds: List[torch.Tensor], kernel_preds: List[torch.Tensor], mask_feats: torch.Tensor, score_factors: Optional[List[torch.Tensor]] = None, batch_img_metas: Optional[List[dict]] = None, cfg: Optional[mmengine.config.config.ConfigDict] = None, rescale: bool = True, with_nms: bool = True) → List[mmengine.structures.instance_data.InstanceData][source]¶
Transform a batch of output features extracted from the head into bbox results.
Note: When score_factors is not None, the cls_scores are usually multiplied by it then obtain the real score used in NMS.
- Parameters
cls_scores (list[Tensor]) – Classification scores for all scale levels, each is a 4D-tensor, has shape (batch_size, num_priors * num_classes, H, W).
bbox_preds (list[Tensor]) – Box energies / deltas for all scale levels, each is a 4D-tensor, has shape (batch_size, num_priors * 4, H, W).
kernel_preds (list[Tensor]) – Kernel predictions of dynamic convs for all scale levels, each is a 4D-tensor, has shape (batch_size, num_params, H, W).
mask_feats (Tensor) – Mask prototype features extracted from the mask head, has shape (batch_size, num_prototypes, H, W).
score_factors (list[Tensor], optional) – Score factor for all scale level, each is a 4D-tensor, has shape (batch_size, num_priors * 1, H, W). Defaults to None.
batch_img_metas (list[dict], Optional) – Batch image meta info. Defaults to None.
cfg (ConfigDict, optional) – Test / postprocessing configuration, if None, test_cfg would be used. Defaults to None.
rescale (bool) – If True, return boxes in original image space. Defaults to False.
with_nms (bool) – If True, do nms before return boxes. Defaults to True.
- Returns
Object detection and instance segmentation results of each image after the post process. Each item usually contains following keys.
scores (Tensor): Classification scores, has a shape (num_instance, )
labels (Tensor): Labels of bboxes, has a shape (num_instances, ).
bboxes (Tensor): Has a shape (num_instances, 4), the last dimension 4 arrange as (x1, y1, x2, y2).
masks (Tensor): Has a shape (num_instances, h, w).
- Return type
list[
InstanceData
]
- class mmyolo.models.dense_heads.RTMDetInsSepBNHeadModule(num_classes: int, *args, num_prototypes: int = 8, dyconv_channels: int = 8, num_dyconvs: int = 3, use_sigmoid_cls: bool = True, **kwargs)[source]¶
Detection and Instance Segmentation Head of RTMDet.
- Parameters
num_classes (int) – Number of categories excluding the background category.
num_prototypes (int) – Number of mask prototype features extracted from the mask head. Defaults to 8.
dyconv_channels (int) – Channel of the dynamic conv layers. Defaults to 8.
num_dyconvs (int) – Number of the dynamic convolution layers. Defaults to 3.
use_sigmoid_cls (bool) – Use sigmoid for class prediction. Defaults to True.
- forward(feats: Tuple[torch.Tensor, ...]) → tuple[source]¶
Forward features from the upstream network.
- Parameters
feats (tuple[Tensor]) – Features from the upstream network, each is a 4D-tensor.
- Returns
Usually a tuple of classification scores and bbox prediction - cls_scores (list[Tensor]): Classification scores for all scale
levels, each is a 4D-tensor, the channels number is num_base_priors * num_classes.
bbox_preds (list[Tensor]): Box energies / deltas for all scale levels, each is a 4D-tensor, the channels number is num_base_priors * 4.
kernel_preds (list[Tensor]): Dynamic conv kernels for all scale levels, each is a 4D-tensor, the channels number is num_gen_params.
- mask_feat (Tensor): Mask prototype features.
Has shape (batch_size, num_prototypes, H, W).
- Return type
tuple
- class mmyolo.models.dense_heads.RTMDetRotatedHead(head_module: Union[mmengine.config.config.ConfigDict, dict], prior_generator: Union[mmengine.config.config.ConfigDict, dict] = {'offset': 0, 'strides': [8, 16, 32], 'type': 'mmdet.MlvlPointGenerator'}, bbox_coder: Union[mmengine.config.config.ConfigDict, dict] = {'type': 'DistanceAnglePointCoder'}, loss_cls: Union[mmengine.config.config.ConfigDict, dict] = {'beta': 2.0, 'loss_weight': 1.0, 'type': 'mmdet.QualityFocalLoss', 'use_sigmoid': True}, loss_bbox: Union[mmengine.config.config.ConfigDict, dict] = {'loss_weight': 2.0, 'mode': 'linear', 'type': 'mmrotate.RotatedIoULoss'}, angle_version: str = 'le90', use_hbbox_loss: bool = False, angle_coder: Union[mmengine.config.config.ConfigDict, dict] = {'type': 'mmrotate.PseudoAngleCoder'}, loss_angle: Optional[Union[mmengine.config.config.ConfigDict, dict]] = None, train_cfg: Optional[Union[mmengine.config.config.ConfigDict, dict]] = None, test_cfg: Optional[Union[mmengine.config.config.ConfigDict, dict]] = None, init_cfg: Optional[Union[mmengine.config.config.ConfigDict, dict, List[Union[dict, mmengine.config.config.ConfigDict]]]] = None)[source]¶
RTMDet-R head.
Compared with RTMDetHead, RTMDetRotatedHead add some args to support rotated object detection.
angle_version used to limit angle_range during training.
angle_coder used to encode and decode angle, which is similar to bbox_coder.
use_hbbox_loss and loss_angle allow custom regression loss calculation for rotated box.
There are three combination options for regression:
use_hbbox_loss=False and loss_angle is None.
bbox_pred────(tblr)───┐ ▼ angle_pred decode──►rbox_pred──(xywha)─►loss_bbox │ ▲ └────►decode──(a)─┘
use_hbbox_loss=False and loss_angle is specified. A angle loss is added on angle_pred.
bbox_pred────(tblr)───┐ ▼ angle_pred decode──►rbox_pred──(xywha)─►loss_bbox │ ▲ ├────►decode──(a)─┘ │ └───────────────────────────────────────────►loss_angle
use_hbbox_loss=True and loss_angle is specified. In this case the loss_angle must be set.
bbox_pred──(tblr)──►decode──►hbox_pred──(xyxy)──►loss_bbox angle_pred──────────────────────────────────────►loss_angle
There’s a decoded_with_angle flag in test_cfg, which is similar to training process.
When decoded_with_angle=True:
bbox_pred────(tblr)───┐ ▼ angle_pred decode──(xywha)──►rbox_pred │ ▲ └────►decode──(a)─┘
When decoded_with_angle=False:
bbox_pred──(tblr)─►decode │ (xyxy) ▼ format───(xywh)──►concat──(xywha)──►rbox_pred ▲ angle_pred────────►decode────(a)───────┘
- Parameters
head_module (ConfigType) – Base module used for RTMDetRotatedHead.
prior_generator – Points generator feature maps in 2D points-based detectors.
bbox_coder (
ConfigDict
or dict) – Config of bbox coder.loss_cls (
ConfigDict
or dict) – Config of classification loss.loss_bbox (
ConfigDict
or dict) – Config of localization loss.angle_version (str) – Angle representations. Defaults to ‘le90’.
use_hbbox_loss (bool) – If true, use horizontal bbox loss and loss_angle should not be None. Default to False.
angle_coder (
ConfigDict
or dict) – Config of angle coder.loss_angle (
ConfigDict
or dict, optional) – Config of angle loss.train_cfg (
ConfigDict
or dict, optional) – Training config of anchor head. Defaults to None.test_cfg (
ConfigDict
or dict, optional) – Testing config of anchor head. Defaults to None.
- :param init_cfg (
ConfigDict
or list[ConfigDict
] or dict or: list[dict], optional): Initialization config dict. Defaults to None.
- loss_by_feat(cls_scores: List[torch.Tensor], bbox_preds: List[torch.Tensor], angle_preds: List[torch.Tensor], batch_gt_instances: List[mmengine.structures.instance_data.InstanceData], batch_img_metas: List[dict], batch_gt_instances_ignore: Optional[List[mmengine.structures.instance_data.InstanceData]] = None) → dict[source]¶
Compute losses of the head.
- Parameters
cls_scores (list[Tensor]) – Box scores for each scale level Has shape (N, num_anchors * num_classes, H, W)
bbox_preds (list[Tensor]) – Decoded box for each scale level with shape (N, num_anchors * 4, H, W) in [tl_x, tl_y, br_x, br_y] format.
angle_preds (list[Tensor]) – Angle prediction for each scale level with shape (N, num_anchors * angle_out_dim, H, W).
batch_gt_instances (list[
InstanceData
]) – Batch of gt_instance. It usually includesbboxes
andlabels
attributes.batch_img_metas (list[dict]) – Meta information of each image, e.g., image size, scaling factor, etc.
batch_gt_instances_ignore (list[
InstanceData
], Optional) – Batch of gt_instances_ignore. It includesbboxes
attribute data that is ignored during training and testing. Defaults to None.
- Returns
A dictionary of loss components.
- Return type
dict[str, Tensor]
- predict_by_feat(cls_scores: List[torch.Tensor], bbox_preds: List[torch.Tensor], angle_preds: List[torch.Tensor], objectnesses: Optional[List[torch.Tensor]] = None, batch_img_metas: Optional[List[dict]] = None, cfg: Optional[mmengine.config.config.ConfigDict] = None, rescale: bool = True, with_nms: bool = True) → List[mmengine.structures.instance_data.InstanceData][source]¶
Transform a batch of output features extracted by the head into bbox results.
- Parameters
cls_scores (list[Tensor]) – Classification scores for all scale levels, each is a 4D-tensor, has shape (batch_size, num_priors * num_classes, H, W).
bbox_preds (list[Tensor]) – Box energies / deltas for all scale levels, each is a 4D-tensor, has shape (batch_size, num_priors * 4, H, W).
angle_preds (list[Tensor]) – Box angle for each scale level with shape (N, num_points * angle_dim, H, W)
objectnesses (list[Tensor], Optional) – Score factor for all scale level, each is a 4D-tensor, has shape (batch_size, 1, H, W).
batch_img_metas (list[dict], Optional) – Batch image meta info. Defaults to None.
cfg (ConfigDict, optional) – Test / postprocessing configuration, if None, test_cfg would be used. Defaults to None.
rescale (bool) – If True, return boxes in original image space. Defaults to False.
with_nms (bool) – If True, do nms before return boxes. Defaults to True.
- Returns
Object detection results of each image after the post process. Each item usually contains following keys. - scores (Tensor): Classification scores, has a shape
(num_instance, )
labels (Tensor): Labels of bboxes, has a shape (num_instances, ).
bboxes (Tensor): Has a shape (num_instances, 5), the last dimension 4 arrange as (x, y, w, h, angle).
- Return type
list[
InstanceData
]
- class mmyolo.models.dense_heads.RTMDetRotatedSepBNHeadModule(num_classes: int, in_channels: int, widen_factor: float = 1.0, num_base_priors: int = 1, feat_channels: int = 256, stacked_convs: int = 2, featmap_strides: Sequence[int] = [8, 16, 32], share_conv: bool = True, pred_kernel_size: int = 1, angle_out_dim: int = 1, conv_cfg: Optional[Union[mmengine.config.config.ConfigDict, dict]] = None, norm_cfg: Union[mmengine.config.config.ConfigDict, dict] = {'type': 'BN'}, act_cfg: Union[mmengine.config.config.ConfigDict, dict] = {'inplace': True, 'type': 'SiLU'}, init_cfg: Optional[Union[mmengine.config.config.ConfigDict, dict, List[Union[dict, mmengine.config.config.ConfigDict]]]] = None)[source]¶
Detection Head Module of RTMDet-R.
Compared with RTMDet Detection Head Module, RTMDet-R adds a conv for angle prediction. An angle_out_dim arg is added, which is generated by the angle_coder module and controls the angle pred dim.
- Parameters
num_classes (int) – Number of categories excluding the background category.
in_channels (int) – Number of channels in the input feature map.
widen_factor (float) – Width multiplier, multiply number of channels in each layer by this amount. Defaults to 1.0.
num_base_priors (int) – The number of priors (points) at a point on the feature grid. Defaults to 1.
feat_channels (int) – Number of hidden channels. Used in child classes. Defaults to 256
stacked_convs (int) – Number of stacking convs of the head. Defaults to 2.
featmap_strides (Sequence[int]) – Downsample factor of each feature map. Defaults to (8, 16, 32).
share_conv (bool) – Whether to share conv layers between stages. Defaults to True.
pred_kernel_size (int) – Kernel size of
nn.Conv2d
. Defaults to 1.angle_out_dim (int) – Encoded length of angle, will passed by head. Defaults to 1.
conv_cfg (
ConfigDict
or dict, optional) – Config dict for convolution layer. Defaults to None.norm_cfg (
ConfigDict
or dict) – Config dict for normalization layer. Defaults todict(type='BN')
.act_cfg (
ConfigDict
or dict) – Config dict for activation layer. Default: dict(type=’SiLU’, inplace=True).
- :param init_cfg (
ConfigDict
or list[ConfigDict
] or dict or: list[dict], optional): Initialization config dict. Defaults to None.
- forward(feats: Tuple[torch.Tensor, ...]) → tuple[source]¶
Forward features from the upstream network.
- Parameters
feats (tuple[Tensor]) – Features from the upstream network, each is a 4D-tensor.
- Returns
Usually a tuple of classification scores and bbox prediction - cls_scores (list[Tensor]): Classification scores for all scale
levels, each is a 4D-tensor, the channels number is num_base_priors * num_classes.
bbox_preds (list[Tensor]): Box energies / deltas for all scale levels, each is a 4D-tensor, the channels number is num_base_priors * 4.
angle_preds (list[Tensor]): Angle prediction for all scale levels, each is a 4D-tensor, the channels number is num_base_priors * angle_out_dim.
- Return type
tuple
- class mmyolo.models.dense_heads.RTMDetSepBNHeadModule(num_classes: int, in_channels: int, widen_factor: float = 1.0, num_base_priors: int = 1, feat_channels: int = 256, stacked_convs: int = 2, featmap_strides: Sequence[int] = [8, 16, 32], share_conv: bool = True, pred_kernel_size: int = 1, conv_cfg: Optional[Union[mmengine.config.config.ConfigDict, dict]] = None, norm_cfg: Union[mmengine.config.config.ConfigDict, dict] = {'type': 'BN'}, act_cfg: Union[mmengine.config.config.ConfigDict, dict] = {'inplace': True, 'type': 'SiLU'}, init_cfg: Optional[Union[mmengine.config.config.ConfigDict, dict, List[Union[dict, mmengine.config.config.ConfigDict]]]] = None)[source]¶
Detection Head of RTMDet.
- Parameters
num_classes (int) – Number of categories excluding the background category.
in_channels (int) – Number of channels in the input feature map.
widen_factor (float) – Width multiplier, multiply number of channels in each layer by this amount. Defaults to 1.0.
num_base_priors (int) – The number of priors (points) at a point on the feature grid. Defaults to 1.
feat_channels (int) – Number of hidden channels. Used in child classes. Defaults to 256
stacked_convs (int) – Number of stacking convs of the head. Defaults to 2.
featmap_strides (Sequence[int]) – Downsample factor of each feature map. Defaults to (8, 16, 32).
share_conv (bool) – Whether to share conv layers between stages. Defaults to True.
pred_kernel_size (int) – Kernel size of
nn.Conv2d
. Defaults to 1.conv_cfg (
ConfigDict
or dict, optional) – Config dict for convolution layer. Defaults to None.norm_cfg (
ConfigDict
or dict) – Config dict for normalization layer. Defaults todict(type='BN')
.act_cfg (
ConfigDict
or dict) – Config dict for activation layer. Default: dict(type=’SiLU’, inplace=True).
- :param init_cfg (
ConfigDict
or list[ConfigDict
] or dict or: list[dict], optional): Initialization config dict. Defaults to None.
- forward(feats: Tuple[torch.Tensor, ...]) → tuple[source]¶
Forward features from the upstream network.
- Parameters
feats (tuple[Tensor]) – Features from the upstream network, each is a 4D-tensor.
- Returns
Usually a tuple of classification scores and bbox prediction - cls_scores (list[Tensor]): Classification scores for all scale
levels, each is a 4D-tensor, the channels number is num_base_priors * num_classes.
bbox_preds (list[Tensor]): Box energies / deltas for all scale levels, each is a 4D-tensor, the channels number is num_base_priors * 4.
- Return type
tuple
- class mmyolo.models.dense_heads.YOLOXHead(head_module: Union[mmengine.config.config.ConfigDict, dict], prior_generator: Union[mmengine.config.config.ConfigDict, dict] = {'offset': 0, 'strides': [8, 16, 32], 'type': 'mmdet.MlvlPointGenerator'}, bbox_coder: Union[mmengine.config.config.ConfigDict, dict] = {'type': 'YOLOXBBoxCoder'}, loss_cls: Union[mmengine.config.config.ConfigDict, dict] = {'loss_weight': 1.0, 'reduction': 'sum', 'type': 'mmdet.CrossEntropyLoss', 'use_sigmoid': True}, loss_bbox: Union[mmengine.config.config.ConfigDict, dict] = {'eps': 1e-16, 'loss_weight': 5.0, 'mode': 'square', 'reduction': 'sum', 'type': 'mmdet.IoULoss'}, loss_obj: Union[mmengine.config.config.ConfigDict, dict] = {'loss_weight': 1.0, 'reduction': 'sum', 'type': 'mmdet.CrossEntropyLoss', 'use_sigmoid': True}, loss_bbox_aux: Union[mmengine.config.config.ConfigDict, dict] = {'loss_weight': 1.0, 'reduction': 'sum', 'type': 'mmdet.L1Loss'}, train_cfg: Optional[Union[mmengine.config.config.ConfigDict, dict]] = None, test_cfg: Optional[Union[mmengine.config.config.ConfigDict, dict]] = None, init_cfg: Optional[Union[mmengine.config.config.ConfigDict, dict, List[Union[dict, mmengine.config.config.ConfigDict]]]] = None)[source]¶
YOLOXHead head used in YOLOX.
- Parameters
head_module (ConfigType) – Base module used for YOLOXHead
prior_generator – Points generator feature maps in 2D points-based detectors.
loss_cls (
ConfigDict
or dict) – Config of classification loss.loss_bbox (
ConfigDict
or dict) – Config of localization loss.loss_obj (
ConfigDict
or dict) – Config of objectness loss.loss_bbox_aux (
ConfigDict
or dict) – Config of bbox aux loss.train_cfg (
ConfigDict
or dict, optional) – Training config of anchor head. Defaults to None.test_cfg (
ConfigDict
or dict, optional) – Testing config of anchor head. Defaults to None.
- :param init_cfg (
ConfigDict
or list[ConfigDict
] or dict or: list[dict], optional): Initialization config dict. Defaults to None.
- forward(x: Tuple[torch.Tensor]) → Tuple[List][source]¶
Forward features from the upstream network.
- Parameters
x (Tuple[Tensor]) – Features from the upstream network, each is a 4D-tensor.
- Returns
A tuple of multi-level classification scores, bbox predictions, and objectnesses.
- Return type
Tuple[List]
- static gt_instances_preprocess(batch_gt_instances: torch.Tensor, batch_size: int) → List[mmengine.structures.instance_data.InstanceData][source]¶
Split batch_gt_instances with batch size.
- Parameters
batch_gt_instances (Tensor) – Ground truth a 2D-Tensor for whole batch, shape [all_gt_bboxes, 6]
batch_size (int) – Batch size.
- Returns
batch gt instances data, shape [batch_size, InstanceData]
- Return type
List
- loss_by_feat(cls_scores: Sequence[torch.Tensor], bbox_preds: Sequence[torch.Tensor], objectnesses: Sequence[torch.Tensor], batch_gt_instances: torch.Tensor, batch_img_metas: Sequence[dict], batch_gt_instances_ignore: Optional[List[mmengine.structures.instance_data.InstanceData]] = None) → dict[source]¶
Calculate the loss based on the features extracted by the detection head.
- Parameters
cls_scores (Sequence[Tensor]) – Box scores for each scale level, each is a 4D-tensor, the channel number is num_priors * num_classes.
bbox_preds (Sequence[Tensor]) – Box energies / deltas for each scale level, each is a 4D-tensor, the channel number is num_priors * 4.
objectnesses (Sequence[Tensor]) – Score factor for all scale level, each is a 4D-tensor, has shape (batch_size, 1, H, W).
batch_gt_instances (list[
InstanceData
]) – Batch of gt_instance. It usually includesbboxes
andlabels
attributes.batch_img_metas (list[dict]) – Meta information of each image, e.g., image size, scaling factor, etc.
batch_gt_instances_ignore (list[
InstanceData
], optional) – Batch of gt_instances_ignore. It includesbboxes
attribute data that is ignored during training and testing. Defaults to None.
- Returns
A dictionary of losses.
- Return type
dict[str, Tensor]
- class mmyolo.models.dense_heads.YOLOXHeadModule(num_classes: int, in_channels: Union[int, Sequence], widen_factor: float = 1.0, num_base_priors: int = 1, feat_channels: int = 256, stacked_convs: int = 2, featmap_strides: Sequence[int] = [8, 16, 32], use_depthwise: bool = False, dcn_on_last_conv: bool = False, conv_bias: Union[bool, str] = 'auto', conv_cfg: Optional[Union[mmengine.config.config.ConfigDict, dict]] = None, norm_cfg: Union[mmengine.config.config.ConfigDict, dict] = {'eps': 0.001, 'momentum': 0.03, 'type': 'BN'}, act_cfg: Union[mmengine.config.config.ConfigDict, dict] = {'inplace': True, 'type': 'SiLU'}, init_cfg: Optional[Union[mmengine.config.config.ConfigDict, dict, List[Union[dict, mmengine.config.config.ConfigDict]]]] = None)[source]¶
YOLOXHead head module used in `YOLOX.
https://arxiv.org/abs/2107.08430
- Parameters
num_classes (int) – Number of categories excluding the background category.
in_channels (Union[int, Sequence]) – Number of channels in the input feature map.
widen_factor (float) – Width multiplier, multiply number of channels in each layer by this amount. Defaults to 1.0.
num_base_priors (int) – The number of priors (points) at a point on the feature grid
stacked_convs (int) – Number of stacking convs of the head. Defaults to 2.
featmap_strides (Sequence[int]) – Downsample factor of each feature map. Defaults to [8, 16, 32].
use_depthwise (bool) – Whether to depthwise separable convolution in blocks. Defaults to False.
dcn_on_last_conv (bool) – If true, use dcn in the last layer of towers. Defaults to False.
conv_bias (bool or str) – If specified as auto, it will be decided by the norm_cfg. Bias of conv will be set as True if norm_cfg is None, otherwise False. Defaults to “auto”.
conv_cfg (
ConfigDict
or dict, optional) – Config dict for convolution layer. Defaults to None.norm_cfg (
ConfigDict
or dict) – Config dict for normalization layer. Defaults to dict(type=’BN’, momentum=0.03, eps=0.001).act_cfg (
ConfigDict
or dict) – Config dict for activation layer. Defaults to None.
- :param init_cfg (
ConfigDict
or list[ConfigDict
] or dict or: list[dict], optional): Initialization config dict. Defaults to None.
- forward(x: Tuple[torch.Tensor]) → Tuple[List][source]¶
Forward features from the upstream network.
- Parameters
x (Tuple[Tensor]) – Features from the upstream network, each is a 4D-tensor.
- Returns
A tuple of multi-level classification scores, bbox predictions, and objectnesses.
- Return type
Tuple[List]
- forward_single(x: torch.Tensor, cls_convs: torch.nn.modules.module.Module, reg_convs: torch.nn.modules.module.Module, conv_cls: torch.nn.modules.module.Module, conv_reg: torch.nn.modules.module.Module, conv_obj: torch.nn.modules.module.Module) → Tuple[torch.Tensor, torch.Tensor, torch.Tensor][source]¶
Forward feature of a single scale level.
- class mmyolo.models.dense_heads.YOLOXPoseHead(loss_pose: Optional[Union[mmengine.config.config.ConfigDict, dict]] = None, *args, **kwargs)[source]¶
YOLOXPoseHead head used in `YOLO-Pose.
<https://arxiv.org/abs/2204.06806>`_. :param loss_pose: Config of keypoint OKS loss. :type loss_pose: ConfigDict, optional
- decode_pose(grids: torch.Tensor, offsets: torch.Tensor, strides: Union[torch.Tensor, int]) → torch.Tensor[source]¶
Decode regression offsets to keypoints.
- Parameters
grids (torch.Tensor) – The coordinates of the feature map grids.
offsets (torch.Tensor) – The predicted offset of each keypoint relative to its corresponding grid.
strides (torch.Tensor | int) – The stride of the feature map for each instance.
- Returns
The decoded keypoints coordinates.
- Return type
torch.Tensor
- static gt_instances_preprocess(batch_gt_instances: List[mmengine.structures.instance_data.InstanceData], *args, **kwargs) → List[mmengine.structures.instance_data.InstanceData][source]¶
Split batch_gt_instances with batch size.
- Parameters
batch_gt_instances (Tensor) – Ground truth a 2D-Tensor for whole batch, shape [all_gt_bboxes, 6]
batch_size (int) – Batch size.
- Returns
batch gt instances data, shape [batch_size, InstanceData]
- Return type
List
- static gt_kps_instances_preprocess(batch_gt_instances: torch.Tensor, batch_gt_keypoints, batch_gt_keypoints_visible, batch_size: int) → List[mmengine.structures.instance_data.InstanceData][source]¶
Split batch_gt_instances with batch size.
- Parameters
batch_gt_instances (Tensor) – Ground truth a 2D-Tensor for whole batch, shape [all_gt_bboxes, 6]
batch_size (int) – Batch size.
- Returns
batch gt instances data, shape [batch_size, InstanceData]
- Return type
List
- loss(x: Tuple[torch.Tensor], batch_data_samples: Union[list, dict]) → dict[source]¶
Perform forward propagation and loss calculation of the detection head on the features of the upstream network.
- Parameters
x (tuple[Tensor]) – Features from the upstream network, each is a 4D-tensor.
batch_data_samples (List[
DetDataSample
], dict) – The Data Samples. It usually includes information such as gt_instance, gt_panoptic_seg and gt_sem_seg.
- Returns
A dictionary of loss components.
- Return type
dict
- loss_by_feat(cls_scores: Sequence[torch.Tensor], bbox_preds: Sequence[torch.Tensor], objectnesses: Sequence[torch.Tensor], kpt_preds: Sequence[torch.Tensor], vis_preds: Sequence[torch.Tensor], batch_gt_instances: torch.Tensor, batch_gt_keypoints: torch.Tensor, batch_gt_keypoints_visible: torch.Tensor, batch_img_metas: Sequence[dict], batch_gt_instances_ignore: Optional[List[mmengine.structures.instance_data.InstanceData]] = None) → dict[source]¶
Calculate the loss based on the features extracted by the detection head.
In addition to the base class method, keypoint losses are also calculated in this method.
- predict_by_feat(cls_scores: List[torch.Tensor], bbox_preds: List[torch.Tensor], objectnesses: Optional[List[torch.Tensor]] = None, kpt_preds: Optional[List[torch.Tensor]] = None, vis_preds: Optional[List[torch.Tensor]] = None, batch_img_metas: Optional[List[dict]] = None, cfg: Optional[mmengine.config.config.ConfigDict] = None, rescale: bool = True, with_nms: bool = True) → List[mmengine.structures.instance_data.InstanceData][source]¶
Transform a batch of output features extracted by the head into bbox and keypoint results.
In addition to the base class method, keypoint predictions are also calculated in this method.
- class mmyolo.models.dense_heads.YOLOXPoseHeadModule(num_keypoints: int, *args, **kwargs)[source]¶
YOLOXPoseHeadModule serves as a head module for YOLOX-Pose.
In comparison to YOLOXHeadModule, this module introduces branches for keypoint prediction.
- class mmyolo.models.dense_heads.YOLOv5Head(head_module: Union[mmengine.config.config.ConfigDict, dict], prior_generator: Union[mmengine.config.config.ConfigDict, dict] = {'base_sizes': [[(10, 13), (16, 30), (33, 23)], [(30, 61), (62, 45), (59, 119)], [(116, 90), (156, 198), (373, 326)]], 'strides': [8, 16, 32], 'type': 'mmdet.YOLOAnchorGenerator'}, bbox_coder: Union[mmengine.config.config.ConfigDict, dict] = {'type': 'YOLOv5BBoxCoder'}, loss_cls: Union[mmengine.config.config.ConfigDict, dict] = {'loss_weight': 0.5, 'reduction': 'mean', 'type': 'mmdet.CrossEntropyLoss', 'use_sigmoid': True}, loss_bbox: Union[mmengine.config.config.ConfigDict, dict] = {'bbox_format': 'xywh', 'eps': 1e-07, 'iou_mode': 'ciou', 'loss_weight': 0.05, 'reduction': 'mean', 'return_iou': True, 'type': 'IoULoss'}, loss_obj: Union[mmengine.config.config.ConfigDict, dict] = {'loss_weight': 1.0, 'reduction': 'mean', 'type': 'mmdet.CrossEntropyLoss', 'use_sigmoid': True}, prior_match_thr: float = 4.0, near_neighbor_thr: float = 0.5, ignore_iof_thr: float = - 1.0, obj_level_weights: List[float] = [4.0, 1.0, 0.4], train_cfg: Optional[Union[mmengine.config.config.ConfigDict, dict]] = None, test_cfg: Optional[Union[mmengine.config.config.ConfigDict, dict]] = None, init_cfg: Optional[Union[mmengine.config.config.ConfigDict, dict, List[Union[dict, mmengine.config.config.ConfigDict]]]] = None)[source]¶
YOLOv5Head head used in YOLOv5.
- Parameters
head_module (ConfigType) – Base module used for YOLOv5Head
prior_generator (dict) – Points generator feature maps in 2D points-based detectors.
bbox_coder (
ConfigDict
or dict) – Config of bbox coder.loss_cls (
ConfigDict
or dict) – Config of classification loss.loss_bbox (
ConfigDict
or dict) – Config of localization loss.loss_obj (
ConfigDict
or dict) – Config of objectness loss.prior_match_thr (float) – Defaults to 4.0.
ignore_iof_thr (float) – Defaults to -1.0.
obj_level_weights (List[float]) – Defaults to [4.0, 1.0, 0.4].
train_cfg (
ConfigDict
or dict, optional) – Training config of anchor head. Defaults to None.test_cfg (
ConfigDict
or dict, optional) – Testing config of anchor head. Defaults to None.
- :param init_cfg (
ConfigDict
or list[ConfigDict
] or dict or: list[dict], optional): Initialization config dict. Defaults to None.
- forward(x: Tuple[torch.Tensor]) → Tuple[List][source]¶
Forward features from the upstream network.
- Parameters
x (Tuple[Tensor]) – Features from the upstream network, each is a 4D-tensor.
- Returns
A tuple of multi-level classification scores, bbox predictions, and objectnesses.
- Return type
Tuple[List]
- loss(x: Tuple[torch.Tensor], batch_data_samples: Union[list, dict]) → dict[source]¶
Perform forward propagation and loss calculation of the detection head on the features of the upstream network.
- Parameters
x (tuple[Tensor]) – Features from the upstream network, each is a 4D-tensor.
batch_data_samples (List[
DetDataSample
], dict) – The Data Samples. It usually includes information such as gt_instance, gt_panoptic_seg and gt_sem_seg.
- Returns
A dictionary of loss components.
- Return type
dict
- loss_by_feat(cls_scores: Sequence[torch.Tensor], bbox_preds: Sequence[torch.Tensor], objectnesses: Sequence[torch.Tensor], batch_gt_instances: Sequence[mmengine.structures.instance_data.InstanceData], batch_img_metas: Sequence[dict], batch_gt_instances_ignore: Optional[List[mmengine.structures.instance_data.InstanceData]] = None) → dict[source]¶
Calculate the loss based on the features extracted by the detection head.
- Parameters
cls_scores (Sequence[Tensor]) – Box scores for each scale level, each is a 4D-tensor, the channel number is num_priors * num_classes.
bbox_preds (Sequence[Tensor]) – Box energies / deltas for each scale level, each is a 4D-tensor, the channel number is num_priors * 4.
objectnesses (Sequence[Tensor]) – Score factor for all scale level, each is a 4D-tensor, has shape (batch_size, 1, H, W).
batch_gt_instances (Sequence[InstanceData]) – Batch of gt_instance. It usually includes
bboxes
andlabels
attributes.batch_img_metas (Sequence[dict]) – Meta information of each image, e.g., image size, scaling factor, etc.
batch_gt_instances_ignore (list[
InstanceData
], optional) – Batch of gt_instances_ignore. It includesbboxes
attribute data that is ignored during training and testing. Defaults to None.
- Returns
A dictionary of losses.
- Return type
dict[str, Tensor]
- predict_by_feat(cls_scores: List[torch.Tensor], bbox_preds: List[torch.Tensor], objectnesses: Optional[List[torch.Tensor]] = None, batch_img_metas: Optional[List[dict]] = None, cfg: Optional[mmengine.config.config.ConfigDict] = None, rescale: bool = True, with_nms: bool = True) → List[mmengine.structures.instance_data.InstanceData][source]¶
Transform a batch of output features extracted by the head into bbox results. :param cls_scores: Classification scores for all
scale levels, each is a 4D-tensor, has shape (batch_size, num_priors * num_classes, H, W).
- Parameters
bbox_preds (list[Tensor]) – Box energies / deltas for all scale levels, each is a 4D-tensor, has shape (batch_size, num_priors * 4, H, W).
objectnesses (list[Tensor], Optional) – Score factor for all scale level, each is a 4D-tensor, has shape (batch_size, 1, H, W).
batch_img_metas (list[dict], Optional) – Batch image meta info. Defaults to None.
cfg (ConfigDict, optional) – Test / postprocessing configuration, if None, test_cfg would be used. Defaults to None.
rescale (bool) – If True, return boxes in original image space. Defaults to False.
with_nms (bool) – If True, do nms before return boxes. Defaults to True.
- Returns
Object detection results of each image after the post process. Each item usually contains following keys.
scores (Tensor): Classification scores, has a shape (num_instance, )
labels (Tensor): Labels of bboxes, has a shape (num_instances, ).
bboxes (Tensor): Has a shape (num_instances, 4), the last dimension 4 arrange as (x1, y1, x2, y2).
- Return type
list[
InstanceData
]
- class mmyolo.models.dense_heads.YOLOv5HeadModule(num_classes: int, in_channels: Union[int, Sequence], widen_factor: float = 1.0, num_base_priors: int = 3, featmap_strides: Sequence[int] = (8, 16, 32), init_cfg: Optional[Union[mmengine.config.config.ConfigDict, dict, List[Union[dict, mmengine.config.config.ConfigDict]]]] = None)[source]¶
YOLOv5Head head module used in YOLOv5.
- Parameters
num_classes (int) – Number of categories excluding the background category.
in_channels (Union[int, Sequence]) – Number of channels in the input feature map.
widen_factor (float) – Width multiplier, multiply number of channels in each layer by this amount. Defaults to 1.0.
num_base_priors (int) – The number of priors (points) at a point on the feature grid.
featmap_strides (Sequence[int]) – Downsample factor of each feature map. Defaults to (8, 16, 32).
- :param init_cfg (
ConfigDict
or list[ConfigDict
] or dict or: list[dict], optional): Initialization config dict. Defaults to None.
- forward(x: Tuple[torch.Tensor]) → Tuple[List][source]¶
Forward features from the upstream network.
- Parameters
x (Tuple[Tensor]) – Features from the upstream network, each is a 4D-tensor.
- Returns
A tuple of multi-level classification scores, bbox predictions, and objectnesses.
- Return type
Tuple[List]
- class mmyolo.models.dense_heads.YOLOv5InsHead(*args, mask_overlap: bool = True, loss_mask: Union[mmengine.config.config.ConfigDict, dict] = {'reduction': 'none', 'type': 'mmdet.CrossEntropyLoss', 'use_sigmoid': True}, loss_mask_weight=0.05, **kwargs)[source]¶
YOLOv5 Instance Segmentation and Detection head.
- Parameters
mask_overlap (bool) – Defaults to True.
loss_mask (
ConfigDict
or dict) – Config of mask loss.loss_mask_weight (float) – The weight of mask loss.
- crop_mask(masks: torch.Tensor, boxes: torch.Tensor) → torch.Tensor[source]¶
Crop mask by the bounding box.
- Parameters
masks (Tensor) – Predicted mask results. Has shape (1, num_instance, H, W).
boxes (Tensor) – Tensor of the bbox. Has shape (num_instance, 4).
- Returns
The masks are being cropped to the bounding box.
- Return type
(torch.Tensor)
- loss(x: Tuple[torch.Tensor], batch_data_samples: Union[list, dict]) → dict[source]¶
Perform forward propagation and loss calculation of the detection head on the features of the upstream network.
- Parameters
x (tuple[Tensor]) – Features from the upstream network, each is a 4D-tensor.
batch_data_samples (List[
DetDataSample
], dict) – The Data Samples. It usually includes information such as gt_instance, gt_panoptic_seg and gt_sem_seg.
- Returns
A dictionary of loss components.
- Return type
dict
- loss_by_feat(cls_scores: Sequence[torch.Tensor], bbox_preds: Sequence[torch.Tensor], objectnesses: Sequence[torch.Tensor], coeff_preds: Sequence[torch.Tensor], proto_preds: torch.Tensor, batch_gt_instances: Sequence[mmengine.structures.instance_data.InstanceData], batch_gt_masks: Sequence[torch.Tensor], batch_img_metas: Sequence[dict], batch_gt_instances_ignore: Optional[List[mmengine.structures.instance_data.InstanceData]] = None) → dict[source]¶
Calculate the loss based on the features extracted by the detection head.
- Parameters
cls_scores (Sequence[Tensor]) – Box scores for each scale level, each is a 4D-tensor, the channel number is num_priors * num_classes.
bbox_preds (Sequence[Tensor]) – Box energies / deltas for each scale level, each is a 4D-tensor, the channel number is num_priors * 4.
objectnesses (Sequence[Tensor]) – Score factor for all scale level, each is a 4D-tensor, has shape (batch_size, 1, H, W).
coeff_preds (Sequence[Tensor]) – Mask coefficient for each scale level, each is a 4D-tensor, the channel number is num_priors * mask_channels.
proto_preds (Tensor) – Mask prototype features extracted from the mask head, has shape (batch_size, mask_channels, H, W).
batch_gt_instances (Sequence[InstanceData]) – Batch of gt_instance. It usually includes
bboxes
andlabels
attributes.batch_gt_masks (Sequence[Tensor]) – Batch of gt_mask.
batch_img_metas (Sequence[dict]) – Meta information of each image, e.g., image size, scaling factor, etc.
batch_gt_instances_ignore (list[
InstanceData
], optional) – Batch of gt_instances_ignore. It includesbboxes
attribute data that is ignored during training and testing. Defaults to None.
- Returns
A dictionary of losses.
- Return type
dict[str, Tensor]
- predict_by_feat(cls_scores: List[torch.Tensor], bbox_preds: List[torch.Tensor], objectnesses: Optional[List[torch.Tensor]] = None, coeff_preds: Optional[List[torch.Tensor]] = None, proto_preds: Optional[torch.Tensor] = None, batch_img_metas: Optional[List[dict]] = None, cfg: Optional[mmengine.config.config.ConfigDict] = None, rescale: bool = True, with_nms: bool = True) → List[mmengine.structures.instance_data.InstanceData][source]¶
Transform a batch of output features extracted from the head into bbox results. Note: When score_factors is not None, the cls_scores are usually multiplied by it then obtain the real score used in NMS. :param cls_scores: Classification scores for all
scale levels, each is a 4D-tensor, has shape (batch_size, num_priors * num_classes, H, W).
- Parameters
bbox_preds (list[Tensor]) – Box energies / deltas for all scale levels, each is a 4D-tensor, has shape (batch_size, num_priors * 4, H, W).
objectnesses (list[Tensor], Optional) – Score factor for all scale level, each is a 4D-tensor, has shape (batch_size, 1, H, W).
coeff_preds (list[Tensor]) – Mask coefficients predictions for all scale levels, each is a 4D-tensor, has shape (batch_size, mask_channels, H, W).
proto_preds (Tensor) – Mask prototype features extracted from the mask head, has shape (batch_size, mask_channels, H, W).
batch_img_metas (list[dict], Optional) – Batch image meta info. Defaults to None.
cfg (ConfigDict, optional) – Test / postprocessing configuration, if None, test_cfg would be used. Defaults to None.
rescale (bool) – If True, return boxes in original image space. Defaults to False.
with_nms (bool) – If True, do nms before return boxes. Defaults to True.
- Returns
Object detection and instance segmentation results of each image after the post process. Each item usually contains following keys.
scores (Tensor): Classification scores, has a shape (num_instance, )
labels (Tensor): Labels of bboxes, has a shape (num_instances, ).
bboxes (Tensor): Has a shape (num_instances, 4), the last dimension 4 arrange as (x1, y1, x2, y2).
masks (Tensor): Has a shape (num_instances, h, w).
- Return type
list[
InstanceData
]
- process_mask(mask_proto: torch.Tensor, mask_coeff_pred: torch.Tensor, bboxes: torch.Tensor, shape: Tuple[int, int], upsample: bool = False) → torch.Tensor[source]¶
Generate mask logits results.
- Parameters
mask_proto (Tensor) – Mask prototype features. Has shape (num_instance, mask_channels).
mask_coeff_pred (Tensor) – Mask coefficients prediction for single image. Has shape (mask_channels, H, W)
bboxes (Tensor) – Tensor of the bbox. Has shape (num_instance, 4).
shape (Tuple) – Batch input shape of image.
upsample (bool) – Whether upsample masks results to batch input shape. Default to False.
- Returns
- Instance segmentation masks for each instance.
Has shape (num_instance, H, W).
- Return type
Tensor
- class mmyolo.models.dense_heads.YOLOv5InsHeadModule(*args, num_classes: int, mask_channels: int = 32, proto_channels: int = 256, widen_factor: float = 1.0, norm_cfg: Union[mmengine.config.config.ConfigDict, dict] = {'eps': 0.001, 'momentum': 0.03, 'type': 'BN'}, act_cfg: Union[mmengine.config.config.ConfigDict, dict] = {'inplace': True, 'type': 'SiLU'}, **kwargs)[source]¶
Detection and Instance Segmentation Head of YOLOv5.
- Parameters
num_classes (int) – Number of categories excluding the background category.
mask_channels (int) – Number of channels in the mask feature map. This is the channel count of the mask.
proto_channels (int) – Number of channels in the proto feature map.
widen_factor (float) – Width multiplier, multiply number of channels in each layer by this amount. Defaults to 1.0.
norm_cfg (
ConfigDict
or dict) – Config dict for normalization layer. Defaults todict(type='BN', momentum=0.03, eps=0.001)
.act_cfg (
ConfigDict
or dict) – Config dict for activation layer. Default: dict(type=’SiLU’, inplace=True).
- forward(x: Tuple[torch.Tensor]) → Tuple[List][source]¶
Forward features from the upstream network.
- Parameters
x (Tuple[Tensor]) – Features from the upstream network, each is a 4D-tensor.
- Returns
A tuple of multi-level classification scores, bbox predictions, objectnesses, and mask predictions.
- Return type
Tuple[List]
- class mmyolo.models.dense_heads.YOLOv6Head(head_module: Union[mmengine.config.config.ConfigDict, dict], prior_generator: Union[mmengine.config.config.ConfigDict, dict] = {'offset': 0.5, 'strides': [8, 16, 32], 'type': 'mmdet.MlvlPointGenerator'}, bbox_coder: Union[mmengine.config.config.ConfigDict, dict] = {'type': 'DistancePointBBoxCoder'}, loss_cls: Union[mmengine.config.config.ConfigDict, dict] = {'alpha': 0.75, 'gamma': 2.0, 'iou_weighted': True, 'loss_weight': 1.0, 'reduction': 'sum', 'type': 'mmdet.VarifocalLoss', 'use_sigmoid': True}, loss_bbox: Union[mmengine.config.config.ConfigDict, dict] = {'bbox_format': 'xyxy', 'iou_mode': 'giou', 'loss_weight': 2.5, 'reduction': 'mean', 'return_iou': False, 'type': 'IoULoss'}, train_cfg: Optional[Union[mmengine.config.config.ConfigDict, dict]] = None, test_cfg: Optional[Union[mmengine.config.config.ConfigDict, dict]] = None, init_cfg: Optional[Union[mmengine.config.config.ConfigDict, dict, List[Union[dict, mmengine.config.config.ConfigDict]]]] = None)[source]¶
YOLOv6Head head used in YOLOv6.
- Parameters
head_module (ConfigType) – Base module used for YOLOv6Head
prior_generator (dict) – Points generator feature maps in 2D points-based detectors.
loss_cls (
ConfigDict
or dict) – Config of classification loss.loss_bbox (
ConfigDict
or dict) – Config of localization loss.train_cfg (
ConfigDict
or dict, optional) – Training config of anchor head. Defaults to None.test_cfg (
ConfigDict
or dict, optional) – Testing config of anchor head. Defaults to None.
- :param init_cfg (
ConfigDict
or list[ConfigDict
] or dict or: list[dict], optional): Initialization config dict. Defaults to None.
- loss_by_feat(cls_scores: Sequence[torch.Tensor], bbox_preds: Sequence[torch.Tensor], bbox_dist_preds: Sequence[torch.Tensor], batch_gt_instances: Sequence[mmengine.structures.instance_data.InstanceData], batch_img_metas: Sequence[dict], batch_gt_instances_ignore: Optional[List[mmengine.structures.instance_data.InstanceData]] = None) → dict[source]¶
Calculate the loss based on the features extracted by the detection head.
- Parameters
cls_scores (Sequence[Tensor]) – Box scores for each scale level, each is a 4D-tensor, the channel number is num_priors * num_classes.
bbox_preds (Sequence[Tensor]) – Box energies / deltas for each scale level, each is a 4D-tensor, the channel number is num_priors * 4.
batch_gt_instances (list[
InstanceData
]) – Batch of gt_instance. It usually includesbboxes
andlabels
attributes.batch_img_metas (list[dict]) – Meta information of each image, e.g., image size, scaling factor, etc.
batch_gt_instances_ignore (list[
InstanceData
], optional) – Batch of gt_instances_ignore. It includesbboxes
attribute data that is ignored during training and testing. Defaults to None.
- Returns
A dictionary of losses.
- Return type
dict[str, Tensor]
- class mmyolo.models.dense_heads.YOLOv6HeadModule(num_classes: int, in_channels: Union[int, Sequence], widen_factor: float = 1.0, num_base_priors: int = 1, reg_max=0, featmap_strides: Sequence[int] = (8, 16, 32), norm_cfg: Union[mmengine.config.config.ConfigDict, dict] = {'eps': 0.001, 'momentum': 0.03, 'type': 'BN'}, act_cfg: Union[mmengine.config.config.ConfigDict, dict] = {'inplace': True, 'type': 'SiLU'}, init_cfg: Optional[Union[mmengine.config.config.ConfigDict, dict, List[Union[dict, mmengine.config.config.ConfigDict]]]] = None)[source]¶
YOLOv6Head head module used in `YOLOv6.
<https://arxiv.org/pdf/2209.02976>`_.
- Parameters
num_classes (int) – Number of categories excluding the background category.
in_channels (Union[int, Sequence]) – Number of channels in the input feature map.
widen_factor (float) – Width multiplier, multiply number of channels in each layer by this amount. Defaults to 1.0.
num_base_priors – (int): The number of priors (points) at a point on the feature grid.
featmap_strides (Sequence[int]) –
- Downsample factor of each feature map.
Defaults to [8, 16, 32].
None, otherwise False. Defaults to “auto”.
norm_cfg (
ConfigDict
or dict) – Config dict for normalization layer. Defaults to dict(type=’BN’, momentum=0.03, eps=0.001).act_cfg (
ConfigDict
or dict) – Config dict for activation layer. Defaults to None.
- :param init_cfg (
ConfigDict
or list[ConfigDict
] or dict or: list[dict], optional): Initialization config dict. Defaults to None.
- forward(x: Tuple[torch.Tensor]) → Tuple[List][source]¶
Forward features from the upstream network.
- Parameters
x (Tuple[Tensor]) – Features from the upstream network, each is a 4D-tensor.
- Returns
A tuple of multi-level classification scores, bbox predictions.
- Return type
Tuple[List]
- forward_single(x: torch.Tensor, stem: torch.nn.modules.module.Module, cls_conv: torch.nn.modules.module.Module, cls_pred: torch.nn.modules.module.Module, reg_conv: torch.nn.modules.module.Module, reg_pred: torch.nn.modules.module.Module) → Tuple[torch.Tensor, torch.Tensor][source]¶
Forward feature of a single scale level.
- class mmyolo.models.dense_heads.YOLOv7Head(*args, simota_candidate_topk: int = 20, simota_iou_weight: float = 3.0, simota_cls_weight: float = 1.0, aux_loss_weights: float = 0.25, **kwargs)[source]¶
YOLOv7Head head used in YOLOv7.
- Parameters
simota_candidate_topk (int) – The candidate top-k which used to get top-k ious to calculate dynamic-k in BatchYOLOv7Assigner. Defaults to 10.
simota_iou_weight (float) – The scale factor for regression iou cost in BatchYOLOv7Assigner. Defaults to 3.0.
simota_cls_weight (float) – The scale factor for classification cost in BatchYOLOv7Assigner. Defaults to 1.0.
- loss_by_feat(cls_scores: Sequence[Union[torch.Tensor, List]], bbox_preds: Sequence[Union[torch.Tensor, List]], objectnesses: Sequence[Union[torch.Tensor, List]], batch_gt_instances: Sequence[mmengine.structures.instance_data.InstanceData], batch_img_metas: Sequence[dict], batch_gt_instances_ignore: Optional[List[mmengine.structures.instance_data.InstanceData]] = None) → dict[source]¶
Calculate the loss based on the features extracted by the detection head.
- Parameters
cls_scores (Sequence[Tensor]) – Box scores for each scale level, each is a 4D-tensor, the channel number is num_priors * num_classes.
bbox_preds (Sequence[Tensor]) – Box energies / deltas for each scale level, each is a 4D-tensor, the channel number is num_priors * 4.
objectnesses (Sequence[Tensor]) – Score factor for all scale level, each is a 4D-tensor, has shape (batch_size, 1, H, W).
batch_gt_instances (list[
InstanceData
]) – Batch of gt_instance. It usually includesbboxes
andlabels
attributes.batch_img_metas (list[dict]) – Meta information of each image, e.g., image size, scaling factor, etc.
batch_gt_instances_ignore (list[
InstanceData
], optional) – Batch of gt_instances_ignore. It includesbboxes
attribute data that is ignored during training and testing. Defaults to None.
- Returns
A dictionary of losses.
- Return type
dict[str, Tensor]
- class mmyolo.models.dense_heads.YOLOv7HeadModule(num_classes: int, in_channels: Union[int, Sequence], widen_factor: float = 1.0, num_base_priors: int = 3, featmap_strides: Sequence[int] = (8, 16, 32), init_cfg: Optional[Union[mmengine.config.config.ConfigDict, dict, List[Union[dict, mmengine.config.config.ConfigDict]]]] = None)[source]¶
YOLOv7Head head module used in YOLOv7.
- class mmyolo.models.dense_heads.YOLOv7p6HeadModule(*args, main_out_channels: Sequence[int] = [256, 512, 768, 1024], aux_out_channels: Sequence[int] = [320, 640, 960, 1280], use_aux: bool = True, norm_cfg: Union[mmengine.config.config.ConfigDict, dict] = {'eps': 0.001, 'momentum': 0.03, 'type': 'BN'}, act_cfg: Union[mmengine.config.config.ConfigDict, dict] = {'inplace': True, 'type': 'SiLU'}, **kwargs)[source]¶
YOLOv7Head head module used in YOLOv7.
- forward(x: Tuple[torch.Tensor]) → Tuple[List][source]¶
Forward features from the upstream network.
- Parameters
x (Tuple[Tensor]) – Features from the upstream network, each is a 4D-tensor.
- Returns
A tuple of multi-level classification scores, bbox predictions, and objectnesses.
- Return type
Tuple[List]
- class mmyolo.models.dense_heads.YOLOv8Head(head_module: Union[mmengine.config.config.ConfigDict, dict], prior_generator: Union[mmengine.config.config.ConfigDict, dict] = {'offset': 0.5, 'strides': [8, 16, 32], 'type': 'mmdet.MlvlPointGenerator'}, bbox_coder: Union[mmengine.config.config.ConfigDict, dict] = {'type': 'DistancePointBBoxCoder'}, loss_cls: Union[mmengine.config.config.ConfigDict, dict] = {'loss_weight': 0.5, 'reduction': 'none', 'type': 'mmdet.CrossEntropyLoss', 'use_sigmoid': True}, loss_bbox: Union[mmengine.config.config.ConfigDict, dict] = {'bbox_format': 'xyxy', 'iou_mode': 'ciou', 'loss_weight': 7.5, 'reduction': 'sum', 'return_iou': False, 'type': 'IoULoss'}, loss_dfl={'loss_weight': 0.375, 'reduction': 'mean', 'type': 'mmdet.DistributionFocalLoss'}, train_cfg: Optional[Union[mmengine.config.config.ConfigDict, dict]] = None, test_cfg: Optional[Union[mmengine.config.config.ConfigDict, dict]] = None, init_cfg: Optional[Union[mmengine.config.config.ConfigDict, dict, List[Union[dict, mmengine.config.config.ConfigDict]]]] = None)[source]¶
YOLOv8Head head used in YOLOv8.
- Parameters
head_module (
ConfigDict
or dict) – Base module used for YOLOv8Headprior_generator (dict) – Points generator feature maps in 2D points-based detectors.
bbox_coder (
ConfigDict
or dict) – Config of bbox coder.loss_cls (
ConfigDict
or dict) – Config of classification loss.loss_bbox (
ConfigDict
or dict) – Config of localization loss.loss_dfl (
ConfigDict
or dict) – Config of Distribution Focal Loss.train_cfg (
ConfigDict
or dict, optional) – Training config of anchor head. Defaults to None.test_cfg (
ConfigDict
or dict, optional) – Testing config of anchor head. Defaults to None.
- :param init_cfg (
ConfigDict
or list[ConfigDict
] or dict or: list[dict], optional): Initialization config dict. Defaults to None.
- loss_by_feat(cls_scores: Sequence[torch.Tensor], bbox_preds: Sequence[torch.Tensor], bbox_dist_preds: Sequence[torch.Tensor], batch_gt_instances: Sequence[mmengine.structures.instance_data.InstanceData], batch_img_metas: Sequence[dict], batch_gt_instances_ignore: Optional[List[mmengine.structures.instance_data.InstanceData]] = None) → dict[source]¶
Calculate the loss based on the features extracted by the detection head.
- Parameters
cls_scores (Sequence[Tensor]) – Box scores for each scale level, each is a 4D-tensor, the channel number is num_priors * num_classes.
bbox_preds (Sequence[Tensor]) – Box energies / deltas for each scale level, each is a 4D-tensor, the channel number is num_priors * 4.
bbox_dist_preds (Sequence[Tensor]) – Box distribution logits for each scale level with shape (bs, reg_max + 1, H*W, 4).
batch_gt_instances (list[
InstanceData
]) – Batch of gt_instance. It usually includesbboxes
andlabels
attributes.batch_img_metas (list[dict]) – Meta information of each image, e.g., image size, scaling factor, etc.
batch_gt_instances_ignore (list[
InstanceData
], optional) – Batch of gt_instances_ignore. It includesbboxes
attribute data that is ignored during training and testing. Defaults to None.
- Returns
A dictionary of losses.
- Return type
dict[str, Tensor]
- class mmyolo.models.dense_heads.YOLOv8HeadModule(num_classes: int, in_channels: Union[int, Sequence], widen_factor: float = 1.0, num_base_priors: int = 1, featmap_strides: Sequence[int] = (8, 16, 32), reg_max: int = 16, norm_cfg: Union[mmengine.config.config.ConfigDict, dict] = {'eps': 0.001, 'momentum': 0.03, 'type': 'BN'}, act_cfg: Union[mmengine.config.config.ConfigDict, dict] = {'inplace': True, 'type': 'SiLU'}, init_cfg: Optional[Union[mmengine.config.config.ConfigDict, dict, List[Union[dict, mmengine.config.config.ConfigDict]]]] = None)[source]¶
YOLOv8HeadModule head module used in YOLOv8.
- Parameters
num_classes (int) – Number of categories excluding the background category.
in_channels (Union[int, Sequence]) – Number of channels in the input feature map.
widen_factor (float) – Width multiplier, multiply number of channels in each layer by this amount. Defaults to 1.0.
num_base_priors (int) – The number of priors (points) at a point on the feature grid.
featmap_strides (Sequence[int]) – Downsample factor of each feature map. Defaults to [8, 16, 32].
reg_max (int) – Max value of integral set :math:
{0, ..., reg_max-1}
in QFL setting. Defaults to 16.norm_cfg (
ConfigDict
or dict) – Config dict for normalization layer. Defaults to dict(type=’BN’, momentum=0.03, eps=0.001).act_cfg (
ConfigDict
or dict) – Config dict for activation layer. Defaults to None.
- :param init_cfg (
ConfigDict
or list[ConfigDict
] or dict or: list[dict], optional): Initialization config dict. Defaults to None.
- forward(x: Tuple[torch.Tensor]) → Tuple[List][source]¶
Forward features from the upstream network.
- Parameters
x (Tuple[Tensor]) – Features from the upstream network, each is a 4D-tensor.
- Returns
A tuple of multi-level classification scores, bbox predictions
- Return type
Tuple[List]
detectors¶
- class mmyolo.models.detectors.YOLODetector(backbone: Union[mmengine.config.config.ConfigDict, dict], neck: Union[mmengine.config.config.ConfigDict, dict], bbox_head: Union[mmengine.config.config.ConfigDict, dict], train_cfg: Optional[Union[mmengine.config.config.ConfigDict, dict]] = None, test_cfg: Optional[Union[mmengine.config.config.ConfigDict, dict]] = None, data_preprocessor: Optional[Union[mmengine.config.config.ConfigDict, dict]] = None, init_cfg: Optional[Union[mmengine.config.config.ConfigDict, dict, List[Union[dict, mmengine.config.config.ConfigDict]]]] = None, use_syncbn: bool = True)[source]¶
Implementation of YOLO Series
- Parameters
backbone (
ConfigDict
or dict) – The backbone config.neck (
ConfigDict
or dict) – The neck config.bbox_head (
ConfigDict
or dict) – The bbox head config.train_cfg (
ConfigDict
or dict, optional) – The training config of YOLO. Defaults to None.test_cfg (
ConfigDict
or dict, optional) – The testing config of YOLO. Defaults to None.data_preprocessor (
ConfigDict
or dict, optional) – Config ofDetDataPreprocessor
to process the input data. Defaults to None.
- :param init_cfg (
ConfigDict
or list[ConfigDict
] or dict or: list[dict], optional): Initialization config dict. Defaults to None.
- Parameters
use_syncbn (bool) – whether to use SyncBatchNorm. Defaults to True.
layers¶
- class mmyolo.models.layers.BepC3StageBlock(in_channels: int, out_channels: int, num_blocks: int = 1, hidden_ratio: float = 0.5, concat_all_layer: bool = True, block_cfg: Union[mmengine.config.config.ConfigDict, dict] = {'type': 'RepVGGBlock'}, norm_cfg: Union[mmengine.config.config.ConfigDict, dict] = {'eps': 0.001, 'momentum': 0.03, 'type': 'BN'}, act_cfg: Union[mmengine.config.config.ConfigDict, dict] = {'inplace': True, 'type': 'ReLU'})[source]¶
Beer-mug RepC3 Block.
- Parameters
in_channels (int) – Number of channels in the input image
out_channels (int) – Number of channels produced by the convolution
num_blocks (int) – Number of blocks. Defaults to 1
hidden_ratio (float) – Hidden channel expansion. Default: 0.5
concat_all_layer (bool) – Concat all layer when forward calculate. Default: True
block_cfg (dict) – Config dict for the block used to build each layer. Defaults to dict(type=’RepVGGBlock’).
norm_cfg (ConfigType) – Config dict for normalization layer. Defaults to dict(type=’BN’, momentum=0.03, eps=0.001).
act_cfg (ConfigType) – Config dict for activation layer. Defaults to dict(type=’ReLU’, inplace=True).
- forward(x)[source]¶
Defines the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Module
instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
- class mmyolo.models.layers.BiFusion(in_channels0: int, in_channels1: int, out_channels: int, norm_cfg: Union[mmengine.config.config.ConfigDict, dict] = {'eps': 0.001, 'momentum': 0.03, 'type': 'BN'}, act_cfg: Union[mmengine.config.config.ConfigDict, dict] = {'inplace': True, 'type': 'ReLU'})[source]¶
BiFusion Block in YOLOv6.
BiFusion fuses current-, high- and low-level features. Compared with concatenation in PAN, it fuses an extra low-level feature.
- Parameters
in_channels0 (int) – The channels of current-level feature.
in_channels1 (int) – The input channels of lower-level feature.
out_channels (int) – The out channels of the BiFusion module.
norm_cfg (dict) – Config dict for normalization layer. Defaults to dict(type=’BN’, momentum=0.03, eps=0.001).
act_cfg (dict) – Config dict for activation layer. Defaults to dict(type=’SiLU’, inplace=True).
init_cfg (dict or list[dict], optional) – Initialization config dict. Defaults to None.
- class mmyolo.models.layers.CSPLayerWithTwoConv(in_channels: int, out_channels: int, expand_ratio: float = 0.5, num_blocks: int = 1, add_identity: bool = True, conv_cfg: Optional[Union[mmengine.config.config.ConfigDict, dict]] = None, norm_cfg: Union[mmengine.config.config.ConfigDict, dict] = {'eps': 0.001, 'momentum': 0.03, 'type': 'BN'}, act_cfg: Union[mmengine.config.config.ConfigDict, dict] = {'inplace': True, 'type': 'SiLU'}, init_cfg: Optional[Union[mmengine.config.config.ConfigDict, dict, List[Union[dict, mmengine.config.config.ConfigDict]]]] = None)[source]¶
Cross Stage Partial Layer with 2 convolutions.
- Parameters
in_channels (int) – The input channels of the CSP layer.
out_channels (int) – The output channels of the CSP layer.
expand_ratio (float) – Ratio to adjust the number of channels of the hidden layer. Defaults to 0.5.
num_blocks (int) – Number of blocks. Defaults to 1
add_identity (bool) – Whether to add identity in blocks. Defaults to True.
conv_cfg (dict, optional) – Config dict for convolution layer. Defaults to None, which means using conv2d.
norm_cfg (dict) – Config dict for normalization layer. Defaults to dict(type=’BN’).
act_cfg (dict) – Config dict for activation layer. Defaults to dict(type=’SiLU’, inplace=True).
- :param init_cfg (
ConfigDict
or dict or list[dict] or: list[ConfigDict
], optional): Initialization config dict. Defaults to None.
- class mmyolo.models.layers.DarknetBottleneck(in_channels: int, out_channels: int, expansion: float = 0.5, kernel_size: Sequence[int] = (1, 3), padding: Sequence[int] = (0, 1), add_identity: bool = True, use_depthwise: bool = False, conv_cfg: Optional[Union[mmengine.config.config.ConfigDict, dict]] = None, norm_cfg: Union[mmengine.config.config.ConfigDict, dict] = {'eps': 0.001, 'momentum': 0.03, 'type': 'BN'}, act_cfg: Union[mmengine.config.config.ConfigDict, dict] = {'inplace': True, 'type': 'SiLU'}, init_cfg: Optional[Union[mmengine.config.config.ConfigDict, dict, List[Union[dict, mmengine.config.config.ConfigDict]]]] = None)[source]¶
The basic bottleneck block used in Darknet.
Each ResBlock consists of two ConvModules and the input is added to the final output. Each ConvModule is composed of Conv, BN, and LeakyReLU. The first convLayer has filter size of k1Xk1 and the second one has the filter size of k2Xk2.
Note: This DarknetBottleneck is little different from MMDet’s, we can change the kernel size and padding for each conv.
- Parameters
in_channels (int) – The input channels of this Module.
out_channels (int) – The output channels of this Module.
expansion (float) – The kernel size for hidden channel. Defaults to 0.5.
kernel_size (Sequence[int]) – The kernel size of the convolution. Defaults to (1, 3).
padding (Sequence[int]) – The padding size of the convolution. Defaults to (0, 1).
add_identity (bool) – Whether to add identity to the out. Defaults to True
use_depthwise (bool) – Whether to use depthwise separable convolution. Defaults to False
conv_cfg (dict) – Config dict for convolution layer. Default: None, which means using conv2d.
norm_cfg (dict) – Config dict for normalization layer. Defaults to dict(type=’BN’).
act_cfg (dict) – Config dict for activation layer. Defaults to dict(type=’Swish’).
- class mmyolo.models.layers.EELANBlock(num_elan_block: int, **kwargs)[source]¶
Expand efficient layer aggregation networks for YOLOv7.
- Parameters
num_elan_block (int) – The number of ELANBlock.
- forward(x: torch.Tensor) → torch.Tensor[source]¶
Defines the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Module
instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
- class mmyolo.models.layers.ELANBlock(in_channels: int, out_channels: int, middle_ratio: float, block_ratio: float, num_blocks: int = 2, num_convs_in_block: int = 1, conv_cfg: Optional[Union[mmengine.config.config.ConfigDict, dict]] = None, norm_cfg: Union[mmengine.config.config.ConfigDict, dict] = {'eps': 0.001, 'momentum': 0.03, 'type': 'BN'}, act_cfg: Union[mmengine.config.config.ConfigDict, dict] = {'inplace': True, 'type': 'SiLU'}, init_cfg: Optional[Union[mmengine.config.config.ConfigDict, dict, List[Union[dict, mmengine.config.config.ConfigDict]]]] = None)[source]¶
Efficient layer aggregation networks for YOLOv7.
- Parameters
in_channels (int) – The input channels of this Module.
out_channels (int) – The out channels of this Module.
middle_ratio (float) – The scaling ratio of the middle layer based on the in_channels.
block_ratio (float) – The scaling ratio of the block layer based on the in_channels.
num_blocks (int) – The number of blocks in the main branch. Defaults to 2.
num_convs_in_block (int) – The number of convs pre block. Defaults to 1.
conv_cfg (dict) – Config dict for convolution layer. Defaults to None. which means using conv2d. Defaults to None.
norm_cfg (dict) – Config dict for normalization layer. Defaults to dict(type=’BN’, momentum=0.03, eps=0.001).
act_cfg (dict) – Config dict for activation layer. Defaults to dict(type=’SiLU’, inplace=True).
init_cfg (dict or list[dict], optional) – Initialization config dict. Defaults to None.
- class mmyolo.models.layers.EffectiveSELayer(channels: int, act_cfg: Union[mmengine.config.config.ConfigDict, dict] = {'type': 'HSigmoid'})[source]¶
Effective Squeeze-Excitation.
From CenterMask : Real-Time Anchor-Free Instance Segmentation arxiv (https://arxiv.org/abs/1911.06667) This code referenced to https://github.com/youngwanLEE/CenterMask/blob/72147e8aae673fcaf4103ee90a6a6b73863e7fa1/maskrcnn_benchmark/modeling/backbone/vovnet.py#L108-L121 # noqa
- Parameters
channels (int) – The input and output channels of this Module.
act_cfg (dict) – Config dict for activation layer. Defaults to dict(type=’HSigmoid’).
- class mmyolo.models.layers.ExpMomentumEMA(model: torch.nn.modules.module.Module, momentum: float = 0.0002, gamma: int = 2000, interval=1, device: Optional[torch.device] = None, update_buffers: bool = False)[source]¶
Exponential moving average (EMA) with exponential momentum strategy, which is used in YOLO.
- Parameters
model (nn.Module) – The model to be averaged.
momentum (float) –
- The momentum used for updating ema parameter.
Ema’s parameters are updated with the formula:
averaged_param = (1-momentum) * averaged_param + momentum * source_param. Defaults to 0.0002.
gamma (int) – Use a larger momentum early in training and gradually annealing to a smaller value to update the ema model smoothly. The momentum is calculated as (1 - momentum) * exp(-(1 + steps) / gamma) + momentum. Defaults to 2000.
interval (int) – Interval between two updates. Defaults to 1.
device (torch.device, optional) – If provided, the averaged model will be stored on the
device
. Defaults to None.update_buffers (bool) – if True, it will compute running averages for both the parameters and the buffers of the model. Defaults to False.
- avg_func(averaged_param: torch.Tensor, source_param: torch.Tensor, steps: int)[source]¶
Compute the moving average of the parameters using the exponential momentum strategy.
- Parameters
averaged_param (Tensor) – The averaged parameters.
source_param (Tensor) – The source parameters.
steps (int) – The number of times the parameters have been updated.
- class mmyolo.models.layers.ImplicitA(in_channels: int, mean: float = 0.0, std: float = 0.02)[source]¶
Implicit add layer in YOLOv7.
- Parameters
in_channels (int) – The input channels of this Module.
mean (float) – Mean value of implicit module. Defaults to 0.
std (float) – Std value of implicit module. Defaults to 0.02
- class mmyolo.models.layers.ImplicitM(in_channels: int, mean: float = 1.0, std: float = 0.02)[source]¶
Implicit multiplier layer in YOLOv7.
- Parameters
in_channels (int) – The input channels of this Module.
mean (float) – Mean value of implicit module. Defaults to 1.
std (float) – Std value of implicit module. Defaults to 0.02.
- class mmyolo.models.layers.MaxPoolAndStrideConvBlock(in_channels: int, out_channels: int, maxpool_kernel_sizes: int = 2, use_in_channels_of_middle: bool = False, conv_cfg: Optional[Union[mmengine.config.config.ConfigDict, dict]] = None, norm_cfg: Union[mmengine.config.config.ConfigDict, dict] = {'eps': 0.001, 'momentum': 0.03, 'type': 'BN'}, act_cfg: Union[mmengine.config.config.ConfigDict, dict] = {'inplace': True, 'type': 'SiLU'}, init_cfg: Optional[Union[mmengine.config.config.ConfigDict, dict, List[Union[dict, mmengine.config.config.ConfigDict]]]] = None)[source]¶
Max pooling and stride conv layer for YOLOv7.
- Parameters
in_channels (int) – The input channels of this Module.
out_channels (int) – The out channels of this Module.
maxpool_kernel_sizes (int) – kernel sizes of pooling layers. Defaults to 2.
use_in_channels_of_middle (bool) – Whether to calculate middle channels based on in_channels. Defaults to False.
conv_cfg (dict) – Config dict for convolution layer. Defaults to None. which means using conv2d. Defaults to None.
norm_cfg (dict) – Config dict for normalization layer. Defaults to dict(type=’BN’, momentum=0.03, eps=0.001).
act_cfg (dict) – Config dict for activation layer. Defaults to dict(type=’SiLU’, inplace=True).
init_cfg (dict or list[dict], optional) – Initialization config dict. Defaults to None.
- class mmyolo.models.layers.PPYOLOEBasicBlock(in_channels: int, out_channels: int, norm_cfg: Union[mmengine.config.config.ConfigDict, dict] = {'eps': 1e-05, 'momentum': 0.1, 'type': 'BN'}, act_cfg: Union[mmengine.config.config.ConfigDict, dict] = {'inplace': True, 'type': 'SiLU'}, shortcut: bool = True, use_alpha: bool = False)[source]¶
PPYOLOE Backbone BasicBlock.
- Parameters
in_channels (int) – The input channels of this Module.
out_channels (int) – The output channels of this Module.
norm_cfg (dict) – Config dict for normalization layer. Defaults to dict(type=’BN’, momentum=0.1, eps=1e-5).
act_cfg (dict) – Config dict for activation layer. Defaults to dict(type=’SiLU’, inplace=True).
shortcut (bool) – Whether to add inputs and outputs together
the end of this layer. Defaults to True. (at) –
use_alpha (bool) – Whether to use alpha parameter at 1x1 conv.
- class mmyolo.models.layers.RepStageBlock(in_channels: int, out_channels: int, num_blocks: int = 1, bottle_block: torch.nn.modules.module.Module = <class 'mmyolo.models.layers.yolo_bricks.RepVGGBlock'>, block_cfg: Union[mmengine.config.config.ConfigDict, dict] = {'type': 'RepVGGBlock'})[source]¶
RepStageBlock is a stage block with rep-style basic block.
- Parameters
in_channels (int) – The input channels of this Module.
out_channels (int) – The output channels of this Module.
num_blocks (int, tuple[int]) – Number of blocks. Defaults to 1.
bottle_block (nn.Module) – Basic unit of RepStage. Defaults to RepVGGBlock.
block_cfg (ConfigType) – Config of RepStage. Defaults to ‘RepVGGBlock’.
- class mmyolo.models.layers.RepVGGBlock(in_channels: int, out_channels: int, kernel_size: Union[int, Tuple[int]] = 3, stride: Union[int, Tuple[int]] = 1, padding: Union[int, Tuple[int]] = 1, dilation: Union[int, Tuple[int]] = 1, groups: Optional[int] = 1, padding_mode: Optional[str] = 'zeros', norm_cfg: Union[mmengine.config.config.ConfigDict, dict] = {'eps': 0.001, 'momentum': 0.03, 'type': 'BN'}, act_cfg: Union[mmengine.config.config.ConfigDict, dict] = {'inplace': True, 'type': 'ReLU'}, use_se: bool = False, use_alpha: bool = False, use_bn_first=True, deploy: bool = False)[source]¶
RepVGGBlock is a basic rep-style block, including training and deploy status This code is based on https://github.com/DingXiaoH/RepVGG/blob/main/repvgg.py.
- Parameters
in_channels (int) – Number of channels in the input image
out_channels (int) – Number of channels produced by the convolution
kernel_size (int or tuple) – Size of the convolving kernel
stride (int or tuple) – Stride of the convolution. Default: 1
padding (int, tuple) – Padding added to all four sides of the input. Default: 1
dilation (int or tuple) – Spacing between kernel elements. Default: 1
groups (int, optional) – Number of blocked connections from input channels to output channels. Default: 1
padding_mode (string, optional) – Default: ‘zeros’
use_se (bool) – Whether to use se. Default: False
use_alpha (bool) – Whether to use alpha parameter at 1x1 conv. In PPYOLOE+ model backbone, use_alpha will be set to True. Default: False.
use_bn_first (bool) – Whether to use bn layer before conv. In YOLOv6 and YOLOv7, this will be set to True. In PPYOLOE, this will be set to False. Default: True.
deploy (bool) – Whether in deploy mode. Default: False
- forward(inputs: torch.Tensor) → torch.Tensor[source]¶
Forward process. :param inputs: The input tensor. :type inputs: Tensor
- Returns
The output tensor.
- Return type
Tensor
- class mmyolo.models.layers.SPPFBottleneck(in_channels: int, out_channels: int, kernel_sizes: Union[int, Sequence[int]] = 5, use_conv_first: bool = True, mid_channels_scale: float = 0.5, conv_cfg: Optional[Union[mmengine.config.config.ConfigDict, dict]] = None, norm_cfg: Union[mmengine.config.config.ConfigDict, dict] = {'eps': 0.001, 'momentum': 0.03, 'type': 'BN'}, act_cfg: Union[mmengine.config.config.ConfigDict, dict] = {'inplace': True, 'type': 'SiLU'}, init_cfg: Optional[Union[mmengine.config.config.ConfigDict, dict, List[Union[dict, mmengine.config.config.ConfigDict]]]] = None)[source]¶
Spatial pyramid pooling - Fast (SPPF) layer for YOLOv5, YOLOX and PPYOLOE by Glenn Jocher
- Parameters
in_channels (int) – The input channels of this Module.
out_channels (int) – The output channels of this Module.
kernel_sizes (int, tuple[int]) – Sequential or number of kernel sizes of pooling layers. Defaults to 5.
use_conv_first (bool) – Whether to use conv before pooling layer. In YOLOv5 and YOLOX, the para set to True. In PPYOLOE, the para set to False. Defaults to True.
mid_channels_scale (float) – Channel multiplier, multiply in_channels by this amount to get mid_channels. This parameter is valid only when use_conv_fist=True.Defaults to 0.5.
conv_cfg (dict) – Config dict for convolution layer. Defaults to None. which means using conv2d. Defaults to None.
norm_cfg (dict) – Config dict for normalization layer. Defaults to dict(type=’BN’, momentum=0.03, eps=0.001).
act_cfg (dict) – Config dict for activation layer. Defaults to dict(type=’SiLU’, inplace=True).
init_cfg (dict or list[dict], optional) – Initialization config dict. Defaults to None.
- class mmyolo.models.layers.SPPFCSPBlock(in_channels: int, out_channels: int, expand_ratio: float = 0.5, kernel_sizes: Union[int, Sequence[int]] = 5, is_tiny_version: bool = False, conv_cfg: Optional[Union[mmengine.config.config.ConfigDict, dict]] = None, norm_cfg: Union[mmengine.config.config.ConfigDict, dict] = {'eps': 0.001, 'momentum': 0.03, 'type': 'BN'}, act_cfg: Union[mmengine.config.config.ConfigDict, dict] = {'inplace': True, 'type': 'SiLU'}, init_cfg: Optional[Union[mmengine.config.config.ConfigDict, dict, List[Union[dict, mmengine.config.config.ConfigDict]]]] = None)[source]¶
Spatial pyramid pooling - Fast (SPPF) layer with CSP for YOLOv7
- Parameters
in_channels (int) – The input channels of this Module.
out_channels (int) – The output channels of this Module.
expand_ratio (float) – Expand ratio of SPPCSPBlock. Defaults to 0.5.
kernel_sizes (int, tuple[int]) – Sequential or number of kernel sizes of pooling layers. Defaults to 5.
is_tiny_version (bool) – Is tiny version of SPPFCSPBlock. If True, it means it is a yolov7 tiny model. Defaults to False.
conv_cfg (dict) – Config dict for convolution layer. Defaults to None. which means using conv2d. Defaults to None.
norm_cfg (dict) – Config dict for normalization layer. Defaults to dict(type=’BN’, momentum=0.03, eps=0.001).
act_cfg (dict) – Config dict for activation layer. Defaults to dict(type=’SiLU’, inplace=True).
init_cfg (dict or list[dict], optional) – Initialization config dict. Defaults to None.
- class mmyolo.models.layers.TinyDownSampleBlock(in_channels: int, out_channels: int, middle_ratio: float = 1.0, kernel_sizes: Union[int, Sequence[int]] = 3, conv_cfg: Optional[Union[mmengine.config.config.ConfigDict, dict]] = None, norm_cfg: Union[mmengine.config.config.ConfigDict, dict] = {'eps': 0.001, 'momentum': 0.03, 'type': 'BN'}, act_cfg: Union[mmengine.config.config.ConfigDict, dict] = {'negative_slope': 0.1, 'type': 'LeakyReLU'}, init_cfg: Optional[Union[mmengine.config.config.ConfigDict, dict, List[Union[dict, mmengine.config.config.ConfigDict]]]] = None)[source]¶
Down sample layer for YOLOv7-tiny.
- Parameters
in_channels (int) – The input channels of this Module.
out_channels (int) – The out channels of this Module.
middle_ratio (float) – The scaling ratio of the middle layer based on the in_channels. Defaults to 1.0.
kernel_sizes (int, tuple[int]) – Sequential or number of kernel sizes of pooling layers. Defaults to 3.
conv_cfg (dict) – Config dict for convolution layer. Defaults to None. which means using conv2d. Defaults to None.
norm_cfg (dict) – Config dict for normalization layer. Defaults to dict(type=’BN’, momentum=0.03, eps=0.001).
act_cfg (dict) – Config dict for activation layer. Defaults to dict(type=’LeakyReLU’, negative_slope=0.1).
init_cfg (dict or list[dict], optional) – Initialization config dict. Defaults to None.
- forward(x) → torch.Tensor[source]¶
Defines the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Module
instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
losses¶
- class mmyolo.models.losses.IoULoss(iou_mode: str = 'ciou', bbox_format: str = 'xywh', eps: float = 1e-07, reduction: str = 'mean', loss_weight: float = 1.0, return_iou: bool = True)[source]¶
IoULoss.
Computing the IoU loss between a set of predicted bboxes and target bboxes. :param iou_mode: Options are “ciou”.
Defaults to “ciou”.
- Parameters
bbox_format (str) – Options are “xywh” and “xyxy”. Defaults to “xywh”.
eps (float) – Eps to avoid log(0).
reduction (str) – Options are “none”, “mean” and “sum”.
loss_weight (float) – Weight of loss.
return_iou (bool) – If True, return loss and iou.
- forward(pred: torch.Tensor, target: torch.Tensor, weight: Optional[torch.Tensor] = None, avg_factor: Optional[float] = None, reduction_override: Optional[Union[str, bool]] = None) → Tuple[torch.Tensor, torch.Tensor][source]¶
Forward function.
- Parameters
pred (Tensor) – Predicted bboxes of format (x1, y1, x2, y2) or (x, y, w, h),shape (n, 4).
target (Tensor) – Corresponding gt bboxes, shape (n, 4).
weight (Tensor, optional) – Element-wise weights.
avg_factor (float, optional) – Average factor when computing the mean of losses.
reduction_override (str, bool, optional) – Same as built-in losses of PyTorch. Defaults to None.
- Returns
- Return type
loss or tuple(loss, iou)
- class mmyolo.models.losses.OksLoss(metainfo: Optional[str] = None, loss_weight: float = 1.0)[source]¶
A PyTorch implementation of the Object Keypoint Similarity (OKS) loss as described in the paper “YOLO-Pose: Enhancing YOLO for Multi Person Pose Estimation Using Object Keypoint Similarity Loss” by Debapriya et al.
(2022). The OKS loss is used for keypoint-based object recognition and consists of a measure of the similarity between predicted and ground truth keypoint locations, adjusted by the size of the object in the image. The loss function takes as input the predicted keypoint locations, the ground truth keypoint locations, a mask indicating which keypoints are valid, and bounding boxes for the objects. :param metainfo: Path to a JSON file containing information
about the dataset’s annotations.
- Parameters
loss_weight (float) – Weight for the loss.
- compute_oks(output: torch.Tensor, target: torch.Tensor, target_weights: torch.Tensor, bboxes: Optional[torch.Tensor] = None) → torch.Tensor[source]¶
Calculates the OKS loss.
- Parameters
output (Tensor) – Predicted keypoints in shape N x k x 2, where N is batch size, k is the number of keypoints, and 2 are the xy coordinates.
target (Tensor) – Ground truth keypoints in the same shape as output.
target_weights (Tensor) – Mask of valid keypoints in shape N x k, with 1 for valid and 0 for invalid.
bboxes (Optional[Tensor]) – Bounding boxes in shape N x 4, where 4 are the xyxy coordinates.
- Returns
The calculated OKS loss.
- Return type
Tensor
- forward(output: torch.Tensor, target: torch.Tensor, target_weights: torch.Tensor, bboxes: Optional[torch.Tensor] = None) → torch.Tensor[source]¶
Defines the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Module
instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
- mmyolo.models.losses.bbox_overlaps(pred: torch.Tensor, target: torch.Tensor, iou_mode: str = 'ciou', bbox_format: str = 'xywh', siou_theta: float = 4.0, eps: float = 1e-07) → torch.Tensor[source]¶
Calculate overlap between two set of bboxes. Implementation of paper `Enhancing Geometric Factors into Model Learning and Inference for Object Detection and Instance Segmentation.
In the CIoU implementation of YOLOv5 and MMDetection, there is a slight difference in the way the alpha parameter is computed.
- mmdet version:
alpha = (ious > 0.5).float() * v / (1 - ious + v)
- YOLOv5 version:
alpha = v / (v - ious + (1 + eps)
- Parameters
pred (Tensor) – Predicted bboxes of format (x1, y1, x2, y2) or (x, y, w, h),shape (n, 4).
target (Tensor) – Corresponding gt bboxes, shape (n, 4).
iou_mode (str) – Options are (‘iou’, ‘ciou’, ‘giou’, ‘siou’). Defaults to “ciou”.
bbox_format (str) – Options are “xywh” and “xyxy”. Defaults to “xywh”.
siou_theta (float) – siou_theta for SIoU when calculate shape cost. Defaults to 4.0.
eps (float) – Eps to avoid log(0).
- Returns
shape (n, ).
- Return type
Tensor
necks¶
- class mmyolo.models.necks.BaseYOLONeck(in_channels: List[int], out_channels: Union[int, List[int]], deepen_factor: float = 1.0, widen_factor: float = 1.0, upsample_feats_cat_first: bool = True, freeze_all: bool = False, norm_cfg: Optional[Union[mmengine.config.config.ConfigDict, dict]] = None, act_cfg: Optional[Union[mmengine.config.config.ConfigDict, dict]] = None, init_cfg: Optional[Union[mmengine.config.config.ConfigDict, dict, List[Union[dict, mmengine.config.config.ConfigDict]]]] = None, **kwargs)[source]¶
Base neck used in YOLO series.
P5 neck model structure diagram +--------+ +-------+ |top_down|----------+--------->| out |---> output0 | layer1 | | | layer0| +--------+ | +-------+ stride=8 ^ | idx=0 +------+ +--------+ | -----> |reduce|--->| cat | | |layer0| +--------+ | +------+ ^ v +--------+ +-----------+ |upsample| |downsample | | layer1 | | layer0 | +--------+ +-----------+ ^ | +--------+ v |top_down| +-----------+ | layer2 |--->| cat | +--------+ +-----------+ stride=16 ^ v idx=1 +------+ +--------+ +-----------+ +-------+ -----> |reduce|--->| cat | | bottom_up |--->| out |---> output1 |layer1| +--------+ | layer0 | | layer1| +------+ ^ +-----------+ +-------+ | v +--------+ +-----------+ |upsample| |downsample | | layer2 | | layer1 | stride=32 +--------+ +-----------+ idx=2 +------+ ^ v -----> |reduce| | +-----------+ |layer2|---------+------->| cat | +------+ +-----------+ v +-----------+ +-------+ | bottom_up |--->| out |---> output2 | layer1 | | layer2| +-----------+ +-------+
P6 neck model structure diagram +--------+ +-------+ |top_down|----------+--------->| out |---> output0 | layer1 | | | layer0| +--------+ | +-------+ stride=8 ^ | idx=0 +------+ +--------+ | -----> |reduce|--->| cat | | |layer0| +--------+ | +------+ ^ v +--------+ +-----------+ |upsample| |downsample | | layer1 | | layer0 | +--------+ +-----------+ ^ | +--------+ v |top_down| +-----------+ | layer2 |--->| cat | +--------+ +-----------+ stride=16 ^ v idx=1 +------+ +--------+ +-----------+ +-------+ -----> |reduce|--->| cat | | bottom_up |--->| out |---> output1 |layer1| +--------+ | layer0 | | layer1| +------+ ^ +-----------+ +-------+ | v +--------+ +-----------+ |upsample| |downsample | | layer2 | | layer1 | +--------+ +-----------+ ^ | +--------+ v |top_down| +-----------+ | layer3 |--->| cat | +--------+ +-----------+ stride=32 ^ v idx=2 +------+ +--------+ +-----------+ +-------+ -----> |reduce|--->| cat | | bottom_up |--->| out |---> output2 |layer2| +--------+ | layer1 | | layer2| +------+ ^ +-----------+ +-------+ | v +--------+ +-----------+ |upsample| |downsample | | layer3 | | layer2 | +--------+ +-----------+ stride=64 ^ v idx=3 +------+ | +-----------+ -----> |reduce|---------+------->| cat | |layer3| +-----------+ +------+ v +-----------+ +-------+ | bottom_up |--->| out |---> output3 | layer2 | | layer3| +-----------+ +-------+
- Parameters
in_channels (List[int]) – Number of input channels per scale.
out_channels (int) – Number of output channels (used at each scale)
deepen_factor (float) – Depth multiplier, multiply number of blocks in CSP layer by this amount. Defaults to 1.0.
widen_factor (float) – Width multiplier, multiply number of channels in each layer by this amount. Defaults to 1.0.
upsample_feats_cat_first (bool) – Whether the output features are concat first after upsampling in the topdown module. Defaults to True. Currently only YOLOv7 is false.
freeze_all (bool) – Whether to freeze the model. Defaults to False
norm_cfg (dict) – Config dict for normalization layer. Defaults to None.
act_cfg (dict) – Config dict for activation layer. Defaults to None.
init_cfg (dict or list[dict], optional) – Initialization config dict. Defaults to None.
- class mmyolo.models.necks.CSPNeXtPAFPN(in_channels: Sequence[int], out_channels: int, deepen_factor: float = 1.0, widen_factor: float = 1.0, num_csp_blocks: int = 3, freeze_all: bool = False, use_depthwise: bool = False, expand_ratio: float = 0.5, upsample_cfg: Union[mmengine.config.config.ConfigDict, dict] = {'mode': 'nearest', 'scale_factor': 2}, conv_cfg: Optional[bool] = None, norm_cfg: Union[mmengine.config.config.ConfigDict, dict] = {'type': 'BN'}, act_cfg: Union[mmengine.config.config.ConfigDict, dict] = {'inplace': True, 'type': 'SiLU'}, init_cfg: Optional[Union[mmengine.config.config.ConfigDict, dict, List[Union[dict, mmengine.config.config.ConfigDict]]]] = {'a': 2.23606797749979, 'distribution': 'uniform', 'layer': 'Conv2d', 'mode': 'fan_in', 'nonlinearity': 'leaky_relu', 'type': 'Kaiming'})[source]¶
Path Aggregation Network with CSPNeXt blocks.
- Parameters
in_channels (Sequence[int]) – Number of input channels per scale.
out_channels (int) – Number of output channels (used at each scale)
deepen_factor (float) – Depth multiplier, multiply number of blocks in CSP layer by this amount. Defaults to 1.0.
widen_factor (float) – Width multiplier, multiply number of channels in each layer by this amount. Defaults to 1.0.
num_csp_blocks (int) – Number of bottlenecks in CSPLayer. Defaults to 3.
use_depthwise (bool) – Whether to use depthwise separable convolution in blocks. Defaults to False.
expand_ratio (float) – Ratio to adjust the number of channels of the hidden layer. Defaults to 0.5.
upsample_cfg (dict) – Config dict for interpolate layer. Default: dict(scale_factor=2, mode=’nearest’)
conv_cfg (dict, optional) – Config dict for convolution layer. Default: None, which means using conv2d.
norm_cfg (dict) – Config dict for normalization layer. Default: dict(type=’BN’)
act_cfg (dict) – Config dict for activation layer. Default: dict(type=’SiLU’, inplace=True)
init_cfg (dict or list[dict], optional) – Initialization config dict. Default: None.
- build_bottom_up_layer(idx: int) → torch.nn.modules.module.Module[source]¶
build bottom up layer.
- Parameters
idx (int) – layer idx.
- Returns
The bottom up layer.
- Return type
nn.Module
- build_downsample_layer(idx: int) → torch.nn.modules.module.Module[source]¶
build downsample layer.
- Parameters
idx (int) – layer idx.
- Returns
The downsample layer.
- Return type
nn.Module
- build_out_layer(idx: int) → torch.nn.modules.module.Module[source]¶
build out layer.
- Parameters
idx (int) – layer idx.
- Returns
The out layer.
- Return type
nn.Module
- build_reduce_layer(idx: int) → torch.nn.modules.module.Module[source]¶
build reduce layer.
- Parameters
idx (int) – layer idx.
- Returns
The reduce layer.
- Return type
nn.Module
- class mmyolo.models.necks.PPYOLOECSPPAFPN(in_channels: List[int] = [256, 512, 1024], out_channels: List[int] = [256, 512, 1024], deepen_factor: float = 1.0, widen_factor: float = 1.0, freeze_all: bool = False, num_csplayer: int = 1, num_blocks_per_layer: int = 3, block_cfg: Union[mmengine.config.config.ConfigDict, dict] = {'shortcut': False, 'type': 'PPYOLOEBasicBlock', 'use_alpha': False}, norm_cfg: Union[mmengine.config.config.ConfigDict, dict] = {'eps': 1e-05, 'momentum': 0.1, 'type': 'BN'}, act_cfg: Union[mmengine.config.config.ConfigDict, dict] = {'inplace': True, 'type': 'SiLU'}, drop_block_cfg: Optional[Union[mmengine.config.config.ConfigDict, dict]] = None, init_cfg: Optional[Union[mmengine.config.config.ConfigDict, dict, List[Union[dict, mmengine.config.config.ConfigDict]]]] = None, use_spp: bool = False)[source]¶
CSPPAN in PPYOLOE.
- Parameters
in_channels (List[int]) – Number of input channels per scale.
out_channels (List[int]) – Number of output channels (used at each scale).
deepen_factor (float) – Depth multiplier, multiply number of blocks in CSP layer by this amount. Defaults to 1.0.
widen_factor (float) – Width multiplier, multiply number of channels in each layer by this amount. Defaults to 1.0.
freeze_all (bool) – Whether to freeze the model.
num_csplayer (int) – Number of CSPResLayer in per layer. Defaults to 1.
num_blocks_per_layer (int) – Number of blocks per CSPResLayer. Defaults to 3.
block_cfg (dict) – Config dict for block. Defaults to dict(type=’PPYOLOEBasicBlock’, shortcut=True, use_alpha=False)
norm_cfg (dict) – Config dict for normalization layer. Defaults to dict(type=’BN’, momentum=0.1, eps=1e-5).
act_cfg (dict) – Config dict for activation layer. Defaults to dict(type=’SiLU’, inplace=True).
drop_block_cfg (dict, optional) – Drop block config. Defaults to None. If you want to use Drop block after CSPResLayer, you can set this para as dict(type=’mmdet.DropBlock’, drop_prob=0.1, block_size=3, warm_iters=0).
init_cfg (dict or list[dict], optional) – Initialization config dict. Defaults to None.
use_spp (bool) – Whether to use SPP in reduce layer. Defaults to False.
- build_bottom_up_layer(idx: int) → torch.nn.modules.module.Module[source]¶
build bottom up layer.
- Parameters
idx (int) – layer idx.
- Returns
The bottom up layer.
- Return type
nn.Module
- build_downsample_layer(idx: int) → torch.nn.modules.module.Module[source]¶
build downsample layer.
- Parameters
idx (int) – layer idx.
- Returns
The downsample layer.
- Return type
nn.Module
- build_reduce_layer(idx: int)[source]¶
build reduce layer.
- Parameters
idx (int) – layer idx.
- Returns
The reduce layer.
- Return type
nn.Module
- class mmyolo.models.necks.YOLOXPAFPN(in_channels: List[int], out_channels: int, deepen_factor: float = 1.0, widen_factor: float = 1.0, num_csp_blocks: int = 3, use_depthwise: bool = False, freeze_all: bool = False, norm_cfg: Union[mmengine.config.config.ConfigDict, dict] = {'eps': 0.001, 'momentum': 0.03, 'type': 'BN'}, act_cfg: Union[mmengine.config.config.ConfigDict, dict] = {'inplace': True, 'type': 'SiLU'}, init_cfg: Optional[Union[mmengine.config.config.ConfigDict, dict, List[Union[dict, mmengine.config.config.ConfigDict]]]] = None)[source]¶
Path Aggregation Network used in YOLOX.
- Parameters
in_channels (List[int]) – Number of input channels per scale.
out_channels (int) – Number of output channels (used at each scale).
deepen_factor (float) – Depth multiplier, multiply number of blocks in CSP layer by this amount. Defaults to 1.0.
widen_factor (float) – Width multiplier, multiply number of channels in each layer by this amount. Defaults to 1.0.
num_csp_blocks (int) – Number of bottlenecks in CSPLayer. Defaults to 1.
use_depthwise (bool) – Whether to use depthwise separable convolution. Defaults to False.
freeze_all (bool) – Whether to freeze the model. Defaults to False.
norm_cfg (dict) – Config dict for normalization layer. Defaults to dict(type=’BN’, momentum=0.03, eps=0.001).
act_cfg (dict) – Config dict for activation layer. Defaults to dict(type=’SiLU’, inplace=True).
init_cfg (dict or list[dict], optional) – Initialization config dict. Defaults to None.
- build_bottom_up_layer(idx: int) → torch.nn.modules.module.Module[source]¶
build bottom up layer.
- Parameters
idx (int) – layer idx.
- Returns
The bottom up layer.
- Return type
nn.Module
- build_downsample_layer(idx: int) → torch.nn.modules.module.Module[source]¶
build downsample layer.
- Parameters
idx (int) – layer idx.
- Returns
The downsample layer.
- Return type
nn.Module
- build_out_layer(idx: int) → torch.nn.modules.module.Module[source]¶
build out layer.
- Parameters
idx (int) – layer idx.
- Returns
The out layer.
- Return type
nn.Module
- build_reduce_layer(idx: int) → torch.nn.modules.module.Module[source]¶
build reduce layer.
- Parameters
idx (int) – layer idx.
- Returns
The reduce layer.
- Return type
nn.Module
- class mmyolo.models.necks.YOLOv5PAFPN(in_channels: List[int], out_channels: Union[List[int], int], deepen_factor: float = 1.0, widen_factor: float = 1.0, num_csp_blocks: int = 1, freeze_all: bool = False, norm_cfg: Union[mmengine.config.config.ConfigDict, dict] = {'eps': 0.001, 'momentum': 0.03, 'type': 'BN'}, act_cfg: Union[mmengine.config.config.ConfigDict, dict] = {'inplace': True, 'type': 'SiLU'}, init_cfg: Optional[Union[mmengine.config.config.ConfigDict, dict, List[Union[dict, mmengine.config.config.ConfigDict]]]] = None)[source]¶
Path Aggregation Network used in YOLOv5.
- Parameters
in_channels (List[int]) – Number of input channels per scale.
out_channels (int) – Number of output channels (used at each scale)
deepen_factor (float) – Depth multiplier, multiply number of blocks in CSP layer by this amount. Defaults to 1.0.
widen_factor (float) – Width multiplier, multiply number of channels in each layer by this amount. Defaults to 1.0.
num_csp_blocks (int) – Number of bottlenecks in CSPLayer. Defaults to 1.
freeze_all (bool) – Whether to freeze the model
norm_cfg (dict) – Config dict for normalization layer. Defaults to dict(type=’BN’, momentum=0.03, eps=0.001).
act_cfg (dict) – Config dict for activation layer. Defaults to dict(type=’SiLU’, inplace=True).
init_cfg (dict or list[dict], optional) – Initialization config dict. Defaults to None.
- build_bottom_up_layer(idx: int) → torch.nn.modules.module.Module[source]¶
build bottom up layer.
- Parameters
idx (int) – layer idx.
- Returns
The bottom up layer.
- Return type
nn.Module
- build_downsample_layer(idx: int) → torch.nn.modules.module.Module[source]¶
build downsample layer.
- Parameters
idx (int) – layer idx.
- Returns
The downsample layer.
- Return type
nn.Module
- build_reduce_layer(idx: int) → torch.nn.modules.module.Module[source]¶
build reduce layer.
- Parameters
idx (int) – layer idx.
- Returns
The reduce layer.
- Return type
nn.Module
- build_top_down_layer(idx: int)[source]¶
build top down layer.
- Parameters
idx (int) – layer idx.
- Returns
The top down layer.
- Return type
nn.Module
- class mmyolo.models.necks.YOLOv6CSPRepBiPAFPN(in_channels: List[int], out_channels: int, deepen_factor: float = 1.0, widen_factor: float = 1.0, hidden_ratio: float = 0.5, num_csp_blocks: int = 12, freeze_all: bool = False, norm_cfg: Union[mmengine.config.config.ConfigDict, dict] = {'eps': 0.001, 'momentum': 0.03, 'type': 'BN'}, act_cfg: Union[mmengine.config.config.ConfigDict, dict] = {'inplace': True, 'type': 'ReLU'}, block_act_cfg: Union[mmengine.config.config.ConfigDict, dict] = {'inplace': True, 'type': 'SiLU'}, block_cfg: Union[mmengine.config.config.ConfigDict, dict] = {'type': 'RepVGGBlock'}, init_cfg: Optional[Union[mmengine.config.config.ConfigDict, dict, List[Union[dict, mmengine.config.config.ConfigDict]]]] = None)[source]¶
Path Aggregation Network used in YOLOv6 3.0.
- Parameters
in_channels (List[int]) – Number of input channels per scale.
out_channels (int) – Number of output channels (used at each scale)
deepen_factor (float) – Depth multiplier, multiply number of blocks in CSP layer by this amount. Defaults to 1.0.
widen_factor (float) – Width multiplier, multiply number of channels in each layer by this amount. Defaults to 1.0.
num_csp_blocks (int) – Number of bottlenecks in CSPLayer. Defaults to 1.
freeze_all (bool) – Whether to freeze the model.
norm_cfg (dict) – Config dict for normalization layer. Defaults to dict(type=’BN’, momentum=0.03, eps=0.001).
act_cfg (dict) – Config dict for activation layer. Defaults to dict(type=’ReLU’, inplace=True).
block_cfg (dict) – Config dict for the block used to build each layer. Defaults to dict(type=’RepVGGBlock’).
block_act_cfg (dict) – Config dict for activation layer used in each stage. Defaults to dict(type=’SiLU’, inplace=True).
init_cfg (dict or list[dict], optional) – Initialization config dict. Defaults to None.
- class mmyolo.models.necks.YOLOv6CSPRepPAFPN(in_channels: List[int], out_channels: int, deepen_factor: float = 1.0, widen_factor: float = 1.0, hidden_ratio: float = 0.5, num_csp_blocks: int = 12, freeze_all: bool = False, norm_cfg: Union[mmengine.config.config.ConfigDict, dict] = {'eps': 0.001, 'momentum': 0.03, 'type': 'BN'}, act_cfg: Union[mmengine.config.config.ConfigDict, dict] = {'inplace': True, 'type': 'ReLU'}, block_act_cfg: Union[mmengine.config.config.ConfigDict, dict] = {'inplace': True, 'type': 'SiLU'}, block_cfg: Union[mmengine.config.config.ConfigDict, dict] = {'type': 'RepVGGBlock'}, init_cfg: Optional[Union[mmengine.config.config.ConfigDict, dict, List[Union[dict, mmengine.config.config.ConfigDict]]]] = None)[source]¶
Path Aggregation Network used in YOLOv6.
- Parameters
in_channels (List[int]) – Number of input channels per scale.
out_channels (int) – Number of output channels (used at each scale)
deepen_factor (float) – Depth multiplier, multiply number of blocks in CSP layer by this amount. Defaults to 1.0.
widen_factor (float) – Width multiplier, multiply number of channels in each layer by this amount. Defaults to 1.0.
num_csp_blocks (int) – Number of bottlenecks in CSPLayer. Defaults to 1.
freeze_all (bool) – Whether to freeze the model.
norm_cfg (dict) – Config dict for normalization layer. Defaults to dict(type=’BN’, momentum=0.03, eps=0.001).
act_cfg (dict) – Config dict for activation layer. Defaults to dict(type=’ReLU’, inplace=True).
block_cfg (dict) – Config dict for the block used to build each layer. Defaults to dict(type=’RepVGGBlock’).
block_act_cfg (dict) – Config dict for activation layer used in each stage. Defaults to dict(type=’SiLU’, inplace=True).
init_cfg (dict or list[dict], optional) – Initialization config dict. Defaults to None.
- class mmyolo.models.necks.YOLOv6RepBiPAFPN(in_channels: List[int], out_channels: int, deepen_factor: float = 1.0, widen_factor: float = 1.0, num_csp_blocks: int = 12, freeze_all: bool = False, norm_cfg: Union[mmengine.config.config.ConfigDict, dict] = {'eps': 0.001, 'momentum': 0.03, 'type': 'BN'}, act_cfg: Union[mmengine.config.config.ConfigDict, dict] = {'inplace': True, 'type': 'ReLU'}, block_cfg: Union[mmengine.config.config.ConfigDict, dict] = {'type': 'RepVGGBlock'}, init_cfg: Optional[Union[mmengine.config.config.ConfigDict, dict, List[Union[dict, mmengine.config.config.ConfigDict]]]] = None)[source]¶
Path Aggregation Network used in YOLOv6 3.0.
- Parameters
in_channels (List[int]) – Number of input channels per scale.
out_channels (int) – Number of output channels (used at each scale)
deepen_factor (float) – Depth multiplier, multiply number of blocks in CSP layer by this amount. Defaults to 1.0.
widen_factor (float) – Width multiplier, multiply number of channels in each layer by this amount. Defaults to 1.0.
num_csp_blocks (int) – Number of bottlenecks in CSPLayer. Defaults to 1.
freeze_all (bool) – Whether to freeze the model.
norm_cfg (dict) – Config dict for normalization layer. Defaults to dict(type=’BN’, momentum=0.03, eps=0.001).
act_cfg (dict) – Config dict for activation layer. Defaults to dict(type=’ReLU’, inplace=True).
block_cfg (dict) – Config dict for the block used to build each layer. Defaults to dict(type=’RepVGGBlock’).
init_cfg (dict or list[dict], optional) – Initialization config dict. Defaults to None.
- build_top_down_layer(idx: int) → torch.nn.modules.module.Module[source]¶
build top down layer.
- Parameters
idx (int) – layer idx.
- Returns
The top down layer.
- Return type
nn.Module
- class mmyolo.models.necks.YOLOv6RepPAFPN(in_channels: List[int], out_channels: int, deepen_factor: float = 1.0, widen_factor: float = 1.0, num_csp_blocks: int = 12, freeze_all: bool = False, norm_cfg: Union[mmengine.config.config.ConfigDict, dict] = {'eps': 0.001, 'momentum': 0.03, 'type': 'BN'}, act_cfg: Union[mmengine.config.config.ConfigDict, dict] = {'inplace': True, 'type': 'ReLU'}, block_cfg: Union[mmengine.config.config.ConfigDict, dict] = {'type': 'RepVGGBlock'}, init_cfg: Optional[Union[mmengine.config.config.ConfigDict, dict, List[Union[dict, mmengine.config.config.ConfigDict]]]] = None)[source]¶
Path Aggregation Network used in YOLOv6.
- Parameters
in_channels (List[int]) – Number of input channels per scale.
out_channels (int) – Number of output channels (used at each scale)
deepen_factor (float) – Depth multiplier, multiply number of blocks in CSP layer by this amount. Defaults to 1.0.
widen_factor (float) – Width multiplier, multiply number of channels in each layer by this amount. Defaults to 1.0.
num_csp_blocks (int) – Number of bottlenecks in CSPLayer. Defaults to 1.
freeze_all (bool) – Whether to freeze the model.
norm_cfg (dict) – Config dict for normalization layer. Defaults to dict(type=’BN’, momentum=0.03, eps=0.001).
act_cfg (dict) – Config dict for activation layer. Defaults to dict(type=’ReLU’, inplace=True).
block_cfg (dict) – Config dict for the block used to build each layer. Defaults to dict(type=’RepVGGBlock’).
init_cfg (dict or list[dict], optional) – Initialization config dict. Defaults to None.
- build_bottom_up_layer(idx: int) → torch.nn.modules.module.Module[source]¶
build bottom up layer.
- Parameters
idx (int) – layer idx.
- Returns
The bottom up layer.
- Return type
nn.Module
- build_downsample_layer(idx: int) → torch.nn.modules.module.Module[source]¶
build downsample layer.
- Parameters
idx (int) – layer idx.
- Returns
The downsample layer.
- Return type
nn.Module
- build_reduce_layer(idx: int) → torch.nn.modules.module.Module[source]¶
build reduce layer.
- Parameters
idx (int) – layer idx.
- Returns
The reduce layer.
- Return type
nn.Module
- build_top_down_layer(idx: int) → torch.nn.modules.module.Module[source]¶
build top down layer.
- Parameters
idx (int) – layer idx.
- Returns
The top down layer.
- Return type
nn.Module
- class mmyolo.models.necks.YOLOv7PAFPN(in_channels: List[int], out_channels: List[int], block_cfg: dict = {'block_ratio': 0.25, 'middle_ratio': 0.5, 'num_blocks': 4, 'num_convs_in_block': 1, 'type': 'ELANBlock'}, deepen_factor: float = 1.0, widen_factor: float = 1.0, spp_expand_ratio: float = 0.5, is_tiny_version: bool = False, use_maxpool_in_downsample: bool = True, use_in_channels_in_downsample: bool = False, use_repconv_outs: bool = True, upsample_feats_cat_first: bool = False, freeze_all: bool = False, norm_cfg: Union[mmengine.config.config.ConfigDict, dict] = {'eps': 0.001, 'momentum': 0.03, 'type': 'BN'}, act_cfg: Union[mmengine.config.config.ConfigDict, dict] = {'inplace': True, 'type': 'SiLU'}, init_cfg: Optional[Union[mmengine.config.config.ConfigDict, dict, List[Union[dict, mmengine.config.config.ConfigDict]]]] = None)[source]¶
Path Aggregation Network used in YOLOv7.
- Parameters
in_channels (List[int]) – Number of input channels per scale.
out_channels (int) – Number of output channels (used at each scale).
block_cfg (dict) – Config dict for block.
deepen_factor (float) – Depth multiplier, multiply number of blocks in CSP layer by this amount. Defaults to 1.0.
widen_factor (float) – Width multiplier, multiply number of channels in each layer by this amount. Defaults to 1.0.
spp_expand_ratio (float) – Expand ratio of SPPCSPBlock. Defaults to 0.5.
is_tiny_version (bool) – Is tiny version of neck. If True, it means it is a yolov7 tiny model. Defaults to False.
use_maxpool_in_downsample (bool) – Whether maxpooling is used in downsample layers. Defaults to True.
use_in_channels_in_downsample (bool) – MaxPoolAndStrideConvBlock module input parameters. Defaults to False.
use_repconv_outs (bool) – Whether to use repconv in the output layer. Defaults to True.
upsample_feats_cat_first (bool) – Whether the output features are concat first after upsampling in the topdown module. Defaults to True. Currently only YOLOv7 is false.
freeze_all (bool) – Whether to freeze the model. Defaults to False.
norm_cfg (dict) – Config dict for normalization layer. Defaults to dict(type=’BN’, momentum=0.03, eps=0.001).
act_cfg (dict) – Config dict for activation layer. Defaults to dict(type=’SiLU’, inplace=True).
init_cfg (dict or list[dict], optional) – Initialization config dict. Defaults to None.
- build_bottom_up_layer(idx: int) → torch.nn.modules.module.Module[source]¶
build bottom up layer.
- Parameters
idx (int) – layer idx.
- Returns
The bottom up layer.
- Return type
nn.Module
- build_downsample_layer(idx: int) → torch.nn.modules.module.Module[source]¶
build downsample layer.
- Parameters
idx (int) – layer idx.
- Returns
The downsample layer.
- Return type
nn.Module
- build_out_layer(idx: int) → torch.nn.modules.module.Module[source]¶
build out layer.
- Parameters
idx (int) – layer idx.
- Returns
The out layer.
- Return type
nn.Module
- build_reduce_layer(idx: int) → torch.nn.modules.module.Module[source]¶
build reduce layer.
- Parameters
idx (int) – layer idx.
- Returns
The reduce layer.
- Return type
nn.Module
- class mmyolo.models.necks.YOLOv8PAFPN(in_channels: List[int], out_channels: Union[List[int], int], deepen_factor: float = 1.0, widen_factor: float = 1.0, num_csp_blocks: int = 3, freeze_all: bool = False, norm_cfg: Union[mmengine.config.config.ConfigDict, dict] = {'eps': 0.001, 'momentum': 0.03, 'type': 'BN'}, act_cfg: Union[mmengine.config.config.ConfigDict, dict] = {'inplace': True, 'type': 'SiLU'}, init_cfg: Optional[Union[mmengine.config.config.ConfigDict, dict, List[Union[dict, mmengine.config.config.ConfigDict]]]] = None)[source]¶
Path Aggregation Network used in YOLOv8.
- Parameters
in_channels (List[int]) – Number of input channels per scale.
out_channels (int) – Number of output channels (used at each scale)
deepen_factor (float) – Depth multiplier, multiply number of blocks in CSP layer by this amount. Defaults to 1.0.
widen_factor (float) – Width multiplier, multiply number of channels in each layer by this amount. Defaults to 1.0.
num_csp_blocks (int) – Number of bottlenecks in CSPLayer. Defaults to 1.
freeze_all (bool) – Whether to freeze the model
norm_cfg (dict) – Config dict for normalization layer. Defaults to dict(type=’BN’, momentum=0.03, eps=0.001).
act_cfg (dict) – Config dict for activation layer. Defaults to dict(type=’SiLU’, inplace=True).
init_cfg (dict or list[dict], optional) – Initialization config dict. Defaults to None.
- build_bottom_up_layer(idx: int) → torch.nn.modules.module.Module[source]¶
build bottom up layer.
- Parameters
idx (int) – layer idx.
- Returns
The bottom up layer.
- Return type
nn.Module
task_modules¶
- class mmyolo.models.task_modules.BatchATSSAssigner(num_classes: int, iou_calculator: Union[mmengine.config.config.ConfigDict, dict] = {'type': 'mmdet.BboxOverlaps2D'}, topk: int = 9)[source]¶
Assign a batch of corresponding gt bboxes or background to each prior.
This code is based on https://github.com/meituan/YOLOv6/blob/main/yolov6/assigners/atss_assigner.py
Each proposal will be assigned with 0 or a positive integer indicating the ground truth index.
0: negative sample, no assigned gt
positive integer: positive sample, index (1-based) of assigned gt
- Parameters
num_classes (int) – number of class
iou_calculator (
ConfigDict
or dict) – Config dict for iou calculator. Defaults todict(type='BboxOverlaps2D')
topk (int) – number of priors selected in each level
- forward(pred_bboxes: torch.Tensor, priors: torch.Tensor, num_level_priors: List, gt_labels: torch.Tensor, gt_bboxes: torch.Tensor, pad_bbox_flag: torch.Tensor) → dict[source]¶
Assign gt to priors.
The assignment is done in following steps
compute iou between all prior (prior of all pyramid levels) and gt
compute center distance between all prior and gt
on each pyramid level, for each gt, select k prior whose center are closest to the gt center, so we total select k*l prior as candidates for each gt
get corresponding iou for the these candidates, and compute the mean and std, set mean + std as the iou threshold
select these candidates whose iou are greater than or equal to the threshold as positive
limit the positive sample’s center in gt
- Parameters
pred_bboxes (Tensor) – Predicted bounding boxes, shape(batch_size, num_priors, 4)
priors (Tensor) – Model priors with stride, shape(num_priors, 4)
num_level_priors (List) – Number of bboxes in each level, len(3)
gt_labels (Tensor) – Ground truth label, shape(batch_size, num_gt, 1)
gt_bboxes (Tensor) – Ground truth bbox, shape(batch_size, num_gt, 4)
pad_bbox_flag (Tensor) – Ground truth bbox mask, 1 means bbox, 0 means no bbox, shape(batch_size, num_gt, 1)
- Returns
- Assigned result
’assigned_labels’ (Tensor): shape(batch_size, num_gt) ‘assigned_bboxes’ (Tensor): shape(batch_size, num_gt, 4) ‘assigned_scores’ (Tensor):
shape(batch_size, num_gt, number_classes)
’fg_mask_pre_prior’ (Tensor): shape(bs, num_gt)
- Return type
assigned_result (dict)
- get_targets(gt_labels: torch.Tensor, gt_bboxes: torch.Tensor, assigned_gt_inds: torch.Tensor, fg_mask_pre_prior: torch.Tensor, num_priors: int, batch_size: int, num_gt: int) → Tuple[torch.Tensor, torch.Tensor, torch.Tensor][source]¶
Get target info.
- Parameters
gt_labels (Tensor) – Ground true labels, shape(batch_size, num_gt, 1)
gt_bboxes (Tensor) – Ground true bboxes, shape(batch_size, num_gt, 4)
assigned_gt_inds (Tensor) – Assigned ground truth indexes, shape(batch_size, num_priors)
fg_mask_pre_prior (Tensor) – Force ground truth matching mask, shape(batch_size, num_priors)
num_priors (int) – Number of priors.
batch_size (int) – Batch size.
num_gt (int) – Number of ground truth.
- Returns
- Assigned labels,
shape(batch_size, num_priors)
- assigned_bboxes (Tensor): Assigned bboxes,
shape(batch_size, num_priors)
- assigned_scores (Tensor): Assigned scores,
shape(batch_size, num_priors)
- Return type
assigned_labels (Tensor)
- select_topk_candidates(distances: torch.Tensor, num_level_priors: List[int], pad_bbox_flag: torch.Tensor) → Tuple[torch.Tensor, torch.Tensor][source]¶
Selecting candidates based on the center distance.
- Parameters
distances (Tensor) – Distance between all bbox and gt, shape(batch_size, num_gt, num_priors)
num_level_priors (List[int]) – Number of bboxes in each level, len(3)
pad_bbox_flag (Tensor) – Ground truth bbox mask, shape(batch_size, num_gt, 1)
- Returns
- Flag show that each level have
topk candidates or not, shape(batch_size, num_gt, num_priors)
- candidate_idxs (Tensor): Candidates index,
shape(batch_size, num_gt, num_gt)
- Return type
is_in_candidate_list (Tensor)
- static threshold_calculator(is_in_candidate: List, candidate_idxs: torch.Tensor, overlaps: torch.Tensor, num_priors: int, batch_size: int, num_gt: int) → Tuple[torch.Tensor, torch.Tensor][source]¶
Get corresponding iou for the these candidates, and compute the mean and std, set mean + std as the iou threshold.
- Parameters
is_in_candidate (Tensor) – Flag show that each level have topk candidates or not, shape(batch_size, num_gt, num_priors).
candidate_idxs (Tensor) – Candidates index, shape(batch_size, num_gt, num_gt)
overlaps (Tensor) – Overlaps area, shape(batch_size, num_gt, num_priors).
num_priors (int) – Number of priors.
batch_size (int) – Batch size.
num_gt (int) – Number of ground truth.
- Returns
- Overlap threshold of
per ground truth, shape(batch_size, num_gt, 1).
- candidate_overlaps (Tensor): Candidate overlaps,
shape(batch_size, num_gt, num_priors).
- Return type
overlaps_thr_per_gt (Tensor)
- class mmyolo.models.task_modules.BatchTaskAlignedAssigner(num_classes: int, topk: int = 13, alpha: float = 1.0, beta: float = 6.0, eps: float = 1e-07, use_ciou: bool = False)[source]¶
This code referenced to https://github.com/meituan/YOLOv6/blob/main/yolov6/ assigners/tal_assigner.py. Batch Task aligned assigner base on the paper: TOOD: Task-aligned One-stage Object Detection.. Assign a corresponding gt bboxes or background to a batch of predicted bboxes. Each bbox will be assigned with 0 or a positive integer indicating the ground truth index. - 0: negative sample, no assigned gt - positive integer: positive sample, index (1-based) of assigned gt :param num_classes: number of class :type num_classes: int :param topk: number of bbox selected in each level :type topk: int :param alpha: Hyper-parameters related to alignment_metrics.
Defaults to 1.0
- Parameters
beta (float) – Hyper-parameters related to alignment_metrics. Defaults to 6.
eps (float) – Eps to avoid log(0). Default set to 1e-9
use_ciou (bool) – Whether to use ciou while calculating iou. Defaults to False.
- forward(pred_bboxes: torch.Tensor, pred_scores: torch.Tensor, priors: torch.Tensor, gt_labels: torch.Tensor, gt_bboxes: torch.Tensor, pad_bbox_flag: torch.Tensor) → dict[source]¶
Assign gt to bboxes.
The assignment is done in following steps 1. compute alignment metric between all bbox (bbox of all pyramid
levels) and gt
select top-k bbox as candidates for each gt
limit the positive sample’s center in gt (because the anchor-free detector only can predict positive distance)
- Parameters
pred_bboxes (Tensor) – Predict bboxes, shape(batch_size, num_priors, 4)
pred_scores (Tensor) – Scores of predict bboxes, shape(batch_size, num_priors, num_classes)
priors (Tensor) – Model priors, shape (num_priors, 4)
gt_labels (Tensor) – Ground true labels, shape(batch_size, num_gt, 1)
gt_bboxes (Tensor) – Ground true bboxes, shape(batch_size, num_gt, 4)
pad_bbox_flag (Tensor) – Ground truth bbox mask, 1 means bbox, 0 means no bbox, shape(batch_size, num_gt, 1)
- Returns
- assigned_labels (Tensor): Assigned labels,
shape(batch_size, num_priors)
- assigned_bboxes (Tensor): Assigned boxes,
shape(batch_size, num_priors, 4)
- assigned_scores (Tensor): Assigned scores,
shape(batch_size, num_priors, num_classes)
- fg_mask_pre_prior (Tensor): Force ground truth matching mask,
shape(batch_size, num_priors)
- Return type
assigned_result (dict) Assigned result
- get_box_metrics(pred_bboxes: torch.Tensor, pred_scores: torch.Tensor, gt_labels: torch.Tensor, gt_bboxes: torch.Tensor, batch_size: int, num_gt: int) → Tuple[torch.Tensor, torch.Tensor][source]¶
Compute alignment metric between all bbox and gt.
- Parameters
pred_bboxes (Tensor) – Predict bboxes, shape(batch_size, num_priors, 4)
pred_scores (Tensor) – Scores of predict bbox, shape(batch_size, num_priors, num_classes)
gt_labels (Tensor) – Ground true labels, shape(batch_size, num_gt, 1)
gt_bboxes (Tensor) – Ground true bboxes, shape(batch_size, num_gt, 4)
batch_size (int) – Batch size.
num_gt (int) – Number of ground truth.
- Returns
- Align metric,
shape(batch_size, num_gt, num_priors)
overlaps (Tensor): Overlaps, shape(batch_size, num_gt, num_priors)
- Return type
alignment_metrics (Tensor)
- get_pos_mask(pred_bboxes: torch.Tensor, pred_scores: torch.Tensor, priors: torch.Tensor, gt_labels: torch.Tensor, gt_bboxes: torch.Tensor, pad_bbox_flag: torch.Tensor, batch_size: int, num_gt: int) → Tuple[torch.Tensor, torch.Tensor, torch.Tensor][source]¶
Get possible mask.
- Parameters
pred_bboxes (Tensor) – Predict bboxes, shape(batch_size, num_priors, 4)
pred_scores (Tensor) – Scores of predict bbox, shape(batch_size, num_priors, num_classes)
priors (Tensor) – Model priors, shape (num_priors, 2)
gt_labels (Tensor) – Ground true labels, shape(batch_size, num_gt, 1)
gt_bboxes (Tensor) – Ground true bboxes, shape(batch_size, num_gt, 4)
pad_bbox_flag (Tensor) – Ground truth bbox mask, 1 means bbox, 0 means no bbox, shape(batch_size, num_gt, 1)
batch_size (int) – Batch size.
num_gt (int) – Number of ground truth.
- Returns
- Possible mask,
shape(batch_size, num_gt, num_priors)
- alignment_metrics (Tensor): Alignment metrics,
shape(batch_size, num_gt, num_priors)
- overlaps (Tensor): Overlaps of gt_bboxes and pred_bboxes,
shape(batch_size, num_gt, num_priors)
- Return type
pos_mask (Tensor)
- get_targets(gt_labels: torch.Tensor, gt_bboxes: torch.Tensor, assigned_gt_idxs: torch.Tensor, fg_mask_pre_prior: torch.Tensor, batch_size: int, num_gt: int) → Tuple[torch.Tensor, torch.Tensor, torch.Tensor][source]¶
Get assigner info.
- Parameters
gt_labels (Tensor) – Ground true labels, shape(batch_size, num_gt, 1)
gt_bboxes (Tensor) – Ground true bboxes, shape(batch_size, num_gt, 4)
assigned_gt_idxs (Tensor) – Assigned ground truth indexes, shape(batch_size, num_priors)
fg_mask_pre_prior (Tensor) – Force ground truth matching mask, shape(batch_size, num_priors)
batch_size (int) – Batch size.
num_gt (int) – Number of ground truth.
- Returns
- Assigned labels,
shape(batch_size, num_priors)
- assigned_bboxes (Tensor): Assigned bboxes,
shape(batch_size, num_priors)
- assigned_scores (Tensor): Assigned scores,
shape(batch_size, num_priors)
- Return type
assigned_labels (Tensor)
- select_topk_candidates(alignment_gt_metrics: torch.Tensor, using_largest_topk: bool = True, topk_mask: Optional[torch.Tensor] = None) → torch.Tensor[source]¶
Compute alignment metric between all bbox and gt.
- Parameters
alignment_gt_metrics (Tensor) – Alignment metric of gt candidates, shape(batch_size, num_gt, num_priors)
using_largest_topk (bool) – Controls whether to using largest or smallest elements.
topk_mask (Tensor) – Topk mask, shape(batch_size, num_gt, self.topk)
- Returns
- Topk candidates mask,
shape(batch_size, num_gt, num_priors)
- Return type
Tensor
- class mmyolo.models.task_modules.YOLOXBBoxCoder(use_box_type: bool = False, **kwargs)[source]¶
YOLOX BBox coder.
This decoder decodes pred bboxes (delta_x, delta_x, w, h) to bboxes (tl_x, tl_y, br_x, br_y).
- decode(priors: torch.Tensor, pred_bboxes: torch.Tensor, stride: Union[torch.Tensor, int]) → torch.Tensor[source]¶
Decode regression results (delta_x, delta_x, w, h) to bboxes (tl_x, tl_y, br_x, br_y).
- Parameters
priors (torch.Tensor) – Basic boxes or points, e.g. anchors.
pred_bboxes (torch.Tensor) – Encoded boxes with shape
stride (torch.Tensor | int) – Strides of bboxes.
- Returns
Decoded boxes.
- Return type
torch.Tensor
- class mmyolo.models.task_modules.YOLOv5BBoxCoder(use_box_type: bool = False, **kwargs)[source]¶
YOLOv5 BBox coder.
This decoder decodes pred bboxes (delta_x, delta_x, w, h) to bboxes (tl_x, tl_y, br_x, br_y).
- decode(priors: torch.Tensor, pred_bboxes: torch.Tensor, stride: Union[torch.Tensor, int]) → torch.Tensor[source]¶
Decode regression results (delta_x, delta_x, w, h) to bboxes (tl_x, tl_y, br_x, br_y).
- Parameters
priors (torch.Tensor) – Basic boxes or points, e.g. anchors.
pred_bboxes (torch.Tensor) – Encoded boxes with shape
stride (torch.Tensor | int) – Strides of bboxes.
- Returns
Decoded boxes.
- Return type
torch.Tensor
utils¶
- class mmyolo.models.utils.OutputSaveFunctionWrapper(func: Callable, spec: Optional[Dict])[source]¶
A class that wraps a function and saves its outputs.
This class can be used to decorate a function to save its outputs. It wraps the function with a __call__ method that calls the original function and saves the results in a log attribute. :param func: A function to wrap. :type func: Callable :param spec: A dictionary of global variables to use as the
namespace for the wrapper. If None, the global namespace of the original function is used.
- class mmyolo.models.utils.OutputSaveObjectWrapper(obj: Any)[source]¶
A wrapper class that saves the output of function calls on an object.
- mmyolo.models.utils.gt_instances_preprocess(batch_gt_instances: Union[torch.Tensor, Sequence], batch_size: int) → torch.Tensor[source]¶
Split batch_gt_instances with batch size.
From [all_gt_bboxes, box_dim+2] to [batch_size, number_gt, box_dim+1]. For horizontal box, box_dim=4, for rotated box, box_dim=5
If some shape of single batch smaller than gt bbox len, then using zeros to fill.
- Parameters
batch_gt_instances (Sequence[Tensor]) – Ground truth instances for whole batch, shape [all_gt_bboxes, box_dim+2]
batch_size (int) – Batch size.
- Returns
- batch gt instances data, shape
[batch_size, number_gt, box_dim+1]
- Return type
Tensor