In R-FCN, we still have RPN to obtain region proposals, but unlike R-CNN series, FC layers after ROI pooling are removed. Instead, all major complexity is moved before ROI pooling to generate the score maps.
All region proposals, after ROI pooling, will make use of the same set of score maps to perform average voting, which is a simple calculation. R-FCN is even faster than Faster R-CNN.
SSD- Single Shot MultiBox Detector
The tasks of object localization and classification are done in a single forward pass of the network.
SSD discretizes the output space of bounding boxes into a set of default boxes over different aspect ratios and scales per feature map location. At prediction time, the network generates scores for the presence of each object category in each default box and produces adjustments to the box to better match the object shape. Additionally, the network combines predictions from multiple feature maps with different resolutions to naturally handle objects of various sizes.
MultiBox is the name of a technique for bounding box regression. The network is an object detector that also classifies those detected objects.
SSD attains a better balance between swiftness and precision. SSD runs a convolutional network on input image only one time and computes a feature map
YOLO — You Only Look Once
All of the previous object detection algorithms use regions to localize the object within the image.
You Only Look Once is an object detection algorithm much different from the region based algorithms seen above.
In YOLO a single convolutional network predicts the bounding boxes and the class probabilities for these boxes.
The limitation of YOLO algorithm is that it struggles with small objects within the image, for example, it might have difficulties in detecting a flock of birds.
This is due to the spatial constraints of the algorithm.