this problem can be almost perfectly solved by dbscan clustering algorithm. Below, I provide the implementation and result image. Gray blob means outlier or noise according to dbscan. I simply used boxes as input data. Initially, box centers were used for distance function. However for boxes, it is insufficient to correctly characterize distance. So, the current distance function use the minimum distance of all 8 corners of two boxes.
#include "opencv2/opencv.hpp"
using namespace cv;
#include