Close

Soln #6B: Social Distance Monitoring using Monocular images

A project log for Multi-Domain Depth AI Usecases on the Edge

SLAM, ADAS-CAS, Sensor Fusion, Touch-less Attendance, Elderly Assist, Monocular Depth, Gesture & Security Cam with OpenVINO, Math & RPi

Anand UthamanAnand Uthaman 10/25/2021 at 09:580 Comments

I have implemented the below algorithm to solve a Depth AI use case - Social Distance Monitoring - using monocular images. The output of the algorithm is shown at the bottom.


a) Feed the input image frames every couple of seconds

b) Fuse the disparity map with the object detection output, similar to visual sensor fusion. More concretely, find the depth of those pixels inside each object bounding box to compute the median to estimate the object depth.

c) Find the centroid of each object bounding box and map the corresponding depth to the object.

d) Across each object in the image, find the depth difference and also (x, y) axis difference. Multiply depth difference with a scaling factor.

e)Use the Pythagoras theorem to compute the Euclidean distance between each bounding box considering depth difference as one axis. The scaling factor for depth needs to be estimated during the initial camera calibration.

  # Detections contains bounding boxes using object detection model
  boxcount = 0
  depths = []
  bboxMidXs = []
  bboxMidYs = []
  # This is computed to reflect real distance during initial camera calibration
  scalingFactor = 1000 
  # Depth scaling factor is based on one-time cam calibration 
  for detection in detections:

      xmin, ymin, xmax, ymax = detection

      depths.append(np.median(disp[ymin:ymax, xmin:xmax]))
      bboxMidXs.append((xmin+xmax)/2)
      bboxMidYs.append((ymin+ymax)/2)

      size = disp.shape[:2]
      # disp = draw_detections(disp, detection)
      xmin = max(int(detection[0]), 0)
      ymin = max(int(detection[1]), 0)
      xmax = min(int(detection[2]), size[1])
      ymax = min(int(detection[3]), size[0])

      boxcount = boxcount + 1

      cv2.rectangle(disp, (xmin, ymin), (xmax, ymax), (0,255,0), 2)
      cv2.putText(disp, '{} {}'.format('person', boxcount),
               (xmin, ymin - 7), cv2.FONT_HERSHEY_COMPLEX, 0.6, (0,255,0), 1)


  for i in range(len(bboxMidXs)):
      for j in range(i+1, len(bboxMidXs)):
          dist = np.square(bboxMidXs[i] - bboxMidXs[j]) + 
                           np.square((depths[i]-depths[j])*scalingFactor)

          # check whether less than 200 to detect
          # social distance violations
          if np.sqrt(dist) < 200:
              color = (0, 0, 255)
              thickness = 3
          else:
              color = (0, 255, 0)
              thickness = 1

          cv2.line(original_img, (int(bboxMidXs[i]), int(bboxMidYs[i])), 
                (int(bboxMidXs[j]), int(bboxMidYs[j])), color, thickness)

Input Image:

social_distance5.png
                                                                   Out of 4 people in the image, two are very close

Disparity Map - Object Detection Fusion:

disp.png
                                                 Only the depth info inside bounding boxes are considered for estimation

Output Image:

depthImg.png
                                                           People who are near are marked in red, others in green

The people who don't adhere to minimum threshold distance are identified to be violating the social distancing norm, using a monocular camera image.

Discussions