Pedestrians Detection and Tracking System

| OpenCV

The pedestrians detection and tracking system in this writing is developed using Python 3.7 as the programming language and OpenCV 3 as a library for image processing and computer vision. The input of this program are videos of pedestrians crossing the road. Videos will be read frame by frame. Detection will be done on the first frame of the video and after that it will be done every 15 frames. This is done because computational costs for detection are more expensive than tracking. The output of the detection stage is in the form of a bounding box for the detected object (in this case the object is a human). The output will then be detected for its unique features. These unique features will be used in the tracking process. The tracing process will be carried out for every frame that is read. After that, the system will determine whether anyone is crossing the road or whether anyone is on the side of the road to cross. The output of this state will be displayed. The whole process will be repeated until the entire frame on the video has been read (the end of the video is reached).

Resources I used:

  1. Python 3.7[1], programming language used for projects (installed with anaconda)
  2. OpenCV 3[2], library for image processing and computer vision (installed with anaconda package installer)
  3. Caffe version of MobileNet-SSD model[3], the model used to detect pedestrians.
  4. imutils[4], a convenient library for some basic image processing functions such as resizing, rotation, translation, etc.

PEDESTRIANS DETECTION

Before the object is detected, the frame that is read will be resized to the predetermined size. The value of the height (H) and width (W) of the frame will be read. Then a mask (binary image) with a height H and a width W is initialized with the value of all elements equal to zero.

The frame that is read will be used as input for the MobileNet-SSD deep learning model. The model will provide an output in the form of list of objects detected in the frame. Then iteration on these list of objects is carried out. The degree of confidence for each object will be seen. "People" objects those are detected with a degree of confidence (confidence value) above 0.2 will be accepted by the system. The value 0.2 is used after several experiments and comparisons. The output of the detection stage, which is the bounding box of each detected human object, will be used as a mask for searching unique features on detected human objects.

In search of unique features, the Shi-Tomasi Corner Detector method[5] is applied to the masked original frame. That way, the pedestrian tracking system will only look for unique features on human objects that have been detected and not the entire area of frame. The system will look for unique features in each bounding box that results from the human detection algorithm. The coordinates of these unique features will be grouped based on the bounding box where the coordinates are located. This is done because of the characteristics of the Lucas-Kanade optical flow method in OpenCV, which only tracks a few pixel points on the object. These characteristics cause the system to not know which object the tracked pixel originates from. By knowing the coordinates of the object bounding box prior to the implementation of Shi-Tomasi corner detection, the system can find out which object bounding box the tracked pixel points come from. The output of this process is a frame that has been marked with unique features, an array of unique features (an array of corners in the flowchart), and an array that stores information about the bounding box of the object to which the unique feature originates.

PEDESTRIANS TRACKING

At the tracking stage, the unique features obtained in the previous process will be tracked using the Lucas-Kanade optical flow method[6]. The unique features will be tracked and updated. The displacement of these unique features will be averaged based on the bounding box group. This value will be considered as the value of the detected object bounding box displacement. When tracking, the system will periodically re-detect objects on the current frame. This is to overcome a situation when a new object appears or an old object has disappeared.

RESOURCES & REFERENCES

  1. Anaconda Inc., n.d. Individual Edition - Your data science toolkit. Available at: anaconda.com [Retrieved: 15 Oct 2020]

  2. Anaconda Inc., 2017. Anaconda Cloud - menpo / packages / opencv3 3.2.0. Available at: anaconda.org [Retrieved: 15 Oct 2020]

  3. chuanqi305, 2018. MobileNet-SSD. Available at: github.com [Retrieved: 23 July 2019]

  4. Anaconda Inc., 2020. Anaconda Cloud - conda-forge / packages / imutils 0.5.3 Available at: anaconda.org [Retrieved: 15 Oct 2020]

  5. OpenCV., n.d. Shi-Tomasi Corner Detector & Good Features to Track. Available at: opencv-python-tutroals.readthedocs.io [Retrieved: 15 Oct 2020]

  6. OpenCV., n.d. Optical Flow. Available at: opencv-python-tutroals.readthedocs.io [Retrieved: 15 Oct 2020]