In this augmented reality project, the goal is to capture a video and seamlessly integrate a synthetic object into the scene. The process involves using 2D points from the video, which correspond to known 3D coordinates, to calibrate the camera for each frame. With the calibrated camera, a cube's 3D coordinates are projected onto the video frames using the camera projection matrix. When the calibration is accurate, the cube will consistently appear as part of the scene across all frames of the video.
We started off by capturing a video of a white box with drawn grid marks to keep track of 34 coordinates on a 4x4x1 cube. The original video is shown below:
We then drew 2D correspondence points on each point on the grid and mapped them to our 3D coordinate plane for the initial keyframe of our video. This is the 3D axis we used in our 3D world:
To track the coordinates across frames we used off-the-shelf
trackers available in OpenCV. The tracker we used was
cv2.TrackerMedianFlow_create()
. For each point in the
2D grid, we initialize an individual tracker and then track it
across the video frames. During the tracking process, we check
whether the tracked point stays within a predefined threshold. If it
does, we add it to the tracked dictionary. The visualization of the
tracked points is shown below.
Once I had the marked 2D points and their cooresponding 3D coordinates, we could compute the camera projection matrix P. Because our system was overdefined, we want to use least squares to find a matrix that transforms the 3D points into 2D points (which were 4D and 3D respctively as homogenous coordinates). We ended up using SVD to solve as we wrote our equations such that P*A = 0. I did this for each frame in the video, so I had a different P for each frame.
I defined a rectangular prism by creating a matrix of vertices. Then, for each frame in the video, I used the respective camera projection matrix and transformed the 3D box points into 2D video points. I then used the draw
function to draw lines connecting these vertices onto the frame. Once this was done for each frame, I combined the frames into the final video, shown below.