Augmented Reality

Augmented Reality 👾

In this augmented reality project, the goal is to capture a video and seamlessly integrate a synthetic object into the scene. The process involves using 2D points from the video, which correspond to known 3D coordinates, to calibrate the camera for each frame. With the calibrated camera, a cube's 3D coordinates are projected onto the video frames using the camera projection matrix. When the calibration is accurate, the cube will consistently appear as part of the scene across all frames of the video.

Setup

We started off by capturing a video of a white box with drawn grid marks to keep track of 34 coordinates on a 4x4x1 cube. The original video is shown below:

Keypoints with Known 3D World Coordinates

We then drew 2D correspondence points on each point on the grid and mapped them to our 3D coordinate plane for the initial keyframe of our video. This is the 3D axis we used in our 3D world:

Propagating Keypoints to Other Images in the Video

To track the coordinates across frames we used off-the-shelf trackers available in OpenCV. The tracker we used was cv2.TrackerMedianFlow_create(). For each point in the 2D grid, we initialize an individual tracker and then track it across the video frames. During the tracking process, we check whether the tracked point stays within a predefined threshold. If it does, we add it to the tracked dictionary. The visualization of the tracked points is shown below.

Calibrating the Camera

Once I had the marked 2D points and their cooresponding 3D coordinates, we could compute the camera projection matrix P. Because our system was overdefined, we want to use least squares to find a matrix that transforms the 3D points into 2D points (which were 4D and 3D respctively as homogenous coordinates). We ended up using SVD to solve as we wrote our equations such that P*A = 0. I did this for each frame in the video, so I had a different P for each frame.

Projecting a Cube in the Scene

I defined a rectangular prism by creating a matrix of vertices. Then, for each frame in the video, I used the respective camera projection matrix and transformed the 3D box points into 2D video points. I then used the draw function to draw lines connecting these vertices onto the frame. Once this was done for each frame, I combined the frames into the final video, shown below.