Project 4

CS180 Project 4: Image Warping and Mosaicing:

The goal of this project is to combine images into mosaics through perspective warping and matching keypoint locations.

Part 1:

In part 1, we capture a set of images and combine them into a mosaic by computing homographies with manually labeled keypoints.

Step 1: Shoot the pictures:

To create an image mosaic the first thing that we need is images. For a perspective warp to work properly and actually show what one image would look one the same perspective plane as the other, the images must all be taken with the same center of projection. Otherwise, the effects of parallax combined with the 3d geometry of the scene makes it so each image in the scene observes fundamentally different things.

I went to Lawrence Hall of Science at night to capture the set of images I will be using for the mosaic. Here are the two images, I will use for the part 1 mosaic.

im1 im2

Step 2: Recover homographies:

To perform a perspective warp, we must first extract the appropriate homography matrix. The homography matrix takes a set of points and transforms them to the locations of another set of points. It also satisfies the equation:

$s_i \begin{bmatrix}x_i' \\ y_i' \\ 1 \end{bmatrix} = H \begin{bmatrix}x_i \\ y_i \\ 1\end{bmatrix}$

Where $s_i$ is a scaling factor for a corresponding point pair, $(x_i, y_i)$ and $(x_i', y_i')$ are a corresponding point pair, and $H$ is a matrix of the form:

$H = \begin{bmatrix} h_{11} & h_{12} & h_{13} \\ h_{21} & h_{22} & h_{23} \\ h_{31} & h_{32} & 1\end{bmatrix}$

Given a set of corresponding points, we can form a system of linear equations, and solve for the 8 unknowns of the H matrix. We need at least 4 point pairs solve for H. To define the corresponding points, I wrote a script with ginput:

im0 im match

Step 3: Image Warping:

Given a homography matrix we implement a function to warp an image. We can find what position on the output image a given point on the input image corresponds to be multiplying each point $(x_i, y_i, 1)$ by H to get output point $(s_i * x_i', s_i * y_i', s_i)$ and dividing by $s_i$ . We use scipy.interpolate.griddata to interpolate between known values.

Step 4: Rectification:

We can demonstrate our homography warping computations work we can “rectify” an image, that is take a rectangular object that is imaged off axis, and warp it to a rectangle. I do this by mapping a the corners to a new set of points formed from the row and column averages of each side of the original rectangle. The new points are created in a rectangle. I do so with an image of a book:

book book rect

As a note, the deviation from being rectangular in the top right corner comes from a physical bend in the book and not a flaw in the algorithm.

Step 5: Image Mosaicing:

If we define point correspondences between many taken from the same center of projection, and sequentially warp them to match, and blend the images together, we can form an image mosaic. The images are combined with a mask formed by applying a distance filter on the overlapping regions of the final warped images. My final image mosaic is shown below:

mosaic

Part 2:

In part 2, we automate image mosaicing by finding matching features, and computing a homography through RANSAC.

Step 1: Harris Interest Point Detection:

To find keypoints, we use the Harris Interest Point Detection algorithm. We plot the outputs of keypoint detection below:

corners

On the left is the resulting keypoints plotted on our input image. The right contains the Harris response at each pixel of our input image.

Step 2: Adaptive Non-Maximal Suppression:

By itself, Harris corner detection outputs a very large number of points, where many are not particularly strong. Therefore, we would like to select a subset of strong points for matching. However, if we were to simply take the points with the highest Harris responses, we would extract very unevenly distributed keypoints. Since this would ultimately hurt the stability of perspective warps used in our final mosaicing, we must enforce some condition of even distribution.

To do this, we implement Adaptive Non-Maximal Suppression (ANMS). ANMS enforces some level of homogeneity in distribution by only retaining the point of maximum Harris response in its neighborhood radius. In practice, we find such a radius that produces a desired number of extracted corners.

We plot the result of ANMS below given desired number of corners.

ANMS

Step 3: Feature Descriptors and Matching:

Now that we can find candidate key points through ANMS, we can find matching corners between images. To do this, we must first extract descriptors of our interest points. We form descriptors by sampling the 40x40 window around each keypoint into 8x8 patches. These patches are then flattened into vectors and normalized such that each descriptor has mean 0 and standard deviation 1.

Given these feature descriptors we can match by finding point pairs where the difference between the feature descriptors has the minimum l2 norm. While this gives the best matching for each keypoint, not every keypoint necessarily exists in each image, and not every extracted matching is correct. We therefore reject matches through Lowe thresholding, that is we only accept matches where the distance between the descriptors for the best match divided by that of the second best match is below a certain threshold.

This yields fairly good results:

matching

The points that share color between the two images are considered a match.

Step 4: RANSAC Homographies and Resulting Mosaics:

Given a set of keypoint pairs between images, we can compute a homography to join points together. To compute the homography we use four point RANSAC:

Select four random point pairs and compute homography matrix.
Apply homography matrix to the rest of the pairs. Pairs with resulting distances below a certain threshold are considered inliers.
Count the number of inliers, and keep track of the largest set of inliers.
Repeat steps 1-3 for n iterations.
Recompute homography matrix using the largest set of inlier pairs.

By computing the final homography only considering the largest set of inliers, we build in resistance to the inclusion of any falsely identified matching keypoint pair.

We can use the resulting homography matrix to join images in the same manner as in part 1. We produce the following mosaics from pairs of images:

mosaic 0

mosaic 1

mosaic 2

What Have I Learned:

I learned how Lowe thresholding can be used to reject incorrect keypoint pairs. Directly thresholding the distance between descriptors does not work because the ideal threshold would be different for different images. However if you compare the best match with the second best match, you can derive a more resilient thresholding method. Lowe thresholding works by the assumption that, only for a real match, the best keypoint pair is significantly better than the second best.