Project 2: Fun with Filters and Frequencies!

Part 1: Fun with Filters

In this part, we will build intuitions about 2D convolutions and filtering.

1.1: Finite Difference Operator

Finite Difference Operators a very common way to calculate the derivatives of images, which is very important information when dealing with image processing. While there are many different types of operators, we are going to be using the following:

D_x = [1, -1]\qquad D_y = [1, -1]^T

As an example, we will be using a grayscale image of a camera man taking a picture.

Figure 1.1: Original grayscale camera man image

The goal of this exercise is to extract the edges of the images, similar to an edge detection filter. To accomplish this, we need to break our process into a few steps.

First, we start by getting the partial derivatives of the image using the finite difference operators defined above. This involves utilizing the scipy.signal.convolve2d function, which allows us to pass in two 2D arrays, convolving the first one with the second. In our case, the image will be our first argument and we will need to perform two separate convolutions, one for $D_x$ and another for $D_y$ . The results are the following:

Figure 1.2: The partial x derivative of the camera man image.

Figure 1.3: The partial y derivative of the camera man image.

In Figure 2 and 3, we can see the two image derivatives highlight two slightly different sets of edges, where the partial x derivative shows us the edges along the horizontal grain of the image (left-to-right) and the partial y derivative show us edges along the orthogonal grain (top-to-bottom).

Next, we need to calculate the the gradient magnitude image, which shows us the total set of edges from both of the partial derivatives. We calculate the gradient magnitude image by computing an L2-norm of the gradient, which is made up of the individual partial derivatives.

Figure 1.4: The gradient magnitude image for the partial derivatives in Figure 1.2 and 1.3

While this is a good edge detection algorithm so far, we need to remove the unnecessary noise that we see in this image, making our more significant edges more prominent. To do this, we can normalize our image first and then binarize using THRESHOLD = 0.0057.

Figure 1.5: The binarized gradient magnitude image with a THRESHOLD = 0.0057

From the image, we can tell that a lot of the edges are a little coarse and sparse since there was a need to balance the amount of noise we see while having the edges visible. In other words, our result is still noisy. However, there is workaround for this problem using Gaussian Filters.

1.2: Derivative of Gaussian (DoG) Filter

A Gaussian Filter is a commonly-used linear smoothing filter that blurs images and reduces noise. In this case, we construct a Gaussian Filter using cv2.getGaussianKernel , which returns a 1D gaussian distribution defined by the aperture size,ksize, and standard deviation, sigma. We can compute a 2D Gaussian distribution by taking the outer product of a 1D Gaussian with itself.

Now that we have defined our gaussian filter, we can smooth out of camera man image by convolving the original image with our newly create 2D Gaussian filter.

Figure 1.6: Gaussian blurred camera man image

After blurring the image, we can follow the same steps as before to see what our partial derivative images and computed gradient magnitude image look like.

Figure 1.7: The partial x derivative of the blurred camera man image.

Figure 1.8: The partial y derivative of the blurred camera man image.

Figure 1.9: The gradient magnitude image for the partial derivatives in Figure 1.7 and 1.8

From the above images, we can quickly tell that the edges are much more apparent in both the individual partial derivatives and the gradient magnitude image computed from the previous two. While our first algorithm captured many more finer details, this also meant we captured extra noise that we didn’t want. In this second iteration, we smooth out those finer details, making the edge detection much more clear. The final binarized and thresholded image looks like:

Figure 1.10: The binarized gradient magnitude image with a THRESHOLD = 0.0035

The threshold in this case was a little over half of the previous method, showing us the difference in pixel values with blurring. During this process, we needed to perform two different convolutions, one with the gaussian filter and then one with each of the finite difference operators. However, what if we first find the derivative of the gaussian (DoG) filter and then apply that to the original image directly.

The DoG filter is obtained by convolving the constructed 2D gaussian filter with each of the finite difference operators we defined earlier. Through these convolutions, we get the following visualizations showing the derivative of the 2D gaussian distribution.

Figure 1.11: The partial derivative of the Gaussian Filter in the x-direction

Figure 1.12: The partial derivative of the Gaussian Filter in the y-direction

With these plots, we are able to grasp the shape of the 2D gaussian filter, including the parameters that define the distribution. Convolving our original image with these two derivatives, we get the same result as our regular 2D Gaussian Filter as expected.

Figure 1.13: The gradient magnitude image for the partial derivatives calculated using DoG

Figure 1.14: The binarized gradient magnitude image calculated using DoG with a THRESHOLD = 0.0035

Part 2: Fun with Frequencies!

2.1: Image "Sharpening"

In this section, we will be performing image sharpening, an image processing technique to amplify the high frequencies in an image. A common filter used for this operation is the unsharp mask filter, which we will define through this section’s process. As an example, we will be using the following image of the Taj Mahal for our exploration.

Figure 2.1.1: Original image of the Taj Mahal

To perform image sharpening, we need to extract the higher frequencies from the image. We have already defined a Gaussian Filter that lets us filter out the higher frequencies, leaving only lower frequencies. To get the high frequencies, we can then just logically subtract our gaussian blurred image from the original Taj Mahal image.

Figure 2.1.2: The Gaussian blurred image of the Taj Mahal with ksize=8 and sigma=1

Figure 2.1.3: The high frequencies of the Taj Mahal

While it is really faint, after subtracting the gaussian blurred image from the original, we get the image in Figure 2.2. Here we can see the outline of the Taj Mahal along with some other prominent details in the image. These are the features we want to amplify to sharpen the image. Our formula for doing so is $sharpened\_ image = original\_ image + \alpha\ * hf\_image$ .

Let’s look at the sharpening for different values of $\alpha$ .

Figure 2.1.4: Sharpened image for alpha=1

Figure 2.1.5: Sharpened image for alpha=3

Figure 2.1.5: Sharpened image for alpha=5

Figure 2.1.6: Sharpened image for alpha=10

Figure 2.1.7: Sharpened image for alpha=20

Out of the alpha values chosen for the different visualizations, alpha = 3 turned out to be the best sharpened image since it didn’t over-contrast the features while still yielding better results than the original.

Figure 2.1.8: The left image is the original image of the Taj Mahal and the right is the sharpened version for alpha = 3. We can see more definition in the overall structure of the building and the surroundings.

This process requires multiple steps to get to the end result of a sharpened image, but what if we can combine all of it into one filter that we convolve over. The name of this filter is the unsharp mask filter. Writing out the steps from above, we can rearrange the terms to get the definition of the unsharp mask filter. Applying a convolution over the image with this filter yields the extract results from before.

Figure 2.1.9: Derivation for the unsharp mask filter

Now, let’s try applying this filter to other images. To start, let’s do a similar analysis for a portrait of Lebron James, one of my favorite basketball players.

Figure 2.1.10: Sharpened using alpha = 1

Figure 2.1.11: Sharpened using alpha = 3

Figure 2.1.12: Sharpened using alpha = 5

Figure 2.1.13: Sharpened using alpha = 10

Figure 2.1.14: Sharpened using alpha = 20

Here, we see very similar results, where the sharpened images for alpha = 3 and 5 turn out the best, meaning we would generally see the best results for those values. This portrait of lebron is starkly different from the Taj Mahal photo, both in the background and the subjects of the image, so even a sample size of 2 can give us a good idea at the potential of the unsharp mask filter.

To evaluate our filter’s capabilities, let’s take a sharpened image, blur it using a gaussian filter, and then re-sharpen it back to the original state. First, similar to before, we can look at the sharpening abilities of the filter.

Figure 2.1.15: Here we have the progression of the London building going from the original image (leftmost) to a blurred version (middle) to the final sharpened version (rightmost) using an alpha = 1.

Even with an alpha value of 1, we can see a stark difference in the cleanliness and crispness of the building. In particular, we can see the reflection of the rest of London much more clearly in the windows of the building, which I thought was super cool! Below, we take the result of the sharpening, and put it through the process once again.

Figure 2.1.16: Here we have the progression of the London building going from the sharpened image (leftmost) to a blurred version (middle) to the final re-sharpened version (rightmost) using an alpha = 2.

Here, we can see that the sharpening is able to reach the same result from before even after blurring, showing the robustness of this filter. One point to note was that the results were obtained with an alpha = 2 compared to alpha = 1 in the first case. Throughout both the Lebron James and London building explorations, we used a 2D Gaussian Filter with ksize = 30 and sigma = 5 to follow the ksize = 6 * sigma rule.

2.2: Hybrid Images

In this section, we will be tackling Hybrid Images, an image that contains information for two different images at completely different frequencies. In other words, the image will look very different from close and further away.

Our example for this section with be Derek and the cat, Nutmeg.

To create a hybrid image, we need to take the low frequencies of one image and average it with the high frequencies of the other image. This way both images will be captured in one image!

Figure 2.2.3: Hybrid Image of Nutmeg and Derek

Bells and Whistles: Let’s look at how color can effect the hybrid image.

Figure 2.2.4: Hybrid Image where both images used have all RGB information

Figure 2.2.5: Hybrid Image where only the high frequency image has RGB information

Figure 2.2.6: Hybrid Image where only the low frequency image has RGB information

When the low frequency image (Derek) has color, the image becomes more clear at a distance, overshadowing the high frequency image even more than in the grayscale case. Other that that minor difference, there isn’t a major difference in how color impacts the overall effect of the hybrid image. The color in the high frequency image has little to no effect from what I can tell since the image is so faint to start off with, making the color an insignificant factor.

Looking at the frequencies involved in this process, we can see some cool observations.

Figure 2.2.7: The frequencies for the images use throughout making the hybrid for Derek and Nutmeg

We can see a clear difference between the frequencies between the original images and that of the specific aligned frequency images. For the high frequency image, most of the frequencies were kept; however, the sparsity in the plot is apparent compared to the original. The low frequency plot looks very different from the original as most of the frequencies seem to have been filtered out, meaning our low pass filter is working well. The final hybrid is an average of both, which isn’t as apparent to see, but the structure of the high frequency seems to be relatively clear.

Let’s look at other hybrid image pairs!

One of my favorite sports is basketball and as a GSW fan, I needed to create a hybrid image for Stephen Curry. Recently, another 3pt shooter has entered the scene, from the WNBA side, challenging Steph’s spot, Caitlin Clark. For this reason, I wanted to create a hybrid image with the two.

Figure 2.2.7: Original image of Stephen Curry

Figure 2.2.8: Original Image of Caitlin Clark

Figure 2.2.9: Hybrid Image of Stephen Curry (sigma = 10, ksize = 60, intensity=2) and Caitlin Clark (sigma = 15, ksize = 90)

Let’s look at a similar frequency analysis for this image like that of the example case. We see a very similar if not exactly the same trend as the previous example.

Another image pair I tried was my advising Professor Goldberg with another Professor in BAIR, Sergey Levine.

Another pair I tried was to show the transformation of Biden over time; however, there aren’t too many RGB photos of Biden that were headshots, so the alignment of the present-day RGB photo with the old grayscale photo resulted in a bad hybrid image. To resolve this issue, there are image processing techniques that can be used to hopefully scale one image properly, so the relative sizes when overlayed stay relatively the same. Overall, just improving the alignment algorithm to make sure the hybrid image turns out better.

2.3: Gaussian and Laplacian Stacks

The remainder of this project will be focusing on the blending of two images. To start on that task, we need to do some setup to enable blending in the next section. The setup we are talking about here is Gaussian and Lapcian Stacks. As our example, we will be trying to blend an apple and an orange:

A Gaussian Stack is a collection of consecutive Gaussian Blurs applied to the same image. Generally, this process is done in parallel with downsizing the image size to give us a Gaussian Pyramid. However, we can get around this by manipulating our blending mask later in the process. In other words, our Gaussian stack will be images of the same size, successively getting more and more blurry.

A Laplacian Stack is a collection of differential images containing differencing information between images in the Gaussian Stack. As a result, we can define one image in our Laplacian Stack, $LS_i$ , to be $LS_i\ = GS_{i} - GS_{i+1}$ , where $GS_i$ is the ith image in the Gaussian Stack. The edge case in this scenario is the last element of the Laplacian Stack which we would just set to be the last image in the Gaussian Stack. The resulting Gaussian and Laplacian Stacks for the apple and orange are below.

Figure 2.3.3: The Gaussian and Laplacian stacks for both the apple and orange (Row 1: GS for apple, Row 2: LS for apple, Row 3: GS for orange, Row 4: LS for orange). The Gaussian Stacks were created using ksize = 30 and sigma = 5.

The first image in each of these stacks is the original image (either apple or orange), but each following image is where the formula for the Gaussian and Laplacian stacks is applied.

2.4: Multiresolution Blending (a.k.a. the Oraple!)

Now that we have the Gaussian and Laplacian Stacks defined for each of the images, we can actually blend the two images with an external blending mask. For the apple and orange case, we want to blend the two images down the middle, so we can create a binary mask split down the middle with 1s on the left for the apple and 0s on the right for the orange.

Earlier, we mentioned that we wanted to maintain simplicity with our Gaussian and Laplacian stacks, which meant that we lost some of the blending effects that come with downsizing images. Instead, we can create another Gaussian stack, this time for the binary mask we are using. The following Gaussian Stack was created using ksize = 120 and sigma = 20 to again follow the ksize = 6 * sigma rule.

Figure 2.4.1: Gaussian Stack for the horizontal mask using ksize = 120 and sigma = 20

With this Gaussian Stack, we can define a rule for each pixel in our output based on the pixels in our Laplacian stacks and mask. In other words, the ij-th pixel in our output image, $O$ , at each level can be defined as:

O_{ij}\ =\ GS_{mask,ij} * LS_{A,ij}\ +\ (1 - GS_{mask,ij}) * LS_{B,ij}

where $GS_{mask,ij}$ is the pixel located at (i,j) in the mask image from the Gaussian Stack and $LS_{A,ij}$ and $LS_{B,ij}$ are the pixels located at (i,j) in the respective images from each Laplacian Stack for the images that need to be blended. In our case, $LS_{A,ij}$ is the Laplacian Stack for the apple and $LS_{B,ij}$ is the Laplacian Stack for the orange. Using this formula across all three channels for each level gives us the following result.

Figure 2.4.2: The level-by-level blended stack of the apple and orange combined with a horizontal mask

This figure is a recreation of the Figure 3.42 in Szelski (Ed 2) page 167 with more information between the levels that weren’t shown for a full picture. In the end, the final blended picture we get is the following:

Figure 2.4.3: Final blended image for the oraple!

Now that we have an output for the oraple, let’s apply this process to other images! The first example I wanted to explore was a vertical blending compared to the horizontal blending we did with the oraple. For this task, I chose a rocket ship and fish as I thought the general shapes of the two were similar while almost nothing to do with each other.

Figure 2.4.4: Original image of rocket ship

As we did before, let’s start by creating the Gaussian and Laplacian Stacks for each.

Figure 2.4.7: Gaussian and Laplacian Stacks for both the rocket and fish with ksize = 18 and sigma = 3. (Row 1: GS for rocket, Row 2: GS for fish, Row 3: LS for rocket, Row 4: LS for fish)

Next, let’s create our mask for blending, this time for a vertical blend, and its corresponding Gaussian Stack.

Figure 2.4.8: The Gaussian Stack of the vertical mask using ksize = 18 and sigma = 3

Lastly, let’s blend the images together!

Figure 2.4.9: The level by level blended image stack of both the rocket and fish using the vertical mask

The final blended image look like:

While there is some blending with the images, the output is not as ideal as expected given the results of the apple orange blend. I tinkered with the ksize and sigma values for each of the gaussian stacks but the results stayed relatively the same, so it seems that the blending wasn’t ideal for other reasons. One reason could be the stark differences in background and noise for each individual image.

Let’s do one last blending that takes a look at an irregular mask. This time I wanted to combine two landscapes together, so I chose to combine a city skyline with a image of scattered space dust.

Figure 2.4.11: Original city skyline image

Like before, let’s start by creating the Gaussian and Laplacian Stacks for each.

Figure 2.4.13: Gaussian and Laplacian Stacks for both the city skyline and space with ksize = 36 and sigma = 6. (Row 1: GS for space, Row 2: GS for city skyline, Row 3: LS for space, Row 4: LS for city skyline)

Next, let’s create our mask for blending, this time for an irregular mask, and its corresponding Gaussian Stack. I created the mask using photoshop, and then followed the same image processing steps afterward.

Figure 2.4.14: Gaussian Stack of the irregular skyline mask using ksize = 36 and sigma = 6