Project 3: Face Morphing and Modelling a Photo Collection

Background

In this project, we will be going through the process of morphing faces together, whether that an animation going between two friends or a caricature of your face. For morphing, we need to account for the two aspects of actually changing one image into the format of another: the shape/features of the objects and the colors between the two images. By definition, a morph is a simultaneous warp and cross-dissolve of our two or more images.

Let’s start by warping our images. The purpose of the warp is ensure the alignment of our objects in each of the images. Otherwise, when we cross-dissolve, we would end up with misalignment which would make our morph looking messy, and more generally not like a morph. For the morphing process, we will use the following example between me and the popular hip-hop artist, Kanye West.

Let’s jump into the morphing process!

Part 1: Defining Correspondences

The first step of warping is defining correspondence points or landmarks that we see in both images. To reach a common shape for our objects, we would need to match the correspondences in one image to the other. To define these correspondences, I utilized the provided labeling tool to get correspondences like the following.

I the figure we can see the original correspondence points (orange) defined for each of the images, where there is a 1-to-1 correspondence between the two images. We also see another set of blue points on each of the images. The set of blue points is the target shape/set of correspondence points that we want to warp each of the images to. How do we determine the target set of points?

Part 2: Computing the Midway Face

Before trying to create a morph animation, let’s look at the specific case of the midway face between our two images. In the previous part, we introduced a target set of correspondence points, and here we can create a more formal definition. To reduce the amount of warping we perform, we would ideally want these targets to be somewhere between our two images. In the special case of a midway face, we can define the target set to be the average of our two sets of correspondences.

After, we have our target set, we need to define our warping algorithm. The choice of algorithm here will be inverse warping since we avoid needing to splat our result and instead just interpolating values. However, to perform a warp, we need a transformation matrix that we apply to each of the pixels in the source image to get the destination image.

We can simplify this process by creating a mesh representing the shape of the object. The method of choice is Delaunay Triangulation between the sets of correspondence points we found earlier. Since we have a 1-to-1 correspondence between all three sets (image 1, image 2, and the target set), we can perform Delaunay Triangulation on the target set to get triangles for each of the sets.

The figure above shows the triangulation for each of the sets we saw overlayed on the original two images. The individual original image meshes (orange) are the derived triangulations using the simplices from the target set triangulation (blue). With this step complete, we have the average shape of the two faces.

The next step is to calculate and perform the affine transform to get each image in shape of target set of correspondences. One reason for doing the triangulation was to simplify this part of the process since we can use the triangles to define specific transforms for specific sections of the image. For each triangle in the original image and the corresponding triangle in the target, we can create homogenous point matrices containing the vertices of the triangles. The transformation from the target points to the original image points is what we trying to find to perform an inverse warping. Another way of looking at this process is writing out each of the six individual equations that define the system to find the six coefficients we need to fill out the affine transform matrix. This is method I used to calculate a transformation for each triangle in both of the images to the corresponding triangle in the target image.

After finding our affine transformations, we can now apply each of the transformations to the pixels of the original image to warp it into the shape of our target. For any given triangle, we can collect all of the points that are contained within the triangle using the skimage.draw.polygon function. We can reformat all of the points into homogenous coordinates and apply the inverse transformations we found earlier. However, since we are generalizing a transformation on the triangle to all of the pixels in the triangle, we will have a lot of transformations that don’t end up as integers. To address these pixel values, we will interpolate the pixel values using a nearest neighbors approach in the scipy.interpolate.griddata function. With this interpolation, our inverse warping algoirthm is complete for one triangle and can be appleid for all triangles in both images.

The figure above shows the warped images with respect to the mid-points we calculated earlier. With the warp done, we can apply a cross-dissolve function where our factor of dissolution is 0.5.

The is our final midway face for the example used above. One point to notice is how the hair of the people in the two images was very different, causing the differences to come through in the mid-way image. In the next example with Stephen Curry and Kobe Bryant, we see this issue disappear since the faces share more structural similarities. Let’s look at the step by step process for that case!

Let’s look at the correspondence points and the respective triangulations for each image and the target.

Let’s look at the result of the individual warping processes and then the combined mid-way image.

Here, we can see that the face has a much better morph for the midway image due to more structual similarities. We do see an issue around the clothing, but that is because there was a lack of any correspondence points defined on the actual clothing. Now, let’s try creating a sequence of frames to create the morph animation!

Part 3: Morph Sequence

For the morph sequence, we can define the a morph function to perform the warp and cross-dissolve operation we defined in the previous part. However, to create an animation, we need to be able to create frames for the morph at different points between the two images instead of only defining the mid-way face. To do this, we will be passing in warp_frac and dissolve_frac as parameters to the function. The warp_frac will be used as the point between the two image shapes that we want to warp to as our target. In the mid-way face case, our warp_frac was 0.5. The equation to define the target point set is now:

target = warp\_frac\ * pts_{img\_a} + (1 - warp\_frac)\ * pts_{img\_b}

Similarly, we define our new cross-dissolve equation as well.

target\_rgb = dissolve\_frac\ * rgb_{img\_a} + (1 - dissolve\_frac)\ * rgb_{img\_b}

Looping over the range between 0 and 1, we can get a series of frames to show to the progression of the morph. The morph animation between my face and Kanye West is located at the following link: https://drive.google.com/file/d/13H-Osrf6H39NClcGPT-IuaPy6G0qcvTp/view?usp=drive_link. The morph animation between Stephen Curry and Kobe Bryant is at the following link: https://drive.google.com/file/d/12iUOzmrAK6RKOYpjhef1IRofA6WRccHi/view?usp=drive_link.

There was an issue with adding it to the website along the remaining GIFs and videos that follow on this page.

Part 4: Population Mean Face

To create a caricature in the following part, we need to get a average face of a population or subsample of a population. In this part, we will be deriving the average face in the Danes dataset. Each image in the Danes dataset is a headshot of a Danish person standing in front of a green background with corresponding key points similar to those we defined for Part 1 of this project.

To compute the average face, we need to warp all of the faces in our dataset to align with some average shape specific to that population. In our case, calculating the average shape of a face in the dataset is as simple as averaging across all of the key points that we want to include in our population. With this calculation, we can run a similar Delaunay triangulation on the target shape to get the following comparison.

We can notice that th images are much wider, meaning more of our triangles seem to be longer in a sense. Using our predefined functions from before, we can warp all of the images in our dataset to the corresponding target shape we calculated. Below are a few examples showing the warping.

With our warping applied to each image, we can take an average across all of the images to get our average Dane!

With this population’s average face, we can do quite a few cool things. One output in particular is taking our image from before and see how it looks after being transformed to the average Dane face and vice versa.

On the left, we see my face warped to match the shape of the average Dane and the right showing the average Dane face warped to my face. Due to the differences in sizes of images and other structural differences in the images, we do see some error in the warping, but we can clearly see a caricature-type sytle in these resulting images.

Part 5: Caricature

Taking this operation that we performed at the end of the last part to the next level, we can create an caricature of our face. A caricature is an exaggerated photo of someone or something, where certain features are portrayed extremely. We can use the average Dane face and extract the “essence” of the average by subtracting the transformed image of my face from above from my original image. This is essentially extracting the warping we are performing in the form a matrix which we can scale and apply. Performing this process with a SCALE = 1 (without normalization), gives us a caricature like the following.

Let’s do some other cool things with this technique!

Part 6: Bells & Whistles

My first Bells & Whistle is changing the gender and race of my friend, Kailash. The average population face which I chose is for Southasian Women.

There were three main breakdowns here. First is changing only the shape of my friend’s face to the shape of the average face in the population. This involves warping my friend’s face to the face shape defined for the average face of the population. Second is changing only the appearance of my friend’s face by warping the average face of the population to my friend’s face and then applying a cross-dissolve. The final image is the mid-way image for the two images above!

The morph sequence for the images above can be found at: https://drive.google.com/file/d/1fQe3q-Ti7li1OSpv4W6nDhYvUHu5-Izv/view?usp=drive_link.

Another exploration I did was changing my own image’s race using the Danish average face I found in Part 4.

For my last Bells and Whistles, I created a morph sequence for a few of my friends including Kailash above. The friend’s names are Advay, Anirudh, and Sasvath in the order they are morphed after Kailash. To create this sequence, I applied three different morphs between images and then combined the sequences together to get the final sequence. The GIF can be found at this link: https://drive.google.com/file/d/1Xh95lsMzIZomZrvmUXIA2vnkqCmmAQ51/view?usp=drive_link. The video can be found at this link: https://drive.google.com/file/d/12enmcHTR6QeUXpmIFdIAU7bIt0Ir37ti/view?usp=drive_link.

After creating a morph sequence, I was curious about the space of eigenfaces since I have seen similar work in the past, but never understood what it meant or from where it was derived. Using the Danes dataset, I was able to derive the eigenfaces of the collection, using the first 10 principle components as my dimensionality reduction. The eigenfaces are displayed below.

Here, the darker regions indicate higher variance values, meaning those portions of the image capture the most information for the most images in the dataset. We can see a lot of differences between each of the eigenfaces, showing us how the first 10 principle components capture a significant part of the variance in our dataset.

Trying to project our original dataset to this new eigenbasis was a major issue for me. While trying to convert back to rgb from the grayscale dataset needed for PCA, I hit issues with the scale of values causing some of the images to be inconsistent. Here is a small sample of images with both images where reconstruction worked and some where it didn’t (This is the first 5 images in the dataset).

I tried a caricature with the first image in the dataset and was able to get very comparable results for both the projected and original images. The top comparison is at SCALE = 1 and gets larger as you go down (SCALE = 1.2 and then SCALE = 1.5).

This essentially means we captured the necessary variance in the eigenbasis to be able to construct a very similar transformation in both spaces. While we lose information and introduce noise/artifacts into our images, the images in the eigenspace are still comparable to the results we see with the original images. However, we can see that the two transformations start to deviate from each other as we increase our SCALE parameter.

Overall, I really loved looking at the different aspects of morphing, even thought the warping section took much longer for me compared to the others. Trying out different bells and whistles was definitely my favorite part, especially with the morph sequence. Thank you for taking a look at my project! If any of the links don’t work, please use the following link to gain access to all of the visuals: https://drive.google.com/file/d/1cX_kTCr2pg9m2plVX-wOEPAP_zt3EOky/view?usp=drive_link