Chapter 15 Singular Value Decomposition

15.2 SVD in R

\[\newcommand{\Mtx}[1]{\mathbf{#1}}\]

If \(\Mtx{x}\) is an \(n \times p\) matrix, and the singular value decomposition of \(\Mtx{A}\) is given by \(\Mtx{A} = \Mtx{U} \Mtx{S} \Mtx{V}^T\), the columns of the matrix \(\Mtx{V}^T\) are the eigenvectors of the square matrix \(\Mtx{A}^T \Mtx{A}\) (sometimes refered to as the minor product of ). The singular values of are equal to the square roots of the eigenvalues of \(\Mtx{A}^T \Mtx{A}\).

The svd() function computes the singular value decomposition of an arbitrary rectangular matrix. Below I demonstrate the use of the svd() function and confirm the relationships described above:

We should be able to reconstruct \(\Mtx{A}\) from the SVD:

We can also demonstrate the relationship between the SVD of \(\Mtx{A}\) and the eigendecomposition of \(\Mtx{A}^T\Mtx{A}\):

As we discussed in lecture, the eigenvectors of square matrix, \(\Mtx{A}\), point in the directions that are unchanged by the transformation specified by \(\Mtx{A}\).

15.3 Writing our own PCA function using SVD

In lecture we discussed the relationship between SVD and PCA. Let’s walk through some code that carries out PCA via SVD, and then we’ll implement our own PCA function.

For comparison, here’s what the builtin prcomp() function gives us:

Now that we have a sense of the key calculations, let’s turn this into a function.

Note I also included some code to warn the user when the covariance matrix is singular. Use the help to read about variables defined in .Machine.

Let’s put our function through its paces:

To bring things full circle, let’s make sure that the covariance matrix we reconstruct from our PCA analysis is equal to the covariance matrix calculated directly from the data set:

Great! It seems like things are working as expected.

15.4 Writing a biplot function

We can use the same basic framework above to create a function to compute a biplot for our PCA. The function below includes an argument emphasize.variables, If emphasize.variables is TRUE, the function returns the biplot corresponding to \(\alpha=0\), the column-metric preserving biplot. If this arugment is FALSE it returns calculations correspoding to \(\alpha=1\), the row-metric preserving biplot.

Let’s first create a biplot that emphasize the relationships between the variables.

The biplot we just created emphasizes information about the variables in our data, and their relationship to each other and the principal components. Notice that the angule beween Petal Width and Petal Length is very small – this indicates they are highly correlated, while Sepal Width is nearly uncorrelated with the previous two variables. Petal Width and Petal Length also approximately parallel to PC1, indicating that they have large loadings with the first PC. The Petal Length vector is the longest indicating that it has the largest standard deviation. This biplot however distorts the relationships among the objects (observations) – the scatter of scores in this biplot is not an optimal representation of the true distances between the observations in the original space.

Now we create a biplot that emphasizes the relationships between the objects.

In this view, the points representing the weighted scores are a better representation of the relationships among the observations, but the geometry of the vectors representing the associations among the variables is no longer optimal.

15.5 Data compression and noise filtering using SVD

Two common uses for singular value decomposition are for data compression and noise filtering. We’ll illustrate these with two examples involving matrices which represent image data. This example is drawn from an article by David Austin, found on a tutorial about SVD at the American Mathematical Society Website.

15.5.1 Example image

The file zeros.dat from the course wiki. This is a \(25 \times 15\) binary matrix that represents pixel values in a simple binary (black-and-white) image.

This data matrix can be thought of as being composed of just three types of vectors, each illustrated in the following figure:

The three vector types in the “zero” matrix

The three vector types in the “zero” matrix

If SVD is working like expected it should capture that feature of our input matrix, and we should be able to represent the entire image using just three singular values and their associated left- and right-singular vectors.

Our original matrix required \(25 \times 15\) (\(= 375\)) storage elements. Using the SVD we can represent the same data using only \(15 \times 3 + 25 \times 3 + 3 = 123\) units of storage (corresponding to the truncated U, V, and D in the example above). Thus our SVD allows us to represent the same data with at less than \(1/3\) the size of the original matrix. In this case, because all the singular values after the 3rd were zero this is a lossless data compression procedure.

The file noisy-zero.dat is the same ‘zero’ image, but now sprinkled with Gaussian noise draw from a normal distribution \(N(0,0.1)\). As in the data compression case we can use SVD to approximate the input matrix with a lower-dimensional approximation. Here the SVD is “lossy”" as our approximation throws away information. In this case we hope to choose the approximating dimension such that the information we lose corresponds to the noise which is “polluting”" our data.

Now we carry out SVD of the matrix representing this noisy image:

Finally we plot the approximation of this image based on the first three singular vectors/values:

As you can see from the images you created the approximation based on the approximation based on the SVD manages to capture the major features of the matrix and filters out much of (but not all) the noise.

15.6 Image compression using SVD in R

We’ll illustrate another application of SVD – approximating a high dimensional data set using a lower dimensional approximation. We’ll apply this concept to image compression.

R doesn’t have native support for common image files like JPEG and PNG. However, there are packages we can install that will allow us to read in such files and treat them as matrices. We’ll use a package called “imager” to work with images.

Install the “imager” library via the standard installation mechanism

Once imager is installed, download this picture of my dog Chester to your local filesystem and load it as follows:

We can plot images loaded by imager

Let’s explore the object returned by imager:

imager stores the image information in a multidimensional array that has a class of “cimg” (type ?cimg). To manipulate this information using SVD we’ll need to turn it into a conventional matrix first.

Now we’ll use SVD to create a low-dimensional approximation of this image.

Above we created a rank 15 approximation to the rank 556 original image matrix. This approximation is crude (as judged by the visual quality of the approximating image) but it does represent a very large savings in space. Our original image required the storage of \(605 \times 556 = 336380\) integer values. Our approximation requires the storage of only \(15 \times 556 + 15 \times 605 + 15 = 17430\) integers. This is a saving of roughly 95%. Of course, as with any lossy compression algorithm, you need to decide what is the appropriate tradeoff between compression and data loss for your given application.

Finally, let’s look at the “error term” associated with our approximation, i.e. what we did not capture in the 15 singular vectors.