~/Principal Component Analysis Pt. 1

Brandon Rozek

Photo of Brandon Rozek

PhD Student @ RPI, Writer of Tidbits, and Linux Enthusiast

What is PCA?

Principal component analysis is a statistical procedure that performs an orthogonal transformation to convert a set of variables into a set of linearly uncorrelated variables called principle components.

Number of distinct principle components equals $min(# Variables, # Observations - 1)$

The transformation is defined in such a way that the first principle component has the largest possible variance explained in the data.

Each succeeding component has the highest possible variance under the constraint of having to be orthogonal to the preceding components.

PCA is sensitive to the relative scaling of the original variables.

Results of a PCA

Results are discussed in terms of component scores which is the transformed variables and loadings which is the weight by which each original variable should be multiplied to get the component score.

Assumptions of PCA

  1. Linearity
  2. Large variances are important and small variances denote noise
  3. Principal components are orthogonal

Why perform PCA?

Computing PCA

  1. Subtract off the mean of each measurement type
  2. Compute the covariance matrix
  3. Take the eigenvalues/vectors of the covariance matrix

R Code

pcal = function(data) {
  centered_data = scale(data)
  covariance = cov(centered_data)
  eigen_stuff = eigen(covariance)
  sorted_indices = sort(eigen_stuff$values, 
                        index.return = T, 
                        decreasing = T)$ix
  loadings = eigen_stuff$values[sorted_indices]
  components = eigen_stuff$vectors[sorted_indices,]
  combined_list = list(loadings, components)
  names(combined_list) = c("Loadings", "Components")
  return(combined_list)
}