~/Introduction to Dimensionality Reduction

Brandon Rozek

Photo of Brandon Rozek

PhD Student @ RPI, Writer of Tidbits, and Linux Enthusiast

Motivations

We all have problems to solve, but the data we might have at our disposal is too sparse or has too many features that it makes it computationally difficult or maybe even impossible to solve the problem.

Types of Problems

Prediction: This is taking some input and trying to predict an output of it. An example includes having a bunch of labeled pictures of people and having the computer predict who is in the next picture taken. (Face or Object Recognition)

Structure Discovery: Find an alternative representation of the data. Usually used to find groups or alternate visualizations

Density Estimation: Finding the best model that describes the data. An example includes explaining the price of a home depending on several factors.

Advantages

Disadvantages

Data is lost through this method, potentially resulting in possibly insightful information being removed. Features from dimensionality reduction are typically harder to interpret leading to more confusing models.