top of page

PCA? More like PC-YAY! - Principal Component Analysis and Equity Research

Our previous blog post dove into how GIS can be used as a tool to illustrate spatial patterns of inequity. Today, we are going to break down how a statistical technique known as a Principal Component Analysis (PCA) can help us interpret and summarize those patterns.

To begin, what exactly is a PCA? Much like most statistical processes, a quick google of “Principal Component Analysis” may leave you with more questions than answers. To attempt to explain in plain English, here is our take: a PCA helps make sense of multiple and possibly correlated variables. It does this by:

  1. grouping the data into “components” that have similar influence on the overall differences within the dataset; and

  2. weighing the data components based on their overall influence over the spread of the entire dataset

Source: Analytics Vidhya, 2016

In slightly more detail, the PCA identifies variables that have similar, overlapping or correlated influence within the dataset (like the two “directions” in the above image). Variables that are similar in “shape” and “direction” are grouped into a component. Each component’s weight is defined by its influence on the overall variance (i.e.: what percentage of the overall variation in the dataset is caused by this grouping). Groupings or individual variables that are found to have negligible influence on variance can be either dropped, or have low contributions to the final output value. These groupings and their weights are used to calculate a reduction of all variables into a single dataset.

The result? The reduction of a large set of variables to one that still contains most of the information in the large set- something that would be impossible for human brain to distill and interpret.

This method is especially useful where there is no known hypothesis or functional model available to describe a subjectively observed phenomena. It allows for maximal input information, with a result customized to the data patterns present. In English: PCA is perfect for equity analysis because (with most things in social planning) there is no generally agreed-upon model of what makes a place inequitable!

For example, a group of well-meaning stakeholders could brainstorm a long list of equity-related variables or factors. It would be impossible to suggest that overcrowding is more or less of an inequity concern than absence of greenspace without introducing individual bias. Other than equally weighting all indicators, which would not account for cross-correlated variables, a new composite model cannot generally be completed without introducing subjectivity and value-based considerations for the indicators. Additionally, while the selected indicators may have been used by other equity studies, they may not be responsible for variance in the specific geography the study in question is investigating.

To avoid introducing additional bias as well as to avoid the prioritization of one target group over another, a data-agnostic approach such as PCA is sound as it reduces the universe of indicators to a set that best describes the variations in the data without placing a value judgement on any one particular indicator. Per our subject matter, the resulting dataset from PCA is referred to as an Inequity Index, and it highlights where there are several independent and unrelated factors contributing to inequity that are simultaneously occurring in the same time and place, while also serving as a focusing tool for deeper analysis. High volumes of individual indicators are an impossible and overwhelming starting point for analysis and discussion (what’s more important: access to transit or family income?) - a single composite value provides a quantitative arrow pointing at where to start looking closer.

To put it simply, we can go from this:

To this:

All without introducing subjectivity in indicator weighting, and accounting for cross-correlation!

PCA is an excellent tool that, when applied to Equity, allows us to distill information from a wide range of sources in a statistically sound way. GIS and statistics still don’t speak for all lived experience, and community consultation remains as the foundational principle of Equity research. Nevertheless, GIS and the PCA allows for the exploration of inequity in our region, to guide future research questions, outreach and funding. Once we have an idea of the where, we can get into the why and the how, in the hope to push towards just and equitable outcomes for all.

18 views0 comments

Recent Posts

See All


bottom of page