Skip to main content

Filtering observations within a PCA

This tutorial will help you implement a Principal Component Analysis (PCA) on a subset of particular observations.

Dataset for running a Principal Component Analysis

The data are from the US Census Bureau and describe the changes in the population of 51 states between 2000 and 2001. The initial dataset has been transformed to rates per 1000 inhabitants, with the data for 2001 serving as the focus for the analysis. We are interested in studying large states (with a population greater than the average population). A size variable has been added (large/small).

Goal of this Principal Component Analysis

Our goal is to analyze the correlations between the variables and to find out if the changes in population in some states are very different from the ones in other states. We focus on the large states by using the filter option of XLSTAT.

The only difference between this tutorial and the tutorial available here is that we decide to study only the large states.

Setting up a Principal Component Analysis

Once XLSTAT-Pro is activated, select the XLSTAT / Analyzing data / Principal components analysis command, or click on the corresponding button of the Analyzing Data toolbar (see below).

XLSTAT Analyzing Data menu / PCA

The Principal Component Analysis dialog box will appear.

Select the data on the Excel sheet. The Data format chosen is Observations/variables because of the format of the input data.

The PCA type that will be used during the computations is the Pearson's correlation matrix, which corresponds to the classical correlation coefficient.

XLSTAT principal component analysis dialog box general tab

In the Data options tab, select the filter option and select the size column of the dataset.

XLSTAT Principal Component Analysis dialog box data options tab

In the Charts tab, we wish to have all large states displayed and thus do not activate the filter option.

Principal Component Analysis XLSTAT dialog box Variables charts tab

Principal Component Analysis XLSTAT dialog box Observations charts tab

Principal Component Analysis XLSTAT dialog box Biplot charts tab

Click on OK. A new dialog box asking you which group you want to keep is displayed. Select the large group and click Ok.

group dialog box pca

The computations begin once you have clicked on OK. You are asked to confirm the number of rows and columns.

Then you should confirm the axes for which you want to display plots. In this example, the percentage of variability represented by the first two factors is not very high (72.09%); to avoid a misinterpretation of the results, we have decided to complement the results with a second chart on axes 1 and 3.

Principal Component Analysis menu PC1 and PC2 Principal Component Analysis menu PC1 and PC2

Interpreting the results of a Principal Component Analysis applied on filtered data

The results for the largest states are displayed. The first table gives some descriptive statistics.

pca filter descriptive statitics

Then, eigenvalues are displayed.

pca filter eigenvalues

We are interested in the maps for variables and observations. Regarding variables we have the following map:

pca filter variables map

We can see that on the first axis older states are opposed to younger states. The second axis opposes states with high domestic migration rates to states with lower domestic migration rates.

Regarding observations we have the following map.

pca filter observations map

This simple tool allows you to filter observations directly from your PCA dialog box and avoid complex data manipulation.

Was this article useful?

  • Yes
  • No