In the following example, a principal components analysis was performed on the Chatterjee-Price attitude dataset, which contains the aggregated answers to questionnaires of 35 employees from 30 randomly selected departments of a large financial firm. The variables measure employee perceptions of managerial aspects of their employment experience and reflect favorable ratings. These managerial aspects and their associated variables are as follows:
Table 1. Chatterjee-Price variable set.
|complaints||Handling of employee complaints|
|privileges||Does not allow special privileges|
|learning||Opportunity to learn|
|raises||Raises based on performance|
A correlation matrix of these favorable responses, shown in Table 1, was first calculated.
Table 2. Correlation coefficients calculated from Chatterjee-Price data.
As shown in Table 2, R’s princomp() function was used to form components and calculate their associated standard deviations and the proportions and cumulative proportions of the data’s total variance. As can be seen, the first four components account for almost 98% of the data’s total variability, and the first three account for 93%.
Table 3. Standard deviations and proportions and cumulative proportions of variance associated with components.
|Proportion of Variance||0.5322214||0.2510730||0.1465753||0.04831927||0.02181104||4.79E-17|
Table 3 shows the loadings of variables on components with the largest positive loadings on each variable within each row outlined in red. Since the variables are described as reflecting percent favorable ratings, positive loadings on a component are assumed to indicate variables that positively influence the component, and the largest loadings indicate the variables that exert the largest influence on the component. Component 1 loads negatively on complaints, privileges, learning, and raises, and advancement is blank, indicating a near-zero value. However, variable critical represents “too critical” and indicates a supervisor viewed as not too critical.
Component 2’s positive loadings, in descending order of magnitude, are complaints, privileges, and critical. Component 3’s privilege’s loading is higher than that of Component 2, which is also positive, and so it is viewed as primarily influencing Component 3, leaving complaints to dominate Component 2. Component 4’s loadings are raises and advance; privileges has been assigned to Component 3, where it has a higher value. Component 5’s positive loadings are complaints and advance, and Component 6’s only positive is the second highest loading for raises.
Table 4. Component loadings.
The next step in the analysis is to determine how many components to retain. Several rules of thumb exist. One indicates retaining only those components which reach some threshold, for instance those accounting for 90% of the observed variability in the data. This rule would indicate we retain only the first three components in this case, as shown in Table 2. However, variables learning, raises, and advance have no positive loadings within the first three components.
Another rule of thumb relies on the scree plot, which displays the components’ eigenvalues plotted against components. The scree for this analysis is shown in Figure 1. The rule of thumb is to select the component at the point of inflection of the plot, that point at which the rate of change begins to lessen. However, as shown in Figure 1, there are two such points, one at Component 2 and another at Component 4. Since the first three components fail to include three variables at positive loadings, Component 4 must be retained. The scree plot does not indicate retention of Component 5 also, and including it captures only 2% more of the data’s variability, even though it has the highest loading for variable complaints. Therefore, the first four components are retained as the new variable set representing the Chatterjee-Price data.
Figure 1. Scree Graph.
Based on the decision to retain only the first four components, the influence of the original variable set in interpreting the components is as shown in Table 4.
Table 5. Loadings selected for component interpretation.
The new, uncorrelated component variables presented in Table 4 can be broadly interpreted as follows. With its negative loadings on all but critical, Component 1 can be interpreted as solely reflecting this variable, clearly perceived as an important aspect of the respondents’ experiences with management. With positive loadings on complaints and privileges, Component 2 can be interpreted as measuring perceptions of management fairness in dealing with employees; management addressed employee complaints and, although this is weighted less heavily, management does not allow certain employees to have special privileges. Although Component 2 included privileges, its highest loading was on Component 3, which can be interpreted as primarily reflecting this variable. With high positive loadings on raises and advance, Component 4 can be interpreted as appropriate recognition of quality of employee work.