HOW TO USE THE BLACK BOX

 

 

 

 

by

 

 

 

Keith T. Poole

Graduate School of Industrial Administration

Carnegie-Mellon University

Pittsburgh, PA 15213

 

 

3 August 1998

 

 

 


 

Abstract

 

 

            This paper is meant as a supplement to my August, 1998 AJPS article "Recovering a Basic Space From a Set of Issue Scales."  I promised the editor and the reviewers of my article that I would provide and support the computer programs used in the article.  Accordingly, this paper is part of a package of material that contains the FORTRAN code and the executables of the programs used in the article.


 

1.  Introduction

 

            This paper is meant as a supplement to my AJPS article "Recovering a Basic Space From a Set of Issue Scales.  The body of the paper shows material deleted from the original paper to conserve journal space as well as additional empirical examples.  Appendices A, B, and C show researchers how to use the various computer programs that implement the model shown in the article. 

            Section 2 shows how the method I develop in my AJPS article is related to Factor Analysis.  Section 3 reports some Monte-Carlo work that was cut out of the final version of the paper.  Section 4 shows the relationship between the method and Aldrich-McKelvey scaling (1977).  In effect, the method can be used to perform an Aldrich-McKelvey scaling of an issue scale in more than one dimension.  Finally, Section 5 shows some additional empirical applications.

           

 


2.  Relationship With Standard Factor Analysis

            A standard method of analyzing a rectangular data matrix is to compute a correlation matrix between variables (the columns of the data matrix) and then factor analyze (principal components or maximum likelihood) the correlation matrix.  Factor analysis has its own special nomenclature and the method of its presentation varies from author to author.  However, in its simplest form, principal components, it is simply eigenvalue/eigenvector decomposition and its connection to singular value decomposition is easily shown.

            For example, suppose the n by m data matrix, X, has no missing data and is standardized such that each column sums to zero and the sum of the squared entries of the column sums to one.  Note that this is the transformation

                                                                                                  (1)

where  is the original matrix entry,  is the mean of the jth column, and sj is the standard deviation of the jth column.  Given the transformation given in equation (1), the Pearson correlation matrix is simply R = X’X.  

            Alternatively, most authors (e.g., Harman, 1970; Van de Geer, 1971) assume that X is in standard deviation form.  That is:

                                                                                                  (2)

Written in this form the Pearson correlation matrix is  .  This approach has the awkward result of having the 1/n in various equations.  Accordingly, I use the simpler approach of equation (1).  This has no material effect on the discussion below except to simplify the expressions.

            Let the singular value decomposition of X be ULV’, where U is an n by m matrix such that U’U = Im , L1/2 is a m by m diagonal matrix of singular values, and V is a m by m matrix such that V’V = Im. 

            To perform a principal components analysis compute the correlation matrix,

                                    R = X’X = VL2V’                                                    (2)

and then perform a standard eigenvalue/eigenvector decomposition of R.  Note that the eigenvalues of R are the squared singular values of X.  The factor matrix is the m by m matrix VL and the factor scores are the n by m matrix U, where U is from the singular value decomposition of X.  Using the terminology of Harman (1970):

                                    F = U   and   A = VL

In terms of the model I use in the AJPS article,

                                    X0 = [YW' + Jnc']0 + E0

let c' = 0, no missing data, and ignoring the error term for the moment, then

                                                    (3)

so that Y and W are simply related to the factor scores and the factor matrix, respectively.

            Indeed, even when X is in the form of the AJPS paper, [YW' + Jnc']0 + E0, the estimated W matrix, , will be highly correlated with the factor matrix, A.  For example, Table 1 shows the SPSS output for a principal components analysis of the 1980 issue scale example shown in the AJPS paper.  The data set was read into SPSS and the correlation matrix was computed using the pair-wise deletion option.  The first part of the table shows the eigenvalue table and the second part of the table shows the Factor Matrix, A, labeled “Component Matrix” in the SPSS output.

            The r-squares between the 3 columns of the Factor Matrix shown in Table 1 and the columns of the  shown in Table 4 of the AJPS article are .929, .802, and .223 respectively.  In other words, the first two dimensions are essentially the same.  This makes sense because I found that only two of the fourteen ’s for the 3rd dimension to be statistically significant whereas seven of the fourteen ’s for the 2nd dimension and all for the 1st dimension were statistically significant.  The individual placements appear to be at most two-dimensional.

            Further evidence of the data being at most two-dimensional are the scree plots shown in Figure 1.  The dotted line in the upper plot shows the eigenvalues of the correlation matrix from the SPSS output.  The solid line in the upper plot shows the squared singular values for the data transformed as in equation (1).  That is, let X* = X0 for the non-missing entries and let X* = YW’ (using three dimensions) for the missing entries.  Column means were computed using the non-missing entries of X0, and each entry of X* was transformed as shown in equation (1).  The singular values of this matrix were then squared to make them comparable to the eigenvalues extracted from the correlation matrix.  These two series are virtually identical with the values of the eigenvalues/squared singular values falling off fairly smoothly from the elbow at the 3rd value through the 14th value.  This is a clear indication that the data are most likely to be two-dimensional.


Table 1

SPSS Output for 1980 Issue Scale Example

 

Extraction Method: Principal Component Analysis.

Total Variance Explained

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Initial Eigenvalues

 

 

Extraction Sums of Squared Loadings

 

 

Component

Total

% of Variance

Cumulative %

Total

% of Variance

Cumulative %

1

4.432

31.654

31.654

4.432

31.654

31.654

2

2.055

14.677

46.331

2.055

14.677

46.331

3

1.155

8.252

54.583

1.155

8.252

54.583

4

.916

6.546

61.129

 

 

 

5

.879

6.281

67.410

 

 

 

6

.790

5.644

73.054

 

 

 

7

.780

5.570

78.625

 

 

 

8

.630

4.497

83.122

 

 

 

9

.599

4.278

87.399

 

 

 

10

.541

3.865

91.264

 

 

 

11

.409

2.919

94.183

 

 

 

12

.394

2.812

96.995

 

 

 

13

.223

1.595

98.590

 

 

 

14

.197

1.410

100.000

 

 

 

Extraction Method: Principal Component Analysis.

 


Figure 1

Eigenvalues vs. Singular Values

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 


 


            The lower plot shows the singular values of X* minus the actual column means, that is, X* – Jnc, where c is the vector of actual column means.  The column means are subtracted from X* because if the means are large numerically – as they are for issue scales – then the first singular value of X* will be very large compared to the others.  This results from the fact that the sum of the squared singular values is equal to the sum of the squared values of the matrix.  That is:

                                               

            Here the singular values fall off a little less smoothly.  The decline from the 2nd to the third value is not as dramatic as it is for the eigenvalues of the correlation matrix, but the plot clearly indicates that the data is at most three-dimensional (see the output for the 1980 issue scale example in Appendix A).  In this context, a singular value decomposition of X yields essentially the same information as a principal components analysis of the correlation matrix. 

3.      Additional Monte Carlo Tests of the Model

 

            This subsection was deleted from the final version of the paper in order to conserve journal space.  The purpose of these tests is to show the ability of the procedure to estimate the Eckart-Young lower rank approximation matrix of an arbitrary matrix of real numbers with missing entries.  The table and equation references are to those in the AJPS paper.  The table deleted along with this subsection was the original Table 4.  Here I will refer to it as 4* to avoid confusion with the AJPS article.

Estimating Eckart-Young Approximation Matrices

            The Monte Carlo results reported in Tables 2 and 3 show that the procedure does an excellent job of estimating Y, W, and c when the observed data are in the form shown in equation (1).  The purpose of this subsection is to show that the procedure also can be used as a general-purpose tool to obtain the Eckart-Young approximation matrix ULsV¢ of any rectangular matrix of real numbers with missing entries.

            In this application  must be used with some caution.  Recall that if a matrix X is of rank s, then subtracting off the column means, XJnc¢ , in most circumstances, does not change the rank.[1]  However, the converse is not necessarily true.  By construction,  has rank s and the columns of  sum to zero.  Adding a vector of constants, , will usually increase the rank to s+1.  However, the first s singular values of  will be quite large in comparison to the s+1st singular value.  If the observed data are in form given by equation (1), then the closer  is to the true YW’ (which, by construction, is XJnc¢), the smaller the s+1st singular value.  In addition, if the column means are not of interest, then the Monte Carlo results in Table 2 show that  is an excellent approximation of the true YW’ matrix even at substantial levels of error and missing data.

            In order to test the ability of the procedure to estimate the Eckart-Young approximation matrix of an arbitrary matrix of real numbers of full rank, just the first s singular values of  were utilized.  That is, if the singular value decomposition of  is ULV’ where L is an s+1 by s+1 matrix, then ULsV’ was used as the approximation matrix.

            Similar to the method used for Table 2, to construct X of full rank m, U and V were obtained from a singular value decomposition of an n by m matrix of uniform [-1,1] random numbers.  The first three singular values of X were set so that when the column means are subtracted from X, the singular values of the resulting matrix, YW' , were approximately 50, 35, 20, 5, 5, …, 5.  This was done so as to approximate a situation where the first three dimensions largely account for the structure of the matrix but the matrix is of full rank. 

            Missing data was created in the same fashion as described for Table 2.

            In these tests there is no error process so the only difference between trials is the pattern of missing data.  Each entry in Table 4* is the average of 10 trials.  The standard deviations are shown in parentheses.

__________________

Table 4* About Here

__________________

 

            The first three columns of Table 4 show the number of rows, the number of columns, and the rank of the approximation matrix.  The fourth column, r-square with Eckart-Young, shows the average squared Pearson correlation between the nm elements in true Eckart-Young matrix and the reproduced Eckart-Young matrix.  The fifth column shows the average squared Pearson correlation between the true basic dimensions and the estimated basic dimensions corresponding to the rank of the approximation; that is, the average of the r-squares computed between each column of the true Y matrix, and its corresponding column in .[2]  Finally, the sixth column shows the percentage of missing entries.

            Table 4* shows that the procedure does a good job estimating lower rank approximations when a substantial portion of the matrix is missing.  Not surprisingly, the lower the level of missing data and the larger the matrix, the better the approximation.  With 25 percent missing data the r-squares all exceed .97.  Even with 70 percent missing data the procedure will do a reasonable job if the size of the matrix is large enough.

 

4.      Empirical Application Deleted From AJPS Article

 

            This application was deleted from the final draft of the AJPS article to conserve journal space.  In this application the scaling method is applied to a transposed matrix in which the number of columns is much larger than the number of rows.  What I show below is that the general method developed in the AJPS article can be used to perform Aldrich-McKelvey scaling of an issue scale in more than one dimension.

            Aldrich and McKelvey (1977) in effect solved the Likert scale problem.  In my opinion, the Aldrich and McKelvey paper is the most under appreciated achievement in political methodology.  In part this is due to the fact that, as I note below, the standard error of the estimate of their model is clearly biased downwards.  However, as I have argued elsewhere (Palfrey and Poole, 1987), this is an advantage not a defect!  Namely, the Aldrich-McKelvey scaling method can be used as a powerful filter.  Respondents who see the political universe as backward (namely, Reagan to the left of Carter), clearly have a very low level of information about politics.

            The big advantage of applications like the one shown below is that a researcher can check to see if the scale really is one-dimensional.  That is, scales with labeled endpoints are designed to be one-dimensional.  Performing the decomposition shown below can roughly test this.  Namely, the r-square in one dimension should be very large and the increment to adding a second dimension should be quite small.

            Note that because the number of political stimuli being placed on an issue scale is usually not too large, this application should be done with some caution.  For the 1980 scale shown below, there are only 6 stimuli.  Consequently, if respondents are included that placed only 4 of the 6 stimuli, then if 3 dimensions are estimated the r-square for these respondents will be 1.0!  Because of this I only estimated 2 dimensions with two missing responses (see Appendix C for the output files).

            The one-dimensional fit of the model was an r-square of .7541 and a standard error of the estimate of .9843.  In two dimensions the r-square was .8645 and the standard error of the estimate was .8447.  These fits plus the estimated configuration shown in Table 6* show, in my opinion, that the scale is indeed one-dimensional and that the second dimension is largely capturing respondent confusion about where to place Anderson on the scale.

            Note that in Table 6* I show the coordinates as .  In the computer code I simply write out the singular vectors to make it easier to compare with the Aldrich-McKelvey coordinates (which is an eigenvector).

 


            Analysis of 1980 Post-Election Liberal-Conservative Scale

 

            The purpose of this application is to show the connection between the procedure developed here and the scaling method developed by Aldrich and McKelvey (1977) to analyze seven-point scales.  The Aldrich-McKelvey scaling method is a one dimensional version of the model expressed in equation (1) – that is, the original model as expressed in equation (1B) applied to a transposed matrix where m > n.  In this application I will analyze the responses to the 1980 Post-Election Liberal-Conservative seven-point scale.  This is one of the fourteen scales analyzed in the previous subsection.

            In the Aldrich-McKelvey framework, the matrix X is an n by m matrix where the rows are the respondents' perceived positions of the m stimuli on the scale.

The model they estimate is

 

                                         (17)

 

where y is an m length vector of underlying stimulus coordinates, W is a n by n diagonal matrix of weights, Jm is an m length vector of ones, c is an n length vector of constants, and E is an m by n matrix of error terms.  Aldrich and McKelvey assume that the respondents correctly perceive the true underlying configuration subject to some random perceptual error, E, and report a linear transformation of that true configuration.  Their scaling method estimates y using X, and W and c are estimated using  and X with ordinary least squares. 

            The Aldrich-McKelvey scaling method is, in effect, an s = 1 version of equation (1).  Solving for X in (17) produces

 

                           (18)

 

where W* = W-1Jn , c* = - W-1c , and E* = EW.  Equation (18) is identical to equation (1) the only difference being the reversal of the roles of n and m.

            Aldrich and McKelvey require that y¢y = 1 and missing entries are not allowed in X.  The procedure outlined in Section 2 based on the model stated in equation (1) can be regarded as a generalization of the Aldrich-McKelvey scaling procedure to more than one dimension.

            In this application the rows of the data set are the political stimuli and the columns are the respondents’ perceptions of where on the seven-point scale the stimuli are.  Consequently, there are n political stimuli (note the reversal of role of n from the equations above) and the basic space coordinates of the political stimuli are given in .  Recall that, by equation (1B), if X has no missing entries and no error, then it has rank s.  However, because m > n, subtracting off the column means reduces the rank of X by one provided that the columns do not already sum to zero.  In this case the number of basic dimension is s-1 so that is an n by s-1 matrix,  is m by s-1 and  is an m length vector where  and  are the linear mappings for the m respondents.

            Table 6* shows  for two basic dimensions along with the corresponding one dimensional vector estimated by the Aldrich-McKelvey procedure.  The first basic dimension is the liberal-conservative dimension and the order of the political stimuli – from Ted Kennedy at the far left to John Anderson near the center of the spectrum to Ronald Reagan at the far right – is intuitively appealing.  The second basic dimension essentially separates John Anderson from everyone else.  The standard errors were computed using a bootstrap procedure identical to that described earlier except now the columns (respondents) are being sampled with replacement.  The standard errors are based on 100 trials.

__________________

Table 6* About Here

__________________

 

            These standard errors must be taken with a grain of salt, however, because, as Aldrich and McKelvey (1977) note, a respondent “…who sees things backwards … contributes to a better fit to the ‘true’ space” (p. 116).  That is, respondents who perceive a mirror image of the true configuration improve the fit of the model so that the standard errors in Table 6* underestimate the true standard errors.  However, Monte Carlo work done by Aldrich and McKelvey and Palfrey and Poole (1987), show that the recovery of the stimulus configuration is robust to violations of the error assumptions and is very accurate even when the error level is very high and a large number of respondents are reporting mirror or semi-mirror images.

            The fourth column of Table 6* shows the first basic dimension normalized so that it can be directly compared to the Aldrich-McKelvey configuration shown in the fifth column.  The two configurations are, not surprisingly, virtually identical.  The differences are due to the slightly different samples analyzed by the two procedures.  Of the 888 respondents used to estimate the two basic dimensions, 643 had no missing data and were used in the Aldrich-McKelvey procedure.


5.  Additional Empirical Examples

a.      Recovering a Basic Space From the 1992 Issue Scales

            Table 2 shows an analysis of fifteen issu