Appendix to
Non-Parametric Unfolding of Binary Choice Data
Keith T. Poole
Graduate School of Industrial
Administration
Carnegie-Mellon University
This
appendix is a supplement to “Non-Parametric Unfolding of Binary Choice Data” (Political Analysis, 8:211-237). Section A1 shows how the new normal vector
is obtained from the singular value decomposition of the p by s matrix Y* constructed from the legislator coordinates. Sections A2 and A3 report Monte-Carlo
studies of the cutting plane procedure and legislative procedure, respectively,
with voting error. Finally,
section A4 shows a Monte-Carlo study of the unfolding algorithm with missing
data and voting error in two and three dimensions.
Appendix
A1. The sth singular vector, qs , is the Normal Vector, nj#
Recall that the singular value decomposition of the p by s matrix Y* is
Y* = ULQ¢
and that the current estimate of the cutting hyperplane is obtained by applying the Eckart-Young theorem to Y* (Eckart and Young, 1936). Namely, the best fitting hyperplane of rank s-1 through a set of points of rank s is found by performing a singular value decomposition of the matrix of points, ULQ¢ , inserting a zero in place of the sth singular value on the diagonal of L, and remultiplying. That is:
V = UL#Q¢
where L# is an s by s diagonal matrix identical to L except for the replacement of the sth singular value by zero. By construction, the p by s matrix V has rank s-1.
Let nj# be the normal vector of the hyperplane defined by V such that nj#’ nj# =1 and let qs be the sth singular vector of Q . It is easy to show that nj# = qs (or its reflection, nj# = -qs ). To see this, note that by the definition of an orthogonal matrix:
qs’Q = (0, 0, 0, … , 1)
That is, the inner product of qs with the other s-1 singular vectors in Q is zero. Hence:
L#Q¢qs
= 0s
and
Vqs = UL#Q¢qs = 0p (A1)
Where 0s and 0p are vectors of zeroes of length s and p respectively. By construction, equation (A1) is simply a restatement of the definition of a plane. The normal vector to the plane, V, is qs (or its reflection, -qs ). Hence, nj# = qs .
A2.
Monte-Carlo Studies of the Cutting Plane Procedure with Voting Error
Similar to the experiments shown in Table 1, 100 legislators and 500 pairs of policy points were randomly drawn from a uniform distribution through the unit hypersphere. The policy points were randomly drawn but in such a way so as to produce an average majority margin of about 67 percent. Error was introduced by making the legislator choices probabilistic such that the further a legislator is from the cutting plane, the less likely the legislator will make a voting error. Specifically, an indirect utility function (McFadden, 1976) was created for each legislator -- uijJ + eijJ – where uijJ is the deterministic portion of the utility function for choice J=Yea, Nay, and eijJ is the stochastic portion. The deterministic portion is assumed to be an exponential function of the negative of the squared distance from the legislator to the “y” and “n” alternatives and eijy and eijn were drawn from the Normal, Uniform, and Logit distributions, respectively.
Table A1 shows that the procedure does a good job correctly classifying the true roll call choices and recovering the true normal vectors – especially at the 15 percent error level which is the approximate level of the error found in the U.S. Congressional roll call data.[1] Finally, as one would expect, increasing the number of legislators increases the accuracy of the recovery.
__________________
Table A1 about Here
__________________
When error is present the cutting plane procedure converges very quickly. An example is shown in Figure A1 that uses the same configuration of legislator ideal points as Figure 3. The choices of 78 of the 435 legislators have been modified so that they are “errors” – “N’s” on the “Y” side of the true cutting line and “Y’s” on the “N” side of the true cutting line. The cutting plane procedure converges on the 30th iteration as shown in Panel D. As shown by Panels B and C, in the error case the converged cutting plane may not be the one that maximizes classification – however, it will invariably be very close to the optimal cutting plane. This is easily dealt with by simply storing the iteration record and using the normal vector corresponding to the best classification. This works very well in practice.




Given the legislator coordinates, X, and their votes on the jth roll call, tj , standard
errors for the estimated normal vector, nj* ,
can be obtained via a simple bootstrapping analysis. In this context the rows of X and the corresponding
elements in tj are sampled with replacement. In a simple binary limited dependent
variable context, let X# be the matrix X bordered by a column of
ones, then
tj =
X#b
+ e ,
In a Probit or logit analysis, if the estimated b’s for the independent variables,
b1 , b2 , … , bs , are normalized so that their sum of squares is equal to one, then they constitute a normal vector to a plane upon which the choice probabilities are exactly .5/.5 and the intercept term, b0, is the cutting point, mj*. In this context X is a fixed set of numbers. However, in the roll call context, X is estimated from the roll calls. Consequently, the bootstrap standard errors reported below should be regarded as measuring the stability of the cutting plane procedure. In an LDV context where X is a matrix of independent variables, they are more akin to real standard errors.
The bootstrap analysis is performed in the
following manner. First, the rows of X and the corresponding elements in tj are
sampled with replacement to form 100 matrices (that is, the sampling is by
legislator with replacement). Second,
the cutting plane procedure is applied to each of the 100 matrices. Finally, the standard errors are obtained by
computing the sum of squared differences between the actual normal vector from
the original data, nj* , and the 100 normal vectors from the
bootstrap trials, dividing by 100 and taking the square root.
The matrix X was constructed by randomly drawing legislators from a uniform
distribution through the unit hypersphere.
Experiments were conducted using 50, 100, and 500 legislators,
respectively. The elements of the true
normal vector were all set equal to
so that the s
dimensions would be equally salient and a cutting point was chosen along the
normal vector to produce either a 50-50 margin or an 80-20 margin,
respectively. Voting error was generated in the same way as the experiments reported in
Table A1, however, only normally distributed error was used in the experiments
reported in Table A2. The cutting plane
procedure was applied to the matrix with error to obtain the estimated normal
vector, nj* , and then the standard errors were computed
for this nj* using the bootstrap method described in the
previous paragraph. This entire process
was repeated 50 times producing 50s (50 times s) standard errors. The mean and standard deviation of these 50s
standard errors are reported for differing values of p (50, 100, 500), s (2, 3,
10), and margin (50-50, 80-20) in Table A2.
__________________
Table A2 about Here
__________________
For example, the first row of Table A2 shows the results for 50 legislators in 2 dimensions with 15 percent voting error (due to rounding, this was either 7 or 8 of the 50 legislators making voting errors). The average of the 100 bootstrapped standard errors (50 experiments times 2 dimensions) was 0.104 for the 50-50 margin votes (each true roll call was 25 Yea and 25 Nay) and 0.177 for the 80-20 margin votes (each true roll call was 40 Yea and 10 Nay). The standard deviations of these two means were 0.029 and 0.065 respectively.
The average standard errors in Table A2 show that lopsided roll calls are less precisely estimated than close roll calls, and roll calls with a smaller number of legislators the less precisely estimated than roll calls with a larger number. Neither of these findings is a surprise. The bottom line is that the cutting plane procedure is very stable at realistic levels of observations and levels of error.
Table A3 shows the results of applying Probit and the cutting plane procedure to the Spector and Mazzeo (1980) “Grade” data used by Greene (1993, pp.658-659) to analyze Manski’s Maximum Score Estimator (Manski, 1975, 1985; Manski and Thompson, 1986). The probit coefficients and their standard errors are identical to those reported by Greene (1993, p. 646). The standardized probit coefficients and the estimated normal vector from the cutting plane procedure are very similar. The standard errors for the elements of the normal vector estimated by the cutting plane procedure were obtained by a simple bootstrapping analysis. The Spector and Mazzeo dataset was sampled by observation with replacement (that is, the rows of the data matrix were sampled with replacement) to form 100 matrices and the cutting plane procedure was applied to each of the 100 matrices. The standard errors were obtained by computing the sum of squared differences between the actual normal vector from the original data and the 100 normal vectors from the bootstrap trials, dividing by 100 and taking the square root.[2] This is the same approach used in the Monte Carlo work reported above.
__________________
Table A3 about Here
__________________
The pattern of “significance” for the three non-parametric cutting plane coefficients is the same as that for the Probit coefficients. Monte-Carlo work with artificial data suggests that the cutting plane coefficients will have nearly identical patterns of significance (using bootstrapping) with those produced by a Probit analysis when the underlying error distribution is symmetric.
Table A4 shows a second empirical comparison of the cutting plane procedure and Probit analysis. The sample is 231 Republican members of the House of Representatives[3] and the dependent variable is whether or not they signed up as co-sponsors of a minimum wage increase.[4] The independent variables are the two dimensional W-NOMINATE scores (Poole and Rosenthal, 1997) computed from votes taken in 1995 and some characteristics of representatives’ congressional districts (percent rural, percent Black, and median family income). (The independent variables were put in standard deviation form to facilitate comparisons.) The standardized Probit coefficients and the cutting plane coefficients are very close – the simple Pearson correlation is .961. Substantively, the coefficients in Table A4 indicate that Republican moderates from poorer, urban districts support raising the minimum wage. Once again, the pattern of “significance” for the non-parametric cutting plane coefficients is the same as that for the Probit coefficients.
__________________
Table A4 about Here
__________________
A3. .
Monte-Carlo Studies of the Legislative Procedure With Voting Error
Table A5 is organized in the same fashion as Table A1. Not surprisingly, as the number of cutting planes increases with the error level held fixed, the precision of the recovery of the legislators increases dramatically. Even at the very high error level of 25 percent, with 500 roll calls in two or three dimensions the recovery of the legislator coordinates is very good.
__________________
Table A5 about Here
__________________
The legislative procedure is very stable. This is shown by the small standard deviations for correct classifications and the r-squares. In addition, the gap between the average worst r-square and the average best r-square for the s dimensions is not very large. For example, for 500 roll calls in 3 dimensions, the average worst r-square between the true and reproduced legislator coordinates was .968 and the average best r-square was .985. In other words, on average, the three r-squares computed between the corresponding three dimensions ranged between .968 and .985.
Given the normal
vectors, N, the cutpoints, the q mj’s, and legislator i’s
votes on the q roll calls, ti
, standard errors for the estimated legislator coordinates, xi* , can
be obtained via a simple bootstrapping analysis. Two types of bootstrapping experiments are reported below. In the first, similar to the analysis of the
cutting plane procedure above, the stability of the legislative procedure is
assessed by assuming that N is fixed.
In this context the rows of N and the corresponding mj’s
and elements in ti are sampled with replacement. The resulting standard errors for the
entries of X are a useful descriptive measure of the
stability of the legislative procedure.
In the second set of experiments, a matrix of
roll calls is created and the unfolding algorithm is run until convergence to
get the estimated legislator coordinates, X*. Then 100 matrices are formed from the
original roll call matrix by drawing roll calls with replacement and the
unfolding algorithm is run on all 100 matrices to produce 100 estimated
legislator matrices. The standard
errors for the legislator coordinates are then computed from these 100
bootstrapped estimates. These standard
errors are good descriptive measures of the stability of the unfolding
algorithm as a whole and mimic a real world application of the unfolding
procedure.
The bootstrap analysis to assess the stability of the legislative procedure is performed in the following manner. First, the rows of N and the corresponding mj’s and elements in ti are sampled with replacement to form 100 matrices (that is, the sampling is by roll call cutting plane with replacement). Second, the legislative procedure is applied to each of the 100 matrices. Finally, the standard errors are obtained by computing the sum of squared differences between the actual legislator coordinates from the original data, xi* , and the 100 xi’s from the bootstrap trials, dividing by 100 and taking the square root.
The bootstrap analysis to assess the stability of the unfolding algorithm as a whole is performed in the following manner. An artificial roll call matrix is created with a given level of voting error and average majority margin that closely approximates actual congressional roll call data. The unfolding algorithm is then applied to this matrix to get the target legislator configuration, X*. From the roll call matrix, 100 roll call matrices are formed by sampling roll calls with replacement. The unfolding algorithm is applied to each of the 100 roll call matrices and the standard deviation of the 100 estimates for each legislator for each dimension was computed. Each of the 100 estimated configurations was rotated using Schonemann’s (1966) method to best fit the target configuration, X*, to remove any arbitrary rotation. (Empirically, the rotation of the configurations was extremely small.) The standard errors are obtained by computing the sum of squared differences between the actual legislator coordinates from the original data, xi* , and the 100 xi’s from the bootstrap trials (after rotation), dividing by 100 and taking the square root.
Figure A2 shows the results of these bootstrapping experiments (type 1 and type 2, respectively) for 100 legislators and 500 roll calls in 2, 3, and 10 dimensions. The classification error introduced was 18% and the average majority margin was 68-32 for all the experiments. Figure A2A shows the results of both types of bootstrapping for 2 dimensions, figure A2B shows the results for 3 dimensions, and figure A2C 10 dimensions.



For
example, in the 2 dimensional experiments, about 12 percent of the 300 (100
legislators times 3 dimensions) type 1 standard errors and about 3 percent of
the type 2 standard errors were between 0.03 and 0.04. The distributions of the standard errors for
2 and 3 dimensions are piled up to the left of the corresponding ones for 10
dimensions. This makes sense because
the number of legislators and roll calls are held fixed while the number of
dimensions is increased. Recall that q
roll calls in s dimensions create a maximum of
regions in the
space. This number explodes as the
number of dimensions increases so that it is quite likely that in high
dimensional spaces there are multiple regions close to each other with the same
correct classification for a legislator.
This geometry is almost certainly responsible for the slightly larger
standard errors from the bootstrapping experiments. Nevertheless, even in 10 dimensions the bulk of the standard
errors are reasonably small.
A4. .
Monte-Carlo Study of the NonParametric Unfolding Algorithm With Missing
Data and Voting Error
Table A6 shows a set of experiments with and without error at various levels of missing data. Configurations of 100 legislators and 500 roll calls in 2 and 3 dimensions were randomly generated in the same fashion as those used in the Monte-Carlo experiments shown in Table 4. Error was introduced into the choices by making them probabilistic (see Appendices A2 and A3 above). An error level of about 20 percent was chosen because it is somewhat above the approximate level of error in U.S. congressional roll call data. Matrix entries were randomly removed and the remaining entries were then analyzed by the algorithm in one through five dimensions. The upper part of Table A6 shows two-dimensional experiments at four different levels of missing data with and without error, and the lower part shows three-dimensional experiments. Each randomly produced matrix was analyzed at each level of missing data so that the same 10 matrices for two or three dimensions (with varying levels of missing entries) are being averaged in each row of the upper or lower parts of the Table.
__________________
Table A6 about Here
__________________
The accuracy of the recovery of the legislator configuration is quite good and only begins to fall off at 70 percent missing entries. With perfect data the procedure unambiguously finds the true dimensionality. With error there are clear “elbows” at the true dimensionality. The tendency for the correct classification to increase with the percentage of missing data is due to the fact that with more missing data there are fewer roll call cutting planes and hence a legislator’s position is not as constrained as it is with complete data. Indeed, the average largest distance to a cutting plane increases with the level of missing data. This tends to increase the correct classification and decrease the correlation between the true and reproduced legislator configurations. In any event, the results shown in Table A6 suggest that the algorithm will perform well with real world data at realistic levels of missing entries. In particular, with 20 percent missing data there is no appreciable deterioration in performance.
Appendix References
Eckart, Carl and Gale Young. 1936. “The Approximation of One Matrix by Another of Lower Rank.” Psychometrika, 1:211-218.
Greene, William H. 1993. Econometric Analysis. Englewood Cliffs, N.J.: Prentice Hall.
Manski, Charles F. 1975. “Maximum Score Estimation of the Stochastic Utility Model of Choice.” Journal of Econometrics, 3:205-228.
Manski, Charles F. 1985. “Semiparametric Analysis of Discrete Response: Asymptotic Properties of the Maximum Score Estimator.” Journal of Econometrics, 27:313-333.
Manski, Charles F. and T. Scott Thompson. 1986. “Operational Characteristics of Maximum Score Estimation.” Journal of Econometrics, 32:85-108.
McFadden, Daniel. 1976. “Quantal Choice Analysis: A Survey.” Annals of Economic and Social Measurement, 5:363-390.
Poole, Keith T. and Howard Rosenthal. 1997.
Congress: A Political-Economic History
of Roll Call Voting. New York: Oxford University Press.
Schonemann, Peter H. 1966. “A Generalized Solution of the Orthogonal Procrustes Problem.” Psychometrika, 31:1-10.
Spector, L. and M. Mazzeo. 1980. “Probit Analysis and Economic Education.” Journal of Economic Education, 11:37-44.
Table A1
Monte-Carlo Tests of Cutting Plane Procedure
500 Votes With Normal, Uniform, and Logit Error
(Each Entry Average of 10 Trials, Standard Deviations in Parentheses)
|
S |
P |
Average Percent Error |
Average Majority Margin |
Average PercentCorrectly Classified Obs.a |
Average Percent Correctly Classified Trueb |
Average Fit With True Normal Vectors Allc |
Average Fit With True Normal Vectors 10% Min.d |
|
1 |
100 |
24.9 (0.4) |
65.9 (0.6) |
77.9 (0.2) |
91.6 (0.5) |
.840 (.035) |
.894 (.015) |
|
1 |
100 |
15.7 (2.6) |
66.5 (0.8) |
86.5 (0.6) |
94.6 (0.3) |
.906 (.026) |
.939 (.007) |
|
2 |
100 |
25.5 (0.2) |
64.1 (0.2) |
78.8 (0.3) |
90.3 (0.3) |
.951 (.004) |
.952 (.003) |
|
2 |
100 |
15.0 (0.6) |
66.4 (0.8) |
89.7 (0.5) |
94.2 (0.2) |
.979 (.004) |
.986 (.001) |
|
3 |
100 |
25.1 |