POLI 279 MEASUREMENT THEORY
Seventh Assignment
Due 9 June 2006


  1. In this problem we are going to use the Parametric Bootstrap version of W-NOMINATE to generate standard errors for the 104th Senate. Download the program, control card file, and data file and place them in the same directory.

    WNOMJLEWIS -- W-NOMINATE Program
    W-NOMINATE is discussed in detail with several examples on the W-NOMINATE Page and the bootstrap file output is discussed in detail on the Parametric Bootstrap page.

    NOMSTART_JLEWIS.DAT looks like this:
    SEN104KH.ORD
     NOMINAL MULTIDIMENSIONAL UNFOLDING OF 104TH SENATE
      919    1   22
        2   36   11               Number Dimensions, Number Characters to Read from Header, Number of Bootstrap Trials
     15.0000  0.5000
      0.0250   20
    (36A1,15000I1)
    (1x,I4,36A1,1X,4i4,51f7.3)
    (I4,1X,36A1,80F10.4)
    It is identical to the NOMSTART.DAT used for question 5.e-g of Homework 6 except for the "11" (colored red) in the fourth line of the file. This is the number of bootstrap trials. Normally it is set to 1001 but we will use 101 in this problem.

    Put the files in the same directory and run WNOMJLEWIS. You will get 7 output files -- fort.26, fort.56, NOM21.DAT, NOM23.DAT, NOM31.DAT, NOM33.DAT, and NOM36.DAT. The NOM21.DAT - NOM36.DAT files are explained on the W-NOMINATE Page.

    FORT.26 are the parametric bootsrapped legislator coordinates. The file will look something like this:
    
       1 1049990999 0USA     10000CLINTON       -0.9160   -0.3336   -0.8668   -0.2853    0.1039    0.2095    1.0000    0.1436    0.1436    1.0000
       2 1041470541 0ALABAMA 10000HEFLIN        -0.3923   -0.2917   -0.3546   -0.4203    0.0510    0.1876    1.0000    0.7794    0.7794    1.0000
       3 1049465941 0ALABAMA 20000SHELBY         0.5858   -0.3131    0.6871   -0.3176    0.1162    0.0603    1.0000    0.2238    0.2238    1.0000
       4 1041490781 0ALASKA  20000MURKOWSKI      0.6600   -0.1009    0.7623   -0.0338    0.1138    0.1090    1.0000    0.0790    0.0790    1.0000
       5 1041210981 0ALASKA  20000STEVENS        0.4199   -0.3162    0.5695   -0.3110    0.1645    0.1222    1.0000    0.1487    0.1487    1.0000
              etc etc etc
      98 1044930873 0WASHING 10000MURRAY        -0.9235   -0.1667   -0.8936   -0.1832    0.0555    0.0916    1.0000   -0.1436   -0.1436    1.0000
      99 104 136656 0WEST VI 10000BYRD, ROBER   -0.6504   -0.5292   -0.6249   -0.6165    0.0465    0.1304    1.0000    0.1703    0.1703    1.0000
     100 1041492256 0WEST VI 10000ROCKEFELLER   -0.8080   -0.0714   -0.7819   -0.0967    0.0416    0.1585    1.0000    0.6202    0.6202    1.0000
     101 1044930925 0WISCONS 10000FEINGOLD      -0.8300    0.5577   -0.7844    0.6024    0.0546    0.0564    1.0000    0.6076    0.6076    1.0000
     102 1041570325 0WISCONS 10000KOHL          -0.6772    0.5656   -0.6449    0.7452    0.0395    0.1934    1.0000    0.5894    0.5894    1.0000
     103 1041471068 0WYOMING 20000SIMPSON        0.3605   -0.4309    0.5124   -0.4402    0.1725    0.1322    1.0000    0.4797    0.4797    1.0000
     104 1041563368 0WYOMING 20000THOMAS         0.7165    0.3480    0.7850    0.3535    0.0824    0.1281    1.0000    0.5629    0.5629    1.0000
    The legislator coordinates are are the first two columns after the name of the legislator (shown in red). For example, former President Clinton's coordinates are -0.9160 and -0.3336.

    1. Run 101 bootstrap trials using WNOMJLEWIS (change 011 to 101 in the NOMSTART_JLEWIS.DAT file and run the program). E-Mail me the NOM21.DAT file from the run.

    2. Use R to plot the legislators in two dimensions from the FORT.26 file. Use "D" for Non-Southern Democrats, "S" for Southern Democrats, "R" for Republicans, and "P" for President Clinton. This graph should be in the same format as the one you did for question 2.f of Homework 5.

    3. The sixth and seventh columns of numbers (shown in blue) are the bootstrapped standard errors for the first and second dimensions, respectively. The last four columns are the Pearson correlations between the bootstrapped first and second dimension coordinates computed across the bootstrap trials. Because we are only working with two dimensions just the correlation between the first and second dimension estimates is relevant (shown in Purple).

    4. We are going to make a graph like the ones shown in Figures 4.4 and 4.5 of my book. Download the following R program:

      Plot_Bootstrap_New.r -- R Program to Plot Parametric Bootstrap Output, FORT.26

      Here is what Plot_Bootstrap_New.r looks like:
      #
      #  Plot_Bootstrap_New.r -- Program reads parametric bootstrap files posted
      #                      at http://voteview.org/Lewis_and_Poole.htm
      #                      and plots the legislator ideal points and the
      #                      standard errors
      #
      #  Remove all objects just to be safe
      #
      rm(list=ls(all=TRUE))
      #
      library(MASS)
      library(stats)
      library(ellipse)   You Will Need to Download and Install this Library
      #
      #  Set up to read Parametric Bootstrap File
      #
      rc.file <- "c:/ucsd_homework_7/fort.26"
      #
      # The variable fields and their widths
      #
      rc.fields <- c("counter","cong","id","state","dist","lstate","party",
                         "eh1","eh2","name","wnom1","wnom2","wnom1bs","wnom2bs",
                         "se1","se2","r11","r12","r21","r22") 
      #
      #  Note -- For some files the field widths will be (e.g., H108_BS_1000_2.DAT):
      #                  (5,3,5,2,2,7,4,1,1,11,8,7,7,7,7,7,7)
      #
      rc.fieldWidths <- c(4,4,5,2,2,7,4,1,1,11,10,10,10,10,10,10,10,10,10,10)
      #
      # Read the vote data from fwf (FIXED WIDTH FORMAT -- FWF)
      #
      TT <- read.fwf(file=rc.file,widths=rc.fieldWidths,as.is=TRUE,col.names=rc.fields)
      party <- TT[,7]
      state <- TT[,4]
      wnom1 <- TT[,11]
      wnom2 <- TT[,12]
      std1 <- TT[,15]
      std2 <- TT[,16]
      corr12 <- TT[,18]
      #
      nrow <- length(TT[,1])
      ncol <- length(TT[1,])
      #
      plot(TT[,11],TT[,12],type="n",asp=1,
             main="",
             xlab="",
             ylab="",
             xlim=c(-1.0,1.0),ylim=c(-1.0,1.0),font=2)
      points(wnom1[party == 100 & state >= 40 & state <= 51],wnom2[party == 100 & state >= 40 & state <= 51],pch='S',col="red")
      points(wnom1[party == 100 & state == 53],wnom2[party == 100 & state == 53],pch='S',col="red")
      points(wnom1[party == 100 & state == 54],wnom2[party == 100 & state == 54],pch='S',col="red")
      points(wnom1[party == 100 & (state < 40 | state > 54)],wnom2[party == 100 & (state < 40 | state > 54)],pch='D',col="red")
      points(wnom1[party == 100 & state == 52],wnom2[party == 100 & state == 52],pch='D',col="red")
      points(wnom1[party == 200],wnom2[party == 200],pch='R',col="blue")
      # Main title
      mtext("104th Senate From W-NOMINATE\nWith Bootstrapped Standard Errors",side=3,line=1.50,cex=1.2,font=2)
      # x-axis title
      mtext("Liberal - Conservative",side=1,line=2.75,cex=1.2)
      # y-axis title
      mtext("Social/Lifestyle Issues",side=2,line=2.5,cex=1.2)
      #
      #
      #  This code does the cross-hairs for the standard errors.  If the
      #       correlation is greater than .15 between the two dimensions,
      #       the 95% confidence ellipse is shown
      #
      for (i in 1:nrow) {
      #
      #  These two statements do the cross-hairs
      #
      lines(c(wnom1[i],wnom1[i]),c(wnom2[i]-1.96*std2[i],wnom2[i]+1.96*std2[i]),col="gray")
      lines(c(wnom1[i]-1.96*std1[i],wnom1[i]+1.96*std1[i]),c(wnom2[i],wnom2[i]),col="gray")
      #
      #  This if statement does the ellipse
      #
        if (abs(corr12[i]) > .15){
           lines(ellipse(x=corr12[i],scale=c(std1[i],std2[i]),
           centre=c(wnom1[i],wnom2[i])),
           col="gray")
        }
      }
      #
      Run this program and turn in the plot. It should look similar to this:



    5. FORT.56 are the average Optimal Classification coordinates generated by running the Optimal Classification Program on every bootstrap draw. The file will look something like this:
      
         1 1049990999 0USA     10000CLINTON       -0.3160   -0.7979   -0.8467   -0.2906    0.5711    0.5775    0.1089    0.2070    1.0000   -0.8322   -0.8322    1.0000
         2 1041470541 0ALABAMA 10000HEFLIN        -0.2219    0.0278    0.0011   -0.2066    0.3139    0.3403    0.1975    0.2220    1.0000   -0.9512   -0.9512    1.0000
         3 1049465941 0ALABAMA 20000SHELBY         0.6878   -0.0150    0.7162   -0.1751    0.0633    0.1796    0.0529    0.0582    1.0000   -0.3841   -0.3841    1.0000
         4 1041490781 0ALASKA  20000MURKOWSKI      0.7285   -0.0996    0.7283   -0.0236    0.0781    0.0929    0.0741    0.0445    1.0000   -0.5790   -0.5790    1.0000
         5 1041210981 0ALASKA  20000STEVENS        0.6308   -0.1836    0.6209   -0.2080    0.0682    0.0844    0.0639    0.0763    1.0000   -0.4706   -0.4706    1.0000
                     etc   etc   etc
        98 1044930873 0WASHING 10000MURRAY        -0.8772   -0.0015   -0.9068   -0.1126    0.0506    0.1520    0.0378    0.0919    1.0000    0.0408    0.0408    1.0000
        99 104 136656 0WEST VI 10000BYRD, ROBER   -0.6919   -0.5972   -0.6128   -0.3697    0.0959    0.2762    0.0448    0.1300    1.0000   -0.7018   -0.7018    1.0000
       100 1041492256 0WEST VI 10000ROCKEFELLER   -0.7673   -0.0990   -0.7713   -0.1318    0.0585    0.0952    0.0554    0.0842    1.0000    0.2288    0.2288    1.0000
       101 1044930925 0WISCONS 10000FEINGOLD      -0.9161    0.3985   -0.8439    0.3498    0.0901    0.1056    0.0457    0.0875    1.0000    0.7287    0.7287    1.0000
       102 1041570325 0WISCONS 10000KOHL          -0.6915    0.2468   -0.7519    0.4653    0.1107    0.2343    0.0859    0.0404    1.0000   -0.2221   -0.2221    1.0000
       103 1041471068 0WYOMING 20000SIMPSON        0.5253   -0.1192    0.5113   -0.0559    0.0463    0.1029    0.0417    0.0743    1.0000   -0.3008   -0.3008    1.0000
       104 1041563368 0WYOMING 20000THOMAS         0.7193    0.1231    0.7215    0.1730    0.0458    0.1302    0.0434    0.1130    1.0000   -0.4295   -0.4295    1.0000
      The average legislator coordinates are are the third and fourth columns after the name of the legislator (shown in red). For example, former President Clinton's coordinates are -0.8467 and -0.2906.

      Use R to plot the legislators in two dimensions from the FORT.56 file. Use "D" for Non-Southern Democrats, "S" for Southern Democrats, "R" for Republicans, and "P" for President Clinton. This graph should be in the same format as the one you did for question 2.f of Homework 5.

    6. Write an Epsilon keyboard macro as a text file that combines the legislator coordinates from FORT.26 with those from FORT.56. Assume that the macro begins with FORT.26 in the top window, FORT.56 in the second window, and the combined file in the third window (see question 1.a of Homework 5). Leave the header on each record. Turn in a listing of the macro and a neatly formatted listing of the file.

    7. Let A be the matrix of legislator coordinates from FORT.26 after subtracting off the column means, and let B be the matrix of legislator coordinates from FORT.56 after subtracting off the column means. Note that subtracting off the column means of both matrices centers both at the origin, (0.0, 0.0). Solve for the orthogonal procrustes rotation matrix, T, for B. Namely, we want to minimize:

      L(T) = tr(A - BT)(A - BT)'

      The solution is:

      T = VU' where

      A'B = ULV'

      where ULV' is the Singular Value Decomposition of A'B (see Borg and Groenen, pp. 430-432).

      In R you can perform the decompostion with the svd command. For example:

      C <- t(A)%*%B
      svddecomp <- svd(C)


      svddecomp$u has the matrix U
      svddecomp$v has the matrix V
      svddecomp$d has the diagonal of L

      Note that you can check your work as we discussed in class by doing the following:

      D <- diag(svddecomp$d)
      U <- svddecomp$u
      V <- svddecomp$v
      ABCHECK <- U%*%D%*%t(V)
      errorcheck <- sum((C-ABCHECK)^2)


      Solve for T and turn in a neatly formatted listing. Compute the Pearson r-squares between the corresponding columns of A and B before and after rotating B.

  2. In this problem we are going to use Simon Jackman's Bayesian MCMC Quadratic-Normal scaling program IDEAL.

    Go to the IDEAL beta website and download rollcall_0.3.3.zip. (If the site is down rollcall_0.3.3.zip is here.) Install the package from the Install Package From Local Zip File in the Packages drop-down menu in R.

    Download the R programs below along with the Rdata file for the 104th Senate roll calls and put them in the same directory:

    idealKeith.r -- Program to run IDEAL on the 104th Senate.
    idealKeith.r looks like this:
    #
    #  idealKeith.r -- Implements Simon's IDEAL in R
    #
    rm(list=ls(all=TRUE))
    #
    library(rollcall)
    load("C:/ucsd_homework_7/s104.Rdata")
    rc <- rollcall(s104)
    csts <- constrain.legis(rc,x=list("KENNEDY, ED"=-1, "HELMS"=1),d=1)  This Sets up the Constraints
    kpideal <- ideal(rc, priors=csts, startvals=csts, store.item=TRUE)
    sumkpideal <- summary(kpideal,include.beta=TRUE)
    write.table(sumkpideal$x.quant,"c:/ucsd_homework_7/tab_simon2_104a.txt")
    write.table(sumkpideal$beta.quant,"c:/ucsd_homework_7/tab_simon3_104a.txt")
    #
    The tab_simon2_104a.txt file has the esimated legislator ideal points in a format similar to those from MCMCPack that you estimated for question 1 of Homework 5. The file should look something like this:
    "Posterior Mean" "2.5%" "97.5%"
    "HEFLIN" -0.260539161797188 -0.324277175440612 -0.199695576187364
    "SHELBY" 1.23019206655678 1.06776517589545 1.36349750197643
    "MURKOWSKI" 1.63087942782254 1.40849022832179 1.84561135364064
    "STEVENS" 1.04610545982508 0.896182928917575 1.17542884158099
    "KYL" 2.37619418505450 2.08958357054717 2.75349078670169
           etc etc etc
    "ROCKEFELLER" -1.10508436917944 -1.25041868040856 -0.982633707475099
    "FEINGOLD" -1.03161420524957 -1.14931288681370 -0.923294768025014
    "KOHL" -0.792982659833952 -0.884197325699718 -0.720465242777936
    "SIMPSON" 0.912732416007879 0.815117422171576 1.02451977202073
    "THOMAS" 1.76901626257382 1.58540354678476 1.96376294419763
    1. Write a macro similar to the one you used in question 1.a. and 1.b. of Homework 5 to make a nicely formatted file of the legislator coordinates. Turn in a copy of this macro.

    2. Replicate question 1.c. of Homework 5. Write an R program that graphs the rank ordering from Optimal Classification (horizontal axis) against the IDEAL medians (vertical axis). Label the axes appropriately and label a few of the Senators including Campbell (D-CO) and Campbell (R-CO).

    3. Report the correlation between the OC rank ordering and the IDEAL medians.

    4. Use Epsilon to combine the file you created in (a) with that from question 1.a. and 1.b. of Homework 5 and include in that file the 2.5% and 97.5% quantiles corresponding to the medians of both procedures. Report the Pearson correlation between the lengths of the "confidence intervals" for the two Bayesian procedures and also report the Pearson correlation between the IDEAL medians and MCMCPack medians.

    5. (20 Points Extra Credit) Write an R program that reads the above file and produces a two dimensional plot of the legislator ideal points where the horizontal dimension coordinates are the IDEAL medians and the vertical dimension coordinates are the MCMCPack medians. Put cross-hairs through the points with lengths equal to the corresponding distance between the 2.5% and 97.5% quantiles. Plot Northern Democrats with "D" tokens, Southern Democrats with "S" tokens, and Republicans with "R" tokens. Note that you might want to divide all of the MCMCPack quantile values by 2 because IDEAL has Kennedy/Helms at -1/+1 and MCMCPack has Kennedy/Helms at -2/+2.

  3. In this problem we will continue our comparison of Optimal Classification, IDEAL, and MCMCPack using the 102nd Senate. Download these data files:

    Sen102kh.ord -- ASCII Data file for 102nd Senate (used with PERFL.EXE)
    Sen102kh.dta -- Stata Data file for 102nd Senate (used with keith2.r)
    S102.Rdata -- IDEAL Data file for 102nd Senate (used with idealKeith.r)

    and place them in the appropriate directories where you have their associated programs.

    1. Run Optimal Classification (PERFL.EXE) in one dimension on SEN102KH.ORD. There were 550 roll calls in the 102nd Senate. Note that the Optimal Classification Program Page has detailed instructions on how to set up PERFSTRT.DAT. Turn in a copy of PERFSTRT.DAT and PERF21.DAT.

    2. Run keith2.r to generate sen102kh.rda and then run keithMCMC.r to get the Senator medians and 2.5% and 97.5% quantiles. Combine the rank ordering from Optimal Classification with the tab2.txt output from MCMCPack and graph the rank ordering (horizontal axis) against the Bayesian MCMC medians (vertical axis) as you did for Question 2.c of Homework 5.

    3. Report the Pearson correlation between the OC rank ordering and the MCMCPack medians.

    4. Graph the rank ordering from Optimal Classification (horizontal axis) against the IDEAL medians (vertical axis).

    5. Report the correlation between the OC rank ordering and the IDEAL medians.

    6. Report the Pearson correlation between the lengths of the "confidence intervals" (2.5%, 97.5%) for the two Bayesian procedures and also report the Pearson correlation between the IDEAL medians and MCMCPack medians.

  4. In this problem we are going to partially replicate Problem 5 of Homework 4 using the 104th Senate -- SEN104KH.ORD. Place HOUSYM3.EXE, SEN104KH.ORD, and SYMSTRT3.DAT in the same directory. Use Epsilon to change SYMSTRT3.DAT so that HOUSYM3.EXE reads SEN104KH.ORD and writes SEN104.DAT. Note that there are 919 roll calls in SEN104KH.ORD.

    1. Turn in a copy of the HOUSYM3.DAT output file.

    2. Turn in a plot of the eigenvalues. Use R to do the plot.

    3. Use Epsilon to enter the following commands on top of your agreement score output file:
      
      TORSCA
      PRE-ITERATIONS=3
      DIMMAX=3,DIMMIN=1
      COORDINATES=ROTATE
      ITERATIONS=50
      REGRESSION=DESCENDING
      DATA,LOWERHALFMATRIX,DIAGONAL=PRESENT,CUTOFF=.01
      104TH U.S. SENATE AGREEMENT SCORES
      104  1  1
      (36X,104F4.0)
      *********agreement score file*********
      COMPUTE
      STOP
      Be sure to put the COMPUTE and STOP lines on the bottom of the agreement score file.

      Run KYST on this file and report the STRESS values for one, two, and three dimensions.

    4. Use Epsilon to combine the one-dimensional KYST coordinates with the OC rank orderings, the MCMCPack medians, and the IDEAL medians with the headers on the file. Turn in a neatly formatted listing of this file. Compute the 4 by 4 matrix of Pearson correlations and turn in a neatly formatted and clearly labeled table of these correlations.

    5. Any thoughts about what the above table of correlations tells us about this enterprise as a cumulative science?