45-734 Probability and Statistics II
(4th Mini AY 1997-98 Flex-Mode and Flex-Time)
Assignment #5: Due 23 April 1998
In this first problem we are going to look at voting for President
in the 1992 and 1996 elections. The data are in:
There are 407 observations in the dataset corresponding to the 407
Congressional Districts that were not redistricted between 1993 and 1997
(recall that there are 435 members of the U.S. House of Representatives).
These are the districts created after the 1990 Census and first implemented in
the 1992 elections. Subsequently, various Federal Court rulings invalidated
the district boundaries of 28 districts. This leaves us with 407 districts that
are the same for the 1992 and 1996 Presidential elections. By law,
the population in a congressional district must be as close as possible to
being equal to: (total population of United States)/435. Hence, in the
analyses below -- for purposes of interpreting coefficients --
you can assume that population is uniformly distributed over the 407
The variables in the dataset are: AFRAM, percent African-American
population in the Congressional district; BUSH88, percent voting for
George Bush for President in 1988; BUSH92, percent voting for George
Bush for President in 1992; CLINT92, percent voting for Bill Clinton
for President in 1992; CLINT96, percent voting for Bill Clinton in
1996; DOLE96, percent voting for Bob Dole in 1996;
DUK88, percent voting for Michael Dukakis for President in 1988;
HISP, percent Hispanic; INCOME, median income in the district in
Thousands of 1996 dollars; LCECON103, for members of the 103rd
House (1993-94), a measure of liberalism/conservatism
on economic issues that ranges from -1 (liberal) to +1 (conservative);
LCECON104, liberalism/conservatism on economic issues for members of the
104th House (1995-96); LCSOC103, for members
of the 103rd House, a measure of liberalism/conservatism
on social issues (e.g., abortion, gay marriage, etc.) that ranges from -1 (liberal) to +1
(conservative); LCSOC104, liberalism/conservatism on social issues for
members of the 104th House; PEROT92,
percent voting for Ross Perot for President in 1992; PEROT96, percent
voting for Ross Perot in 1996; REP103, REP104, and REP105 are
indicator variables that equal 1 if the representative of the district is a
Republican, and 0 if the representative is a Democrat; and SOUTH, an
indicator variable that is equal to 1 if the Congressional District is in a
Southern state (the 11 States of the Confederacy plus Kentucky and Oklahoma).
Note that there are three types of variables: 1) the percentage vote for
the various Presidential candidates (BUSH88, BUSH92, CLINT92, CLINT96,
DOLE96, DUK88, PEROT92, PEROT96, these will be our dependent variables);
2) demographic variables for the Congressional District (AFRAM, HISP,
INCOME, SOUTH); and 3) variables measuring personal characteristics
(ideology and party) of
the district's representative in the House of Representatives (LCECON103,
LCECON104, LCSOC103, LCSOC104, REP103, REP104, REP105).
The basic theory we are going to test is that the presidential vote is a
function of demographics, ideology, and political party. It is well known
that in American politics, ceteris paribus, African-Americans,
(non-Cuban) Hispanics, economic liberals, and social liberals tend
to favor candidates of the Democratic party; people with higher incomes,
Southerners, economic conservatives, and social conservatives
tend to favor the Republican party.
Test the basic theory on the 1992 Presidential election.
Run separate regressions for Clinton, Bush, and Perot (use LCECON103 and
LCSOC103 for the ideology variables, do not use party indicator
variables) and interpret the coefficients. Compare the Perot
coefficients with those for Clinton and Bush. What sort of voters did
Perot draw his support from and who did he hurt more -- Clinton or Bush?
What do the relative magnitudes of the ideological coefficients tell you
about American politics?
The data set (courtesy of Dennis Epple):
Test the basic theory on the 1996 Presidential election.
Run separate regressions for Clinton, Dole, and Perot (use LCECON104 and
LCSOC104 for the ideology variables, do not use party indicator
variables) and interpret the coefficients. Compare the coefficents
with those for 1992 (Clinton vs. Clinton, Bush vs. Dole, Perot vs. Perot).
Compare the Perot
coefficients with those for Clinton and Dole. What sort of voters did
Perot draw his support from and who did he hurt more -- Clinton or Dole?
The confrontation between the Congressional Republicans and
President Clinton that shut down the government off and on from November
of 1995 to January 1996 is alleged to have hurt the Republicans politically.
In the 1996 elections the Republican majority in the House was cut from
236 Seats to 228 Seats (218 are needed for control). For this sample of
407 House districts, 20 districts switched from Republican
to Democrat in the 1996 elections and 12 districts switched from Democrat to
Republican. Create indicator variables for congressional
districts that switched parties due to the 1996 elections (one for seats
that switched Republican to Democrat; and one for seats that switched Democrat
to Republican). You can use the indicator variables REP104 and
REP105 to create the party switch indicator variables. Add these
party switch indicator variables to the regressions you ran for part (b)
and interpret the coefficents. Do a Wald test on the party switch
indicator variables constraining one coefficient to equal the negative of
the other coefficient. What does this test tell you?
contains information about condominium prices and
characteristics for an area of central Boston that is much sought after.
(The data were compiled and analyzed by Denise DiPasquale and William
Wheaton in their book Urban Economics and Real Estate Markets.)
The data include sale price (PRICE), floor area (SQFT),
number of bedrooms
(BED), number of bathrooms (BATH), number of stories in
the building (STORY),
distance in feet from the Boston Common (CDIST),
an indicator variable denoting the
availability of parking in the building (PARK), and
denoting street on which the unit is located: MDUM if on
BDUM if on Beacon Street, CDUM if on Commonwealth Ave.
All units not on
one of these streets are on Beacon Hill.
If price is proportional to floor area, explain why the
following two regression equations are equivalent.
Estimate the above regressions. Notice that the coefficients
of the variables are approximately the same for both the regressions.
Which model is better and why? Why is the R2 statistic so
Suppose that you own a parcel of land on Marlborough Street
exactly one-half mile (2640 feet) from the Boston Common and you wish to
construct a condominium building. Suppose that you have
decided to build a condominium with all units having three bedrooms, two
bathrooms, and parking in the building. Suppose that construction cost
per square foot increases with the number of floors according to the
following equation: Cost = 40*STORY + 2*STORY2. Revenue per
square foot per floor is given by PRICE/SQFT. Hence, revenue per square
foot with more than one floor is given by STORY*(PRICE/SQFT). Using the
appropriate regression, how many stories would you build to maximize the
revenue per square foot of developed property? That is, find the number
of stories to maximize:
profit = STORY*(PRICE/SQFT) - (40*STORY + 2*STORY2)
Hint: Substitute the values for the variables given above into
your estimated equation for price per square foot. Note that the only
unknown in the resulting expression will then be the number of stories.
You can use the GENR command of EVIEWS to compute
each possible value of STORY to find the value that maximizes profit.
An easy way to start this process is to define a new variable called
FLOORS. (The data set already has a variable named STORY and you
donít want to
confuse your new variable with STORY!) Let the first observation
value 1, the second two, and so forth. Then plug that into the equation.
Here is an easy way to make