Multiple Regression
Exercises
Conduct all tests at 5%
significance
18.1 A developer who specializes in summer cottage properties is
considering purchasing a large tract of land adjoining a lake. The current owner
of the tract has already subdivided the land into separate building lots and
has prepared the lots by removing some of the trees. The developer wants to
forecast the value of each lot. From previous experience, she knows that the
most important factors affecting the price of the lot are size, number of
mature trees, and distance to the lake. From a nearby area, she gathers the
relevant data for 60 recently sold lots. These data are stored in file XR 18-01. (Column 1 = price in thousands of
dollars; column 2 = lot size in thousands of square feet; column 3 = number of
mature trees; column 4 = distance to the lake in feet.) A multiple regression
analysis was performed.
a) What is the standard error of estimate?
Interpret its value.
b) What is the coefficient of determination?
What does this statistic tell you?
c) What is the coefficient of determination,
adjusted for degrees of freedom? Why does this value differ from the
coefficient of determination? What does this tell you about the model?
d) Test the overall validity of the model. What
does the p-value of the test statistic tell you?
e) Interpret each of the coefficients.
f) Test to determine whether each of the
independent variables is linearly related to the price of the lot.
18.6 The admissions officer of a university is trying to develop a
formal system of deciding which students to admit to the university. She
believes that determinants of success include the standard variables-high
school grades and SAT scores. However, she also believes that students who have
participated in extracurricular activities are more likely to succeed than
those who have not. To investigate the issue, she randomly sampled 100
fourth-year students and recorded the following variables.
1. GPA for the first 3
years at the university (range: 0 to 12)
2. GPA from high school
(range: 0 to 12) SAT score (range: 200 to 800)
3. Number of hours on
average spent per week in organized extracurricular activities in the last year
of high school
The data are stored in
columns 1 to 4 of file XR 18-06.
a) Develop a model that
helps the admissions officer decide which students to admit, and use the
computer to generate the usual statistics.
b) What is the standard
error of estimate? What does this statistic tell you?
c) What is the
coefficient of determination? Interpret its value.
d) What is the
coefficient of determination, adjusted for degrees of freedom? Interpret its
value.
e) Test the overall validity
of the model. What does the p-value of the test statistic tell you?
f) Interpret each of the
coefficients.
g) Test to determine
whether each of the independent variables is linearly related to the dependent
variable.
h) Predict with 95% confidence
the GPA for the first 3 years of university for a student whose high school GPA
is 10, whose SAT score is 600, and who worked an average of 2 hours per week on
organized extracurricular activities in the last year of high school.
i) Estimate with 90%
confidence the mean GPA for the first 3 years of university for all students
whose high school GPA is 8, whose SAT score is 550, and who worked an average
of IO hours per week on organized extracurricular activities in the last year
of high school.
Solutions
18.1
a) The standard error of estimate is
= 40.24. It is the standard deviation of the error variable.
b) The coefficient
of determination is
= .2425; 24.25% of the variation in prices is explained by
the model.
c) The coefficient
of determination adjusted for degrees of freedom is .2019. It differs from
because it includes an adjustment for the number of
independent variables.
d) ![]()
![]()
![]()
0
At least one
is not equal to zero
F = MSR/MSE
Conclusion: F = 5.97, p-value = .0013. There is enough
evidence to conclude that the model is valid.
e)
= .700; for each addition thousand square feet the
price on average increases by .700 thousand dollars provided that the other
variables remain constant.
= .679; for each addition tree the price on average
increases by .679 thousand dollars provided that the other variables remain
constant.
= -.378; for each addition foot from the lake the
price on average decreases by .378 thousand dollars provided that the other
variables remain constant.
f) ![]()
0
![]()
![]()
0
![]()
Lot size: t = 1.25, p-value = .2156
Trees: t = 2.96, p-value = .0045
Distance: t = -1.94, p-value = .0577
Conclusions: At the 5% significance level only the number of
trees is linearly related to price.
18.6
a)
Excel
Printout
|
SUMMARY OUTPUT |
|
|
|
|
|
|
Regression
Statistics |
|
|
|
|
|
|
Multiple R |
0.5369 |
|
|
|
|
|
R Square |
0.2882 |
|
|
|
|
|
Adjusted R Square |
0.2660 |
|
|
|
|
|
Standard Error |
2.03 |
|
|
|
|
|
Observations |
100 |
|
|
|
|
|
ANOVA |
|
|
|
|
|
|
|
df |
SS |
MS |
F |
Significance F |
|
Regression |
3 |
160.2 |
53.41 |
12.96 |
0.0000 |
|
Residual |
96 |
395.7 |
4.12 |
|
|
|
Total |
99 |
555.9 |
|
|
|
|
|
|
|
|
|
|
|
|
Coefficients |
Standard Error |
t Stat |
P-value |
|
|
Intercept |
0.721 |
1.87 |
0.39 |
0.7006 |
|
|
HS GPA |
0.611 |
0.101 |
6.06 |
0.0000 |
|
|
SAT |
0.0027 |
0.0029 |
0.94 |
0.3482 |
|
|
Activities |
0.0463 |
0.0640 |
0.72 |
0.4720 |
|
b) The standard error of estimate is
= 2.03. It is the standard deviation of the error variable.
c) The coefficient of determination is
= .2882; 28.82% of the variation in university GPAs is
explained by the model.
d) The coefficient of determination adjusted for degrees of
freedom is .2660.
e) ![]()
![]()
![]()
0
At least one
is not equal to zero
F = MSR/MSE
Conclusion: F = 12.96, p-value = 0. There is enough evidence
to conclude that the model is valid.
f)
= .611; for each additional point of high school GPA
university GPA increases on average by .611 provided that the other variables
remain constant.
= .0027; for
each additional point of SAT university GPA increases on average by .0027
provided that the other variables remain constant.
= .0463; for each additional hour of activities
university GPA increases on average by .0463 provided that the other variables
remain constant.
g) ![]()
0
![]()
![]()
0
![]()
High school GPA: t = 6.06, p-value = 0
SAT: t = .94, p-value = .3482
Activities: t = .72, p-value = .4720
Conclusions: Only high school GPA is linearly related to
university GPA.
h)
Excel
Printout
|
0.95 |
Prediction Interval |
|
|
|
|
|
|
Predicted value = |
8.55 |
|
|
Lower limit = |
4.45 |
|
|
Upper limit = |
12.65 |
|
We predict that the student's GPA will fall between 4.45 and
12.00 (12 is the maximum).
i)
Excel
Printout
|
0.9 |
Confidence Interval Estimate |
||
|
|
|
|
|
|
Lower limit = |
6.90 |
|
|
|
Upper limit = |
8.22 |
|
|
We estimate the average student's GPA will fall between 6.90
and 8.22.