Multiple Regression Exercises

 

Conduct all tests at 5% significance

 

18.1 A developer who specializes in summer cottage properties is considering purchasing a large tract of land adjoining a lake. The current owner of the tract has already subdivided the land into separate building lots and has prepared the lots by removing some of the trees. The developer wants to forecast the value of each lot. From previous experience, she knows that the most important factors affecting the price of the lot are size, number of mature trees, and distance to the lake. From a nearby area, she gathers the relevant data for 60 recently sold lots. These data are stored in file XR 18-01. (Column 1 = price in thousands of dollars; column 2 = lot size in thousands of square feet; column 3 = number of mature trees; column 4 = distance to the lake in feet.) A multiple regression analysis was performed.

 

a)  What is the standard error of estimate? Interpret its value.

 

b)  What is the coefficient of determination? What does this statistic tell you?

 

c)  What is the coefficient of determination, adjusted for degrees of freedom? Why does this value differ from the coefficient of determination? What does this tell you about the model?

 

d)  Test the overall validity of the model. What does the p-value of the test statistic tell you?

 

e)  Interpret each of the coefficients.

 

f)  Test to determine whether each of the independent variables is linearly related to the price of the lot.

 

18.6 The admissions officer of a university is trying to develop a formal system of deciding which students to admit to the university. She believes that determinants of success include the standard variables-high school grades and SAT scores. However, she also believes that students who have participated in extracurricular activities are more likely to succeed than those who have not. To investigate the issue, she randomly sampled 100 fourth-year students and recorded the following variables.

 

1. GPA for the first 3 years at the university (range: 0 to 12)

2. GPA from high school (range: 0 to 12) SAT score (range: 200 to 800)

3. Number of hours on average spent per week in organized extracurricular activities in the last year of high school

 

The data are stored in columns 1 to 4 of file XR 18-06.

 

a) Develop a model that helps the admissions officer decide which students to admit, and use the computer to generate the usual statistics.

 

b) What is the standard error of estimate? What does this statistic tell you?

 

c) What is the coefficient of determination? Interpret its value.

 

d) What is the coefficient of determination, adjusted for degrees of freedom? Interpret its value.

 

e) Test the overall validity of the model. What does the p-value of the test statistic tell you?

 

f) Interpret each of the coefficients.

 

g) Test to determine whether each of the independent variables is linearly related to the dependent variable.

 

h) Predict with 95% confidence the GPA for the first 3 years of university for a student whose high school GPA is 10, whose SAT score is 600, and who worked an average of 2 hours per week on organized extracurricular activities in the last year of high school.

 

i) Estimate with 90% confidence the mean GPA for the first 3 years of university for all students whose high school GPA is 8, whose SAT score is 550, and who worked an average of IO hours per week on organized extracurricular activities in the last year of high school.

 

Solutions

 

18.1

a)  The standard error of estimate is = 40.24. It is the standard deviation of the error variable.

 

b)  The coefficient of determination is = .2425; 24.25% of the variation in prices is explained by the model.

 

c)  The coefficient of determination adjusted for degrees of freedom is .2019. It differs from because it includes an adjustment for the number of independent variables.

 

d)         0

            At least one is not equal to zero

            F = MSR/MSE

Conclusion: F = 5.97, p-value = .0013. There is enough evidence to conclude that the model is valid.

 

e)

= .700; for each addition thousand square feet the price on average increases by .700 thousand dollars provided that the other variables remain constant.

 

= .679; for each addition tree the price on average increases by .679 thousand dollars provided that the other variables remain constant.

 

= -.378; for each addition foot from the lake the price on average decreases by .378 thousand dollars provided that the other variables remain constant.

 

f)          0

            0

           

Lot size: t = 1.25, p-value = .2156

 

Trees: t = 2.96, p-value = .0045

 

Distance: t = -1.94, p-value = .0577

 

Conclusions: At the 5% significance level only the number of trees is linearly related to price.

 

18.6

a)

Excel Printout

SUMMARY OUTPUT

 

 

 

 

 

Regression Statistics

 

 

 

 

Multiple R

0.5369

 

 

 

 

R Square

0.2882

 

 

 

 

Adjusted R Square

0.2660

 

 

 

 

Standard Error

2.03

 

 

 

 

Observations

100

 

 

 

 

ANOVA

 

 

 

 

 

 

df

SS

MS

F

Significance F

Regression

3

160.2

53.41

12.96

0.0000

Residual

96

395.7

4.12

 

 

Total

99

555.9

 

 

 

 

 

 

 

 

 

 

Coefficients

Standard Error

t Stat

P-value

 

Intercept

0.721

1.87

0.39

0.7006

 

HS GPA

0.611

0.101

6.06

0.0000

 

SAT

0.0027

0.0029

0.94

0.3482

 

Activities

0.0463

0.0640

0.72

0.4720

 

 

b) The standard error of estimate is = 2.03. It is the standard deviation of the error variable.

 

c) The coefficient of determination is = .2882; 28.82% of the variation in university GPAs is explained by the model.

 

d) The coefficient of determination adjusted for degrees of freedom is .2660.

 

e)          0

            At least one is not equal to zero

            F = MSR/MSE

Conclusion: F = 12.96, p-value = 0. There is enough evidence to conclude that the model is valid.

 

f)

= .611; for each additional point of high school GPA university GPA increases on average by .611 provided that the other variables remain constant.

 

=  .0027; for each additional point of SAT university GPA increases on average by .0027 provided that the other variables remain constant.

 

= .0463; for each additional hour of activities university GPA increases on average by .0463 provided that the other variables remain constant.

 

g)         0

            0

           

High school GPA: t = 6.06, p-value = 0

SAT: t = .94, p-value = .3482

Activities: t = .72, p-value = .4720

Conclusions: Only high school GPA is linearly related to university GPA.

 

h)

Excel Printout

0.95

Prediction Interval

 

 

 

Predicted value =

8.55

Lower limit =

4.45

Upper limit =

12.65

We predict that the student's GPA will fall between 4.45 and 12.00 (12 is the maximum).

 

i)

Excel Printout

0.9

Confidence Interval Estimate

 

 

 

 

Lower limit =

6.90

 

Upper limit =

8.22

 

We estimate the average student's GPA will fall between 6.90 and 8.22.