Statistics homework help. Stat 423 Section 02 Spring 2020 Name ______________________________________
Exam 3 (100 points) ID Number __________________________
Part I. Workout Problems. Show solution in support of your answers. Unsupported answers will not receive full
credit. (61 points)
1. A 2!”# fractional factorial involving factors A, B, C, D, E and F is to be run. Practitioners have these two sets of
generators in mind:
Design 1 Generators: E=ABD and F=ACD
Design 2 Generators: E=ABCD and F=ABD
a. Consider Design 1. Which treatments in this experiment will have both factors A and B at their high (+)
levels? [6 pts]
b. Consider Design 1. Derive its defining relation and determine its resolution. [8 pts]
c. The defining relation for Design 2 is I=CEF=ABDF=ABCDE. Which design (1 or 2) is better? Explain briefly
and give at least one reason for your choice. [3 pts]
2. A 2$”% fractional factorial was conducted to study the effects of four factors on the bond strength of an
integrated circuit mounted on metallized glass substrate. The four factors (and their levels) that engineers
identified as potentially important determiners of bond strength are listed in the table below.
Factor Levels
A – Adhesive Type D2A (−) vs. H-1-E (+)
B – Conductor Material Copper (−) vs. Nickel (+)
C – Cure Time at 90°C 90 min (−) vs. 120 min (+)
D – Deposition Material Tin (−) vs. Silver (+)
Let �& = main effect of A, �’= main effect of B, �( = main effect of C, �) = main effect of D, and � = interaction
effect. Summary statistics and the results of the Yates algorithm for computing fitted effects are given below.
Treatment Replication
Sample
Variance ��
Sample
Mean �+
Yates Algorithm
Cycle 1 Cycle 2 Cycle 3 Fitted Effect
(1) 5 2.452 73.48 157.36 314.54 650.84 81.355
ad 5 4.233 83.88 157.18 336.30 7.84 0.980
bd 5 0.647 81.58 166.60 4.42 2.92 0.365
ab 5 26.711 75.60 169.70 3.42 2.08 0.260
cd 5 0.503 87.06 10.40 −0.18 21.76 2.720
ac 5 8.562 79.54 −5.98 3.10 −1.00 −0.125
bc 5 1.982 79.38 −7.52 −16.38 3.28 0.410
abcd 5 3.977 90.32 10.94 18.46 34.84 4.355
a. The replications and the sample variances of the 8 treatment combinations are given in the 2nd and 3rd
columns, respectively, in the table above. Compute �(0.05) for judging if a fitted effect is statistically
significant at the � = 0.05 level. Note that the sum of the variances is 49.067. [8 pts]
b. The generator and defining relation were D=ABC and I=ABCD, respectively. If you have no answer in (a), use
�(�. ��) = �. ���.
i. Based on your answer in (a), is the fitted effect 0.980 statistically significant? [2 pts]
Select one: NO YES
ii. What sum of effects does the fitted effect 0.980 estimate? Your answer should be a sum of
subscripted/superscripted Greek letters (e.g., �# + �##
+,). [4 pts]
3. The diameter � of a tree at breast height (in cm, relatively easy to measure) is used to predict the height � of a
tree (in m, difficult to measure). Summary data on � = 36 white spruce trees (in British Columbia) are given
below.
B� = 655.1, B�# = 12711.47, B� = 644.7, B�# = 11824.45,
B�� = 12112.34, �– = 790.4697, ��� = �.. = 278.9475, �̅= 18.1972, �G = 17.9083.
a. Do some calculations to show that the least-squares line is �H = 9.1468 + 0.4815�. [10 pts]
b. Compute the sample correlation � between � and �. Give a quick interpretation. [6 pts]
Interpretation:
c. Construct an interval with 95% confidence for the height of a new spruce tree with a breast height diameter �
= 19 cm. Plug in numbers in a formula and do not simplify. Use � = 36, �̅= 18.1972, �– = 790.4697,
�# = ��� = 2.815. [8 pts]
Problem 3 (continued).
d. A scatterplot of the data and ��� values for the linear and quadratic model fits are given below. Also, the tota
l sum of squares for either model is ��� = 1824.45. Which of the two models provides a better description o
f the data? Explain briefly. In your explanation, use both graphical AND numeric results [6 pts]
Part II. Multiple Choice. Circle the letter of the correct/best answer. (39 points)
1. Which of the following statements is NOT true?
A. The simple linear regression model is � = �/ + �%� + � where the � is a random variable that is normally
distributed with mean 0 and variance �#.
B. In simple linear regression, the independent variable � is also referred to as the predictor or explanatory
variable.
C. The goal of least-squares regression is to find the curve that maximizes the sum of the squared distances
between the curve and the data points.
D. A first step in a regression analysis involving two variables is to construct a scatter plot.
2. In fitting � = �/ + �%� + � through data, (1.7, 2.5) is a 90% confidence interval for �%. What is a 90%
confidence interval for the mean change in � when we reduce � by 0.65.
A. (−1.625, −1.105)
B. (1.05, 1.85)
C. (1.105, 1.625)
D. (2.35, 3.15)
3. Which of the following is/are TRUE about the correlation coefficient � between � and �?
A. For the simple linear regression, 100% × �# = �# where �# is the coefficient of determination (in %).
B. A correlation of � = −0.87 is weaker than a correlation of � = 0.25.
C. The correlation � is a measure of the strength of the linear relationship between � and �.
D. If � = −0.1, and we convert � (in inches) to centimeters (1 in = 2.54 cm), then the correlation becomes
2.54 × (−0.1) = −0.254.
E. Both (A) and (C).
Model ���
� = �/ + �%� + � 95.703
� = �/ + �%� + �#�# + � 63.007
5 10 15 20 25 30
8 10 12 14 16 18 20 22
Breast-Height Diameter x
Height y
4. Is � = �/ ⋅ �%
0 intrinsically linear? If yes, what is appropriate transformation to obtain a linear model?
Recall: log(��) = log(�) + log(�) , log(�1) = � ⋅ log(�)
A. No.
B. Yes, log(�) = log(�/) + log(�%) ⋅ �
C. Yes, log(�) = log(�/) + �% ⋅ log (�)
D. Yes, log(�) = log(�/) + �% ⋅ �
For Problems 5 to 8: A study investigated the effects of �% = Seal Temperature, �# = Cooling Bar Temperature, and
�2 = % Polyethylene Additive on the seal strength �. The three models in column of the table below were fit to the
data.
There were � = 20 observations, and the total sum of squares (for all 3 models) is ��� = 82.17 (total df = 19).
5. What is ��� for Model (1)?
A. 30.96
B. 51.21
C. 21.36
D. 60.81
6. What is �34′
# for Model (2)?
A. 49.42%
B. 76.66%
C. 23.34%
D. 84.03%
7. What is the F statistic for testing �/:{�% = �# = ⋯ = �5 = 0} versus �3: {�/ is false.} with model (3).
A. 6.59
B. 9.69
C. 3.23
D. 5.36
8. In the fit of Model (2), we get �^
6 = −0.5 and �78! = 0.3552 and find that the P-value is 0.1827 for testing
�/: �6 = 0 versus �3: �6 ≠ 0. What are the � test statistic and conclusion at � = 0.10 significance level?
A. � = −1.41. There is NO significant interaction between �% and �2.
B. � = 1.41. The predictor �6 has NO significant effect on the response �.
C. � = −0.84. There is NO significant interaction between �% and �2.
D. � = −1.41. There is significant interaction between �% and �2.
Model �� ����
� ���
(1) � = �/ + �%�% + �#�# + �2�2 + � 37.68% 25.99% ?
(2) � = �/ + �%�% + �2�2 + �$�%
# + �<�#
# + �!�2
# + �6�%�2 + �
84.03% ? 13.1231
(3) � = �/ + �%�% + �#�# + �2�2 + �$�%
# + �<�#
# + �!�2
#
+ �=�%�# + �6�%�2 + �5�#�2 + � 85.57% 72.58% 11.8593
9. Which of the following is not true about 2>”? fractional factorial studies?
A. The loss of information and ambiguity (confounding) can be held to a minimum by careful planning and
wise analysis.
B. A loss of information is usually expected because we are unable to observe responses at all of the 2>
factor combinations.
C. If two effects are aliased or confounded together, it means that we can discuss their significance together
but not apart from each other.
D. None of the above.
10. A fitted multiple regression model is �H = 10 − 4�% + 3�#. If �% is decreased by 2, while holding �# fixed, then
then we can expect �
A. to increase by 8
B. to decrease by 6
C. to increase by 6
D. to decrease by 8
E. remain the same
11. Suppose that the least-squares line is �H = −2.12 + 15.75�. If the � test statistic for testing �/: �% = 0
against �3: �% ≠ 0 is � = 2.1 (from the ANOVA table), what is the � test statistic for testing the same
hypotheses?
A. � = 1.45
B. � = −4.41
C. � = −1.45
D. � = 4.41
12. Which of the following statements is true?
A. Model 1 with more predictor terms may not necessarily be a better than Model 2 with fewer predictor
terms even though Model 1’s coefficient of multiple determination �# is larger.
B. To balance the cost of using more parameters against the gain in the coefficient of multiple determination
�#, many statisticians use �34′
# = {the adjusted �#}.
C. An objective of regression analysis is to find a model that is simple (relatively few parameters) and provides
a good fit to the data.
D. All of the above.
13. A study investigated the effects of three explanatory variables �%, �#, and �2 on the response �. The model � =
�/ + �%�% + �#�# + �2�2 + � provided a good �# value. Which of the following is NOT appropriate in assessing
the (statistical) significance of the relationship between �2 and �?
A. a � test of �/: �2 = 0 versus �3: �2 ≠ 0
B. a prediction interval
C. a confidence interval for �2
D. the sample correlation between �2 and �
E. a comparison of �34′
# values for � = �/ + �%�% + �#�# + �2�2 + � and � = �/ + �%�% + �#�# + �