ISSCM 491 Statistics for managers Excel PHstat My Occupational group is Group # 3 Admin Support – everything else should be pretty straight forward in the
ISSCM 491 Statistics for managers Excel PHstat My Occupational group is Group # 3 Admin Support
– everything else should be pretty straight forward in the excel spreadsheet it shows and tells you exactly what to do. – the file for this assignment is hw7.xlsx
Also the Survey Data ISSCM491 file is attached as (if needed) to look at this for this homework assignment (if it is) mentioned inside hw7.xlsx
Townes.HW1.xlsx will show you the The Correct way on how to answer and submit this assignment, so you could take a look at that to help how to format to answer the questions. Homework 7
(This homework has four questions)
Note: The answers to all questions should be provided (written down).
Do not show them on the computer output.
Question 1.
You are interested in determining the relationship between the employee INCOME (Y), and of
the years of WORK (X1), years of EDUCation (X2) and GENDER (X3) (male or female) for the
employees in your occupational group. (You will be using the file that you have created for your
occupational group as part of homework 1). You can use PHStat output to answer these questions.
Suppose you wanted to use all three variables (X1= years of WORK, X2= EDUCation and
X3=GENDER) as the independent variables to predict the INCOME of the employees
in your occupational group.
(a) Using the method of least squares, determine the equation for the estimated
regression line Y = b0 + b1X1 + b2X2 + b3X3
(b) State the meaning of the slope b3 for the GENDER of the employees.
(c) Using the estimated multiple regression equation, calculate the predicted INCOME of a
FEMALE employee with 20 years of WORK experience and 14 years of EDUCation.
(d) If you had used the average INCOME as your estimate, what would be the total sum of
squared errors (SST)?
(e) If you use multiple regression prediction values as your estimate for employee INCOMES,
what would be the squared errors – error sum of squares (SSE)?
(f) How much of the total variation is explained by the years of WORK, EDUCation, and
GENDER (SSR)?
(g) What percentage of the total variation in employee INCOMES is explained by the years
of WORK, EDUCATION, and GENDER (r2Y.123)?
(h) Construct 95% confidence interval for the mean INCOME of the MALE employees
with 20 years WORK experience and 14 years of EDUCation.
(i) In testing Ho that b3=0 against the two sided alternative, should the Null hypothesis
be rejected or accepted at a sinficance level of 0.05?
What does it mean to reject the Null hypothesis (b3=0) in this case?
(j) Based on the hypothesis testing result of (i) above, does the GENDER have a significant
effect on the INCOME of the employees in your occupational group?
(k) Now, compare your conclusion in (j) above with the conclusion you had in your answer
to Question #6 of homework 3. Are your answers consistent?
Page 1 of 6 (Homework 7)
Important:
PHStat REGRESSION procedure, when CIE & PIE is selected, sometimes fails to work.
If that happens, do the following:
(a) Open a new file and copy your data into this new file
(b) Repeat PHStat –> Multiple-Sample Test –> One-Way ANOVA
(c) If the same problem continues
(i) Close and restart your computer
(ii) Open a new file and copy your data to this new file
(iii) Repeat PHStat –> Multiple-Sample Test –> One-Way ANOVA
Question 2.
The marketing manager of StarDisk company wanted to study the effect of price on the
demand for their 1GB USB Flash drives. The company has selected a sample of 11 stores
with equal sales potential each, and used different weekly prices/promotions in each store.
The prices charged (X) and the weekly demand (Y) for the flash drives in these stores
are given below:
1
2
3
4
5
6
7
8
9
10
11
(a)
(b)
(c)
(d)
(e)
(f)
(g)
(h)
(i)
(j)
(k)
(l)
Y
287
288
290
295
300
300
300
299
297
296
295
X
7,5
8,0
8,5
9,0
9,5
10,0
10,5
11,0
11,5
12,0
12,5
Y = Demand
X = Price
Determine the equation for the estimated regression line Y = b0 + b1X1 (Call this Model 1)
State the meaning of the slope b1 for the PRICE.
Is PRICE a significant factor in determining the sales? Use a significance level (a) of 0.05?
What percentage of the total variation in SALES is explained by the PRICE (What is r2)?
Construct 95% confidence interval for the mean SALES of the 1 GB Flasdrives when
the price is $10.
Develop a regression model that includes price (X), and the squard prices (X2) to predict
Y = b0 + b1X1 + b2X12 to predict the weekly sales. Determine the equation for the
estimated regression line. (Call this Model 2)
(Note that you need to create a new column that will have the square prices. That coulmn
will be the second X variable in the table)
Is squared prices (X12) a significant factor in determining the sales? Use a significance
level (a) of 0.05?
Predict the weekly sales for the 1GB Flashdrive for a price of $10 by using Model 2.
Using Model 2, construct 95% confidence interval for the mean SALES of the 1 GB USB Flash
drives when the price is $10.
What percentage of the total variation in SALES is explained by the PRICE and
Squared PRICES (What is r2 equal to?)
Which regression model is more appropriate, Model 1 or Model 2? Why?
What price do you suggest the company charge for 1 GB USB Flash drives? Why?
Page 2 of 6 (Homework 7)
Question 3.
Note this his problem was covered in homework 3 and in Example 1 of class 4.
The CFO of a Healthcare organization wanted to examine whether two of the
hospitals (Hospitals A and B) operated by the organization differed on the average
cost per case. The financial manager has randomly taken a random sample of 6 cases
from hospital A and hospital B and assigned the following costs to each discharge.
Cases
1
2
3
4
5
6
Hospital A Hospital B
4,5
2,0
5,0
6,0
5,5
4,0
3,5
3,5
6,5
4,5
5,0
4,0
We have already shown in class that this problem could be solved by using
(a) Hypothesis testing for the difference between two means
(b) One-Way Analysis of Variance
Your job is to check if regression analysis can also be used to answer this question.
For regression purposes, the observations can be (re)stated as follows:
Cost
4,5
5,0
5,5
3,5
6,5
5,0
2,0
6,0
4,0
3,5
4,5
4,0
Hospital
A
A
A
A
A
A
B
B
B
B
B
B
First, you need to assign dummy variables for hospitals (0=Hospital A, 1=Hospital B) to use
regression to predict the costs.
(a) Determine the equation for the estimated regression line Y = b0 + b1X1, where X1 is
a dummy variable used for the hospitals.
(b) State the meaning of the slope b1 for the dummy variable (used for hospitals).
(c) Is the coefficient b1 significant at significance level (a) of 0.05? What is the P-Value
associated with it?
(d) Using your answer to (c) above, answer the following question:
Is there strong evidence to show that there is a significant cost
difference in the average costs between hospitals A and B?
(e) Is this answer consistent with the answers we have when obtained by using hypothesis
testing and ANOVA approaches covered in Class 4?
Page 3 of 6 (Homework 7)
=3.80-3.85 = -0.05
Question 4.
(This is a nonlinear regression question)
Enterprise Industries produces Fresh, a brand of liquid laundry detergent.
In order to more effectively manage its inventory, the company would like
to better predict demand for Fresh. To develop a prediction model, the
company has gathered data concerning demand for Fresh over the last 30
sales periods (each sales period is considered to be a four-week period).
The demand data are given in Table 1 with the following variables:
Variables:
Y the demand for large size bottle of Fresh (in hundreds of thousands
of bottles) in the sales period
X1 The price (in dollars) of Fresh as offered by Enterprise Industries in
the sales period
X2 The average industry price (in dollars) of competitors’ similar
detergents in the sales period
X3 Enterprise Industries’ advertising expenditure (in hundreds of
thousands of dollars) to promote Fresh in the sales period
The Enterprise Industries believes on theoretical grounds that the single
independent variable X4 = X2-X1 (The “price difference” in the sales
period) adequately describes the effects of X1 and X2 on Y. That is,
perhaps demand for Fresh depends more on how the price for Fresh
compares to competitors’ price than it does on the absolute levels of the
prices for Fresh and other competing detergents. This makes sense since
most consumers must buy a certain amount of detergent no matter what
the price might be.
First, you need to complete column X4 in the Table 1.
I have completed the first two rows.
Questions:
(1) Answer the following questions by using model Y = b0 + b1X4 + b2X3
(a) What is the estimated regression equation?
(b) Interpret the slopes of b1 and b2.
(c) Do the p-values for the independent variables in this model indicate
that these independent variables (X4, X3) are significant?
(d) What percent of the total variation in observed demands is explained
by this model? (What is r2 equal to?)
(2) Consider the demand for Fresh Detergent in the future sales period
when Enterprise Industries’ price for Fresh will be X1 = 3.70,
Table 1
FRESH DETERGENT DATA
Price
X1
3,85
3,75
3,70
3,70
3,60
3,60
3,60
3,80
3,80
3,85
3,90
3,90
3,70
3,75
3,75
3,80
3,70
3,80
3,70
3,80
3,80
3,75
3,70
3,55
3,60
3,65
3,70
3,75
3,80
3,70
IndPrice Demand
X2
Y
3,80
7,38
4,00
8,51
4,30
9,52
3,70
7,50
3,85
9,33
3,80
8,28
3,75
8,75
3,85
7,87
3,65
7,10
4,00
8,00
4,10
7,89
4,00
8,15
4,10
9,10
4,20
8,86
4,10
8,90
4,10
8,87
4,20
9,26
4,30
9,00
4,10
8,75
3,75
7,95
3,75
7,65
3,65
7,27
3,90
8,00
3,65
8,50
4,10
8,75
4,25
9,21
3,65
8,27
3,75
7,67
3,85
7,93
4,25
9,26
X2-X1
X4
-0,05
0,25
AdvExp
X3
5,50
6,75
7,25
5,50
7,00
6,50
6,75
5,25
5,25
6,00
6,50
6,25
7,00
6,90
6,80
6,80
7,10
7,00
6,80
6,50
6,25
6,00
6,50
7,00
6,80
6,80
6,50
5,75
5,80
6,80
of competitors’ similar detergent will be X2 = 3.90 and Enterprise
the average price Industries’ advertising expenditure for Fresh
will be X3 = 6.50.
(a) What is the point prediction of this demand?
(b) Identify the 95 percent confidence interval estimation for the avergae demand.
Page 4 of 6 (Homework 8)
USING PHStat TO OBTAIN SIMPLE REGRESSION
Choose the following:
Check ANOVA and Coefficients Tables
PHStat —> Regression —> Simple Linear Regression… —>
Output
(Fill out the window as shown below)
Data
X Value
Confidence Level
1.652,96
95%
Intermediate Calculations
Sample Size
70
Degrees of Freedom
68
t Value
1,9955
XBar, Sample Mean of X
214,9286
Sum of Squared Differences from
7.606.537
XBar
Standard Error of the Estimate 15,9436
h Statistic
0,0143
Predicted Y (YHat)
214,929
17
For Average Y
Interval Half Width
3,803
Confidence Interval Lower Limit 211,126
Confidence Interval Upper Limit 218,731
18a
For Individual Response Y
Interval Half Width
32,041
Prediction Interval Lower Limit 182,887
Prediction Interval Upper Limit 246,970
Regression Statistics
Multiple R
0,8617
R Square
0,7425
Adjusted R Square
0,7388
Standard Error
15,9436
Observations
70
18b
15
16
14cba
ANOVA
Regression
Residual
Total
Intercept
AREA
df
1
68
69
Coefficients
81,110
0,081
11
SS
49.853,253
17.285,390
67.138,643
Standard Error
9,744
0,006
MS
49.853,253
254,197
F
196,121
Signif F
0,00000
t Stat
8,324
14,004
P-value
0,00000
0,00000
Lower 95%
19a
61,67
0,07
Upper 95%
100,55
0,09
19b
USING PHStat TO OBTAIN MULTIPLE LINEAR REGRESSION
Choose the following:
PHStat —> Regression —> Multiple Regression… —>
Check ANOVA and Coefficients Tables
(Fill out the window as shown below)
Confidence Interval Estimate and Prediction Interval
Example 1-2
Data
Confidence Level
AREA given value
ASSESS given value
0,95
1
1.652,96
137,60
Enter
these
Predicted Y (YHat)
214,906
27
For Average Predicted Y (YHat)
Interval Half Width
3,193
Confidence Interval Lower Limit
211,714
Confidence Interval Upper Limit
218,099
For Individual Response Y
Interval Half Width
26,901
Prediction Interval Lower Limit
188,005
Prediction Interval Upper Limit
241,807
SUMMARY OUTPUT
Regression Statistics
Multiple R
R Square
Adjusted R Square
Standard Error
Observations
28a
28b
Example 1-2
0,9063
0,8213
0,8160
13,3820
70
25
26
24cba
ANOVA
df
2
67
69
Regression
Residual
Total
Intercept
AREA
ASSESS
Coefficients
53,5824
0,0646
0,3961
21
SS
55.140,48
11.998,17
67.138,64
Stand Error
9,6202
0,0057
0,0729
MS
27.570,24
179,077
t Stat
5,5698
11,3225
5,4337
F
153,9574
Significance F
P-value
0,0000
0,0000
0,0000
Lower 99% Upper 99%
28,0771
79,0877
0,0495
0,0798
0,2028
0,5893
29a
0,0000
29b
nter
ese
Homework 1
Note that each student will prepare and submit the homework assignments individually.
It is important that you complete this first assignment since the remaining assignments will be using the
work on this assignment. You will be using MS Excel or PHStat software. It is acceptable, if you prefer
to use another software for the study. Please let me know if you need help with the use of Excel.
You will be using the survey data given in ISSCM491 file for most of the homework problems.
These 800 employees have different occupations. The categorical variables used for occupation are
0. Managerial 1. Professional 2. Production 3. Admin. Support 4. Laborer
5. Finance 6. Corporate 8. Information Systems 9. Technical 10. Service
You will be working with the survey data of a specific occupation. The occupational group data that
you will be working is provided on the Announcements page of the course web site.
For all the questions during this quarter, the term Occupational Group refers to
the occupational group that you will be working.
The survey data (file ISSCM491.xlsx) can be downloaded from the Assignments page of the course website.
The Excel file ISSCM491.xlsx has the following three worksheets
Includes the survey information (questions).
Has the survey data as observed.
Has the same survey data sorted according to the occupational groups.
(This sorted data will be helpful to obtain the sample values of your occupational group).
Questions Part I:
1. Extract the employees of your group and save them with a meaningful file name
such as manag.xls, profe.xls, prod.xls etc.
2. Calculate the Descriptive Statistics for INCOME, Years WORKed and EDUCation for your Group.
3. Find the frequency distribution and Histogram for the INCOME of your Group
(See next page for using the Descriptive Statistics and Histogram options of MS Excel).
(a) Calculate the Descriptive Statistics for INCOME, Years WORKed and EDUCation for FEMALE
employees in your Group.
(b) Calculate the Descriptive Statistics for INCOME, Years WORKed and EDUCation for MALE
employees in your Group.
5. Select a random sample of 16 employees from this 800 observations.
(Note that this sample will be randomly selected from the whole 800 sample survey results,
not from the group you are working with).
(a) Explain how you have selected your sample to ensure that it is a random sample.
(b) What is the average (mean) AGE of the employees in your sample?
4.
Questions Part II: (These are general questions that do not use the sample survey data.
Your are asked to identify the population and suggest a sample selection for each case).
6. You would like to know, if there is a convincing evidence that newborns from mothers who smoke
have a different average birth weight than newborns from mothers who don’t smoke?
(a) What is the population for this study?
(b) How do you select the sample for this study?
7. Suppose a pharmaceutical company had developed a new drug for lowering blood pressure,
and they are preparing a clinical trial (experiment) to test the drug’s effectiveness.
(a) What is the population in this study?
(b) How do you proceed with this study? i.e., How do you collect the sample for this study?
Submit the file containing your answers to questions (1), (2), (3), (4), (5), (6) and (7) electronically
on the course website on Sakai.
Please write your name and your Occupational Group number on the first page of your submission.
Page 1 of 3 (Homework 1)
USING PHStat to calculate DECSRIPTIVE STATISTICS
PHStat–> Descritive Statistisc –> Descriptive Summary –>
Page 2
Page 2 of 3 (Homework 1)
To calculate Descriptive Statistics Using Excel
Step 1
Step 2
Select Data pane
Click Data Analysis …
Page 3
(Data analysis dialog window will open)
If Data Analysis… is not available on your tools menu, then the Analysis Toolpack has not been
installed in your system. See the FAQ on the course website.
Step 3
Choose Descriptive Statistics by clicking it once and then clicking the OK button
(The Descriptive Statistics dialog window will open. Provide the necessary information as shown below)
Step 4
Click the OK button. A new worksheet will be created with the results.
To construct a Histogram and Frequency Distribution Using Excel
Step 1
Step 2
Step 3
Select Data pane
Click Data Analysis …
(Data analysis dialog window will open)
Select Histogram option and click the OK button
(Next you will get the Histogram window. Provide the necessary information as shown below).
Step 4
Click the OK button. A new worksheet will be created with the results.
Page 3 of 3 (Homework 1)
QUESTION 1
1
2
3
4
5
6
7
8
9
ID_No
OCCUP
Resp
INCOME
Years
WORKed
EDUC
GENDER
AGE
Work
HRS
No
EARN
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
35.2
25.8
34.3
34.1
29.3
31.7
30.6
29.7
37.1
27.8
33.1
29.3
26.7
25.9
35.8
30.0
26.2
44.6
32.7
30.0
29.7
30.5
28.9
23.2
48.7
24.4
23.4
25.3
25.5
72.8
27.8
25.4
25.0
50.7
27.1
27.7
25.3
28.1
15
20
7
33
10
15
20
27
21
20
25
8
20
9
45
15
10
10
13
25
15
20
12
22
8
6
26
17
21
19
10
15
11
24
15
20
3
22
12
14
12
14
13
12
14
12
13
12
16
12
12
12
12
12
12
16
15
13
12
12
12
12
12
16
12
13
14
12
15
14
12
18
13
13
12
13
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
33
50
24
50
26
53
52
43
39
40
42
28
37
35
63
42
35
27
29
61
32
55
29
38
32
29
42
38
47
36
27
37
32
44
41
40
20
65
50
40
49
42
40
40
40
40
36
45
36
50
40
43
35
40
40
40
42
40
40
40
32
40
40
40
40
35
60
40
45
40
40
40
40
45
40
40
1
1
2
3
1
1
1
2
3
1
1
1
2
1
1
2
2
2
2
1
2
1
2
2
1
3
1
1
3
1
1
1
2
2
2
3
2
1
2
14
21
31
50
77
94
98
99
100
108
110
112
113
114
126
137
140
150
163
166
176
187
198
200
212
213
218
220
236
249
250
252
269
280
300
307
312
350
359
360
390
400
450
500
550
600
650
700
750
800
19
29
40
58
82
109
115
134
207
222
233
254
315
385
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
32.5
25.0
34.2
31.9
25.5
46.1
28.1
25.6
37.3
24.2
33.2
20.3
26.8
43.1
24.7
25.4
32.5
27.9
36.8
33.9
60.5
26.3
55.8
29.3
62.7
56.6
29.3
Total 2134.90
Average 32.84
25
6
30
21
34
36
4
24
20
20
37
20
27
7
15
16
10
17
18
10
20
24
24
16
31
32
18
12
12
12
12
12
16
14
14
12
17
13
13
12
16
16
12
12
14
13
13
16
12
12
9
14
12
12
0
0
0
0
0
0
0
0
0
0
0
0
0
1
1
1
1
1
1
1
1
1
1
1
1
1
1
61
23
55
31
56
56
30
42
38
38
58
57
69
25
35
32
27
27
37
29
37
41
41
33
48
48
35
40
44
50
47
40
40
48
37
40
37
32
32
29
40
40
40
40
40
40
41
50
72
45
56
80
40
40
1
2
2
3
2
1
2
1
4
1
1
2
1
1
2
1
3
2
4
2
2
1
1
1
2
4
2
1216.00
18.71
851.00
13.09
14.00
0.22
2602.00
40.03
2745.00
42.23
113.00
1.74
Marques Townes
Occup = 3
10
11
12
13
14
15
16
Fam
Inc
Job
Sat
Job
Char
Get
Ahead
Traumat
Events
Employ
Years
Number
Prom
25.2
17.4
27.4
69.6
19.6
35.7
47.3
57.8
31.2
17.8
34.2
19.3
26.5
15.9
38.7
23.0
43.4
69.3
43.1
20.0
38.9
20.5
21.9
16.7
38.7
28.3
29.7
15.3
68.7
63.4
17.8
15.4
25.4
49.2
32.4
31.4
27.1
18.2
1
3
3
2
2
2
2
1
2
1
4
2
2
1
1
1
2
1
2
1
2
2
1
2
2
3
1
1
4
1
1
1
2
3
2
2
1
1
1
1
4
5
3
5
5
5
5
1
5
1
1
5
2
2
1
5
2
1
5
1
5
5
1
1
4
5
5
4
5
1
3
1
5
5
4
5
2
1
2
1
1
2
1
1
1
1
2
1
1
1
3
3
1
1
3
2
1
2
1
1
1
1
1
1
3
1
3
1
1
2
1
1
1
1
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
1
0
0
1
0
0
0
1
0
0
0
1
0
0
0
0
0
0
4
10
5
3.5
4
12.5
3
9.41
16
0.5
12
3
10
1.5
22
7.08
2.5
5
4.33
18.33
11.58
15
3
1
6
3
1.08
4
1
2.16
2
15
10
22
4
7
1.5
13
3
0
1
1
5
1
1
1
0
0
1
0
0
1
1
0
3
4
0
0
3
2
1
0
0
0
1
1
0
0
0
3
0
0
0
1
0
1
Bins
15.0
25.0
35.0
45.0
55.0
65.0
75.0
22.5
37.3
27.9
40.2
44.5
36.9
41.8
16.4
72.6
49.2
23.2
42.5
16.8
33.1
46.9
15.4
32.5
46.5
34.0
43.0
59.2
16.3
49.4
23.2
56.1
71.0
45.0
1
3
2
1
2
1
2
4
1
2
2
2
1
4
2
1
1
2
3
2
2
1
1
1
4
1
2
5
2
1
1
5
1
1
1
4
5
5
1
5
5
5
1
5
5
4
4
5
5
5
4
1
5
1
3
1
2
1
1
1
2
2
1
1
2
2
1
3
2
2
1
1
1
3
1
3
1
1
3
1
1
0
0
0
0
0
0
1
0
0
0
1
1
0
0
3
0
0
2
0
0
0
1
0
0
0
0
0
17
2.5
30
10
21
30
1
2
10.5
5
5
3
25
1
0.75
0.16
5
0.5
16
3
8.5
18
8
4.5
9
25
8
1
0
3
0
0
3
1
0
4
0
0
0
3
1
2
0
0
0
0
2
2
0
4
0
1
1
0
2284.80
35.15
119.00
1.83
217.00
3.34
99.00
1.5…
Purchase answer to see full
attachment