Statistics Assignment Once you have made your corrections, you will compile your information from Phase 1, Phase 2, Phase 3 and your final conclusion into
Statistics Assignment Once you have made your corrections, you will compile your information from Phase 1, Phase 2, Phase 3 and your final conclusion into one submission and submit this as your rough draft for Phase 4 of the course project. Below is a summary of the expectations for Phase 4 of the course project:
Introduce your scenario and data set.
Provide a brief overview of the scenario you are given above and the data set that you will be analyzing.
Classify the variables in your data set.
Which variables are quantitative/qualitative?
Which variables are discrete/continuous?
Describe the level of measurement for each variable included in your data set.
Discuss the importance of the Measures of Center and the Measures of Variation.
What are the measures of center and why are they important?
What are the measures of variation and why are they important?
Calculate the measures of center and measures of variation. Interpret your results in context of the selected topic.
Mean
Median
Mode
Midrange
Range
Variance
Standard Deviantion
Discuss the importance of constructing confidence intervals for the population mean.
What are confidence intervals?
What is a point estimate?
What is the best point estimate for the population mean? Explain.
Why do we need confidence intervals?
Based on your selected topic, evaluate the following:
Find the best point estimate of the population mean.
Construct a 95% confidence interval for the population mean. Assume that your data is normally distributed and σ is unknown.
Please show your work for the construction of this confidence interval and be sure to use the Equation Editor to format your equations.
Write a statement that correctly interprets the confidence interval in context of your selected topic.
Based on your selected topic, evaluate the following:
Find the best point estimate of the population mean.
Construct a 99% confidence interval for the population mean. Assume that your data is normally distributed and σ is unknown.
Please show your work for the construction of this confidence interval and be sure to use the Equation Editor to format your equations.
Write a statement that correctly interprets the confidence interval in context of your selected topic.
Compare and contrast your findings for the 95% and 99% confidence interval.
Did you notice any changes in your interval estimate? Explain.
What conclusion(s) can be drawn about your interval estimates when the confidence level is increased? Explain.
Discuss the process for hypothesis testing.
Discuss the 8 steps of hypothesis testing?
When performing the 8 steps for hypothesis testing, which method do you prefer; P-Value method or Critical Value method? Why?
Original Claim: The average age of all patients admitted to the hospital with infectious diseases is less than 65 years of age.
Test the claim using α = 0.05 and assume your data is normally distributed and σ is unknown.
Based on your selected topic, answer the following:
Write the null and alternative hypothesis symbolically and identify which hypothesis is the claim.
Is the test two-tailed, left-tailed, or right-tailed? Explain.
Which test statistic will you use for your hypothesis test; z-test or t-test? Explain.
What is the value of the test-statistic? What is the P-value? What is the critical value?
5.) What is your decision; reject the null or do not reject the null?
Explain why you made your decision including the results for your p-value and the critical value.
State the final conclusion in non-technical terms.
Conclusion
Recap your ideas by summarizing the information presented in context of your chosen scenario.
Please be sure to show all of your work and use the Equation Editor to format your equations.
This assignment should be formatted using APA guidelines and a minimum of 2 pages in length. Running Head: Hypothesis Testing.
1
Hypothesis Testing
Mary Gibson
01-23-2019
Hypothesis Testing.
2
Steps of Hypothesis Testing.
1. State the null and alternative hypothesis (H0 and H1)
2. State the level of significance. For example, ∝=0.05. The purpose of hypothesis testing is
not to question the computed value of the sample statistics but to make a judgement about
the difference between the sample statistic and a hypothesized sample parameter.
Therefore, we need to make a decision on the criterion to use for deciding whether to
accept or reject the null hypothesis. A higher significance level means the chances of
dropping the null hypothesis when it is true are high. Decide which distribution to use in
hypothesis testing. That is the test statistic (Efron, 2014).
3. Determine the critical region or regions.
4. Compute the test statistic value.
5. Make a statistical decision. That is, decide on whether to accept or reject the null
hypothesis depending on whether test statistic lies on then critical regions.
6. Finally, make the management decision.
When performing hypothesis testing the best method to use is the critical value method. The
critical value method or the critical region method is test statistic that lead to the rejection of the
null hypothesis form the critical region of a test while those that lead to acceptance of null
hypothesis form the acceptance region (Johansen, 2011).
Consider the diagram below to clearly understand the critical value method.
Hypothesis Testing.
If the ∝ is 0.05 the rejection region will lie in the shaded region and the acceptance region will
lie in the unshaded region. There are two critical regions whose area sum is equal to the level of
significance∝.
Performing hypothesis testing.
α = 0.05
Claim; age of the patients is less than 65 years.
Step 1;
Statement of the hypothesis.
H0 = 65
H1 < 65
Step 2;
This is a right tailed test with α = 0.05.
Consider the diagram below to support the argument and facilitate understanding.
3
Hypothesis Testing.
4
This is a right tailed test because the rejection region lies in the shaded region meaning that the
claim is less than 65. Any value which lies in the rejection region means that we will reject the
null hypothesis (Johansen, 2011).
Step 3;
We are going to use Z-test statistic because this is a large sample. The sample size is greater than
30 (that is, n=60), thus it’s a large sample and use Z-test statistic. T-test statistic uses a small
sample size, which is less than 30 (n< 30)
Step 4;
Table area = 0.05-0.01=0.49
From the standard normal tables the critical value, Z α = Z0.01 = 2.33
Step 5;
Test statistic.
Since the population standard deviation, ∂ is unknown we use sample standard deviation, S
S=8.779, Ẋ=61.917, µ=65, n=60
Hypothesis Testing.
The test statistic is Z=
=
5
Ẋ−µ
√
=
Ẋ−µ
√
61.917−65
8.779
√60
= -2.72
Step 6; statistical decision.
Since the sample test statistic Z = -2.72 does not lie in the critical region we accept the null
hypothesis.
Step 7; interpretation.
Therefore the average age of the patients who visit the hospital is not less than 65.
Conclusion.
From week one we carried out a number of statistical computations in search of information that
will help us make a decision for the hospital patients. From week one we saw that the average
age of the patients who visit the hospital is 61.917 years. The modal age came at 70.75 years.
This was not enough statistical information to draw a conclusion for the management of the
hospital. We had to further perform some statistical hypothesis to test for some claims made by
the management. This hypothesis testing was based on a specific level of significance. The
hypothesis testing revealed that the average of the patients who visit NCLEX Memorial Hospital
is not less than 65. Therefore the management should employ a policy that will put more
emphasis on treating patients who have 65 years and above, since this is the age bracket that has
more patients infected with infectious disease (Efron, 2014).
Hypothesis Testing.
6
References
Efron, B. (2014). Large-scale simultaneous hypothesis testing: the choice of a null
hypothesis. Journal of the American Statistical Association, 99(465), 96-104.
Johansen, S. (2011). Estimation and hypothesis testing of co-integration vectors in Gaussian
vector autoregressive models. Econometric: Journal of the Econometric Society, 15511580.
Patient #
Infectious Disease
Age
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
69
35
60
55
49
60
72
70
70
73
68
72
74
69
46
48
70
55
49
60
72
70
76
56
59
64
71
69
55
61
70
55
45
69
54
48
60
61
50
59
60
62
63
53
64
Descriptives
Mean
Median
Mode
Midrange
Range
Variance
Standard deviation
95% Confidence interval - mean
99% Confidence interval - mean
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
50
69
52
68
70
69
59
58
69
65
61
59
71
71
68
Age
61.82
61.5
69
55.5
41
79.64
8.92
fidence interval - mean
95%
61.82
8.92
60
1.960
2.2581
64.0748
59.5585
confidence level
mean
std. dev.
n
z
half-width
upper confidence limit
lower confidence limit
fidence interval - mean
99%
61.82
8.92
60
2.576
2.9677
64.7843
58.8490
confidence level
mean
std. dev.
n
z
half-width
upper confidence limit
lower confidence limit
Running head: CONFIDENCE INTERVAL
1
Confidence interval
Mary Gibson
CONFIDENCE INTERVAL
2
Confidence interval
Confidence interval expresses a type of interval estimate in statistics that gives a range of
values which have a likelihood of containing a population parameter. The interval has an
associated degree of confidence, which implies a certain likelihood that a population parameter is
within the confidence interval. Thus, the confidence interval gives the range of values that are
likely to contain the true parameter of the population (Bland, 2015).
Point estimate
A point estimate expresses a single value that is evaluated from sample data to represent
the best guess for an unknown parameter of a population. Notably, the sample mean is best point
estimate for the true mean of the population since its gives the average of a data sample obtained
from the given population.
Confidence intervals are important in decision making since they reduce the risk of
making an error by restricting the real answer with a certain estimated range.
Point estimate for the population mean of the dataset
Using the Option #2 dataset, the point estimate is the sample mean of the age of the
patients which is 61.82 years.
99% and 95% confidence intervals
The confidence intervals can be evaluated by the formula (Bland, 2015);
= ̌ ± ∗
√
CONFIDENCE INTERVAL
3
Where ̌ represents the sample mean, z represents the z-score of the degree of
confidence, s represents the standard deviation of the sample, and n represents the sample size.
Using the excel; the 95% confidence interval can be given as;
95% Confidence interval – mean
95%
61.82
8.92
60
1.960
2.2581
64.0748
59.5585
confidence level
Mean
std. dev.
N
Z
half-width
upper confidence limit
lower confidence limit
Thus, the 95% confidence interval is (59.56, 64.07). This implies that we can be 95%
confident that the population mean for the age of patients is between 59.56 years and 64.07
years.
The 99% confidence interval can be obtained as;
99% Confidence interval – mean
99%
61.82
8.92
60
2.576
2.9677
64.7843
58.8490
confidence level
Mean
std. dev.
N
Z
half-width
upper confidence limit
lower confidence limit
CONFIDENCE INTERVAL
4
The 99% confidence interval is (58.85, 64.78). This implies that we can be 99% confident
that the population mean for the age of the patients is between 58.85 years and 64.78 years.
Interpretation
The 95% confidence interval is (59.56, 64.07) whilst the 99% confidence interval is
(58.85, 64.78). As such, the 99% confidence interval is wider compared to the 95% confidence
interval. This implies that as the degree of confidence increased, the interval estimate widened.
The increase in confidence interval implies that the chances of making an error in
estimating the population mean decreases since it can be observed from a wider range of values.
Increase in confidence level widens the interval estimate thus reducing the risk of making an
error in estimation.
CONFIDENCE INTERVAL
References
Bland, M. (2015). An introduction to medical statistics. Oxford University Press (UK).
5
Patient #
Infectious Disease
Age
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
69
35
60
55
49
60
72
70
70
73
68
72
74
69
46
48
70
55
49
60
72
70
76
56
59
64
71
69
55
61
70
55
45
69
54
48
60
61
50
59
60
62
63
53
64
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
Mean
Median
Mode
Midrange
Range
Variance
Standard
Deviation
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
61.82
61.50
69.00
58.50
41.00
79.64
8.30
50
69
52
68
70
69
59
58
69
65
61
59
71
71
68
76
35
Purchase answer to see full
attachment