Question 1193085: Two-sample t test versus matched pairs t test. Consider the following data set. The data were actually collected in pairs, and each row represents a pair.
Group 1 Group 2
48.86 48.88
50.60 52.63
51.02 52.55
47.99 50.94
54.20 53.02
50.66 50.66
45.91 47.78
48.79 48.44
47.76 48.92
51.13 51.63
(A)Suppose that we ignore the fact that the data were collected in pairs and mistakenly treat this as a two-sample problem. Compute the sample mean and variance for each group. Then compute the two-sample t statistic, degrees of freedom, and P-value for the two-sided alternative.
(B)Now analyze the data in the proper way. Compute the sample mean and variance of the differences. Then compute the t statistic, degrees of freedom, and P-value.
(C)Describe the differences in the two test results
Answer by math_tutor2020(3817) (Show Source):
You can put this solution on YOUR website!
Given Data
Group 1 | Group 2 | 48.86 | 48.88 | 50.60 | 52.63 | 51.02 | 52.55 | 47.99 | 50.94 | 54.20 | 53.02 | 50.66 | 50.66 | 45.91 | 47.78 | 48.79 | 48.44 | 47.76 | 48.92 | 51.13 | 51.63 |
Part (A)
Let's compute the sample mean of Group 1.
To get the sample mean, we first add up the data values
48.86+50.6+51.02+47.99+54.2+50.66+45.91+48.79+47.76+51.13 = 496.92
Then we divide that over the sample size n = 10
496.92/n = 496.92/10 = 49.692
This is the sample mean (xbar) for group 1.
I'll refer to this as xbar1.
Follow similar steps for group 2 to find that
xbar2 = 50.545
Note: this does NOT mean xbar squared.
Now let's calculate the variance for group 1.
Here's the data values for group 1 only, which I'll call X.
X | 48.86 | 50.6 | 51.02 | 47.99 | 54.2 | 50.66 | 45.91 | 48.79 | 47.76 | 51.13 |
For each X value, subtract off the value of xbar = 49.692
Then square the difference.
For example, we have (X-xbar)^2 = (48.86-49.692)^2 = 0.692224 in the first row.
X | (X-xbar)^2 | 48.86 | 0.692224 | 50.6 | 0.824464 | 51.02 | 1.763584 | 47.99 | 2.896804 | 54.2 | 20.322064 | 50.66 | 0.937024 | 45.91 | 14.303524 | 48.79 | 0.813604 | 47.76 | 3.732624 | 51.13 | 2.067844 |
Sum everything in the second column and you should get 48.35376
This is the Sum of the Squared Errors (SSE)
Divide the SSE value over n-1 = 10-1 = 9 to compute the sample variance
sample variance = (SSE)/(n-1)
sample variance = (48.35376)/9
sample variance = 5.37264
I'll refer to this as V1 to represent the variance of group 1.
Follow similar steps to find that V2 = 3.70316 is the approximate sample variance of group 2.
Use of a calculator with a built-in standard deviation function will make quick work of finding the variance.
Now onto the standard error (SE)
SE = sqrt( (V1)/(n1) + (V2)/(n2) )
SE = sqrt( (5.37264)/(10) + (3.70316)/(10) )
SE = 0.95266993234803
Which helps us find the t statistic
t = ((xbar1 - xbar2) - (mu1 - mu2))/(SE)
t = ((49.692 - 50.545) - (0))/(0.95266993234803)
t = -0.89537831628381
t = -0.8954
The degrees of freedom is the smaller of n1-1 or n2-1
Because n1 = n2 = 10, we just simply can think of it as n-1
The degrees of freedom is df = n-1 = 10-1 = 9
Use a calculator like this one
https://stattrek.com/online-calculator/t-distribution.aspx
to find that P(T < -0.8954) = 0.1969 approximately when we have df = 9
This doubles to 2*0.1969 = 0.3938 due to the fact that we have a two-sided test (because the phrasing "P-value for the two-sided alternative.")
The result 0.3938 is the approximate P-value.
------------------------------
Summary:
xbar1 = 49.692 and xbar2 = 50.545 are the sample means
V1 = 5.37264 and V2 = 3.70316 are the sample variances
t = -0.8954 is the test statistic
df = 9 is the degrees of freedom
p-value = 0.3938
==============================================================================================================
Part (B)
Here are the two original groups of data.
Group 1 | Group 2 | 48.86 | 48.88 | 50.60 | 52.63 | 51.02 | 52.55 | 47.99 | 50.94 | 54.20 | 53.02 | 50.66 | 50.66 | 45.91 | 47.78 | 48.79 | 48.44 | 47.76 | 48.92 | 51.13 | 51.63 |
For each row, subtract the values in the form X1 - X2
X1 is from group 1
X2 is from group 2
For instance, the first row has 48.86 - 48.88 = -0.02 as the difference
We'll list the differences in the column labeled "d"
Group 1 | Group 2 | d | 48.86 | 48.88 | -0.02 | 50.6 | 52.63 | -2.03 | 51.02 | 52.55 | -1.53 | 47.99 | 50.94 | -2.95 | 54.2 | 53.02 | 1.18 | 50.66 | 50.66 | 0 | 45.91 | 47.78 | -1.87 | 48.79 | 48.44 | 0.35 | 47.76 | 48.92 | -1.16 | 51.13 | 51.63 | -0.5 |
If you were to compute the sample mean of the d column, you should find that the mean is -0.853
We call this value dbar in much the same way xbar is denoted. The "bar" refers to the horizontal line up top.
dbar = -0.853
The sample variance of the d column will follow the same type of steps as described in part (A) when I detailed how to compute the variance of group 1.
You should get a sample variance of 1.610668 which leads to the sample standard deviation of sqrt(1.610668) = 1.269121
I'll refer to this standard deviation as Sd to indicate "Standard deviation of the differences".
Now onto the standard error (SE)
SE = Sd/sqrt(n)
SE = 1.269121/sqrt(10)
SE = 0.40133129863506
SE = 0.401331
It allows us to compute the test statistic
t = (dbar - mu_d)/SE
t = (-0.853 - 0)/0.40133129863506
t = -2.12542605797524
t = -2.1254
The degrees of freedom is n-1 = 10-1 = 9
I'll then use this calculator again
https://stattrek.com/online-calculator/t-distribution.aspx
to find that P(T < -2.1254) = 0.0312 when df = 9 which doubles to 2*0.0312 = 0.0624 since we're doing a two-tailed test.
The result 0.0624 is the approximate P-value.
------------------------------
Summary:
dbar = -0.853 is the sample mean of the differences (d column)
1.610668 is the approximate sample variance of the differences (d column)
t = -2.1254 is the approximate test statistic
df = 9 = degrees of freedom
P-value = 0.0624
==============================================================================================================
Part (C)
Admittedly there are a lot of numbers and variables to keep track of.
It might be overwhelming if you aren't familiar with statistics too much.
Though if I had to pick one variable to focus on, I would say it's the P-value.
In many scientific journals, the researchers report the P-value to the reader to indicate how (in)significant the results were.
In part (A), we got a P-value of roughly 0.3938
In part (B), we got a P-value of roughly 0.0624
That's quite a gap.
Recall that the P-value determines if you reject or fail to reject the null.
Let's say the significance level is alpha = 0.05 which is the default level.
At this alpha value, we'd fail to reject the null for both part (A) and part (B). Why? Because the p-value for each is not less than alpha = 0.05
We reject the null only if the p-value is smaller than alpha.
If we set alpha = 0.10, then we'd fail to reject in part (A) but reject the null in part (B)
Sometimes you may see a significance level of alpha = 0.10 (of course it depends on the context).
As you can see, part (B) has leads to a situation where we are more likely to reject the null.
|
|
|