SOLUTION: Two-sample t test versus matched pairs t test. Consider the following data set. The data were actually collected in pairs, and each row represents a pair. Group 1 Group 2

Click here to see ALL problems on Probability-and-statistics

Question 1193085: Two-sample t test versus matched pairs t test. Consider the following data set. The data were actually collected in pairs, and each row represents a pair.
Group 1 Group 2

48.86 48.88

50.60 52.63

51.02 52.55

47.99 50.94

54.20 53.02

50.66 50.66
45.91 47.78

48.79 48.44

47.76 48.92

51.13 51.63
(A)Suppose that we ignore the fact that the data were collected in pairs and mistakenly treat this as a two-sample problem. Compute the sample mean and variance for each group. Then compute the two-sample t statistic, degrees of freedom, and P-value for the two-sided alternative.
(B)Now analyze the data in the proper way. Compute the sample mean and variance of the differences. Then compute the t statistic, degrees of freedom, and P-value.
(C)Describe the differences in the two test results
Answer by math_tutor2020(3817) (Show Source):
You can put this solution on YOUR website!

Given Data

Group 1 Group 2
48.86 48.88
50.60 52.63
51.02 52.55
47.99 50.94
54.20 53.02
50.66 50.66
45.91 47.78
48.79 48.44
47.76 48.92
51.13 51.63

Part (A)

Let's compute the sample mean of Group 1.
To get the sample mean, we first add up the data values
48.86+50.6+51.02+47.99+54.2+50.66+45.91+48.79+47.76+51.13 = 496.92

Then we divide that over the sample size n = 10
496.92/n = 496.92/10 = 49.692
This is the sample mean (xbar) for group 1.
I'll refer to this as xbar1.

Follow similar steps for group 2 to find that
xbar2 = 50.545
Note: this does NOT mean xbar squared.

Now let's calculate the variance for group 1.
Here's the data values for group 1 only, which I'll call X.

X
48.86
50.6
51.02
47.99
54.2
50.66
45.91
48.79
47.76
51.13

For each X value, subtract off the value of xbar = 49.692
Then square the difference.
For example, we have (X-xbar)^2 = (48.86-49.692)^2 = 0.692224 in the first row.

X (X-xbar)^2
48.86 0.692224
50.6 0.824464
51.02 1.763584
47.99 2.896804
54.2 20.322064
50.66 0.937024
45.91 14.303524
48.79 0.813604
47.76 3.732624
51.13 2.067844

Sum everything in the second column and you should get 48.35376
This is the Sum of the Squared Errors (SSE)
Divide the SSE value over n-1 = 10-1 = 9 to compute the sample variance

sample variance = (SSE)/(n-1)
sample variance = (48.35376)/9
sample variance = 5.37264
I'll refer to this as V1 to represent the variance of group 1.

Follow similar steps to find that V2 = 3.70316 is the approximate sample variance of group 2.
Use of a calculator with a built-in standard deviation function will make quick work of finding the variance.

Now onto the standard error (SE)
SE = sqrt( (V1)/(n1) + (V2)/(n2) )
SE = sqrt( (5.37264)/(10) + (3.70316)/(10) )
SE = 0.95266993234803

Which helps us find the t statistic
t = ((xbar1 - xbar2) - (mu1 - mu2))/(SE)
t = ((49.692 - 50.545) - (0))/(0.95266993234803)
t = -0.89537831628381
t = -0.8954

The degrees of freedom is the smaller of n1-1 or n2-1
Because n1 = n2 = 10, we just simply can think of it as n-1
The degrees of freedom is df = n-1 = 10-1 = 9

Use a calculator like this one
https://stattrek.com/online-calculator/t-distribution.aspx
to find that P(T < -0.8954) = 0.1969 approximately when we have df = 9
This doubles to 2*0.1969 = 0.3938 due to the fact that we have a two-sided test (because the phrasing "P-value for the two-sided alternative.")
The result 0.3938 is the approximate P-value.

------------------------------

Summary:

xbar1 = 49.692 and xbar2 = 50.545 are the sample means
V1 = 5.37264 and V2 = 3.70316 are the sample variances
t = -0.8954 is the test statistic
df = 9 is the degrees of freedom
p-value = 0.3938

==============================================================================================================
Part (B)

Here are the two original groups of data.

Group 1 Group 2
48.86 48.88
50.60 52.63
51.02 52.55
47.99 50.94
54.20 53.02
50.66 50.66
45.91 47.78
48.79 48.44
47.76 48.92
51.13 51.63

For each row, subtract the values in the form X1 - X2
X1 is from group 1
X2 is from group 2
For instance, the first row has 48.86 - 48.88 = -0.02 as the difference
We'll list the differences in the column labeled "d"

Group 1 Group 2 d
48.86 48.88 -0.02
50.6 52.63 -2.03
51.02 52.55 -1.53
47.99 50.94 -2.95
54.2 53.02 1.18
50.66 50.66 0
45.91 47.78 -1.87
48.79 48.44 0.35
47.76 48.92 -1.16
51.13 51.63 -0.5

If you were to compute the sample mean of the d column, you should find that the mean is -0.853
We call this value dbar in much the same way xbar is denoted. The "bar" refers to the horizontal line up top.
dbar = -0.853

The sample variance of the d column will follow the same type of steps as described in part (A) when I detailed how to compute the variance of group 1.
You should get a sample variance of 1.610668 which leads to the sample standard deviation of sqrt(1.610668) = 1.269121
I'll refer to this standard deviation as Sd to indicate "Standard deviation of the differences".

Now onto the standard error (SE)
SE = Sd/sqrt(n)
SE = 1.269121/sqrt(10)
SE = 0.40133129863506
SE = 0.401331

It allows us to compute the test statistic
t = (dbar - mu_d)/SE
t = (-0.853 - 0)/0.40133129863506
t = -2.12542605797524
t = -2.1254

The degrees of freedom is n-1 = 10-1 = 9

I'll then use this calculator again
https://stattrek.com/online-calculator/t-distribution.aspx
to find that P(T < -2.1254) = 0.0312 when df = 9 which doubles to 2*0.0312 = 0.0624 since we're doing a two-tailed test.
The result 0.0624 is the approximate P-value.

------------------------------

Summary:

dbar = -0.853 is the sample mean of the differences (d column)
1.610668 is the approximate sample variance of the differences (d column)
t = -2.1254 is the approximate test statistic
df = 9 = degrees of freedom
P-value = 0.0624

==============================================================================================================
Part (C)

Admittedly there are a lot of numbers and variables to keep track of.
It might be overwhelming if you aren't familiar with statistics too much.

Though if I had to pick one variable to focus on, I would say it's the P-value.
In many scientific journals, the researchers report the P-value to the reader to indicate how (in)significant the results were.

In part (A), we got a P-value of roughly 0.3938
In part (B), we got a P-value of roughly 0.0624
That's quite a gap.

Recall that the P-value determines if you reject or fail to reject the null.
Let's say the significance level is alpha = 0.05 which is the default level.

At this alpha value, we'd fail to reject the null for both part (A) and part (B). Why? Because the p-value for each is not less than alpha = 0.05
We reject the null only if the p-value is smaller than alpha.

If we set alpha = 0.10, then we'd fail to reject in part (A) but reject the null in part (B)
Sometimes you may see a significance level of alpha = 0.10 (of course it depends on the context).
As you can see, part (B) has leads to a situation where we are more likely to reject the null.