| 
 
 
| Question 1193085:  Two-sample t test versus matched pairs t test. Consider the following data set. The data were actually collected in pairs, and each row represents a pair.
 Group 1  Group 2
 
 48.86       48.88
 
 50.60       52.63
 
 51.02       52.55
 
 47.99       50.94
 
 54.20       53.02
 
 50.66       50.66
 45.91       47.78
 
 48.79       48.44
 
 47.76       48.92
 
 51.13       51.63
 (A)Suppose that we ignore the fact that the data were collected in pairs and mistakenly treat this as a two-sample problem. Compute the sample mean and variance for each group. Then compute the two-sample t statistic, degrees of freedom, and P-value for the two-sided alternative.
 (B)Now analyze the data in the proper way. Compute the sample mean and variance of the differences. Then compute the t statistic, degrees of freedom, and P-value.
 (C)Describe the differences in the two test results
 Answer by math_tutor2020(3817)
      (Show Source): 
You can put this solution on YOUR website! Given Data
 
 
| Group 1 | Group 2 |  | 48.86 | 48.88 |  | 50.60 | 52.63 |  | 51.02 | 52.55 |  | 47.99 | 50.94 |  | 54.20 | 53.02 |  | 50.66 | 50.66 |  | 45.91 | 47.78 |  | 48.79 | 48.44 |  | 47.76 | 48.92 |  | 51.13 | 51.63 |  
 Part (A)
 
 Let's compute the sample mean of Group 1.
 To get the sample mean, we first add up the data values
 48.86+50.6+51.02+47.99+54.2+50.66+45.91+48.79+47.76+51.13 = 496.92
 
 Then we divide that over the sample size n = 10
 496.92/n = 496.92/10 = 49.692
 This is the sample mean (xbar) for group 1.
 I'll refer to this as xbar1.
 
 Follow similar steps for group 2 to find that
 xbar2 = 50.545
 Note: this does NOT mean xbar squared.
 
 Now let's calculate the variance for group 1.
 Here's the data values for group 1 only, which I'll call X.
 
 
| X |  | 48.86 |  | 50.6 |  | 51.02 |  | 47.99 |  | 54.2 |  | 50.66 |  | 45.91 |  | 48.79 |  | 47.76 |  | 51.13 |  For each X value, subtract off the value of xbar = 49.692
 Then square the difference.
 For example, we have (X-xbar)^2 = (48.86-49.692)^2 = 0.692224 in the first row.
 
 
| X | (X-xbar)^2 |  | 48.86 | 0.692224 |  | 50.6 | 0.824464 |  | 51.02 | 1.763584 |  | 47.99 | 2.896804 |  | 54.2 | 20.322064 |  | 50.66 | 0.937024 |  | 45.91 | 14.303524 |  | 48.79 | 0.813604 |  | 47.76 | 3.732624 |  | 51.13 | 2.067844 |  Sum everything in the second column and you should get 48.35376
 This is the Sum of the Squared Errors (SSE)
 Divide the SSE value over n-1 = 10-1 = 9 to compute the sample variance
 
 sample variance = (SSE)/(n-1)
 sample variance = (48.35376)/9
 sample variance = 5.37264
 I'll refer to this as V1 to represent the variance of group 1.
 
 Follow similar steps to find that V2 = 3.70316 is the approximate sample variance of group 2.
 Use of a calculator with a built-in standard deviation function will make quick work of finding the variance.
 
 Now onto the standard error (SE)
 SE = sqrt( (V1)/(n1) + (V2)/(n2) )
 SE = sqrt( (5.37264)/(10) + (3.70316)/(10) )
 SE = 0.95266993234803
 
 Which helps us find the t statistic
 t = ((xbar1 - xbar2) - (mu1 - mu2))/(SE)
 t = ((49.692 - 50.545) - (0))/(0.95266993234803)
 t = -0.89537831628381
 t = -0.8954
 
 The degrees of freedom is the smaller of n1-1 or n2-1
 Because n1 = n2 = 10, we just simply can think of it as n-1
 The degrees of freedom is df = n-1 = 10-1 = 9
 
 Use a calculator like this one
 https://stattrek.com/online-calculator/t-distribution.aspx
 to find that P(T < -0.8954) = 0.1969 approximately when we have df = 9
 This doubles to 2*0.1969 = 0.3938 due to the fact that we have a two-sided test (because the phrasing "P-value for the two-sided alternative.")
 The result 0.3938 is the approximate P-value.
 
 ------------------------------
 
 Summary:
 
 xbar1 = 49.692 and xbar2 = 50.545 are the sample means
 V1 = 5.37264 and V2 = 3.70316 are the sample variances
 t = -0.8954 is the test statistic
 df = 9 is the degrees of freedom
 p-value = 0.3938
 
 ==============================================================================================================
 Part (B)
 
 Here are the two original groups of data.
 
 
| Group 1 | Group 2 |  | 48.86 | 48.88 |  | 50.60 | 52.63 |  | 51.02 | 52.55 |  | 47.99 | 50.94 |  | 54.20 | 53.02 |  | 50.66 | 50.66 |  | 45.91 | 47.78 |  | 48.79 | 48.44 |  | 47.76 | 48.92 |  | 51.13 | 51.63 |  For each row, subtract the values in the form X1 - X2
 X1 is from group 1
 X2 is from group 2
 For instance, the first row has 48.86 - 48.88 = -0.02 as the difference
 We'll list the differences in the column labeled "d"
 
 
| Group 1 | Group 2 | d |  | 48.86 | 48.88 | -0.02 |  | 50.6 | 52.63 | -2.03 |  | 51.02 | 52.55 | -1.53 |  | 47.99 | 50.94 | -2.95 |  | 54.2 | 53.02 | 1.18 |  | 50.66 | 50.66 | 0 |  | 45.91 | 47.78 | -1.87 |  | 48.79 | 48.44 | 0.35 |  | 47.76 | 48.92 | -1.16 |  | 51.13 | 51.63 | -0.5 |  If you were to compute the sample mean of the d column, you should find that the mean is -0.853
 We call this value dbar in much the same way xbar is denoted. The "bar" refers to the horizontal line up top.
 dbar = -0.853
 
 The sample variance of the d column will follow the same type of steps as described in part (A) when I detailed how to compute the variance of group 1.
 You should get a sample variance of 1.610668 which leads to the sample standard deviation of sqrt(1.610668) = 1.269121
 I'll refer to this standard deviation as Sd to indicate "Standard deviation of the differences".
 
 Now onto the standard error (SE)
 SE = Sd/sqrt(n)
 SE = 1.269121/sqrt(10)
 SE = 0.40133129863506
 SE = 0.401331
 
 It allows us to compute the test statistic
 t = (dbar - mu_d)/SE
 t = (-0.853 - 0)/0.40133129863506
 t = -2.12542605797524
 t = -2.1254
 
 The degrees of freedom is n-1 = 10-1 = 9
 
 I'll then use this calculator again
 https://stattrek.com/online-calculator/t-distribution.aspx
 to find that P(T < -2.1254) = 0.0312 when df = 9 which doubles to 2*0.0312 = 0.0624 since we're doing a two-tailed test.
 The result 0.0624 is the approximate P-value.
 
 ------------------------------
 
 Summary:
 
 dbar = -0.853 is the sample mean of the differences (d column)
 1.610668 is the approximate sample variance of the differences (d column)
 t = -2.1254 is the approximate test statistic
 df = 9 = degrees of freedom
 P-value = 0.0624
 
 ==============================================================================================================
 Part (C)
 
 Admittedly there are a lot of numbers and variables to keep track of.
 It might be overwhelming if you aren't familiar with statistics too much.
 
 Though if I had to pick one variable to focus on, I would say it's the P-value.
 In many scientific journals, the researchers report the P-value to the reader to indicate how (in)significant the results were.
 
 In part (A), we got a P-value of roughly 0.3938
 In part (B), we got a P-value of roughly 0.0624
 That's quite a gap.
 
 Recall that the P-value determines if you reject or fail to reject the null.
 Let's say the significance level is alpha = 0.05 which is the default level.
 
 At this alpha value, we'd fail to reject the null for both part (A) and part (B). Why? Because the p-value for each is not less than alpha = 0.05
 We reject the null only if the p-value is smaller than alpha.
 
 If we set alpha = 0.10, then we'd fail to reject in part (A) but reject the null in part (B)
 Sometimes you may see a significance level of alpha = 0.10 (of course it depends on the context).
 As you can see, part (B) has leads to a situation where we are more likely to reject the null.
 
 | 
  
 | 
 |