|  | 
| 
 
 
| Question 1196643:  TABLE: https://imagizer.imageshack.com/img924/3557/S7r4QJ.jpg
 Disk drives have been getting larger. Their capacity is now often given in terabytes (TB) where 1TB=1000 gigabytes, or about a trillion bytes. A search of prices for external disk drives on a large shopping website in a recent year found the accompanying data. Find and interpret the value of R^2
 Answer by math_tutor2020(3817)
      (Show Source): 
You can put this solution on YOUR website! If you are in a hurry, then you can use technology to quickly compute the r and r^2 values.
 Two examples would be the LinReg command on a TI83 (or similar) and using the the CORREL command in a spreadsheet.
 There are many other options to choose from. Feel free to search out your favorite.
 You should find these approximations
 r = 0.9878
 r^2 = 0.9757
 Since r^2 is very close to 1, this makes the linear regression a good fit. Approximately 97.57% of the variation in x explains the variation in y.
 
 
 If you have more time to go over the math, then there are various ways to calculate the correlation coefficient.
 I'll go over two slightly different methods.
 
 -------------------------------------------------------------------------------------------------------------
 
 Method 1
 
 x = capacity of hard drive in terabytes (TB)
 y = price in dollars
 
 Given info:
 n = 9 = sample size = number of x,y pairs
 xbar = 7.611
 ybar = 786.49
 SD(x) = 9.854
 SD(y) = 1417.82
 
 The term "xbar" refers to the horizontal bar over the x, i.e.
  A similar story is with ybar as well.
 
 Given Data
 
 
Form a third column which is the product of the x and y columns| x | y |  | 0.5 | 60.99 |  | 1 | 77.99 |  | 2 | 112.97 |  | 3 | 110.99 |  | 4 | 151.99 |  | 6 | 425.34 |  | 8 | 597.11 |  | 12 | 1081.99 |  | 32 | 4459 |  Eg: 0.5*60.99 = 30.495 in the first row
 
I strongly recommend using spreadsheet software.| x | y | xy |  | 0.5 | 60.99 | 30.495 |  | 1 | 77.99 | 77.99 |  | 2 | 112.97 | 225.94 |  | 3 | 110.99 | 332.97 |  | 4 | 151.99 | 607.96 |  | 6 | 425.34 | 2552.04 |  | 8 | 597.11 | 4776.88 |  | 12 | 1081.99 | 12983.88 |  | 32 | 4459 | 142688 |  It's not only fast and efficient, but also something that is expected in real world applications.
 I'm using LibreOffice but you could use Excel or Google Sheets or whichever app you prefer most.
 
 Add up the values in the xy column to get 164,276.155
 Then we subtract off the value of n*xbar*ybar = 9*7.611*786.49 = 53,873.77851
 
 So we have:
 Sum(xy) - n*xbar*ybar = 164,276.155 - 53,873.77851 = 110,402.37649
 
 We'll divide that result over the product of the given standard deviation values, multiplied with (n-1)
 So,
 (n-1)*SD(x)*SD(y) = (9-1)*(9.854)*(1417.82) = 111,769.58624
 
 
 Therefore,
 r = (110,402.37649)/(111,769.58624)
 r = 0.98776760480204
 r^2 = (0.98776760480204)^2
 r^2 = 0.97568484109636
 r^2 = 0.9757
 which is approximate.
 
 Since r^2 is very close to 1, this makes the linear regression a good fit. Approximately 97.57% of the variation in x explains the variation in y.
 
 Note: The formula I used just now is
 
  - \text{n}*\overline{\text{x}}*\overline{\text{y}}}{(\text{n}-1)*SD(\text{x})*SD(\text{y})})  
 -------------------------------------------------------------------------------------------------------------
 
 Method 2
 
 x = capacity of hard drive in terabytes (TB)
 y = price in dollars
 
 Given info:
 n = 9 = sample size = number of x,y pairs
 xbar = 7.611
 ybar = 786.49
 SD(x) = 9.854
 SD(y) = 1417.82
 
 Given Data
 
Instead of an xy column, we'll form the Zx column| x | y |  | 0.5 | 60.99 |  | 1 | 77.99 |  | 2 | 112.97 |  | 3 | 110.99 |  | 4 | 151.99 |  | 6 | 425.34 |  | 8 | 597.11 |  | 12 | 1081.99 |  | 32 | 4459 |  Zx = (x - xbar)/(SD(x))
 We're computing the z score for each x term
 
 For instance, in the first row we have
 Zx = (x-xbar)/(SD(x))
 Zx = (0.5-7.611)/(9.854)
 Zx = -0.72163588390501
 Zx = -0.721636
 Do the same thing for each item in the x column. The values of xbar and SD(x) will remain constant.
 
 This is what the updated table looks like
 
Follow similar steps for the Zy column| x | y | Zx |  | 0.5 | 60.99 | -0.721636 |  | 1 | 77.99 | -0.670895 |  | 2 | 112.97 | -0.569413 |  | 3 | 110.99 | -0.467932 |  | 4 | 151.99 | -0.36645 |  | 6 | 425.34 | -0.163487 |  | 8 | 597.11 | 0.039476 |  | 12 | 1081.99 | 0.445403 |  | 32 | 4459 | 2.475036 |  For example, we'll have the following calculation for the 1st row.
 Zy = (y - ybar)/(SD(y))
 Zy = (60.99 - 786.49)/(1417.82)
 Zy = -0.511701
 
 We have this so far
 
Then we'll multiply the Zx and Zy items for each row.| x | y | Zx | Zy |  | 0.5 | 60.99 | -0.721636 | -0.511701 |  | 1 | 77.99 | -0.670895 | -0.499711 |  | 2 | 112.97 | -0.569413 | -0.475039 |  | 3 | 110.99 | -0.467932 | -0.476436 |  | 4 | 151.99 | -0.36645 | -0.447518 |  | 6 | 425.34 | -0.163487 | -0.254722 |  | 8 | 597.11 | 0.039476 | -0.133571 |  | 12 | 1081.99 | 0.445403 | 0.208419 |  | 32 | 4459 | 2.475036 | 2.590251 |  Eg: Zx*Zy = (-0.721636)*(-0.511701) = 0.369262 in row one
 
 This is what the fully completed table looks like
 
| x | y | Zx | Zy | ZxZy |  | 0.5 | 60.99 | -0.721636 | -0.511701 | 0.369262 |  | 1 | 77.99 | -0.670895 | -0.499711 | 0.335254 |  | 2 | 112.97 | -0.569413 | -0.475039 | 0.270493 |  | 3 | 110.99 | -0.467932 | -0.476436 | 0.22294 |  | 4 | 151.99 | -0.36645 | -0.447518 | 0.163993 |  | 6 | 425.34 | -0.163487 | -0.254722 | 0.041644 |  | 8 | 597.11 | 0.039476 | -0.133571 | -0.005273 |  | 12 | 1081.99 | 0.445403 | 0.208419 | 0.09283 |  | 32 | 4459 | 2.475036 | 2.590251 | 6.410964 |  Add up the values in the final column and you should get roughly 7.902107
 
 So,
 r = Sum(ZxZy)/(n-1)
 r = 7.902107/(9-1)
 r = 7.902107/8
 r = 0.987763375
 r^2 = (0.987763375)^2
 r^2 = 0.9756764849914
 r^2 = 0.9757
 
 Since r^2 is very close to 1, this makes the linear regression a good fit. Approximately 97.57% of the variation in x explains the variation in y.
 
 Answer: r^2 = 0.9757 approximately
 
 | 
  
 | 
 |  |  |