SOLUTION: Compute and interpret the correlation coefficient for the following coded data: X (height) 15 18 11 19 20 9 Y(weight) 17 21 24 10 25 16 Fit a regression line Y on X and X on Y a

Algebra ->  Probability-and-statistics -> SOLUTION: Compute and interpret the correlation coefficient for the following coded data: X (height) 15 18 11 19 20 9 Y(weight) 17 21 24 10 25 16 Fit a regression line Y on X and X on Y a      Log On


   



Question 1165702: Compute and interpret the correlation coefficient for the following coded data:
X (height) 15 18 11 19 20 9
Y(weight) 17 21 24 10 25 16
Fit a regression line Y on X and X on Y and coefficient of correlation?

Answer by CPhill(2138) About Me  (Show Source):
You can put this solution on YOUR website!
```python?code_reference&code_event_index=2
import numpy as np
import pandas as pd
from scipy.stats import pearsonr, linregress
# Data
X = np.array([15, 18, 11, 19, 20, 9])
Y = np.array([17, 21, 24, 10, 25, 16])
n = len(X)
# 1. Compute sums for manual calculation (for verification and required components)
sum_X = np.sum(X)
sum_Y = np.sum(Y)
sum_X_sq = np.sum(X**2)
sum_Y_sq = np.sum(Y**2)
sum_XY = np.sum(X * Y)
# Calculate the covariance numerator (Num) and variance denominators (Den_X, Den_Y)
Num = n * sum_XY - sum_X * sum_Y
Den_X_sq = n * sum_X_sq - sum_X**2
Den_Y_sq = n * sum_Y_sq - sum_Y**2
# Correlation Coefficient (r)
r = Num / np.sqrt(Den_X_sq * Den_Y_sq)
# 2. Regression Line Y on X: Y = b0 + b1*X
b1_yx = Num / Den_X_sq
mean_X = np.mean(X)
mean_Y = np.mean(Y)
b0_yx = mean_Y - b1_yx * mean_X
# 3. Regression Line X on Y: X = b'0 + b'1*Y
b1_xy = Num / Den_Y_sq
b0_xy = mean_X - b1_xy * mean_Y
print(f"n = {n}")
print(f"Sum X = {sum_X}, Sum Y = {sum_Y}")
print(f"Sum X^2 = {sum_X_sq}, Sum Y^2 = {sum_Y_sq}")
print(f"Sum XY = {sum_XY}")
print("-" * 30)
print(f"Correlation Coefficient (r): {r}")
print(f"Regression Y on X (b1): {b1_yx}")
print(f"Regression Y on X (b0): {b0_yx}")
print(f"Regression X on Y (b'1): {b1_xy}")
print(f"Regression X on Y (b'0): {b0_xy}")
# Verify r^2 = b1_yx * b1_xy
r_sq_check = b1_yx * b1_xy
print(f"Check r^2 = b1_yx * b1_xy: {r_sq_check}")
print(f"r^2: {r**2}")
```
```text?code_stdout&code_event_index=2
n = 6
Sum X = 92, Sum Y = 113
Sum X^2 = 1512, Sum Y^2 = 2287
Sum XY = 1731
------------------------------
Correlation Coefficient (r): -0.013137168189865763
Regression Y on X (b1): -0.01644736842105263
Regression Y on X (b0): 19.085526315789473
Regression X on Y (b'1): -0.01049317943336831
Regression X on Y (b'0): 15.530954879328437
Check r^2 = b1_yx * b1_xy: 0.00017258518804882087
r^2: 0.0001725851880488209
```
The computations reveal a very weak, almost non-existent, negative linear relationship between the coded height (X) and weight (Y) data.
-----
## 1\. Coefficient of Correlation ($r$)
The **correlation coefficient ($r$)** is calculated as:
$$r = \frac{n(\sum XY) - (\sum X)(\sum Y)}{\sqrt{[n(\sum X^2) - (\sum X)^2][n(\sum Y^2) - (\sum Y)^2]}}$$
Using the computed summary statistics:
* $\sum X = 92$
* $\sum Y = 113$
* $\sum XY = 1731$
* $n = 6$
$$r \approx \mathbf{-0.0131}$$
### Interpretation of $r$
The correlation coefficient $r \approx -0.0131$ indicates a **very weak, negligible negative linear relationship** between the two variables.
* The value is very close to $0$, which suggests there is **virtually no linear association** between the coded height (X) and weight (Y) in this sample.
* The negative sign means that as X (height) slightly increases, Y (weight) tends to slightly decrease, but this trend is too small to be meaningful.
-----
## 2\. Regression Line Y on X
The regression line used to predict Y (weight) from X (height) is $\hat{Y} = b_0 + b_1X$.
### Slope ($b_1$)
$$b_1 = \frac{n(\sum XY) - (\sum X)(\sum Y)}{n(\sum X^2) - (\sum X)^2} \approx \mathbf{-0.0164}$$
### Intercept ($b_0$)
$$b_0 = \bar{Y} - b_1\bar{X} \approx 18.8333 - (-0.0164)(15.3333) \approx \mathbf{19.0855}$$
The equation for the regression line Y on X is:
$$\mathbf{\hat{Y} = 19.0855 - 0.0164X}$$
-----
## 3\. Regression Line X on Y
The regression line used to predict X (height) from Y (weight) is $\hat{X} = b'_0 + b'_1Y$.
### Slope ($b'_1$)
$$b'_1 = \frac{n(\sum XY) - (\sum X)(\sum Y)}{n(\sum Y^2) - (\sum Y)^2} \approx \mathbf{-0.0105}$$
### Intercept ($b'_0$)
$$b'_0 = \bar{X} - b'_1\bar{Y} \approx 15.3333 - (-0.0105)(18.8333) \approx \mathbf{15.5310}$$
The equation for the regression line X on Y is:
$$\mathbf{\hat{X} = 15.5310 - 0.0105Y}$$