SOLUTION: Compute and interpret the correlation coefficient for the following coded data:
X (height) 15 18 11 19 20 9
Y(weight) 17 21 24 10 25 16
Fit a regression line Y on X and X on Y a
Algebra.Com
Question 1165702: Compute and interpret the correlation coefficient for the following coded data:
X (height) 15 18 11 19 20 9
Y(weight) 17 21 24 10 25 16
Fit a regression line Y on X and X on Y and coefficient of correlation?
Answer by CPhill(2138) (Show Source): You can put this solution on YOUR website!
```python?code_reference&code_event_index=2
import numpy as np
import pandas as pd
from scipy.stats import pearsonr, linregress
# Data
X = np.array([15, 18, 11, 19, 20, 9])
Y = np.array([17, 21, 24, 10, 25, 16])
n = len(X)
# 1. Compute sums for manual calculation (for verification and required components)
sum_X = np.sum(X)
sum_Y = np.sum(Y)
sum_X_sq = np.sum(X**2)
sum_Y_sq = np.sum(Y**2)
sum_XY = np.sum(X * Y)
# Calculate the covariance numerator (Num) and variance denominators (Den_X, Den_Y)
Num = n * sum_XY - sum_X * sum_Y
Den_X_sq = n * sum_X_sq - sum_X**2
Den_Y_sq = n * sum_Y_sq - sum_Y**2
# Correlation Coefficient (r)
r = Num / np.sqrt(Den_X_sq * Den_Y_sq)
# 2. Regression Line Y on X: Y = b0 + b1*X
b1_yx = Num / Den_X_sq
mean_X = np.mean(X)
mean_Y = np.mean(Y)
b0_yx = mean_Y - b1_yx * mean_X
# 3. Regression Line X on Y: X = b'0 + b'1*Y
b1_xy = Num / Den_Y_sq
b0_xy = mean_X - b1_xy * mean_Y
print(f"n = {n}")
print(f"Sum X = {sum_X}, Sum Y = {sum_Y}")
print(f"Sum X^2 = {sum_X_sq}, Sum Y^2 = {sum_Y_sq}")
print(f"Sum XY = {sum_XY}")
print("-" * 30)
print(f"Correlation Coefficient (r): {r}")
print(f"Regression Y on X (b1): {b1_yx}")
print(f"Regression Y on X (b0): {b0_yx}")
print(f"Regression X on Y (b'1): {b1_xy}")
print(f"Regression X on Y (b'0): {b0_xy}")
# Verify r^2 = b1_yx * b1_xy
r_sq_check = b1_yx * b1_xy
print(f"Check r^2 = b1_yx * b1_xy: {r_sq_check}")
print(f"r^2: {r**2}")
```
```text?code_stdout&code_event_index=2
n = 6
Sum X = 92, Sum Y = 113
Sum X^2 = 1512, Sum Y^2 = 2287
Sum XY = 1731
------------------------------
Correlation Coefficient (r): -0.013137168189865763
Regression Y on X (b1): -0.01644736842105263
Regression Y on X (b0): 19.085526315789473
Regression X on Y (b'1): -0.01049317943336831
Regression X on Y (b'0): 15.530954879328437
Check r^2 = b1_yx * b1_xy: 0.00017258518804882087
r^2: 0.0001725851880488209
```
The computations reveal a very weak, almost non-existent, negative linear relationship between the coded height (X) and weight (Y) data.
-----
## 1\. Coefficient of Correlation ($r$)
The **correlation coefficient ($r$)** is calculated as:
$$r = \frac{n(\sum XY) - (\sum X)(\sum Y)}{\sqrt{[n(\sum X^2) - (\sum X)^2][n(\sum Y^2) - (\sum Y)^2]}}$$
Using the computed summary statistics:
* $\sum X = 92$
* $\sum Y = 113$
* $\sum XY = 1731$
* $n = 6$
$$r \approx \mathbf{-0.0131}$$
### Interpretation of $r$
The correlation coefficient $r \approx -0.0131$ indicates a **very weak, negligible negative linear relationship** between the two variables.
* The value is very close to $0$, which suggests there is **virtually no linear association** between the coded height (X) and weight (Y) in this sample.
* The negative sign means that as X (height) slightly increases, Y (weight) tends to slightly decrease, but this trend is too small to be meaningful.
-----
## 2\. Regression Line Y on X
The regression line used to predict Y (weight) from X (height) is $\hat{Y} = b_0 + b_1X$.
### Slope ($b_1$)
$$b_1 = \frac{n(\sum XY) - (\sum X)(\sum Y)}{n(\sum X^2) - (\sum X)^2} \approx \mathbf{-0.0164}$$
### Intercept ($b_0$)
$$b_0 = \bar{Y} - b_1\bar{X} \approx 18.8333 - (-0.0164)(15.3333) \approx \mathbf{19.0855}$$
The equation for the regression line Y on X is:
$$\mathbf{\hat{Y} = 19.0855 - 0.0164X}$$
-----
## 3\. Regression Line X on Y
The regression line used to predict X (height) from Y (weight) is $\hat{X} = b'_0 + b'_1Y$.
### Slope ($b'_1$)
$$b'_1 = \frac{n(\sum XY) - (\sum X)(\sum Y)}{n(\sum Y^2) - (\sum Y)^2} \approx \mathbf{-0.0105}$$
### Intercept ($b'_0$)
$$b'_0 = \bar{X} - b'_1\bar{Y} \approx 15.3333 - (-0.0105)(18.8333) \approx \mathbf{15.5310}$$
The equation for the regression line X on Y is:
$$\mathbf{\hat{X} = 15.5310 - 0.0105Y}$$
RELATED QUESTIONS
A 10-year study conducted by the American Heart Association provided data on how age... (answered by stanbon)
A 10-year study conducted by the American Heart Association provided data on how age... (answered by stanbon)
The graph and accompanying table shown here display 12 observations of a pair of... (answered by stanbon)
Text scores 14 19 24 21 26 22 15 20 19
Sales in 000’rs 31 36 48 37 50 45 33 41 39
(answered by ewatrrr)
Calculate coefficient of correlation from the following data.
X 1 2 3 4 5 6 7 8 9
Y 9... (answered by ikleyn)
Based on the data shown below, calculate the regression line (each value to two decimal... (answered by ikleyn,MathLover1)
1. Find the value for the correlation coefficient r.
x 10 8 7 12 14 5
y 20 (answered by Fombitz)
Which of the following most accurately describes a correlation coefficient of 0.83... (answered by richard1234)
compute r, the correlation coefficient, using the following data
x 1 6 3 7 4 5... (answered by ewatrrr)