Question 704868: If theres a positive correlation between data does it matter which set of data is represented on the x axis
Answer by KMST(5328) (Show Source):
You can put this solution on YOUR website! Positive correlation means:
1) there is a correlation (we agree that the x and y values are related)
2) the slope of the graph is positive (the larger the x, the larger the y)
A negative correlation would give a graph with negative slope (larger x values correspond to smaller y values)
A correlation of any kind could be weak or strong.
A weak correlation shows a lot of scatter (the statistician notices the correlation, but other people may not see it). That is the kind of correlation that we see in biology, pharmacology, medicine, epidemiology.
If there is a strong correlation, everyone agrees that the variables are related. That's what we want in analytical chemistry, physics, and engineering.
Consider the following (x,y) strongly positively correlated data points:
(1.0,1.1), (2.0,2.2), (3.0,3.3), (4.0,3.9), (5.0,4.8)
Linear regression says:
r=0.9946, slope=0.91, y-intercept=0.24
Predictions: y(0.0)=0.33, y(6.0)=0.91(6.0)+0.33=5.79
give us predicted points (0.0,0.33), (6.0,5.79) for the regression line
Reversing them, the (x,y) pairs would be:
(1.1,1.0), (2.2,2.0), (3.3,3.0), (3.9,4.0), (4.8,5.0)
Linear regression says:
r=0.9946, slope=1.087, y-intercept=-0.326
y=1.087x-0.326 --> x=(y+0.326)/1.087
Predictions: x(0.33)=(0.0+0.326)/1.087=0.60, x(5.79)=(5.79+0.326)/1.087=5.63
give us predicted points (0.24,0.45), (5.79, 5.63) for the regression line
Conclusion:
Even with obviously strongly correlated data,
for the same data points, the correlation coefficient, r, is the same,
but the calculated regression line is a bit different, and depends on what we take as the x.
Points (1.0,1.1), (2.0,2.2), (3.0,3.3), (4.0,3.9), (5.0,4.8)
and the two regression lines (green and blue) are plotted below.

If I had made up a set of points more widely scattered, the difference would be more dramatic.
|
|
|