SOLUTION: Because it is not practical to weigh bears in the field, researchers sought to develop a model to predict a bear's weight based on its length. Here are the results for a sample:

Click here to see ALL problems on Linear-regression

Question 1196665: Because it is not practical to weigh bears in the field, researchers sought to develop a model to predict a bear's weight based on its length. Here are the results for a sample:
Total Length (cm) Weight (kg)
139.0 110
138.0 60
139.0 90
120.5 60
149.0 85
141.0 100
141.0 95
150.0 85
166.0 155
151.5 140
129.5 105
150.0 110
The residual associated with the bear whose length is 149.0 cm and weight is 85 kg is _______kg. (round your answer to three digits after the decimal)

Answer by math_tutor2020(3817) (Show Source):
You can put this solution on YOUR website!

Answer: -24.960

============================================================

Explanation:

x = total length (cm)
y = weight (kg)

Use technology to find the equation of the regression line
I used GeoGebra to get y = 1.69417x - 142.47092 approximately.
You could use a spreadsheet program or any linear regression calculator to get the same thing.
I'll go into further detail where this equation comes from in the next section below.

Plug in x = 149.0 to find that,
y = 1.69417x - 142.47092
y = 1.69417*149.0 - 142.47092
y = 109.96041

The length of x = 149.0 cm leads to a predicted weight of about y = 109.96041 kg
The true weight associated with this x value should be y = 85 kg instead.

The residual is the error between the observed y value and predicted y value
residual = (observed y value) - (predicted y value)
residual = 85 - 109.96041
residual = -24.96041
residual = -24.960
This is approximate and rounded to three decimal places (nearest thousandth)

---------------------------------------------

This section will go into further detail where the regression line came from.
Depending on your teacher, this section is optional.

In real world settings, you won't need to know the formulas. It's much more efficient to use calculators, software, or spreadsheets.
However, it's still good to know what's going on under the hood. I'll leave out the proofs and derivations of each formula.
Those are better suited for calculus and linear algebra settings.

Here's the original data set of x and y values paired up together

x y
139 110
138 60
139 90
120.5 60
149 85
141 100
141 95
150 85
166 155
151.5 140
129.5 105
150 110

We'll form the following columns:
x^2
xy

The x^2 column is where we square each x value
eg: 139 squares to 139^2 = 139*139 = 19321

The xy column has us multiply each x and y value together (separately per row).
Eg: 139.0*110 = 15290 in the first row of this column.

I strongly recommend using spreadsheet software rather than doing it all by hand.

Here's what all that looks like

x y x^2 xy
139 110 19321 15290
138 60 19044 8280
139 90 19321 12510
120.5 60 14520.25 7230
149 85 22201 12665
141 100 19881 14100
141 95 19881 13395
150 85 22500 12750
166 155 27556 25730
151.5 140 22952.25 21210
129.5 105 16770.25 13597.5
150 110 22500 16500

Next we add up the values of each column
P = sum of the x values = 1714.5
Q = sum of the y values = 1195
R = sum of the x^2 values = 246447.75
S = sum of the xy values = 173257.5

The linear regression equation is of the form y = mx+b
m = slope
b = y intercept

To calculate m and b, we use these two formulas

where P,Q,R,S were mentioned in the previous paragraph above.

The numerators are different, but the denominators are identical.
The n refers to the sample size. It's the number of x,y pairs of values. In this case we have n = 12 such items.

So,

is the approximate slope

And,

is the approximate y intercept.

Therefore, the template y = mx+b updates to the approximation of y = 1.69417x - 142.47092
This was the linear regression equation (aka line of best fit) mentioned in the previous section.
Follow the steps mentioned in the previous section to get an answer of -24.960