Question 706393: Problem 3) The following data come from your book, problem 26 on page 298. Here is the data:
Mean daily calories Infant Mortality Rate (per 1,000 births)
1523 154
3495 6
1941 114
2678 24
1610 107
3443 6
1640 153
3362 7
3429 44
2671 7
a) For the above data construct a scatter plot using SPSS or Excel (Follow instructions on page 324 of your textbook). What does the scatter plot show? Can you determine a type of relationship? Are there any outliers that you can see?
b) Using the same data conduct a correlation analysis using SPSS or Excel. What is the correlation coefficient? Is it a strong, moderate or weak correlation? Is the correlation significant or not? If it is what does that mean?
Answer by KMST(5328) (Show Source):
You can put this solution on YOUR website! a)
When you get your points plotted (even if you did it by hand), with mean daily calories on the x-axis and infant mortality rate on the y-axis,
you would see that there is some scatter, but it seems obvious that the two variables are related.

The greater the mean daily calorie intake, the lower the infant mortality rate.
The scatter plot shows that and more.
It looks like a straight line would be a good model to represent the data, so you would say that there is a linear relationship.
The relationship line drawn to fit the points would slope down (negative slope), so you would say that infant mortality rate is negatively correlated with mean daily calorie intake.
I do not like to eliminate suspected outliers.
You may think that the point (2671,7), representing a low 7 per 1,000 births infant mortality in a place with a mean daily calorie intake of only 2671 calories is an outlier. Or maybe you think (3429,44) is an outlier. Or maybe you think both are.
b)
With your data, Excel gives me , and .
I would say that
the correlation coefficient is -0.902,
it is a strong correlation (because that close to -1 or 1, everyone agrees to call it strong), and
it is significant (because tabulated critical values for 10 data pairs/points are -0.602 and 0.602, and -0.902 is not between those critical values, meaning that it would be very unlikely to get 10 data points that so strongly suggest correlation if the variables were really unrelated).
The significant strong negative correlation does not mean that low calorie intake causes infant mortality, or that infant mortality causes low calorie intake. The two variables are related, but statisticians are supposed to say "correlation does not imply causation". (It is up to politicians and journalists to give that correlation the spin of their choice).
If you have Excel, there are many versions of it, and the menus became harder to find with all the new features added in the last 10 years.
You need to enter your data table first.
(I would have 2 columns with titles x, for mean daily calories,
and y, for infant mortality rate).
To graph with Excel, selecting the data first,
you would go to the Insert menu and select Chart, selecting the chart type "XY (Scatter)" with the choice showing just the points, no lines.
You might find "Chart Wizard" icons to get to Chart a little faster, going through less menus.
The software guides you through the steps to get your points plotted.
You get x and y axes and your points plotted.
(I like to get rid of the gray "fill" painted into the chart,
by right clicking on that gray area, selecting format plot area,
and selecting "none" for border and fill.
I also like to delete the grid lines, and the "Series 1" legend).
Then, somewhere in the Chart menu, I would find "Add Trendline" and use it to add my regression line (linear type).
Then, right-cliking on the line, I would "Format Trendline" to get the desired "Patterns" (dashed, dotted, or solid, thick, or thin, black, or other color).
(I also like to use "Format Trendline" "Options" to
"Display equation on chart", "Display R-squared value on chart", and extend the line "Forward" and "Backward" a bit using "Forecast".
Outside/without the graph, Excel can give you the correlation coefficient, R, using the "CORREL" function,
and/or you can use the "LINEST" function as an array function to get ten linear regression parameters tabulated in two columns.
|
|
|