Correlation is a statistical measurement used to quantify the strength and direction of
a linear relationship.

\(\bullet\) This value is unitless, and thus is not affected by location and scale of the variables,
and bound between -1 and 1.

\(\bullet\) It is typically denoted by \(r\) or by \(\rho\).

\(\bullet\) A correlation of -1 would mean that the data have a perfectly negative relationship, which would appear in the scatterplot as
a perfect line with a negative slope.

\(\bullet\) Similarly, a correlation of 1 would mean that the data are perfectly positively correlated, which would appear in the scatterplot as a perfect line
with a positive slope.

\(\bullet\) If data have no relationship, they would have a correlation of 0, and would appear as a random
scatter of points in the scatterplot.

\(\bullet\) The correlation presented in this application is generated using the Pearson
Correlation Coefficient method.

\(\bullet\) The formula used to calculate this is value is \({1\over n-1}\sum_{i=1}^n{(x_i-\bar{x})(y_i-\bar{y})\over s_xs_y}\)

\(\bullet\) The population regression line is: \(Y=\beta_o+\beta_1X\)

\(\bullet\) When given a random sample of data, we estimate this by: \(\hat{y}=b_0+b_1x\)

\(\bullet\)To assess, whether or not the estimated line is the best line we can look at two values.

\(\qquad\circ\) We can minimize a value called the sum of squared errors, denoted \(SSE=\sum_{i=1}^n(y_i-\hat{y_i})^2\).

\(\qquad\circ\) Equivalently, we can maximize a value called the coefficient of determination. We denote this value as
\(R^2=1-{SSE \over SST}\), where \(SST=\sum_{i=1}^n(y_i-\bar{y})^2\)