Science & Tech

Sum of Squares: The Sum of Squares Formula

Written by MasterClass

Last updated: Nov 22, 2022 • 4 min read

In regression analysis, the total sum of squares measures standard deviations from the data set’s mean value. Learn how to calculate and use the sum of squares measurement.

Learn From the Best

What Is the Sum of Squares?

Sum of squares is a common statistical technique in regression analysis that measures the dispersion of data points. The sum of squares calculates the deviation of data points away from the sample size’s mean value. Investors often calculate this practical analysis of variance to help inform investment decisions as it illuminates the variance in asset values.

How to Use the Sum of Squares

The sum of squares is a statistical measurement of the deviation from the mean. This variation refers to the relative distance between a given data point and the overall set’s average value. This determines the best fit line in a graph; if the line does not pass through each data point, there is some variability.

Investors and market analysts rely on the sum of the squares to forecast volatility in a given company’s stock values. They might chart two companies' stock values daily over two years to see how far each company deviates from their mean value. A company with more significant variability might make for a less wise investment.

The Total Sum of Squares Formula

For a set X of n items, the sum of squares equals:
The Total Sum of Squares Formula

In the sum of squares equation, “X sub i” equals the “i to the th power” item in the set; “X-bar” equals the mean of all items in the set; and the difference of the two equals the deviation of each item from the mean.

How to Calculate the Sum of Squares

To calculate the total sum of squares, subtract the data points from their mean, square that difference to yield a positive number, then calculate the summation of the difference and the squared differences. Follow these steps:

  1. 1. Calculate. First, add all the data points and divide them by the number of values in the data set.
  2. 2. Subtract. Second, subtract all the data points from that mean value.
  3. 3. Square. Third, square that difference.
  4. 4. Add. Finally, add the difference you found in step two to the squared difference you found in step three.

A sum of squares that totals a large number indicates higher variability, while a lower value shows little variability from the sample mean.

3 Types of Sum of Squares Figures

There are three main variations on the sum of squares:

  1. 1. Regression sum of squares: This sum of squares shows the relationship between the given data and the regression model, where the regression model denotes whether there is a relationship between one or multiple variables. A low regression sum of squares shows a better best fit within the data.
  2. 2. Residual sum of squares: A residual sum of squares speaks to a sum of squares whose best fit line does not pass through many data points, showing unexplained variance in the data set.
  3. 3. Total sum of squares: The total sum of squares measures the distance between a given data point and the set’s mean value. The total sum of squares can help determine the residual and regression sum of squares.

Sum of Squares Example

Calculating the sum of squares for a company’s monthly revenue can tell you how much variance there is. Take the example company Adventure Movies. To find the sum of squares, follow these steps:

  • Add the ticket prices. Each month, Adventure Movies releases a new film and measures their ticket sales in millions of dollars. In January, they make $7.0, in February $5.2, in March $7.5, in April $8.0, and in May $8.4.
  • Calculate the mean. To find the sum of squares, first find the mean: $7.0 + $5.2 + $7.5 + $8.0 + $8.4 = $36.1. Then divide the mean by five (the number of months in this data set) to get $7.2 million.
  • Find the difference of each price from the mean. To calculate the sum of squares, first find the difference of each price from the mean. The sum of squares = ($7.0 - $7.2)2 + ($5.2 - $7.2)2 + ($7.5 - $7.2)2 + ($8.0 - $7.2)2 + ($8.4 - $7.2)2 .
  • Square the differences. Next, square the differences. The sum of squares = (-0.2)2 + (-2.0)2 + (0.3)2 + (0.8)2 + (1.2)2.
  • Add the differences together. Finally, add the differences. In this example, the sum of squares = 0.04 + 4.00 + 0.09 + 0.64 + 1.44 = 6.21.

This brings the total sum of squares to 6.21, a higher sum of squares, showing a wider variability between a month’s given revenue and the overall mean. This can be challenging for investors, who might not know exactly how much money Adventure Movies will make in a given month. For example, if the mean is $7.2 million per month, but February only made $5.2, that is a significant, $2 million difference.

Learn More

Get the MasterClass Annual Membership for exclusive access to video lessons taught by science luminaries, including Terence Tao, Bill Nye, Neil deGrasse Tyson, Chris Hadfield, Jane Goodall, and more.