The variance of a data set is a measure of how much the values of the data set deviate from the mean value of the data set. Variance is calculated differently than mean deviation. Variance is also used as a base to calculate standard deviation.
Variance is calculated by first calculating the deviation of each observation (value) from the mean value of all observations. Then you raise each deviation to the power of 2 (multiply it by itself). Third, you sum all the raised deviations (all the deviations multiplied by themselves). Fourth you divide the sum by the number of observations in the data set.
If the data set is a sample (subset) of the full population (full data set) then you have to divide the sum by the number of observations - 1 . If the data set is the full population you divide by the number of observations exactly.
To see how to calculate variance in practice, look at this sample data set:
The mean value for the data set is
mean = ratio( (2 + 4 + 6 + 8 + 10), 5) = 6
Once you know the mean value of the data set you calculate the deviations from it of each observation in the data set. The deviation is simply calculated as the mean value minus the observation value. Here are the deviations for each observation of the example data set:
-4, -2, 0, 2, 4
Now we raise the deviations to the power of 2 - meaning we multiply each value by itself. The values raised to the power of 2 are:
16, 4, 0, 4, 16
We then sum all these values:
sum(16, 4, 0, 4, 16) = 40
Now we divide the sum with the number of observations in the data set, which is 5. Since the data set represents the full population, we do not subtract one from the number of observations. The result is then:
ratio(40, 5) = 8
Thus, the variance of the example data set is 8.