Average (Mean)
Jakob Jenkov |
The term average refers to a special ratio calculated from a data set. The term mean is an often used synonym for average. The average (or mean) is defined like this:
average(data) = ratio( sum(data), count(data) )
In other words, the average is the sum of all values in a data set divided by the number of records in the data set. The above is a functional notation which I will use in many other tutorials in this mathematical analysis trail.
If the records in the data set contains multiple properties, then you can calculate the average value of each numeric property. As an example, look at this data set:
Item | Amount | Order Id | Customer Id |
---|---|---|---|
Hard disk | 99.95 | 790 | 23 |
Monitor | 195.95 | 791 | 45 |
Mouse | 19.95 | 792 | 23 |
Keyboard | 29.95 | 793 | 23 |
Hard disk | 79.95 | 794 | 76 |
Mouse | 17.95 | 795 | 34 |
Keyboard | 24.95 | 796 | 34 |
Monitor | 249.95 | 797 | 67 |
USB Storage | 49.95 | 798 | 67 |
Hard disk | 119.95 | 799 | 87 |
The average value of the amount
property for the records in this data set is:
average(data, "amount") = ratio( sum(data, "amount"), count(data) )
The sum of the amount
property is 888.5 and the count is 10. Thus, the average value of the
amount
property is:
average(data, "amount") = ratio(888.5 , 10)
which is 88.85 .
Subset Average
You can also calculate the average of property values for a subset of a data set. For instance, we could calculate the average amount of orders by customers with exactly 2 orders.
Only 2 customers have made exactly 2 orders, and that is customer 34 and 67. The sum of the order amounts is 342.8 and the total count of orders from these customers is 4. The average is thus defined like this:
average(data, "amount", "orders of customers with exactly 2 orders") ) = ratio( sum (data, "amount", "orders of customers with exactly 2 orders"), count(data, "orders of customers with exactly 2 orders"))
Only 2 customers have made exactly 2 orders, and that is customer 34 and 67. The sum of the order amounts is 342.8 and the total count of orders from these customers is 4. The average is thus defined like this:
average(data, "amount", "orders of customers with exactly 2 orders") ) = ratio( 342.8 , 4)
The result is 85.7
Tweet | |
Jakob Jenkov |