# Sum

Jakob Jenkov |

The term *sum* used in mathematical analysis means "the sum of values stored in the records of a data set".
A sum may refer to both the sum of values in all records in a data set, or the sum of values in a subset of
records in the data set. I will illustrate both types of sums in this tutorial.

The sum of values in records is both interesting by itself, but also as part of composite calculations (e.g. average customer life time value - the average "sum" of total orders from a customer in their time as customer with you).

To illustrate sum operations on a data set I will use the following example data set:

Item | Amount | Order Id | Customer Id |
---|---|---|---|

Hard disk | 99.95 | 790 | 23 |

Monitor | 195.95 | 791 | 45 |

Mouse | 19.95 | 792 | 23 |

Keyboard | 29.95 | 793 | 23 |

Hard disk | 79.95 | 794 | 76 |

Mouse | 17.95 | 795 | 34 |

Keyboard | 24.95 | 796 | 34 |

Monitor | 249.95 | 797 | 67 |

USB Storage | 49.95 | 798 | 67 |

Hard disk | 119.95 | 799 | 87 |

## Total Sum

The total sum of a value in a data set is the sum of that value from all records in the data set. For the example data set above, the total sum of order amounts is 888.5 .

In other tutorials in this mathematical analysis trail I will use the following notation for total sum:

sum(data, property)

This is a functional notation, where the name of the function is `sum`

, and the parameters passed
to the `sum`

function are the data set (`data`

) and the name of the `property`

of each record
to sum. For instance:

sum(data, "amount")

## Subset Sum

The subset sum of a value in a data set is the sum of that value from a subset of the records in the data set. For the example data set above, the subset sum of orders made by customer with customer id 23 is 149.85 .

In the other tutorials in this mathematical analysis trail I will use the following notation for subset sum:

sum(data, property, criteria)

The `data`

parameter to the `sum`

function is the data set. The `property`

is the
name of the value to sum from each record. The `criteria`

is the criteria used to select what records
to sum the values for. For example:

sum(data, "amount", "records by customers with more than 3 orders")

In this example the `property`

to sum is the "amount" properties. The records to sum from is
"records by customers with more than 3 orders". This criteria is not directly executable by a computer. In a real
program you might have to use a criteria syntax that is executable by a computer, like SQL or a lambda expression
of some kind. Exactly what syntax to use depends on what tools you are using to keep the data set in.

Tweet | |

Jakob Jenkov |