Count
Jakob Jenkov |
The term count used in mathematical analysis means "the number of records (observations)" in a data set. A count may refer to both the total number of records in a data set, or the count of a subset of the records in the data set. I will illustrate both types of counts in this tutorial.
The count of records is both interesting by itself (e.g. the total number of customers), and as part of composite calculations (e.g. how big percentage of our customers are from a specific country).
To illustrate count operations on a data set I will use the following example data set:
Item | Amount | Order Id | Customer Id |
---|---|---|---|
Hard disk | 99.95 | 790 | 23 |
Monitor | 195.95 | 791 | 45 |
Mouse | 19.95 | 792 | 23 |
Keyboard | 29.95 | 793 | 23 |
Hard disk | 79.95 | 794 | 76 |
Mouse | 17.95 | 795 | 34 |
Keyboard | 24.95 | 796 | 34 |
Monitor | 249.95 | 797 | 67 |
USB Storage | 49.95 | 798 | 67 |
Hard disk | 119.95 | 799 | 87 |
Total Count
The term "total count" usually refers to the total number of records in the data set. For the example data set above, the total count is 10.
In other tutorials in this mathematical analysis trail, I will use the following notation for count:
count(data)
This is a functional notation where count
is a function performed on data
which is
the data set.
Subset Count
A count operation may count a subset of the records which match a certain criteria. For instance, in the above example data set the count of orders of a keyboard is 2, and the count of orders of a hard disk is 3. Similarly, the total number of customers is 7.
I will be using this notation for subset count in other tutorials in this mathematical analysis trail:
count(data, criteria)
The criteria
part means the criteria by which the subset is selected. This criteria will typically
be expressed in text, like:
count(data, "customers with more than 1 order"); count(data, "customers that bought a keyboard");
This notation is not directly executable by a computer. A computer cannot easily make sense of the textual selection criteria. In a real computer program you would have to specify the selection criteria using a syntax which a computer could understand.
Exactly what this syntax would be, depends on what tools you are using to analyze the data. If you were using a relational database, the syntax could be SQL. If you are keeping all data in memory and analyzing it with code, it could be another function etc. Use your imagination here.
Tweet | |
Jakob Jenkov |