Mathematical analysis is the use of math to analyze data. Mathematical analysis is frequently used as part of data science projects. Mathematical analysis includes areas like statistics and probability, but also many simpler formulas and calculations, like simple conversion ratio calculations.
This mathematical analysis tutorial will contain a list of mathematical formulas and calculations which are often used in data science projects. These calculations fall within the following subcategories of mathematical analysis:
- Descriptive Statistics
- Inferential Statistics
I will briefly explain the categories though.
Descriptive statistics is concerned with describing an observed entity. If the observed entity was the population of a country, descriptive statistics would be concerned with calculations describing that population. For instance, how many people in total, in different age groups, with different types of educations, salary ranges etc.
Inferential statistics is concerned with probabilities based on observations. For instance, if N% in the past had a 3 years or more university education, what is the chance of a new born to get an education? What would the probability be if one of that child's parents had a longer university education? What if both parents had a longer university education? What if the child grew up in a single parent family? With a single mom? With a single dad? If the child is a boy? If the child is a girl?
Probability is concerned with how you calculate the probability of an event occurring. For instance, what is the probability of a die roll resulting in a 6 being rolled? What is the probability of 2 subsequent die rolls resulting in two 6'es being rolled? What are the odds of two die rolls resulting in a total amount of 7? Or of 11?
Mathematical vs. Computer Science Terms
Mathematicians and computer scientists / developers often use different terms for the same concepts. It is good to know what these terms are, and what they mean in the two camps.
As mentioned in the data science introduction, data science projects often attempt to extract aggregate information from individual records in data sets. Developers would call the full amount of data for a "data set", and each individual record in the data set for a "record" or "object".
Mathematicians look at data in a slightly different perspective. If a data set measures the full "group" being examined, then the data set is called a "population". For instance, if you are measuring the number of disease outbreaks in a country, and you measure all disease outbreaks in the whole population, then your data set is a "population". If your data set only measures part of the population (part of the full group being studied), then mathematicians call this a "sample". Each record in a data set is called an "observation".
As you can see, what computer scientists call a "data set", mathematicians call a "sample" or "population" depending on what the data set contains. Computer scientists tend to call each item in the data set a "record" or "object", and mathematicians call them "observations". I will be using both terms in this mathematical analysis tutorial.
This mathematical analysis tutorial consists of many pages covering smaller aspects of mathematical analysis. See the menu at the top left of the page for a list of the topcis covered in other texts.
Mathematical vs. Computer Science Notations
Mathematicians use their own special notation to express formulas and equations. This notation involves the use of greek letters (e.g. sigma which means "sum") annotated with other letters and numbers.
Software developers tend to write mathematical formulas and expressions in a notation that looks like how
you would write it in a programming language. For instance, the text
be used instead of the greek letter sigma.
Since the target audience of this tutorial is software developers, I will be using the notation used by software developers throughout this mathematical analysis tutorial.