Let's Build a Simple Statistics Class in ES6
I needed to build a simple statistics tool for analyzing some timing information across a small section of our application. I only needed this during debugging (as it won't be in production), and I did not really want to include a huge library just for a few small functions that I needed to test against.
In this tutorial, we are going to build this simple statistics analyzer class. You can find this class in this gist: https://gist.github.com/jimpoulakos/faf17fe4dbfc385d5c3f.
Getting Started
The first thing we are going to need is our basic set of helper functions. We have two basic helper methods that we are going to build. We are also going to have a simple constructor that takes the array you want to analyze.
class analyzer {
constructor (target) {
this.arr = target;
}
sum () {
return this.arr.reduce((previous, current) => previous + current, 0);
}
id () {
return this.arr.map(value => value);
}
}
You will notice that I am using arrow functions here, much like I would use lambdas in C#. Our sum function takes our array, and iterates over it using the Array.prototype.reduce()
function. We only really care about the previous value and the current value, so we are ignoring the other two arguments in the reduce function. Additionally, we are setting the initial value to zero.
Our id function serves as a simple array copy method. There are other uses for an id function, but we won't be dealing with those in this article.
Basic Statistics
In most statistics libraries, you are going to come across Mean, Median and Mode functions. We want our little statistics analyzer class to also include these three (and more). Let's get to it.
class analyzer {
// Removed above code for brevity
mean () {
return this.sum() / this.arr.length;
}
median () {
let temp = this.id();
temp.sort((a,b) => a - b);
let midpoint = Math.floor(temp.length / 2);
return temp.length % 2?
temp[midpoint]:
(temp[midpoint - 1] + temp[midpoint]) / 2;
}
mode () {
let map = {},
max = null;
this.arr.forEach(value => {
map[value] = map[value] || 0;
map[value]++;
});
for (let key in map) {
if (max === null) {
max = key;
}
max = map[max] < map[key]? key: max;
}
return max;
}
}
The mean, or average, takes advantage of the helper function, sum
to get the average of all values in the array.
The median is the middlemost value, or the average of the two middlemost values in an even-length array. In order to make sure that we can get the median value, we need to sort. We do this using a simple lambda-style arrow function.
The mode is the value that is the most common of all values presented. We keep track of a map of all the values, using the value itself as the key to a count that we increment as needed. Again, we are using a lambda-style arrow function in order to keep down our code bloat.
Advanced Statistics
The basic functions are great to have, but we really need a little bit more here. Let's continue by adding the functions for getting the sample standard variance and sample standard deviation.
class analyzer {
// Removed above code for brevity
sum (arr) {
arr = arr || this.id();
return arr.reduce((previous, current) => previous + current, 0);
}
squared (arr) {
arr = arr || this.id();
return arr.map(value => value * value);
}
standardVariance () {
let mean = this.mean(),
differences = this.arr.map(value => value - mean),
squares = this.squared(differences);
return this.sum(squares) / (this.arr.length - 1);
}
standardDeviation () {
return Math.sqrt(this.standardVariance());
}
}
Because we need an additional sum type, we are modifying our sum function to allow us to either pass in an array or use the initialized array as a default.
We've added a new helper function squared
to the mix. What this does is square all of the values in the array and return the resulting dataset. As with our sum function, we are allowing the code to call this by either passing in an array or defaulting to our initialized array.
Sample standard variance is calculated by summing the square of the difference between each value and the mean of the hole, and then calculating the average of the sample (which is the size of the array minus one). You can see that we are using another lambda-style arrow function in order to map the differences list.
The sample standard deviation is simply the square root of the sample standard variance.
Now that we've got these, what can we also derive from these values? For starters, we can get the standard maximum/minimum deviation values. This is the sigma deviation from the mean.
class analyzer {
// Removed above code for brevity
maximumStandardDeviation () {
return this.mean() + this.standardDeviation();
}
minimumStandardDeviation () {
return this.mean() - this.standardDeviation();
}
}
More Advanced Statistics
We've learned to build our application slowly. We start with small units and move our way into more difficult things. We build on existing functions or create new ones in order to help move this right along.
Our next step is to tackle the next main set of statistics functions: population standard variance and population standard deviation.
class analyzer {
// Removed above code for brevity
differences (mean) {
return this.arr.map(value => value - mean);
}
populationVariance () {
let mean = this.mean(),
differences = this.differences(mean),
squares = this.squared(differences);
return this.sum(squares) / this.arr.length;
}
populationDeviation () {
return Math.sqrt(this.populationVariance());
}
}
I've created a new function here for the differences. We can retrofit our standardDeviation
function to use this instead of the inline lambda function. If we need to change this at a later time, we will only need to do so in one place.
The population standard variance and deviation functions are almost identical to the sample standard functions, with the exception that we are using the entire population of the array instead of just the sample population.
Next Steps
You can build out additional functionality on top of this simple analyzer class by forking the gist. You could add a number of other useful functions.
- Interquartile range
- Skewness
- Kurtosis
- Correlation
- Regression analysis
These are just a few ideas you could implement with this base class definition.