By Roberto Lopez, Artelnics.
In machine learning, benchmarking is the practice of comparing tools to identify the best-performing technologies in the industry.
However, comparing different machine learning platforms can be a difficult task due to the large number of factors involved in the performance of a tool.
This post aims to identify the most critical key performance indicators (KPIs) and define a consistent measurement process.
Contents:
As we know, the volume, variety, and velocity of information stored in organizations are increasing significantly.
Therefore, for machine learning tools to be efficient, they need to process large amounts of data in the shortest time possible.
Key performance indicators typically measured here are data capacity, training speed, inference speed, and model precision.
Benchmarking is used to measure performance using a specific indicator resulting in a metric that is then compared to others.
This allows organizations to develop plans on making improvements or adapting specific best practices, usually to increase some aspect of performance.
In this way, they learn how well the targets perform and, more importantly, the business processes that explain why these firms are successful.
Nowadays, common datasets used in machine learning might contain thousands of variables and millions of samples.
However, machine learning platforms may crash due to memory problems when building models with big datasets.
Therefore, tools capable of processing these volumes of data are necessary.
The data capacity of a machine learning platform can be defined as the biggest dataset that it can process. In this way, the tool should perform all the essential tasks with that dataset.
We can measure data capacity as the number of samples that a machine learning platform can process for a given number of variables.
This metric depends on numerous factors:
To compare the data capacity of machine learning platforms, we follow the next steps:
Note that the selection of a dataset suite is necessary.
The following figure illustrates the result of a data capacity test with two platforms.
As we can see, Platform A can analyze up to 400,000 samples, while Platform B can analyze up to 600,000 samples. Therefore, we can say that the capacity of Platform B is 1.5 times the capacity of Platform A.
As a practical case, consider that our computer has 16 Gb RAM and our data set has 500,000 samples. Platform A would throw a memory allocation error, while Platform B would train the model.
One of the most critical factors in machine learning platforms is the time they need to train the models. Indeed, modeling big data sets is very expensive in computational terms.
Training machine learning models with big datasets can take several hours. Moreover, before deploying a model, it is usually necessary to train many candidate models to select the best-performing one. This can make it impractical to use some platforms for some applications.
The training speed of a machine learning platform depends on numerous factors:
Training speed is usually measured as the number of samples per second that the platform processes during training.
To compare the training speed of machine learning platforms, we follow the next steps:
The following figure illustrates the result of a training speed test with two platforms.
As we can see, the training speed of Platform 1 is 200,000 samples/second, while that of platform 2 is 350,000 samples/second. Therefore, we can say that the training speed of Platform B is 1.75 times the capacity of Platform A.
To illustrate that, consider a data set with 1,000,000 training samples and an optimization algorithm that runs for 1,000 epochs. The training time for Platform A is 1:23:20 seconds, and the training time for Platform B is 00:47:37.
The main objective of machine learning is to develop models to attain high accuracy.
We can define precision as the mean error of a model against a testing data set.
The precision of a machine learning platform depends on numerous factors:
We follow the next steps to compare the precision of different machine learning platforms:
A dataset here should allow to reach error 0.
The next table illustrates the result of an accuracy test with two platforms:
As we can see, Platform A can build a model with a correlation of 0.8. On the other hand, Platform B can build a model with a correlation of 0.9. Therefore, we can say that the precision of Platform B is 1.12 times bigger than that of Platform A.
In many applications, especially real-time ones, the response time of the model is a critical factor. Indeed, an inference time of a few milliseconds can make the model impractical.
The inference speed can be defined as the time to calculate the outputs from the model as a function of the inputs. To measure this metric, we use the number of samples per second.
The inference speed of a machine learning platform depends on numerous factors.
To compare the inference speed of machine learning platforms, we follow the next steps:
The following figure illustrates the result of an inference speed test with two platforms.
As we can see, in one second, Platforms A and B can calculate the outputs for 700,000 and 900,000 inputs, respectively. Therefore, we can say that the inference speed of Platform B is 1.28 times bigger than that of Platform A.
This post aims to define the most important KPIs in machine learning platforms.
It also describes the most relevant factors that might affect those key performance indicators.
Finally, it describes how to design and measure performance tests for data capacity, training speed, model precision, and inference speed.
The machine learning platform Neural Designer implements high performance techniques so that you can get maximum productivity.
You can download Neural Designer now and try it for free.