How to benchmark the performance of machine learning platforms

By Roberto Lopez, Artelnics.

In machine learning, benchmarking aims to compare different tools to identify the best-performing technologies in the industry.

However, comparing different machine learning platforms can be a difficult task due to the large number of factors involved in the performance of a tool.

This post aims to identify the most critical key performance indicators (KPIs) and define a consistent measurement process.

Contents:

Performance benchmarking

As we know, the volume, variety, and velocity of information stored in organizations are increasing significantly.

Therefore, for machine learning tools to be efficient, they need to process large amounts of data in the shortest time possible.

Key performance indicators typically measured here are data capacity, training speed, inference speed, and model precision.

DATA CAPACITY

TRAINING SPEED

MODEL PRECISION

INFERENCE SPEED

Benchmarking is used to measure performance using a specific indicator resulting in a metric that is then compared to others.

This allows organizations to develop plans on making improvements or adapting specific best practices, usually to increase some aspect of performance.

In this way, they learn how well the targets perform and, more importantly, the business processes that explain why these firms are successful.

Data capacity tests

Nowadays, common datasets used in machine learning might contain thousands of variables and millions of samples.

However, machine learning platforms may crash due to memory problems when trying to build models with big datasets.

Therefore, tools that are capable of processing these volumes of data are necessary.

The data capacity of a machine learning platform can be defined as the biggest dataset that it can process. In this way, the tool should perform all the essential tasks with that dataset.

Data capacity can be measured as the number of samples that a machine learning platform can process for a given number of variables.

This metric depends on numerous factors:

To compare the data capacity of machine learning platforms, we follow the next steps:

  1. Choose a reference computer (CPU, GPU, RAM...).
  2. Choose a reference benchmark (data set, neural network, training strategy).
  3. Choose a reference model (number of layers, number of neurons...).
  4. Choose a reference training strategy (loss index, optimization algorithm...).
  5. Choose a stopping criterion (loss goal, epochs number, maximum time...).

Note that the selection of a dataset suite is necessary.

The following figure illustrates the result of a data capacity test with two platforms.

As we can see, Platform A can analyze up to 400,000 samples, while Platform B can analyze up to 600,000 samples. Therefore, we can say that the capacity of Platform B is 1.5 times the capacity of Platform A.

As a practical case, consider that our computer has 16 Gb RAM and our data set has 500,000 samples. Platform A would throw a memory allocation error, while Platform B would train the model.

Training speed tests

One of the most critical factors in machine learning platforms is the time they need to train the models. Indeed, modeling big data sets is very expensive in computational terms.

Training machine learning models with big datasets can take several hours. Moreover, before deploying a model, it is usually necessary to train many candidate models to select the best-performing one. This can make it impractical to use some platforms for some applications.

The training speed of a machine learning platform depends on numerous factors:

Training speed is usually measured as the number of samples per second that the platform processes during training.

To compare the training speed of machine learning platforms, we follow the next steps:

  1. Choose a reference benchmark (data set, neural network, training strategy...).
  2. Choose a reference computer (CPU, GPU, RAM...).
  3. Compare the training speed.

The following figure illustrates the result of a training speed test with two platforms.

As we can see, the training speed of Platform 1 is 200,000 samples/second, while that of platform 2 is 350,000 samples/second. Therefore, we can say that the training speed of Platform B is 1.75 times the capacity of Platform A.

To illustrate that, consider a data set with 1,000,000 training samples, and an optimization algorithm that runs for 1,000 epochs. The training time for Platform A is 1:23:20 seconds, and the training time for Platform B is 00:47:37.

Model precision tests

The main objective of machine learning is to develop models to attain high accuracy.

We can define precision as the mean error of a model against a testing data set.

The precision of a machine learning platform depends on numerous factors:

We follow the next steps to compare the precision of different machine learning platforms:

  1. Choose a reference computer (CPU, GPU, RAM...).
  2. Choose a reference dataset (variables and samples number).
  3. Choose a reference model (number of layers, number of neurons...).
  4. Choose a reference training strategy (loss index, optimization algorithm...).
  5. Choose a stopping criterion (loss goal, epochs number, maximum time...).

A dataset here should allow to reach error 0.

The next table illustrates the result of an accuracy test with two platforms:

As we can see, Platform A can build a model with a correlation of 0.8. On the other hand, Platform B can build a model with a correlation of 0.9. Therefore, we can say that the precision of Platform B is 1.12 times bigger than that of Platform A.

Inference speed tests

In many applications, especially real-time ones, the response time of the model is a critical factor. Indeed, an inference time of a few milliseconds can make the model impractical.

The inference speed can be defined as the time to calculate the outputs from the model as a function of the inputs. To measure this metric, we use the number of samples per second.

The inference speed of a machine learning platform depends on numerous factors.

We follow the next steps to compare the inference speed of machine learning platforms,

  1. Choose a reference computer (CPU, GPU, RAM...).
  2. Choose a reference input set (variables and samples number).
  3. Choose a reference model (number of layers, number of neurons...).

The following figure illustrates the result of an inference speed test with two platforms.

As we can see, in one second, Platforms A and B can calculate the outputs for 700,000 and 900,000 inputs, respectively. Therefore, we can say that the inference speed of Platform B is 1.28 times bigger than that of Platform A.

Conclusions

This post aims to define the most important KPIs in machine learning platforms.

It also describes the most relevant factors that might affect those key performance indicators.

Finally, it describes how to design and measure performance tests for data capacity, training speed, model precision, and inference speed.

The machine learning platform Neural Designer implements high performance techniques so that you can get maximum productivity.

You can download Neural Designer now and try it for free.

Related posts: