{"id":3417,"date":"2025-10-28T10:59:21","date_gmt":"2025-10-28T09:59:21","guid":{"rendered":"https:\/\/neuraldesigner.com\/blog\/samples\/"},"modified":"2025-11-27T14:55:44","modified_gmt":"2025-11-27T13:55:44","slug":"samples","status":"publish","type":"blog","link":"https:\/\/www.neuraldesigner.com\/blog\/samples\/","title":{"rendered":"Training, validation and testing samples in machine learning"},"content":{"rendered":"\t\t<div data-elementor-type=\"wp-post\" data-elementor-id=\"3417\" class=\"elementor elementor-3417\" data-elementor-post-type=\"blog\">\n\t\t\t\t\t\t<section class=\"elementor-section elementor-top-section elementor-element elementor-element-40f764d0 elementor-section-boxed elementor-section-height-default elementor-section-height-default\" data-id=\"40f764d0\" data-element_type=\"section\">\n\t\t\t\t\t\t<div class=\"elementor-container elementor-column-gap-default\">\n\t\t\t\t\t<div class=\"elementor-column elementor-col-100 elementor-top-column elementor-element elementor-element-759a008c\" data-id=\"759a008c\" data-element_type=\"column\">\n\t\t\t<div class=\"elementor-widget-wrap elementor-element-populated\">\n\t\t\t\t\t\t<div class=\"elementor-element elementor-element-46d4388e elementor-widget elementor-widget-text-editor\" data-id=\"46d4388e\" data-element_type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t\t\t\t\t\t<h2>Introduction<\/h2>\n<p>In machine learning, samples represent all variables in the dataset and are divided into training, validation, and test samples.<\/p>\n<section>A sample contains one or more features and possibly a label. Samples can be labeled or unlabeled.<p><\/p>\n<p>Labeled examples have a label that describes or characterizes the example, and unlabeled examples have no associated feature.<\/p>\n<p>The samples are the rows of the data matrix.<\/p>\n<p>Let $p$ and $q$ be the number of rows and columns in the data matrix.<\/p>\n<p>A sample is a vector $u \\in {R}^{q}$. In this regard, the data matrix contains $p$ samples,<\/p>\n<p>\\begin{eqnarray}<br>u_{i}:=row_{i}(d), \\quad i=1,\\ldots,p.<br>\\end{eqnarray}<\/p>\n<p>Designing a neural network to memorize a data set is not helpful.<\/p>\n<p>Instead, we want the neural network to perform accurately on new data, that is, to generalize.<\/p>\n<p>To achieve that, we divide the data set into three different subsets:<\/p>\n<ul>\n<li>The first subset is the training set. We use it for constructing different candidate models.<\/li>\n<li>The second subset is the selection set, used to select the model exhibiting the best properties.<\/li>\n<li>Finally, the third subset is the testing set used to validate the final model.<\/li>\n<\/ul>\n<p>The following figure illustrates the potential applications of a sample within a dataset.<\/p>\n<p><img decoding=\"async\" src=\"https:\/\/www.neuraldesigner.com\/images\/sample_uses.webp\" alt=\"outline\" width=\"900\"><\/p>\n<p>Usually, 60% of the samples are used for training, 20% for selection, and 20% for testing.<\/p>\n<p>The sample splitting might be performed in sequential order or randomly.<\/p>\n<p>Next, we provide a detailed description of the use of the training, selection, and testing samples.<\/p>\n<h3>Contents<\/h3>\n<section>\n<ol style=\"font-size: 20px;\">\n<li style=\"font-size: 20px;\"><a href=\"https:\/\/www.neuraldesigner.com\/blog\/samples\/?elementor-preview=3417&amp;ver=1703156003#TrainingSamples\">Training Samples<\/a>.<\/li>\n<li style=\"font-size: 20px;\"><a href=\"https:\/\/www.neuraldesigner.com\/blog\/samples\/?elementor-preview=3417&amp;ver=1703156003#SelectionSamples\">Selection samples<\/a>.<\/li>\n<li style=\"font-size: 20px;\"><a href=\"https:\/\/www.neuraldesigner.com\/blog\/samples\/?elementor-preview=3417&amp;ver=1703156003#Testingsamples\">Testing samples<\/a>.<\/li>\n<li style=\"font-size: 20px;\"><a href=\"https:\/\/www.neuraldesigner.com\/blog\/samples\/?elementor-preview=3417&amp;ver=1703156003#Unusedsamples\">Unused samples<\/a>.<\/li>\n<li style=\"font-size: 20px;\"><a href=\"https:\/\/www.neuraldesigner.com\/blog\/samples\/?elementor-preview=3417&amp;ver=1703156003#Conclusions\">Conclusions<\/a>.<\/li>\n<li style=\"font-size: 20px;\"><a href=\"https:\/\/www.neuraldesigner.com\/blog\/samples\/?elementor-preview=3417&amp;ver=1703156003#TutorialVideo\">Tutorial video<\/a>.<\/li>\n<li style=\"font-size: 20px;\"><a href=\"https:\/\/www.neuraldesigner.com\/blog\/samples\/?elementor-preview=3417&amp;ver=1703156003#References\">References<\/a>.<\/li>\n<\/ol>\n<\/section>\n<\/section>\n<section id=\"Trainingsamples\">\n<h2>2. Training Samples<\/h2>\n<p>Training samples are the dataset used to train the model, allowing it to learn the features and patterns of the data.<\/p>\n<p>In each epoch, the training data is repeatedly fed into the neural network architecture, and the model continues to learn the features of the data.<\/p>\n<p>The training set should have a diverse input set so that the model is trained in all scenarios and can predict any unseen data samples that may arise.<\/p>\n<\/section>\n<section id=\"Selectionsamples\">\n<h2>3. Selection samples<\/h2>\n<p>The selection or validation set is the dataset used to validate the performance of our model during training.<\/p>\n<p>This selection process provides information that helps us tune the model&#8217;s hyperparameters and configuration accordingly.<\/p>\n<p>The model is trained on the training set, and its performance is evaluated on the validation set after each epoch.<\/p>\n<p>We use the selection samples for choosing the neural network with the best generalization properties.<\/p>\n<\/section>\n<section id=\"Testingsamples\">\n<h2>4. Testing samples<\/h2>\n<p>The test set is a set of data used to test how the model works after training is complete.<\/p>\n<p>It provides an unbiased metric for determining model performance in terms of accuracy, precision, etc.<\/p>\n<\/section>\n<section id=\"Unusedsamples\">\n<h2>5. Unused samples<\/h2>\n<p>Some samples might distort the model instead of providing helpful information to the model. For example, outliers in the data can cause the neural network to perform poorly. To fix those problems, we set some samples to $unused$. We can also set repeated samples to $unused$ since they provide redundant information to the model.<\/p>\n<p>In this regard, we can define the following samples using a vector,<\/p>\n<p>\\begin{eqnarray}<br>sample\\_use = \\{ training \\lor selection \\lor testing \\lor unused \\}^{p}.<br>\\end{eqnarray}<\/p>\n<p>The size of this vector is $p$, the number of samples in the data set.<\/p>\n<p>Note that a sample cannot have two uses at the same time. For example, we cannot use a sample to $train$ and $test$ a model.<\/p>\n<p>When designing a model, we usually test different configurations, i.e., we usually build candidate models with different architectures and compare their performance.<\/p>\n<p>The first step in building several models is to train different models with the training samples.<\/p>\n<p>We then select the one that performs best with the selection samples.<\/p>\n<p>Finally, we test their capabilities with the test samples.<\/p>\n<h3>Example: Car price assignment<\/h3>\n<p>An automotive company wants to understand the factors that affect the price of cars in the market.<\/p>\n<p>For this study, we gathered an extensive data set of different types of cars on the market to create the model.<\/p>\n<p>The data set has 205 samples. Sixty percent of the samples will be for training, 20% for selection, and 20% for testing.<\/p>\n<p>The first step is to build several models and train them with the training samples.<\/p>\n<p>For the first model, we utilize the 25 variables in the dataset. The following table shows the training error and selection error.<\/p>\n<div style=\"overflow-x: auto;\">\n<table>\n<tbody>\n<tr>\n<th>training error<\/th>\n<th>selection error<\/th>\n<\/tr>\n<tr>\n<td style=\"text-align: right;\">0.0006<\/td>\n<td style=\"text-align: right;\">0.1630<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<\/div>\n<p>For the second model, we have chosen to use 10 variables from the dataset that coincide with those with the highest correlation with the target variable. The following table shows the training error and selection error.<\/p>\n<div style=\"overflow-x: auto;\">\n<table>\n<tbody>\n<tr>\n<th>training error<\/th>\n<th>selection error<\/th>\n<\/tr>\n<tr>\n<td style=\"text-align: right;\">0.0296<\/td>\n<td style=\"text-align: right;\">0.1760<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<\/div>\n<p>We then select the one that performs best with the selection samples.<\/p>\n<p>The selection tests validate the performance of our model during training.<\/p>\n<p>The best-performing model is the first because the selection error is smaller.<\/p>\n<p>Finally, we tested the chosen model&#8217;s capabilities using the test samples.<\/p>\n<\/section>\n<section id=\"Conclusions\">\n<h2>Conclusions<\/h2>\n<p>The samples are the rows of the data matrix and are divided into training, selection, and test samples.<\/p>\n<p>From the training samples, we train the model. We then select the best-performing model using the selection samples and, finally, test its capabilities with the test samples.<\/p>\n<\/section>\n<section id=\"Tutorialvideo\">\n<h2>Tutorial video<\/h2>\n<p>You can watch the video tutorial to help you complete this article.<\/p>\n<p><iframe title=\"YouTube video player\" src=\"https:\/\/www.youtube.com\/embed\/YYulr5S4dQQ\" width=\"560\" height=\"315\" frameborder=\"0\" allowfullscreen=\"allowfullscreen\"><\/iframe><\/p>\n<\/section>\n<section id=\"References\">\n<h2>References<\/h2>\n<ul>\n<li>Kaggle Machine Learning Repository. <a href=\"https:\/\/www.kaggle.com\/hellbuoy\/car-price-prediction\">Car Price Assignment Data Set<\/a>.<\/li>\n<\/ul>\n<h2>Related posts<\/h2>\n<\/section>\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/section>\n\t\t\t\t<\/div>\n\t\t","protected":false},"author":10,"featured_media":1590,"template":"","categories":[],"tags":[36],"class_list":["post-3417","blog","type-blog","status-publish","has-post-thumbnail","hentry","tag-tutorials"],"acf":[],"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v26.4 - https:\/\/yoast.com\/wordpress\/plugins\/seo\/ -->\n<title>Training, validation and testing samples in machine learning<\/title>\n<meta name=\"description\" content=\"In machine learning, samples measure all variables in the data set and are divided into training, selection, and test samples.\" \/>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/www.neuraldesigner.com\/blog\/samples\/\" \/>\n<meta property=\"og:locale\" content=\"en_US\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"Training, validation and testing samples in machine learning\" \/>\n<meta property=\"og:description\" content=\"In machine learning, samples measure all variables in the data set and are divided into training, selection, and test samples.\" \/>\n<meta property=\"og:url\" content=\"https:\/\/www.neuraldesigner.com\/blog\/samples\/\" \/>\n<meta property=\"og:site_name\" content=\"Neural Designer\" \/>\n<meta property=\"article:modified_time\" content=\"2025-11-27T13:55:44+00:00\" \/>\n<meta property=\"og:image\" content=\"https:\/\/www.neuraldesigner.com\/wp-content\/uploads\/2023\/06\/sample_uses.webp\" \/>\n\t<meta property=\"og:image:width\" content=\"1586\" \/>\n\t<meta property=\"og:image:height\" content=\"480\" \/>\n\t<meta property=\"og:image:type\" content=\"image\/webp\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:site\" content=\"@NeuralDesigner\" \/>\n<meta name=\"twitter:label1\" content=\"Est. reading time\" \/>\n\t<meta name=\"twitter:data1\" content=\"5 minutes\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\/\/schema.org\",\"@graph\":[{\"@type\":\"WebPage\",\"@id\":\"https:\/\/www.neuraldesigner.com\/blog\/samples\/\",\"url\":\"https:\/\/www.neuraldesigner.com\/blog\/samples\/\",\"name\":\"Training, validation and testing samples in machine learning\",\"isPartOf\":{\"@id\":\"https:\/\/www.neuraldesigner.com\/#website\"},\"primaryImageOfPage\":{\"@id\":\"https:\/\/www.neuraldesigner.com\/blog\/samples\/#primaryimage\"},\"image\":{\"@id\":\"https:\/\/www.neuraldesigner.com\/blog\/samples\/#primaryimage\"},\"thumbnailUrl\":\"https:\/\/www.neuraldesigner.com\/wp-content\/uploads\/2023\/06\/sample_uses.webp\",\"datePublished\":\"2025-10-28T09:59:21+00:00\",\"dateModified\":\"2025-11-27T13:55:44+00:00\",\"description\":\"In machine learning, samples measure all variables in the data set and are divided into training, selection, and test samples.\",\"breadcrumb\":{\"@id\":\"https:\/\/www.neuraldesigner.com\/blog\/samples\/#breadcrumb\"},\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\/\/www.neuraldesigner.com\/blog\/samples\/\"]}]},{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/www.neuraldesigner.com\/blog\/samples\/#primaryimage\",\"url\":\"https:\/\/www.neuraldesigner.com\/wp-content\/uploads\/2023\/06\/sample_uses.webp\",\"contentUrl\":\"https:\/\/www.neuraldesigner.com\/wp-content\/uploads\/2023\/06\/sample_uses.webp\",\"width\":1586,\"height\":480},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\/\/www.neuraldesigner.com\/blog\/samples\/#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"Home\",\"item\":\"https:\/\/www.neuraldesigner.com\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"Blog\",\"item\":\"https:\/\/www.neuraldesigner.com\/blog\/\"},{\"@type\":\"ListItem\",\"position\":3,\"name\":\"Training, validation and testing samples in machine learning\"}]},{\"@type\":\"WebSite\",\"@id\":\"https:\/\/www.neuraldesigner.com\/#website\",\"url\":\"https:\/\/www.neuraldesigner.com\/\",\"name\":\"Neural Designer\",\"description\":\"Explanable AI Platform\",\"publisher\":{\"@id\":\"https:\/\/www.neuraldesigner.com\/#organization\"},\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\/\/www.neuraldesigner.com\/?s={search_term_string}\"},\"query-input\":{\"@type\":\"PropertyValueSpecification\",\"valueRequired\":true,\"valueName\":\"search_term_string\"}}],\"inLanguage\":\"en-US\"},{\"@type\":\"Organization\",\"@id\":\"https:\/\/www.neuraldesigner.com\/#organization\",\"name\":\"Neural Designer\",\"url\":\"https:\/\/www.neuraldesigner.com\/\",\"logo\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/www.neuraldesigner.com\/#\/schema\/logo\/image\/\",\"url\":\"https:\/\/www.neuraldesigner.com\/wp-content\/uploads\/2023\/05\/logo-neural-1.png\",\"contentUrl\":\"https:\/\/www.neuraldesigner.com\/wp-content\/uploads\/2023\/05\/logo-neural-1.png\",\"width\":1024,\"height\":223,\"caption\":\"Neural Designer\"},\"image\":{\"@id\":\"https:\/\/www.neuraldesigner.com\/#\/schema\/logo\/image\/\"},\"sameAs\":[\"https:\/\/x.com\/NeuralDesigner\",\"https:\/\/es.linkedin.com\/showcase\/neuraldesigner\/\"]}]}<\/script>\n<!-- \/ Yoast SEO plugin. -->","yoast_head_json":{"title":"Training, validation and testing samples in machine learning","description":"In machine learning, samples measure all variables in the data set and are divided into training, selection, and test samples.","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/www.neuraldesigner.com\/blog\/samples\/","og_locale":"en_US","og_type":"article","og_title":"Training, validation and testing samples in machine learning","og_description":"In machine learning, samples measure all variables in the data set and are divided into training, selection, and test samples.","og_url":"https:\/\/www.neuraldesigner.com\/blog\/samples\/","og_site_name":"Neural Designer","article_modified_time":"2025-11-27T13:55:44+00:00","og_image":[{"width":1586,"height":480,"url":"https:\/\/www.neuraldesigner.com\/wp-content\/uploads\/2023\/06\/sample_uses.webp","type":"image\/webp"}],"twitter_card":"summary_large_image","twitter_site":"@NeuralDesigner","twitter_misc":{"Est. reading time":"5 minutes"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"WebPage","@id":"https:\/\/www.neuraldesigner.com\/blog\/samples\/","url":"https:\/\/www.neuraldesigner.com\/blog\/samples\/","name":"Training, validation and testing samples in machine learning","isPartOf":{"@id":"https:\/\/www.neuraldesigner.com\/#website"},"primaryImageOfPage":{"@id":"https:\/\/www.neuraldesigner.com\/blog\/samples\/#primaryimage"},"image":{"@id":"https:\/\/www.neuraldesigner.com\/blog\/samples\/#primaryimage"},"thumbnailUrl":"https:\/\/www.neuraldesigner.com\/wp-content\/uploads\/2023\/06\/sample_uses.webp","datePublished":"2025-10-28T09:59:21+00:00","dateModified":"2025-11-27T13:55:44+00:00","description":"In machine learning, samples measure all variables in the data set and are divided into training, selection, and test samples.","breadcrumb":{"@id":"https:\/\/www.neuraldesigner.com\/blog\/samples\/#breadcrumb"},"inLanguage":"en-US","potentialAction":[{"@type":"ReadAction","target":["https:\/\/www.neuraldesigner.com\/blog\/samples\/"]}]},{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/www.neuraldesigner.com\/blog\/samples\/#primaryimage","url":"https:\/\/www.neuraldesigner.com\/wp-content\/uploads\/2023\/06\/sample_uses.webp","contentUrl":"https:\/\/www.neuraldesigner.com\/wp-content\/uploads\/2023\/06\/sample_uses.webp","width":1586,"height":480},{"@type":"BreadcrumbList","@id":"https:\/\/www.neuraldesigner.com\/blog\/samples\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"https:\/\/www.neuraldesigner.com\/"},{"@type":"ListItem","position":2,"name":"Blog","item":"https:\/\/www.neuraldesigner.com\/blog\/"},{"@type":"ListItem","position":3,"name":"Training, validation and testing samples in machine learning"}]},{"@type":"WebSite","@id":"https:\/\/www.neuraldesigner.com\/#website","url":"https:\/\/www.neuraldesigner.com\/","name":"Neural Designer","description":"Explanable AI Platform","publisher":{"@id":"https:\/\/www.neuraldesigner.com\/#organization"},"potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/www.neuraldesigner.com\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"en-US"},{"@type":"Organization","@id":"https:\/\/www.neuraldesigner.com\/#organization","name":"Neural Designer","url":"https:\/\/www.neuraldesigner.com\/","logo":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/www.neuraldesigner.com\/#\/schema\/logo\/image\/","url":"https:\/\/www.neuraldesigner.com\/wp-content\/uploads\/2023\/05\/logo-neural-1.png","contentUrl":"https:\/\/www.neuraldesigner.com\/wp-content\/uploads\/2023\/05\/logo-neural-1.png","width":1024,"height":223,"caption":"Neural Designer"},"image":{"@id":"https:\/\/www.neuraldesigner.com\/#\/schema\/logo\/image\/"},"sameAs":["https:\/\/x.com\/NeuralDesigner","https:\/\/es.linkedin.com\/showcase\/neuraldesigner\/"]}]}},"_links":{"self":[{"href":"https:\/\/www.neuraldesigner.com\/api\/wp\/v2\/blog\/3417","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.neuraldesigner.com\/api\/wp\/v2\/blog"}],"about":[{"href":"https:\/\/www.neuraldesigner.com\/api\/wp\/v2\/types\/blog"}],"author":[{"embeddable":true,"href":"https:\/\/www.neuraldesigner.com\/api\/wp\/v2\/users\/10"}],"version-history":[{"count":1,"href":"https:\/\/www.neuraldesigner.com\/api\/wp\/v2\/blog\/3417\/revisions"}],"predecessor-version":[{"id":21399,"href":"https:\/\/www.neuraldesigner.com\/api\/wp\/v2\/blog\/3417\/revisions\/21399"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.neuraldesigner.com\/api\/wp\/v2\/media\/1590"}],"wp:attachment":[{"href":"https:\/\/www.neuraldesigner.com\/api\/wp\/v2\/media?parent=3417"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.neuraldesigner.com\/api\/wp\/v2\/categories?post=3417"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.neuraldesigner.com\/api\/wp\/v2\/tags?post=3417"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}