{"id":3409,"date":"2023-08-31T10:59:21","date_gmt":"2023-08-31T10:59:21","guid":{"rendered":"https:\/\/neuraldesigner.com\/blog\/octanol_water_partition_coeff_prediction\/"},"modified":"2025-09-15T15:28:21","modified_gmt":"2025-09-15T13:28:21","slug":"octanol_water_partition_coeff_prediction","status":"publish","type":"blog","link":"https:\/\/www.neuraldesigner.com\/blog\/octanol_water_partition_coeff_prediction\/","title":{"rendered":"Assess a chemical&#8217;s partition coefficient using machine learning"},"content":{"rendered":"\t\t<div data-elementor-type=\"wp-post\" data-elementor-id=\"3409\" class=\"elementor elementor-3409\" data-elementor-post-type=\"blog\">\n\t\t\t\t\t\t<section class=\"elementor-section elementor-top-section elementor-element elementor-element-d68adb8 elementor-section-boxed elementor-section-height-default elementor-section-height-default\" data-id=\"d68adb8\" data-element_type=\"section\">\n\t\t\t\t\t\t<div class=\"elementor-container elementor-column-gap-default\">\n\t\t\t\t\t<div class=\"elementor-column elementor-col-100 elementor-top-column elementor-element elementor-element-1e2b34d8\" data-id=\"1e2b34d8\" data-element_type=\"column\">\n\t\t\t<div class=\"elementor-widget-wrap elementor-element-populated\">\n\t\t\t\t\t\t<div class=\"elementor-element elementor-element-58b6559d elementor-widget elementor-widget-text-editor\" data-id=\"58b6559d\" data-element_type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t\t\t\t\t\t<p>In this example, we will build a machine learning model to assess the n-octanol-water partition coefficient.<\/p>\n<p>The n-octanol-water partition coefficient is a partition coefficient for the two-phase system consisting of n-octanol and water. It measures the solubility of substances.<\/p>\n<section>\n<p>The original data used in this example is downloaded from the <a href=\"https:\/\/www.fda.gov\/drugs\/development-approval-process-drugs\/drug-approvals-and-databases\" target=\"_blank\" rel=\"noopener\">FDA<\/a> web.<\/p>\n<ul>\n<li><a href=\"#introduction\"> 1. Introduction<\/a><\/li>\n<li><a href=\"#mandm\"> 2. Materials and methods<\/a>:\n<ul>\n<li><a href=\"#dataset\"> 2.1. Data set<\/a><\/li>\n<li><a href=\"#modeldel\"> 2.2. Model development<\/a><\/li>\n<\/ul>\n<\/li>\n<li><a href=\"#results\"> 3. Results<\/a>:<\/li>\n<\/ul>\n<p>We will use the <a href=\"https:\/\/www.neuraldesigner.com\">Neural Designer<\/a> free version to answer the previous question. You can download a free trial <a href=\"https:\/\/www.neuraldesigner.com\/free-trial\">here<\/a>.<\/p>\n<\/section>\n<section>\n<h2>Introduction<\/h2>\n<p>The n-octanol-water partition coefficient, or logK<sub>ow<\/sub>, measures the relationship between a substance&#8217;s fat solubility (lipophilicity) and water solubility (hydrophilicity). A substance would be more soluble in fat-like solvents such as n-octanol if the value exceeds one. On the other hand, if this value is less than one, it is more soluble in water.<\/p>\n<p><!-- IMAGEN AQUI --><br><img decoding=\"async\" style=\"width: 40%; height: 40%;\" src=\"https:\/\/www.neuraldesigner.com\/images\/octanol_water.webp\"><\/p>\n<p>This value is used, among others, to assess the environmental fate of persistent organic pollutants. Compounds with high coefficients (values greater than 5) tend to accumulate in the organism&#8217;s fatty tissue (bioaccumulation).<\/p>\n<p>This value assumes significance in drug research as it offers a reliable estimate of a substance&#8217;s distribution within a cell, delineating between membranes (lipophilic) and the cytosol (hydrophilic).<\/p>\n<p>This value is not measurable for all substances, so a good model that allows its prediction will be useful in developing and elaborating drugs for the best treatment of diseases.<\/p>\n<\/section>\n<p><!-- --><\/p>\n<section>\n<h2>Materials and methods<\/h2>\n<h3>Dataset<\/h3>\n<p><!-- Explication de que tiene el dataset --><\/p>\n<p>This dataset contains physicochemical properties for 16,523 chemical compounds. <a href=\"https:\/\/pubchem.ncbi.nlm.nih.gov\">PubChem<\/a>, a website detailing chemical compounds and storing their properties, is the source of these properties. All of these properties have been either determined via experimental procedures in the laboratory or via software.<\/p>\n<p>We will download the data from <a href=\"https:\/\/www.fda.gov\/drugs\/development-approval-process-drugs\/drug-approvals-and-databases\" target=\"_blank\" rel=\"noopener\">FDA<\/a>, where the raw files have been processed to tabular format. We also have used the PubChem API to retrieve the physicochemical properties for all the compounds they have records of.<\/p>\n<p>The final merged data has 16523 rows, corresponding to chemical compounds. Each compound has data for 34 physicochemical properties, including the xlogp. This is the value we are going to assess.<\/p>\n<p>Experts calculated this variable using various computational methods and confirmed its accuracy through experimental procedures.<\/p>\n<h3>Model development<\/h3>\n<p>We will build an approximation model as we try to predict a continuous variable (xlogp).<\/p>\n<p>We will use the normalized squared error for the training methodology with a L2 regularization term. As for the optimization algorithm, we will use the Quasi-Newton method.<\/p>\n<p>Also, we will only use variables that we can calculate or infer using the chemical formula of the compound, that is 9 of the 34 properties retrieved from Pubchem:<\/p>\n<ul>\n<li><b>MolecularWeigth<\/b>: mass of a molecule. It is calculated as the sum of the mass of each constituent atom multiplied by the number of atoms of that element in the molecular formula.<\/li>\n<li><b>HeavyAtomCount<\/b>: any atom except hydrogen in a chemical structure.<\/li>\n<li><b>Complexity<\/b>: rough estimate of how complicated a structure is, seen from both the point of view of the elements contained and the displayed structural features, including symmetry. This complexity rating is computed using the Bertz\/Hendrickson\/Ihlenfeldt formula.<\/li>\n<li><b>BondStereoCount<\/b>: total number of bonds with planar (sp2) stereo [e.g., (E)- or (Z)-configuration].<\/li>\n<li><b>DefinedAtomStereoCount<\/b>: number of atoms with defined planar (sp2) stereo.<\/li>\n<li><b>UndefinedAtomStereoCount<\/b>: number of atoms with undefined planar (sp2) stereo.<\/li>\n<li><b>HBondAcceptorCount<\/b>: the number of hydrogen bond acceptors in the structure.<\/li>\n<li><b>HBondDonnorCount<\/b>: the number of hydrogen bond donors in the structure.<\/li>\n<\/ul>\n<p>To utilize these properties as input variables, we will employ a network with a scaling layer comprising the same number of neurons as our inputs.<\/p>\n<p>Moreover, the setup involves ten neurons utilizing the hyperbolic tangent as the activation function for the perceptron layer. Additionally, there is an extra perceptron layer employing a linear activation function. Furthermore, we have incorporated unscaling and bounding layers, each with one neuron.<\/p>\n<p>This number of neurons is to obtain better model regularization.<\/p>\n<p>Finally, our probabilistic layer with one neuron gives us the value for the xlogp assessed.<\/p>\n<p><!-- network plot --><br><img decoding=\"async\" style=\"width: 50%; height: 50%;\" src=\"https:\/\/www.neuraldesigner.com\/images\/octanol_water_network.webp\"><\/p>\n<p>In the previous image, we can see the architecture of the model we will train in the next steps, with all the layers described previously.<\/p>\n<\/section>\n<section>\n<h2>Results<\/h2>\n<p>We have built a model for predicting the xlogp of chemical compounds. This model gives us the estimated value of this coefficient for a compound.<\/p>\n<p>For our trained model, the <b>training error<\/b> is <b>0.308775,<\/b> and the <b>selection error<\/b> is <b>0.337858<\/b>.<\/p>\n<p><!-- errors plot --><br><img decoding=\"async\" src=\"https:\/\/www.neuraldesigner.com\/images\/octanol_water_errors.webp\"><\/p>\n<p>Next, we will use the goodness of fit of our model to describe how well it fits a set of observations. The value we wil be looking at will be the R<sup>2<\/sup>, which describes how well our model explains the variability of the data. The larger this value, the better our model explains the variability in the data.<\/p>\n<p><!-- goodness of fit plot --><br><img decoding=\"async\" src=\"https:\/\/www.neuraldesigner.com\/images\/octanol_water_gofit.webp\"><\/p>\n<p>In the previous image, we can see how well we can predict our data, for the ideal case, all the points should be inside the black line. This means that the predicted value is equal to the real value. We have some variance, but more or less, our datapoints are all aggregated near the line. Our model has an R<sup>2<\/sup> of 0.64, which means we can explain 64% of the variability in the data.<\/p>\n<p>With this, we can conclude that we have generated a model that works properly to calculate the xlogp of a compound.<\/p>\n<\/section>\n<section>\n<h2>References<\/h2>\n<ul>\n<li>Image adapted from: <a href=\"https:\/\/doi.org\/10.1021\/acsomega.7b01102\">ACS Omega 2017, 2, 9, 6244-6249 September 28, 2017<\/a><\/li>\n<\/ul>\n<h2>Related posts<\/h2>\n<\/section>\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/section>\n\t\t\t\t<\/div>\n\t\t","protected":false},"author":19,"featured_media":1788,"template":"","categories":[],"tags":[40],"class_list":["post-3409","blog","type-blog","status-publish","has-post-thumbnail","hentry","tag-chemistry"],"acf":[],"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v26.4 - https:\/\/yoast.com\/wordpress\/plugins\/seo\/ -->\n<title>Assess a chemical&#039;s partition coefficient using machine learning<\/title>\n<meta name=\"description\" content=\"Build a machine learning model to assess the The n-octanol-water partition coefficient, used to measure the solubility of substances.\" \/>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/www.neuraldesigner.com\/blog\/octanol_water_partition_coeff_prediction\/\" \/>\n<meta property=\"og:locale\" content=\"en_US\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"Octanol-water partition coefficient assessment\" \/>\n<meta property=\"og:description\" content=\"Octanol-water partition coefficient assessment from chemcical compound properties.\" \/>\n<meta property=\"og:url\" content=\"https:\/\/www.neuraldesigner.com\/blog\/octanol_water_partition_coeff_prediction\/\" \/>\n<meta property=\"og:site_name\" content=\"Neural Designer\" \/>\n<meta property=\"article:modified_time\" content=\"2025-09-15T13:28:21+00:00\" \/>\n<meta property=\"og:image\" content=\"https:\/\/www.neuraldesigner.com\/wp-content\/uploads\/2023\/06\/octanol_water_chem.webp\" \/>\n\t<meta property=\"og:image:width\" content=\"1000\" \/>\n\t<meta property=\"og:image:height\" content=\"750\" \/>\n\t<meta property=\"og:image:type\" content=\"image\/webp\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:title\" content=\"Octanol-water partition coefficient assessment\" \/>\n<meta name=\"twitter:description\" content=\"Octanol-water partition coefficient assessment from chemcical compound properties.\" \/>\n<meta name=\"twitter:image\" content=\"https:\/\/www.neuraldesigner.com\/wp-content\/uploads\/2023\/06\/octanol_water_chem.webp\" \/>\n<meta name=\"twitter:site\" content=\"@NeuralDesigner\" \/>\n<meta name=\"twitter:label1\" content=\"Est. reading time\" \/>\n\t<meta name=\"twitter:data1\" content=\"5 minutes\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\/\/schema.org\",\"@graph\":[{\"@type\":\"WebPage\",\"@id\":\"https:\/\/www.neuraldesigner.com\/blog\/octanol_water_partition_coeff_prediction\/\",\"url\":\"https:\/\/www.neuraldesigner.com\/blog\/octanol_water_partition_coeff_prediction\/\",\"name\":\"Assess a chemical's partition coefficient using machine learning\",\"isPartOf\":{\"@id\":\"https:\/\/www.neuraldesigner.com\/#website\"},\"primaryImageOfPage\":{\"@id\":\"https:\/\/www.neuraldesigner.com\/blog\/octanol_water_partition_coeff_prediction\/#primaryimage\"},\"image\":{\"@id\":\"https:\/\/www.neuraldesigner.com\/blog\/octanol_water_partition_coeff_prediction\/#primaryimage\"},\"thumbnailUrl\":\"https:\/\/www.neuraldesigner.com\/wp-content\/uploads\/2023\/06\/octanol_water_chem.webp\",\"datePublished\":\"2023-08-31T10:59:21+00:00\",\"dateModified\":\"2025-09-15T13:28:21+00:00\",\"description\":\"Build a machine learning model to assess the The n-octanol-water partition coefficient, used to measure the solubility of substances.\",\"breadcrumb\":{\"@id\":\"https:\/\/www.neuraldesigner.com\/blog\/octanol_water_partition_coeff_prediction\/#breadcrumb\"},\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\/\/www.neuraldesigner.com\/blog\/octanol_water_partition_coeff_prediction\/\"]}]},{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/www.neuraldesigner.com\/blog\/octanol_water_partition_coeff_prediction\/#primaryimage\",\"url\":\"https:\/\/www.neuraldesigner.com\/wp-content\/uploads\/2023\/06\/octanol_water_chem.webp\",\"contentUrl\":\"https:\/\/www.neuraldesigner.com\/wp-content\/uploads\/2023\/06\/octanol_water_chem.webp\",\"width\":1000,\"height\":750},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\/\/www.neuraldesigner.com\/blog\/octanol_water_partition_coeff_prediction\/#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"Home\",\"item\":\"https:\/\/www.neuraldesigner.com\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"Blog\",\"item\":\"https:\/\/www.neuraldesigner.com\/blog\/\"},{\"@type\":\"ListItem\",\"position\":3,\"name\":\"Assess a chemical&#8217;s partition coefficient using machine learning\"}]},{\"@type\":\"WebSite\",\"@id\":\"https:\/\/www.neuraldesigner.com\/#website\",\"url\":\"https:\/\/www.neuraldesigner.com\/\",\"name\":\"Neural Designer\",\"description\":\"Explanable AI Platform\",\"publisher\":{\"@id\":\"https:\/\/www.neuraldesigner.com\/#organization\"},\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\/\/www.neuraldesigner.com\/?s={search_term_string}\"},\"query-input\":{\"@type\":\"PropertyValueSpecification\",\"valueRequired\":true,\"valueName\":\"search_term_string\"}}],\"inLanguage\":\"en-US\"},{\"@type\":\"Organization\",\"@id\":\"https:\/\/www.neuraldesigner.com\/#organization\",\"name\":\"Neural Designer\",\"url\":\"https:\/\/www.neuraldesigner.com\/\",\"logo\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/www.neuraldesigner.com\/#\/schema\/logo\/image\/\",\"url\":\"https:\/\/www.neuraldesigner.com\/wp-content\/uploads\/2023\/05\/logo-neural-1.png\",\"contentUrl\":\"https:\/\/www.neuraldesigner.com\/wp-content\/uploads\/2023\/05\/logo-neural-1.png\",\"width\":1024,\"height\":223,\"caption\":\"Neural Designer\"},\"image\":{\"@id\":\"https:\/\/www.neuraldesigner.com\/#\/schema\/logo\/image\/\"},\"sameAs\":[\"https:\/\/x.com\/NeuralDesigner\",\"https:\/\/es.linkedin.com\/showcase\/neuraldesigner\/\"]}]}<\/script>\n<!-- \/ Yoast SEO plugin. -->","yoast_head_json":{"title":"Assess a chemical's partition coefficient using machine learning","description":"Build a machine learning model to assess the The n-octanol-water partition coefficient, used to measure the solubility of substances.","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/www.neuraldesigner.com\/blog\/octanol_water_partition_coeff_prediction\/","og_locale":"en_US","og_type":"article","og_title":"Octanol-water partition coefficient assessment","og_description":"Octanol-water partition coefficient assessment from chemcical compound properties.","og_url":"https:\/\/www.neuraldesigner.com\/blog\/octanol_water_partition_coeff_prediction\/","og_site_name":"Neural Designer","article_modified_time":"2025-09-15T13:28:21+00:00","og_image":[{"width":1000,"height":750,"url":"https:\/\/www.neuraldesigner.com\/wp-content\/uploads\/2023\/06\/octanol_water_chem.webp","type":"image\/webp"}],"twitter_card":"summary_large_image","twitter_title":"Octanol-water partition coefficient assessment","twitter_description":"Octanol-water partition coefficient assessment from chemcical compound properties.","twitter_image":"https:\/\/www.neuraldesigner.com\/wp-content\/uploads\/2023\/06\/octanol_water_chem.webp","twitter_site":"@NeuralDesigner","twitter_misc":{"Est. reading time":"5 minutes"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"WebPage","@id":"https:\/\/www.neuraldesigner.com\/blog\/octanol_water_partition_coeff_prediction\/","url":"https:\/\/www.neuraldesigner.com\/blog\/octanol_water_partition_coeff_prediction\/","name":"Assess a chemical's partition coefficient using machine learning","isPartOf":{"@id":"https:\/\/www.neuraldesigner.com\/#website"},"primaryImageOfPage":{"@id":"https:\/\/www.neuraldesigner.com\/blog\/octanol_water_partition_coeff_prediction\/#primaryimage"},"image":{"@id":"https:\/\/www.neuraldesigner.com\/blog\/octanol_water_partition_coeff_prediction\/#primaryimage"},"thumbnailUrl":"https:\/\/www.neuraldesigner.com\/wp-content\/uploads\/2023\/06\/octanol_water_chem.webp","datePublished":"2023-08-31T10:59:21+00:00","dateModified":"2025-09-15T13:28:21+00:00","description":"Build a machine learning model to assess the The n-octanol-water partition coefficient, used to measure the solubility of substances.","breadcrumb":{"@id":"https:\/\/www.neuraldesigner.com\/blog\/octanol_water_partition_coeff_prediction\/#breadcrumb"},"inLanguage":"en-US","potentialAction":[{"@type":"ReadAction","target":["https:\/\/www.neuraldesigner.com\/blog\/octanol_water_partition_coeff_prediction\/"]}]},{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/www.neuraldesigner.com\/blog\/octanol_water_partition_coeff_prediction\/#primaryimage","url":"https:\/\/www.neuraldesigner.com\/wp-content\/uploads\/2023\/06\/octanol_water_chem.webp","contentUrl":"https:\/\/www.neuraldesigner.com\/wp-content\/uploads\/2023\/06\/octanol_water_chem.webp","width":1000,"height":750},{"@type":"BreadcrumbList","@id":"https:\/\/www.neuraldesigner.com\/blog\/octanol_water_partition_coeff_prediction\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"https:\/\/www.neuraldesigner.com\/"},{"@type":"ListItem","position":2,"name":"Blog","item":"https:\/\/www.neuraldesigner.com\/blog\/"},{"@type":"ListItem","position":3,"name":"Assess a chemical&#8217;s partition coefficient using machine learning"}]},{"@type":"WebSite","@id":"https:\/\/www.neuraldesigner.com\/#website","url":"https:\/\/www.neuraldesigner.com\/","name":"Neural Designer","description":"Explanable AI Platform","publisher":{"@id":"https:\/\/www.neuraldesigner.com\/#organization"},"potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/www.neuraldesigner.com\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"en-US"},{"@type":"Organization","@id":"https:\/\/www.neuraldesigner.com\/#organization","name":"Neural Designer","url":"https:\/\/www.neuraldesigner.com\/","logo":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/www.neuraldesigner.com\/#\/schema\/logo\/image\/","url":"https:\/\/www.neuraldesigner.com\/wp-content\/uploads\/2023\/05\/logo-neural-1.png","contentUrl":"https:\/\/www.neuraldesigner.com\/wp-content\/uploads\/2023\/05\/logo-neural-1.png","width":1024,"height":223,"caption":"Neural Designer"},"image":{"@id":"https:\/\/www.neuraldesigner.com\/#\/schema\/logo\/image\/"},"sameAs":["https:\/\/x.com\/NeuralDesigner","https:\/\/es.linkedin.com\/showcase\/neuraldesigner\/"]}]}},"_links":{"self":[{"href":"https:\/\/www.neuraldesigner.com\/api\/wp\/v2\/blog\/3409","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.neuraldesigner.com\/api\/wp\/v2\/blog"}],"about":[{"href":"https:\/\/www.neuraldesigner.com\/api\/wp\/v2\/types\/blog"}],"author":[{"embeddable":true,"href":"https:\/\/www.neuraldesigner.com\/api\/wp\/v2\/users\/19"}],"version-history":[{"count":0,"href":"https:\/\/www.neuraldesigner.com\/api\/wp\/v2\/blog\/3409\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.neuraldesigner.com\/api\/wp\/v2\/media\/1788"}],"wp:attachment":[{"href":"https:\/\/www.neuraldesigner.com\/api\/wp\/v2\/media?parent=3409"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.neuraldesigner.com\/api\/wp\/v2\/categories?post=3409"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.neuraldesigner.com\/api\/wp\/v2\/tags?post=3409"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}