Statistical inference on a very small datasets

I have been working with machine learning for about a year now, but mostly with large datasets. However, I am currently working on a problem with a very small dataset. Here is my problem: I am creating a rocket fuel with 4 ingredients, x1, x2, x3, x4, and I want to maximize reaction strength, y. I have already mixed them in the arrangements below to get the corresponding values.

(0.9)x1 + (0.0)x2 + (0.1)x3 + (0)x4 = 16.5

(0.0)x1 + (0.9)x2 + (0.1)x3 + (0)x4 = 8.6

(.45)x1 + (.45)x2 + (0.0)x3 + (0.1)x4 = 12.6

(0.6)x1 + (0.3)x2 + (0.05)x3 + (.05)x4 = 18.9

(0.3)x1 + (0.9)x2 + (0.05)x3 + (.05)x4 = 9.8

My next question is, how should I design my next few mixtures to maximize the reaction strength? Can you suggest any algorithms or statistical frameworks to get me started? Much appreciated.

asked Mar 15 '18 at 17:10

mnalavadi

$begingroup$
Bayesian linear regression ?.
$endgroup$
– ncasas
Mar 15 '18 at 17:14

$begingroup$
Do you have reason to believe the relationship is linear? If so, @ncasas idea is good. Otherwise, read about active learning. Welcome to the site.
$endgroup$
– Emre
Mar 15 '18 at 17:37

$begingroup$
Thanks for the tips. To clarify, would I need something like a multiple Bayesian regression, since I am regressing on multiple variables? And is there a tool (Python library?) you recommend to implement a solution?
$endgroup$
– mnalavadi
Mar 15 '18 at 19:00

1

$begingroup$
Are your ingredients single chemicals or compounds?
$endgroup$
– FirefoxMetzger
Mar 15 '18 at 19:21

add a comment |

(0.9)x1 + (0.0)x2 + (0.1)x3 + (0)x4 = 16.5

(0.0)x1 + (0.9)x2 + (0.1)x3 + (0)x4 = 8.6

(.45)x1 + (.45)x2 + (0.0)x3 + (0.1)x4 = 12.6

(0.6)x1 + (0.3)x2 + (0.05)x3 + (.05)x4 = 18.9

(0.3)x1 + (0.9)x2 + (0.05)x3 + (.05)x4 = 9.8

My next question is, how should I design my next few mixtures to maximize the reaction strength? Can you suggest any algorithms or statistical frameworks to get me started? Much appreciated.

asked Mar 15 '18 at 17:10

mnalavadi

$begingroup$
Bayesian linear regression ?.
$endgroup$
– ncasas
Mar 15 '18 at 17:14

$begingroup$
Do you have reason to believe the relationship is linear? If so, @ncasas idea is good. Otherwise, read about active learning. Welcome to the site.
$endgroup$
– Emre
Mar 15 '18 at 17:37

$begingroup$
Thanks for the tips. To clarify, would I need something like a multiple Bayesian regression, since I am regressing on multiple variables? And is there a tool (Python library?) you recommend to implement a solution?
$endgroup$
– mnalavadi
Mar 15 '18 at 19:00

1

$begingroup$
Are your ingredients single chemicals or compounds?
$endgroup$
– FirefoxMetzger
Mar 15 '18 at 19:21

add a comment |

(0.9)x1 + (0.0)x2 + (0.1)x3 + (0)x4 = 16.5

(0.0)x1 + (0.9)x2 + (0.1)x3 + (0)x4 = 8.6

(.45)x1 + (.45)x2 + (0.0)x3 + (0.1)x4 = 12.6

(0.6)x1 + (0.3)x2 + (0.05)x3 + (.05)x4 = 18.9

(0.3)x1 + (0.9)x2 + (0.05)x3 + (.05)x4 = 9.8

My next question is, how should I design my next few mixtures to maximize the reaction strength? Can you suggest any algorithms or statistical frameworks to get me started? Much appreciated.

asked Mar 15 '18 at 17:10

mnalavadi

(0.9)x1 + (0.0)x2 + (0.1)x3 + (0)x4 = 16.5

(0.0)x1 + (0.9)x2 + (0.1)x3 + (0)x4 = 8.6

(.45)x1 + (.45)x2 + (0.0)x3 + (0.1)x4 = 12.6

(0.6)x1 + (0.3)x2 + (0.05)x3 + (.05)x4 = 18.9

(0.3)x1 + (0.9)x2 + (0.05)x3 + (.05)x4 = 9.8

My next question is, how should I design my next few mixtures to maximize the reaction strength? Can you suggest any algorithms or statistical frameworks to get me started? Much appreciated.

predictive-modeling statistics bayesian

asked Mar 15 '18 at 17:10

mnalavadi

asked Mar 15 '18 at 17:10

mnalavadi

asked Mar 15 '18 at 17:10

mnalavadi

asked Mar 15 '18 at 17:10

mnalavadi

asked Mar 15 '18 at 17:10

mnalavadi

$begingroup$
Bayesian linear regression ?.
$endgroup$
– ncasas
Mar 15 '18 at 17:14

$begingroup$
Do you have reason to believe the relationship is linear? If so, @ncasas idea is good. Otherwise, read about active learning. Welcome to the site.
$endgroup$
– Emre
Mar 15 '18 at 17:37

$begingroup$
Thanks for the tips. To clarify, would I need something like a multiple Bayesian regression, since I am regressing on multiple variables? And is there a tool (Python library?) you recommend to implement a solution?
$endgroup$
– mnalavadi
Mar 15 '18 at 19:00

1

$begingroup$
Are your ingredients single chemicals or compounds?
$endgroup$
– FirefoxMetzger
Mar 15 '18 at 19:21

add a comment |

$begingroup$
Bayesian linear regression ?.
$endgroup$
– ncasas
Mar 15 '18 at 17:14

$begingroup$
Do you have reason to believe the relationship is linear? If so, @ncasas idea is good. Otherwise, read about active learning. Welcome to the site.
$endgroup$
– Emre
Mar 15 '18 at 17:37

$begingroup$
Thanks for the tips. To clarify, would I need something like a multiple Bayesian regression, since I am regressing on multiple variables? And is there a tool (Python library?) you recommend to implement a solution?
$endgroup$
– mnalavadi
Mar 15 '18 at 19:00

1

$begingroup$
Are your ingredients single chemicals or compounds?
$endgroup$
– FirefoxMetzger
Mar 15 '18 at 19:21

Bayesian linear regression ?.

– ncasas
Mar 15 '18 at 17:14

Do you have reason to believe the relationship is linear? If so, @ncasas idea is good. Otherwise, read about active learning. Welcome to the site.

– Emre
Mar 15 '18 at 17:37

Thanks for the tips. To clarify, would I need something like a multiple Bayesian regression, since I am regressing on multiple variables? And is there a tool (Python library?) you recommend to implement a solution?

– mnalavadi
Mar 15 '18 at 19:00

Are your ingredients single chemicals or compounds?

– FirefoxMetzger
Mar 15 '18 at 19:21

add a comment |

2 Answers
2

active

oldest

votes

There are two separate issues:

Sampling - Picking the optional ingredient level for next experiment to run. Given you have only have 4 explanatory variables, just plot them. Either all pairwise or a couple of 3d charts. With the outcome variable on the y or z axis. You'll then see the trend in the data. You can decide to get more data for interpolation (between the data points you already have) or extrapolation (data outside of the current range). There are frameworks, such as Bayesian Optimization, but that is too much work given the small dimensionality.

Inference - Predicting performance for new data. Given the data you have seen thus far (sample data), estimate parameters. In your example that would the estimating the contribution of each of the 4 ingredients, either individually or interaction. Those parameters could be scalar coefficients or distributions.

answered Mar 15 '18 at 19:48

Brian Spiering

3,5181028

add a comment |

This is a perfect problem for active learning. Methods based on Bayesian Optimization are particularly powerful for optimizing black-box functions which are expensive to evaluate (i.e. running an experiment in the lab). There are a few BO packages out there which may be of interest, Martin Kraisser's blog has a nice overview.

I noticed that the features in your last experiment don't add up to 1 which I am assuming was a typo. For the demo I changed that entry to x2 = 0.6.

Here is a sample I threw together in python using GPyOpt, a Gaussian Process based package:

import numpy as np

import GPyOpt



x_init = np.array([[0.9,0.0,0.1,0.0],

                   [0.0,0.9,0.1,0.0],

                   [0.45,0.45,0.0,0.1],

                   [0.6,0.3,0.05,0.05],

                   [0.3,0.6,0.05,0.05]])



y_init = np.array([[16.5],[8.6],[12.6],[18.9],[9.8]])



domain = [{'name': 'x1', 'type': 'continuous', 'domain': (0,1.0)},

        {'name': 'x2', 'type': 'continuous', 'domain': (0,1.0)},

        {'name': 'x3', 'type': 'continuous', 'domain': (0,1.0)},

        {'name': 'x4', 'type': 'continuous', 'domain': (0,1.0)}        

        ]



constraints = [

        {'name':'const_1', 'constraint': '(x[:,0] + x[:,1] + x[:,2] + x[:,3]) - 1 - 0.001'},

        {'name':'const_2', 'constraint': '1 - (x[:,0] + x[:,1] + x[:,2] + x[:,3]) - 0.001'}

        ]



bo_step = GPyOpt.methods.BayesianOptimization(

        f = None,

        domain = domain,

        constraints = constraints,

        X = x_init,

        Y = y_init,

        maximize=True

        )



x_next = bo_step.suggest_next_locations()



print(x_next)

print(np.sum(x_next))

Note: GPyOpt only accepts constraints in a certain form, that's why there are 2 which constrain y on the interval [0.999,0.0.001].

This example suggests that your next experiment should be run at:

x1 = 0.04
x2 = 0.78
x3 = 0.00
x4 = 0.18

BO algorithms can be tuned to give different results based on your preferences for exploiting existing information versus exploring new areas of the space. I'm not sure what GPyOpts standard settings are so If you are interested it could be worth looking at the documentation.

answered 5 hours ago

b-shields

New contributor

add a comment |

Your Answer

StackExchange.ifUsing("editor", function () {
return StackExchange.using("mathjaxEditing", function () {
StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix) {
StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\$","\$"]]);
});
});
}, "mathjax-editing");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "557"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});

}
});

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f29128%2fstatistical-inference-on-a-very-small-datasets%23new-answer', 'question_page');
}
);

Post as a guest

Name

Required, but never shown

2 Answers
2

active

oldest

votes

2 Answers
2

active

oldest

votes

There are two separate issues:

Sampling - Picking the optional ingredient level for next experiment to run. Given you have only have 4 explanatory variables, just plot them. Either all pairwise or a couple of 3d charts. With the outcome variable on the y or z axis. You'll then see the trend in the data. You can decide to get more data for interpolation (between the data points you already have) or extrapolation (data outside of the current range). There are frameworks, such as Bayesian Optimization, but that is too much work given the small dimensionality.

Inference - Predicting performance for new data. Given the data you have seen thus far (sample data), estimate parameters. In your example that would the estimating the contribution of each of the 4 ingredients, either individually or interaction. Those parameters could be scalar coefficients or distributions.

answered Mar 15 '18 at 19:48

Brian Spiering

3,5181028

add a comment |

There are two separate issues:

Sampling - Picking the optional ingredient level for next experiment to run. Given you have only have 4 explanatory variables, just plot them. Either all pairwise or a couple of 3d charts. With the outcome variable on the y or z axis. You'll then see the trend in the data. You can decide to get more data for interpolation (between the data points you already have) or extrapolation (data outside of the current range). There are frameworks, such as Bayesian Optimization, but that is too much work given the small dimensionality.

Inference - Predicting performance for new data. Given the data you have seen thus far (sample data), estimate parameters. In your example that would the estimating the contribution of each of the 4 ingredients, either individually or interaction. Those parameters could be scalar coefficients or distributions.

answered Mar 15 '18 at 19:48

Brian Spiering

3,5181028

add a comment |

There are two separate issues:

Sampling - Picking the optional ingredient level for next experiment to run. Given you have only have 4 explanatory variables, just plot them. Either all pairwise or a couple of 3d charts. With the outcome variable on the y or z axis. You'll then see the trend in the data. You can decide to get more data for interpolation (between the data points you already have) or extrapolation (data outside of the current range). There are frameworks, such as Bayesian Optimization, but that is too much work given the small dimensionality.

Inference - Predicting performance for new data. Given the data you have seen thus far (sample data), estimate parameters. In your example that would the estimating the contribution of each of the 4 ingredients, either individually or interaction. Those parameters could be scalar coefficients or distributions.

answered Mar 15 '18 at 19:48

Brian Spiering

3,5181028

There are two separate issues:

Sampling - Picking the optional ingredient level for next experiment to run. Given you have only have 4 explanatory variables, just plot them. Either all pairwise or a couple of 3d charts. With the outcome variable on the y or z axis. You'll then see the trend in the data. You can decide to get more data for interpolation (between the data points you already have) or extrapolation (data outside of the current range). There are frameworks, such as Bayesian Optimization, but that is too much work given the small dimensionality.

Inference - Predicting performance for new data. Given the data you have seen thus far (sample data), estimate parameters. In your example that would the estimating the contribution of each of the 4 ingredients, either individually or interaction. Those parameters could be scalar coefficients or distributions.

answered Mar 15 '18 at 19:48

Brian Spiering

3,5181028

answered Mar 15 '18 at 19:48

Brian Spiering

3,5181028

answered Mar 15 '18 at 19:48

Brian Spiering

3,5181028

answered Mar 15 '18 at 19:48

Brian Spiering

3,5181028

add a comment |

I noticed that the features in your last experiment don't add up to 1 which I am assuming was a typo. For the demo I changed that entry to x2 = 0.6.

Here is a sample I threw together in python using GPyOpt, a Gaussian Process based package:

import numpy as np

import GPyOpt



x_init = np.array([[0.9,0.0,0.1,0.0],

                   [0.0,0.9,0.1,0.0],

                   [0.45,0.45,0.0,0.1],

                   [0.6,0.3,0.05,0.05],

                   [0.3,0.6,0.05,0.05]])



y_init = np.array([[16.5],[8.6],[12.6],[18.9],[9.8]])



domain = [{'name': 'x1', 'type': 'continuous', 'domain': (0,1.0)},

        {'name': 'x2', 'type': 'continuous', 'domain': (0,1.0)},

        {'name': 'x3', 'type': 'continuous', 'domain': (0,1.0)},

        {'name': 'x4', 'type': 'continuous', 'domain': (0,1.0)}        

        ]



constraints = [

        {'name':'const_1', 'constraint': '(x[:,0] + x[:,1] + x[:,2] + x[:,3]) - 1 - 0.001'},

        {'name':'const_2', 'constraint': '1 - (x[:,0] + x[:,1] + x[:,2] + x[:,3]) - 0.001'}

        ]



bo_step = GPyOpt.methods.BayesianOptimization(

        f = None,

        domain = domain,

        constraints = constraints,

        X = x_init,

        Y = y_init,

        maximize=True

        )



x_next = bo_step.suggest_next_locations()



print(x_next)

print(np.sum(x_next))

Note: GPyOpt only accepts constraints in a certain form, that's why there are 2 which constrain y on the interval [0.999,0.0.001].

This example suggests that your next experiment should be run at:

x1 = 0.04
x2 = 0.78
x3 = 0.00
x4 = 0.18

answered 5 hours ago

b-shields

New contributor

add a comment |

I noticed that the features in your last experiment don't add up to 1 which I am assuming was a typo. For the demo I changed that entry to x2 = 0.6.

Here is a sample I threw together in python using GPyOpt, a Gaussian Process based package:

import numpy as np

import GPyOpt



x_init = np.array([[0.9,0.0,0.1,0.0],

                   [0.0,0.9,0.1,0.0],

                   [0.45,0.45,0.0,0.1],

                   [0.6,0.3,0.05,0.05],

                   [0.3,0.6,0.05,0.05]])



y_init = np.array([[16.5],[8.6],[12.6],[18.9],[9.8]])



domain = [{'name': 'x1', 'type': 'continuous', 'domain': (0,1.0)},

        {'name': 'x2', 'type': 'continuous', 'domain': (0,1.0)},

        {'name': 'x3', 'type': 'continuous', 'domain': (0,1.0)},

        {'name': 'x4', 'type': 'continuous', 'domain': (0,1.0)}        

        ]



constraints = [

        {'name':'const_1', 'constraint': '(x[:,0] + x[:,1] + x[:,2] + x[:,3]) - 1 - 0.001'},

        {'name':'const_2', 'constraint': '1 - (x[:,0] + x[:,1] + x[:,2] + x[:,3]) - 0.001'}

        ]



bo_step = GPyOpt.methods.BayesianOptimization(

        f = None,

        domain = domain,

        constraints = constraints,

        X = x_init,

        Y = y_init,

        maximize=True

        )



x_next = bo_step.suggest_next_locations()



print(x_next)

print(np.sum(x_next))

Note: GPyOpt only accepts constraints in a certain form, that's why there are 2 which constrain y on the interval [0.999,0.0.001].

This example suggests that your next experiment should be run at:

x1 = 0.04
x2 = 0.78
x3 = 0.00
x4 = 0.18

answered 5 hours ago

b-shields

New contributor

add a comment |

I noticed that the features in your last experiment don't add up to 1 which I am assuming was a typo. For the demo I changed that entry to x2 = 0.6.

Here is a sample I threw together in python using GPyOpt, a Gaussian Process based package:

import numpy as np

import GPyOpt



x_init = np.array([[0.9,0.0,0.1,0.0],

                   [0.0,0.9,0.1,0.0],

                   [0.45,0.45,0.0,0.1],

                   [0.6,0.3,0.05,0.05],

                   [0.3,0.6,0.05,0.05]])



y_init = np.array([[16.5],[8.6],[12.6],[18.9],[9.8]])



domain = [{'name': 'x1', 'type': 'continuous', 'domain': (0,1.0)},

        {'name': 'x2', 'type': 'continuous', 'domain': (0,1.0)},

        {'name': 'x3', 'type': 'continuous', 'domain': (0,1.0)},

        {'name': 'x4', 'type': 'continuous', 'domain': (0,1.0)}        

        ]



constraints = [

        {'name':'const_1', 'constraint': '(x[:,0] + x[:,1] + x[:,2] + x[:,3]) - 1 - 0.001'},

        {'name':'const_2', 'constraint': '1 - (x[:,0] + x[:,1] + x[:,2] + x[:,3]) - 0.001'}

        ]



bo_step = GPyOpt.methods.BayesianOptimization(

        f = None,

        domain = domain,

        constraints = constraints,

        X = x_init,

        Y = y_init,

        maximize=True

        )



x_next = bo_step.suggest_next_locations()



print(x_next)

print(np.sum(x_next))

Note: GPyOpt only accepts constraints in a certain form, that's why there are 2 which constrain y on the interval [0.999,0.0.001].

This example suggests that your next experiment should be run at:

x1 = 0.04
x2 = 0.78
x3 = 0.00
x4 = 0.18

answered 5 hours ago

b-shields

New contributor

I noticed that the features in your last experiment don't add up to 1 which I am assuming was a typo. For the demo I changed that entry to x2 = 0.6.

Here is a sample I threw together in python using GPyOpt, a Gaussian Process based package:

import numpy as np

import GPyOpt



x_init = np.array([[0.9,0.0,0.1,0.0],

                   [0.0,0.9,0.1,0.0],

                   [0.45,0.45,0.0,0.1],

                   [0.6,0.3,0.05,0.05],

                   [0.3,0.6,0.05,0.05]])



y_init = np.array([[16.5],[8.6],[12.6],[18.9],[9.8]])



domain = [{'name': 'x1', 'type': 'continuous', 'domain': (0,1.0)},

        {'name': 'x2', 'type': 'continuous', 'domain': (0,1.0)},

        {'name': 'x3', 'type': 'continuous', 'domain': (0,1.0)},

        {'name': 'x4', 'type': 'continuous', 'domain': (0,1.0)}        

        ]



constraints = [

        {'name':'const_1', 'constraint': '(x[:,0] + x[:,1] + x[:,2] + x[:,3]) - 1 - 0.001'},

        {'name':'const_2', 'constraint': '1 - (x[:,0] + x[:,1] + x[:,2] + x[:,3]) - 0.001'}

        ]



bo_step = GPyOpt.methods.BayesianOptimization(

        f = None,

        domain = domain,

        constraints = constraints,

        X = x_init,

        Y = y_init,

        maximize=True

        )



x_next = bo_step.suggest_next_locations()



print(x_next)

print(np.sum(x_next))

Note: GPyOpt only accepts constraints in a certain form, that's why there are 2 which constrain y on the interval [0.999,0.0.001].

This example suggests that your next experiment should be run at:

x1 = 0.04
x2 = 0.78
x3 = 0.00
x4 = 0.18

answered 5 hours ago

b-shields

New contributor

answered 5 hours ago

b-shields

New contributor

answered 5 hours ago

b-shields

answered 5 hours ago

b-shields

New contributor

b-shields is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.

add a comment |

draft saved

draft discarded

Thanks for contributing an answer to Data Science Stack Exchange!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

Use MathJax to format equations. MathJax reference.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Htydjtk