Statistical inference on a very small datasets












0












$begingroup$


I have been working with machine learning for about a year now, but mostly with large datasets. However, I am currently working on a problem with a very small dataset. Here is my problem: I am creating a rocket fuel with 4 ingredients, x1, x2, x3, x4, and I want to maximize reaction strength, y. I have already mixed them in the arrangements below to get the corresponding values.




  1. (0.9)x1 + (0.0)x2 + (0.1)x3 + (0)x4 = 16.5

  2. (0.0)x1 + (0.9)x2 + (0.1)x3 + (0)x4 = 8.6

  3. (.45)x1 + (.45)x2 + (0.0)x3 + (0.1)x4 = 12.6

  4. (0.6)x1 + (0.3)x2 + (0.05)x3 + (.05)x4 = 18.9


  5. (0.3)x1 + (0.9)x2 + (0.05)x3 + (.05)x4 = 9.8



    My next question is, how should I design my next few mixtures to maximize the reaction strength? Can you suggest any algorithms or statistical frameworks to get me started? Much appreciated.












share|improve this question









$endgroup$












  • $begingroup$
    Bayesian linear regression ?.
    $endgroup$
    – ncasas
    Mar 15 '18 at 17:14










  • $begingroup$
    Do you have reason to believe the relationship is linear? If so, @ncasas idea is good. Otherwise, read about active learning. Welcome to the site.
    $endgroup$
    – Emre
    Mar 15 '18 at 17:37










  • $begingroup$
    Thanks for the tips. To clarify, would I need something like a multiple Bayesian regression, since I am regressing on multiple variables? And is there a tool (Python library?) you recommend to implement a solution?
    $endgroup$
    – mnalavadi
    Mar 15 '18 at 19:00








  • 1




    $begingroup$
    Are your ingredients single chemicals or compounds?
    $endgroup$
    – FirefoxMetzger
    Mar 15 '18 at 19:21
















0












$begingroup$


I have been working with machine learning for about a year now, but mostly with large datasets. However, I am currently working on a problem with a very small dataset. Here is my problem: I am creating a rocket fuel with 4 ingredients, x1, x2, x3, x4, and I want to maximize reaction strength, y. I have already mixed them in the arrangements below to get the corresponding values.




  1. (0.9)x1 + (0.0)x2 + (0.1)x3 + (0)x4 = 16.5

  2. (0.0)x1 + (0.9)x2 + (0.1)x3 + (0)x4 = 8.6

  3. (.45)x1 + (.45)x2 + (0.0)x3 + (0.1)x4 = 12.6

  4. (0.6)x1 + (0.3)x2 + (0.05)x3 + (.05)x4 = 18.9


  5. (0.3)x1 + (0.9)x2 + (0.05)x3 + (.05)x4 = 9.8



    My next question is, how should I design my next few mixtures to maximize the reaction strength? Can you suggest any algorithms or statistical frameworks to get me started? Much appreciated.












share|improve this question









$endgroup$












  • $begingroup$
    Bayesian linear regression ?.
    $endgroup$
    – ncasas
    Mar 15 '18 at 17:14










  • $begingroup$
    Do you have reason to believe the relationship is linear? If so, @ncasas idea is good. Otherwise, read about active learning. Welcome to the site.
    $endgroup$
    – Emre
    Mar 15 '18 at 17:37










  • $begingroup$
    Thanks for the tips. To clarify, would I need something like a multiple Bayesian regression, since I am regressing on multiple variables? And is there a tool (Python library?) you recommend to implement a solution?
    $endgroup$
    – mnalavadi
    Mar 15 '18 at 19:00








  • 1




    $begingroup$
    Are your ingredients single chemicals or compounds?
    $endgroup$
    – FirefoxMetzger
    Mar 15 '18 at 19:21














0












0








0





$begingroup$


I have been working with machine learning for about a year now, but mostly with large datasets. However, I am currently working on a problem with a very small dataset. Here is my problem: I am creating a rocket fuel with 4 ingredients, x1, x2, x3, x4, and I want to maximize reaction strength, y. I have already mixed them in the arrangements below to get the corresponding values.




  1. (0.9)x1 + (0.0)x2 + (0.1)x3 + (0)x4 = 16.5

  2. (0.0)x1 + (0.9)x2 + (0.1)x3 + (0)x4 = 8.6

  3. (.45)x1 + (.45)x2 + (0.0)x3 + (0.1)x4 = 12.6

  4. (0.6)x1 + (0.3)x2 + (0.05)x3 + (.05)x4 = 18.9


  5. (0.3)x1 + (0.9)x2 + (0.05)x3 + (.05)x4 = 9.8



    My next question is, how should I design my next few mixtures to maximize the reaction strength? Can you suggest any algorithms or statistical frameworks to get me started? Much appreciated.












share|improve this question









$endgroup$




I have been working with machine learning for about a year now, but mostly with large datasets. However, I am currently working on a problem with a very small dataset. Here is my problem: I am creating a rocket fuel with 4 ingredients, x1, x2, x3, x4, and I want to maximize reaction strength, y. I have already mixed them in the arrangements below to get the corresponding values.




  1. (0.9)x1 + (0.0)x2 + (0.1)x3 + (0)x4 = 16.5

  2. (0.0)x1 + (0.9)x2 + (0.1)x3 + (0)x4 = 8.6

  3. (.45)x1 + (.45)x2 + (0.0)x3 + (0.1)x4 = 12.6

  4. (0.6)x1 + (0.3)x2 + (0.05)x3 + (.05)x4 = 18.9


  5. (0.3)x1 + (0.9)x2 + (0.05)x3 + (.05)x4 = 9.8



    My next question is, how should I design my next few mixtures to maximize the reaction strength? Can you suggest any algorithms or statistical frameworks to get me started? Much appreciated.









predictive-modeling statistics bayesian






share|improve this question













share|improve this question











share|improve this question




share|improve this question










asked Mar 15 '18 at 17:10









mnalavadimnalavadi

1




1












  • $begingroup$
    Bayesian linear regression ?.
    $endgroup$
    – ncasas
    Mar 15 '18 at 17:14










  • $begingroup$
    Do you have reason to believe the relationship is linear? If so, @ncasas idea is good. Otherwise, read about active learning. Welcome to the site.
    $endgroup$
    – Emre
    Mar 15 '18 at 17:37










  • $begingroup$
    Thanks for the tips. To clarify, would I need something like a multiple Bayesian regression, since I am regressing on multiple variables? And is there a tool (Python library?) you recommend to implement a solution?
    $endgroup$
    – mnalavadi
    Mar 15 '18 at 19:00








  • 1




    $begingroup$
    Are your ingredients single chemicals or compounds?
    $endgroup$
    – FirefoxMetzger
    Mar 15 '18 at 19:21


















  • $begingroup$
    Bayesian linear regression ?.
    $endgroup$
    – ncasas
    Mar 15 '18 at 17:14










  • $begingroup$
    Do you have reason to believe the relationship is linear? If so, @ncasas idea is good. Otherwise, read about active learning. Welcome to the site.
    $endgroup$
    – Emre
    Mar 15 '18 at 17:37










  • $begingroup$
    Thanks for the tips. To clarify, would I need something like a multiple Bayesian regression, since I am regressing on multiple variables? And is there a tool (Python library?) you recommend to implement a solution?
    $endgroup$
    – mnalavadi
    Mar 15 '18 at 19:00








  • 1




    $begingroup$
    Are your ingredients single chemicals or compounds?
    $endgroup$
    – FirefoxMetzger
    Mar 15 '18 at 19:21
















$begingroup$
Bayesian linear regression ?.
$endgroup$
– ncasas
Mar 15 '18 at 17:14




$begingroup$
Bayesian linear regression ?.
$endgroup$
– ncasas
Mar 15 '18 at 17:14












$begingroup$
Do you have reason to believe the relationship is linear? If so, @ncasas idea is good. Otherwise, read about active learning. Welcome to the site.
$endgroup$
– Emre
Mar 15 '18 at 17:37




$begingroup$
Do you have reason to believe the relationship is linear? If so, @ncasas idea is good. Otherwise, read about active learning. Welcome to the site.
$endgroup$
– Emre
Mar 15 '18 at 17:37












$begingroup$
Thanks for the tips. To clarify, would I need something like a multiple Bayesian regression, since I am regressing on multiple variables? And is there a tool (Python library?) you recommend to implement a solution?
$endgroup$
– mnalavadi
Mar 15 '18 at 19:00






$begingroup$
Thanks for the tips. To clarify, would I need something like a multiple Bayesian regression, since I am regressing on multiple variables? And is there a tool (Python library?) you recommend to implement a solution?
$endgroup$
– mnalavadi
Mar 15 '18 at 19:00






1




1




$begingroup$
Are your ingredients single chemicals or compounds?
$endgroup$
– FirefoxMetzger
Mar 15 '18 at 19:21




$begingroup$
Are your ingredients single chemicals or compounds?
$endgroup$
– FirefoxMetzger
Mar 15 '18 at 19:21










2 Answers
2






active

oldest

votes


















1












$begingroup$

There are two separate issues:




  1. Sampling - Picking the optional ingredient level for next experiment to run. Given you have only have 4 explanatory variables, just plot them. Either all pairwise or a couple of 3d charts. With the outcome variable on the y or z axis. You'll then see the trend in the data. You can decide to get more data for interpolation (between the data points you already have) or extrapolation (data outside of the current range). There are frameworks, such as Bayesian Optimization, but that is too much work given the small dimensionality.


  2. Inference - Predicting performance for new data. Given the data you have seen thus far (sample data), estimate parameters. In your example that would the estimating the contribution of each of the 4 ingredients, either individually or interaction. Those parameters could be scalar coefficients or distributions.







share|improve this answer









$endgroup$





















    0












    $begingroup$

    This is a perfect problem for active learning. Methods based on Bayesian Optimization are particularly powerful for optimizing black-box functions which are expensive to evaluate (i.e. running an experiment in the lab). There are a few BO packages out there which may be of interest, Martin Kraisser's blog has a nice overview.



    I noticed that the features in your last experiment don't add up to 1 which I am assuming was a typo. For the demo I changed that entry to x2 = 0.6.



    Here is a sample I threw together in python using GPyOpt, a Gaussian Process based package:



    import numpy as np
    import GPyOpt

    x_init = np.array([[0.9,0.0,0.1,0.0],
    [0.0,0.9,0.1,0.0],
    [0.45,0.45,0.0,0.1],
    [0.6,0.3,0.05,0.05],
    [0.3,0.6,0.05,0.05]])

    y_init = np.array([[16.5],[8.6],[12.6],[18.9],[9.8]])

    domain = [{'name': 'x1', 'type': 'continuous', 'domain': (0,1.0)},
    {'name': 'x2', 'type': 'continuous', 'domain': (0,1.0)},
    {'name': 'x3', 'type': 'continuous', 'domain': (0,1.0)},
    {'name': 'x4', 'type': 'continuous', 'domain': (0,1.0)}
    ]

    constraints = [
    {'name':'const_1', 'constraint': '(x[:,0] + x[:,1] + x[:,2] + x[:,3]) - 1 - 0.001'},
    {'name':'const_2', 'constraint': '1 - (x[:,0] + x[:,1] + x[:,2] + x[:,3]) - 0.001'}
    ]

    bo_step = GPyOpt.methods.BayesianOptimization(
    f = None,
    domain = domain,
    constraints = constraints,
    X = x_init,
    Y = y_init,
    maximize=True
    )

    x_next = bo_step.suggest_next_locations()

    print(x_next)
    print(np.sum(x_next))


    Note: GPyOpt only accepts constraints in a certain form, that's why there are 2 which constrain y on the interval [0.999,0.0.001].



    This example suggests that your next experiment should be run at:



    x1 = 0.04
    x2 = 0.78
    x3 = 0.00
    x4 = 0.18



    BO algorithms can be tuned to give different results based on your preferences for exploiting existing information versus exploring new areas of the space. I'm not sure what GPyOpts standard settings are so If you are interested it could be worth looking at the documentation.






    share|improve this answer








    New contributor




    b-shields is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
    Check out our Code of Conduct.






    $endgroup$













      Your Answer





      StackExchange.ifUsing("editor", function () {
      return StackExchange.using("mathjaxEditing", function () {
      StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix) {
      StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\\(","\\)"]]);
      });
      });
      }, "mathjax-editing");

      StackExchange.ready(function() {
      var channelOptions = {
      tags: "".split(" "),
      id: "557"
      };
      initTagRenderer("".split(" "), "".split(" "), channelOptions);

      StackExchange.using("externalEditor", function() {
      // Have to fire editor after snippets, if snippets enabled
      if (StackExchange.settings.snippets.snippetsEnabled) {
      StackExchange.using("snippets", function() {
      createEditor();
      });
      }
      else {
      createEditor();
      }
      });

      function createEditor() {
      StackExchange.prepareEditor({
      heartbeatType: 'answer',
      autoActivateHeartbeat: false,
      convertImagesToLinks: false,
      noModals: true,
      showLowRepImageUploadWarning: true,
      reputationToPostImages: null,
      bindNavPrevention: true,
      postfix: "",
      imageUploader: {
      brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
      contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
      allowUrls: true
      },
      onDemand: true,
      discardSelector: ".discard-answer"
      ,immediatelyShowMarkdownHelp:true
      });


      }
      });














      draft saved

      draft discarded


















      StackExchange.ready(
      function () {
      StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f29128%2fstatistical-inference-on-a-very-small-datasets%23new-answer', 'question_page');
      }
      );

      Post as a guest















      Required, but never shown

























      2 Answers
      2






      active

      oldest

      votes








      2 Answers
      2






      active

      oldest

      votes









      active

      oldest

      votes






      active

      oldest

      votes









      1












      $begingroup$

      There are two separate issues:




      1. Sampling - Picking the optional ingredient level for next experiment to run. Given you have only have 4 explanatory variables, just plot them. Either all pairwise or a couple of 3d charts. With the outcome variable on the y or z axis. You'll then see the trend in the data. You can decide to get more data for interpolation (between the data points you already have) or extrapolation (data outside of the current range). There are frameworks, such as Bayesian Optimization, but that is too much work given the small dimensionality.


      2. Inference - Predicting performance for new data. Given the data you have seen thus far (sample data), estimate parameters. In your example that would the estimating the contribution of each of the 4 ingredients, either individually or interaction. Those parameters could be scalar coefficients or distributions.







      share|improve this answer









      $endgroup$


















        1












        $begingroup$

        There are two separate issues:




        1. Sampling - Picking the optional ingredient level for next experiment to run. Given you have only have 4 explanatory variables, just plot them. Either all pairwise or a couple of 3d charts. With the outcome variable on the y or z axis. You'll then see the trend in the data. You can decide to get more data for interpolation (between the data points you already have) or extrapolation (data outside of the current range). There are frameworks, such as Bayesian Optimization, but that is too much work given the small dimensionality.


        2. Inference - Predicting performance for new data. Given the data you have seen thus far (sample data), estimate parameters. In your example that would the estimating the contribution of each of the 4 ingredients, either individually or interaction. Those parameters could be scalar coefficients or distributions.







        share|improve this answer









        $endgroup$
















          1












          1








          1





          $begingroup$

          There are two separate issues:




          1. Sampling - Picking the optional ingredient level for next experiment to run. Given you have only have 4 explanatory variables, just plot them. Either all pairwise or a couple of 3d charts. With the outcome variable on the y or z axis. You'll then see the trend in the data. You can decide to get more data for interpolation (between the data points you already have) or extrapolation (data outside of the current range). There are frameworks, such as Bayesian Optimization, but that is too much work given the small dimensionality.


          2. Inference - Predicting performance for new data. Given the data you have seen thus far (sample data), estimate parameters. In your example that would the estimating the contribution of each of the 4 ingredients, either individually or interaction. Those parameters could be scalar coefficients or distributions.







          share|improve this answer









          $endgroup$



          There are two separate issues:




          1. Sampling - Picking the optional ingredient level for next experiment to run. Given you have only have 4 explanatory variables, just plot them. Either all pairwise or a couple of 3d charts. With the outcome variable on the y or z axis. You'll then see the trend in the data. You can decide to get more data for interpolation (between the data points you already have) or extrapolation (data outside of the current range). There are frameworks, such as Bayesian Optimization, but that is too much work given the small dimensionality.


          2. Inference - Predicting performance for new data. Given the data you have seen thus far (sample data), estimate parameters. In your example that would the estimating the contribution of each of the 4 ingredients, either individually or interaction. Those parameters could be scalar coefficients or distributions.








          share|improve this answer












          share|improve this answer



          share|improve this answer










          answered Mar 15 '18 at 19:48









          Brian SpieringBrian Spiering

          3,5181028




          3,5181028























              0












              $begingroup$

              This is a perfect problem for active learning. Methods based on Bayesian Optimization are particularly powerful for optimizing black-box functions which are expensive to evaluate (i.e. running an experiment in the lab). There are a few BO packages out there which may be of interest, Martin Kraisser's blog has a nice overview.



              I noticed that the features in your last experiment don't add up to 1 which I am assuming was a typo. For the demo I changed that entry to x2 = 0.6.



              Here is a sample I threw together in python using GPyOpt, a Gaussian Process based package:



              import numpy as np
              import GPyOpt

              x_init = np.array([[0.9,0.0,0.1,0.0],
              [0.0,0.9,0.1,0.0],
              [0.45,0.45,0.0,0.1],
              [0.6,0.3,0.05,0.05],
              [0.3,0.6,0.05,0.05]])

              y_init = np.array([[16.5],[8.6],[12.6],[18.9],[9.8]])

              domain = [{'name': 'x1', 'type': 'continuous', 'domain': (0,1.0)},
              {'name': 'x2', 'type': 'continuous', 'domain': (0,1.0)},
              {'name': 'x3', 'type': 'continuous', 'domain': (0,1.0)},
              {'name': 'x4', 'type': 'continuous', 'domain': (0,1.0)}
              ]

              constraints = [
              {'name':'const_1', 'constraint': '(x[:,0] + x[:,1] + x[:,2] + x[:,3]) - 1 - 0.001'},
              {'name':'const_2', 'constraint': '1 - (x[:,0] + x[:,1] + x[:,2] + x[:,3]) - 0.001'}
              ]

              bo_step = GPyOpt.methods.BayesianOptimization(
              f = None,
              domain = domain,
              constraints = constraints,
              X = x_init,
              Y = y_init,
              maximize=True
              )

              x_next = bo_step.suggest_next_locations()

              print(x_next)
              print(np.sum(x_next))


              Note: GPyOpt only accepts constraints in a certain form, that's why there are 2 which constrain y on the interval [0.999,0.0.001].



              This example suggests that your next experiment should be run at:



              x1 = 0.04
              x2 = 0.78
              x3 = 0.00
              x4 = 0.18



              BO algorithms can be tuned to give different results based on your preferences for exploiting existing information versus exploring new areas of the space. I'm not sure what GPyOpts standard settings are so If you are interested it could be worth looking at the documentation.






              share|improve this answer








              New contributor




              b-shields is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
              Check out our Code of Conduct.






              $endgroup$


















                0












                $begingroup$

                This is a perfect problem for active learning. Methods based on Bayesian Optimization are particularly powerful for optimizing black-box functions which are expensive to evaluate (i.e. running an experiment in the lab). There are a few BO packages out there which may be of interest, Martin Kraisser's blog has a nice overview.



                I noticed that the features in your last experiment don't add up to 1 which I am assuming was a typo. For the demo I changed that entry to x2 = 0.6.



                Here is a sample I threw together in python using GPyOpt, a Gaussian Process based package:



                import numpy as np
                import GPyOpt

                x_init = np.array([[0.9,0.0,0.1,0.0],
                [0.0,0.9,0.1,0.0],
                [0.45,0.45,0.0,0.1],
                [0.6,0.3,0.05,0.05],
                [0.3,0.6,0.05,0.05]])

                y_init = np.array([[16.5],[8.6],[12.6],[18.9],[9.8]])

                domain = [{'name': 'x1', 'type': 'continuous', 'domain': (0,1.0)},
                {'name': 'x2', 'type': 'continuous', 'domain': (0,1.0)},
                {'name': 'x3', 'type': 'continuous', 'domain': (0,1.0)},
                {'name': 'x4', 'type': 'continuous', 'domain': (0,1.0)}
                ]

                constraints = [
                {'name':'const_1', 'constraint': '(x[:,0] + x[:,1] + x[:,2] + x[:,3]) - 1 - 0.001'},
                {'name':'const_2', 'constraint': '1 - (x[:,0] + x[:,1] + x[:,2] + x[:,3]) - 0.001'}
                ]

                bo_step = GPyOpt.methods.BayesianOptimization(
                f = None,
                domain = domain,
                constraints = constraints,
                X = x_init,
                Y = y_init,
                maximize=True
                )

                x_next = bo_step.suggest_next_locations()

                print(x_next)
                print(np.sum(x_next))


                Note: GPyOpt only accepts constraints in a certain form, that's why there are 2 which constrain y on the interval [0.999,0.0.001].



                This example suggests that your next experiment should be run at:



                x1 = 0.04
                x2 = 0.78
                x3 = 0.00
                x4 = 0.18



                BO algorithms can be tuned to give different results based on your preferences for exploiting existing information versus exploring new areas of the space. I'm not sure what GPyOpts standard settings are so If you are interested it could be worth looking at the documentation.






                share|improve this answer








                New contributor




                b-shields is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
                Check out our Code of Conduct.






                $endgroup$
















                  0












                  0








                  0





                  $begingroup$

                  This is a perfect problem for active learning. Methods based on Bayesian Optimization are particularly powerful for optimizing black-box functions which are expensive to evaluate (i.e. running an experiment in the lab). There are a few BO packages out there which may be of interest, Martin Kraisser's blog has a nice overview.



                  I noticed that the features in your last experiment don't add up to 1 which I am assuming was a typo. For the demo I changed that entry to x2 = 0.6.



                  Here is a sample I threw together in python using GPyOpt, a Gaussian Process based package:



                  import numpy as np
                  import GPyOpt

                  x_init = np.array([[0.9,0.0,0.1,0.0],
                  [0.0,0.9,0.1,0.0],
                  [0.45,0.45,0.0,0.1],
                  [0.6,0.3,0.05,0.05],
                  [0.3,0.6,0.05,0.05]])

                  y_init = np.array([[16.5],[8.6],[12.6],[18.9],[9.8]])

                  domain = [{'name': 'x1', 'type': 'continuous', 'domain': (0,1.0)},
                  {'name': 'x2', 'type': 'continuous', 'domain': (0,1.0)},
                  {'name': 'x3', 'type': 'continuous', 'domain': (0,1.0)},
                  {'name': 'x4', 'type': 'continuous', 'domain': (0,1.0)}
                  ]

                  constraints = [
                  {'name':'const_1', 'constraint': '(x[:,0] + x[:,1] + x[:,2] + x[:,3]) - 1 - 0.001'},
                  {'name':'const_2', 'constraint': '1 - (x[:,0] + x[:,1] + x[:,2] + x[:,3]) - 0.001'}
                  ]

                  bo_step = GPyOpt.methods.BayesianOptimization(
                  f = None,
                  domain = domain,
                  constraints = constraints,
                  X = x_init,
                  Y = y_init,
                  maximize=True
                  )

                  x_next = bo_step.suggest_next_locations()

                  print(x_next)
                  print(np.sum(x_next))


                  Note: GPyOpt only accepts constraints in a certain form, that's why there are 2 which constrain y on the interval [0.999,0.0.001].



                  This example suggests that your next experiment should be run at:



                  x1 = 0.04
                  x2 = 0.78
                  x3 = 0.00
                  x4 = 0.18



                  BO algorithms can be tuned to give different results based on your preferences for exploiting existing information versus exploring new areas of the space. I'm not sure what GPyOpts standard settings are so If you are interested it could be worth looking at the documentation.






                  share|improve this answer








                  New contributor




                  b-shields is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
                  Check out our Code of Conduct.






                  $endgroup$



                  This is a perfect problem for active learning. Methods based on Bayesian Optimization are particularly powerful for optimizing black-box functions which are expensive to evaluate (i.e. running an experiment in the lab). There are a few BO packages out there which may be of interest, Martin Kraisser's blog has a nice overview.



                  I noticed that the features in your last experiment don't add up to 1 which I am assuming was a typo. For the demo I changed that entry to x2 = 0.6.



                  Here is a sample I threw together in python using GPyOpt, a Gaussian Process based package:



                  import numpy as np
                  import GPyOpt

                  x_init = np.array([[0.9,0.0,0.1,0.0],
                  [0.0,0.9,0.1,0.0],
                  [0.45,0.45,0.0,0.1],
                  [0.6,0.3,0.05,0.05],
                  [0.3,0.6,0.05,0.05]])

                  y_init = np.array([[16.5],[8.6],[12.6],[18.9],[9.8]])

                  domain = [{'name': 'x1', 'type': 'continuous', 'domain': (0,1.0)},
                  {'name': 'x2', 'type': 'continuous', 'domain': (0,1.0)},
                  {'name': 'x3', 'type': 'continuous', 'domain': (0,1.0)},
                  {'name': 'x4', 'type': 'continuous', 'domain': (0,1.0)}
                  ]

                  constraints = [
                  {'name':'const_1', 'constraint': '(x[:,0] + x[:,1] + x[:,2] + x[:,3]) - 1 - 0.001'},
                  {'name':'const_2', 'constraint': '1 - (x[:,0] + x[:,1] + x[:,2] + x[:,3]) - 0.001'}
                  ]

                  bo_step = GPyOpt.methods.BayesianOptimization(
                  f = None,
                  domain = domain,
                  constraints = constraints,
                  X = x_init,
                  Y = y_init,
                  maximize=True
                  )

                  x_next = bo_step.suggest_next_locations()

                  print(x_next)
                  print(np.sum(x_next))


                  Note: GPyOpt only accepts constraints in a certain form, that's why there are 2 which constrain y on the interval [0.999,0.0.001].



                  This example suggests that your next experiment should be run at:



                  x1 = 0.04
                  x2 = 0.78
                  x3 = 0.00
                  x4 = 0.18



                  BO algorithms can be tuned to give different results based on your preferences for exploiting existing information versus exploring new areas of the space. I'm not sure what GPyOpts standard settings are so If you are interested it could be worth looking at the documentation.







                  share|improve this answer








                  New contributor




                  b-shields is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
                  Check out our Code of Conduct.









                  share|improve this answer



                  share|improve this answer






                  New contributor




                  b-shields is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
                  Check out our Code of Conduct.









                  answered 5 hours ago









                  b-shieldsb-shields

                  1




                  1




                  New contributor




                  b-shields is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
                  Check out our Code of Conduct.





                  New contributor





                  b-shields is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
                  Check out our Code of Conduct.






                  b-shields is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
                  Check out our Code of Conduct.






























                      draft saved

                      draft discarded




















































                      Thanks for contributing an answer to Data Science Stack Exchange!


                      • Please be sure to answer the question. Provide details and share your research!

                      But avoid



                      • Asking for help, clarification, or responding to other answers.

                      • Making statements based on opinion; back them up with references or personal experience.


                      Use MathJax to format equations. MathJax reference.


                      To learn more, see our tips on writing great answers.




                      draft saved


                      draft discarded














                      StackExchange.ready(
                      function () {
                      StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f29128%2fstatistical-inference-on-a-very-small-datasets%23new-answer', 'question_page');
                      }
                      );

                      Post as a guest















                      Required, but never shown





















































                      Required, but never shown














                      Required, but never shown












                      Required, but never shown







                      Required, but never shown

































                      Required, but never shown














                      Required, but never shown












                      Required, but never shown







                      Required, but never shown







                      Popular posts from this blog

                      How to label and detect the document text images

                      Vallis Paradisi

                      Tabula Rosettana