contextual bandits for online learning












0












$begingroup$


Which of the algorithms in the current literature for contextual bandits can be implemented for online learning and which ones can't? I'd really appreciate it if someone could provide a link to papers too! Thanks for the help!










share|improve this question









$endgroup$




bumped to the homepage by Community 3 mins ago


This question has answers that may be good or bad; the system has marked it active so that they can be reviewed.











  • 2




    $begingroup$
    I'm pretty sure all bandit algorithms are necessarily online learning algorithms
    $endgroup$
    – David Marx
    Jan 7 '18 at 21:55










  • $begingroup$
    I agree with David, generally the whole point of the bandit problem, and how it is framed in the literature, is to maximise reward (or minimise "regret") during an active learning process. "Offline contextual bandits" are essentially just a supervised learning/regression problem.
    $endgroup$
    – Neil Slater
    Jan 7 '18 at 22:02












  • $begingroup$
    So for example the epoch greedy algorithm mentioned here hunch.net/~jl/projects/interactive/sidebandits/bandit.pdf is online? The reason i ask is because the algorithm itself solves a supervised learning algorithm, so i'm interested if this is implementable practically?
    $endgroup$
    – Pavan Sangha
    Jan 8 '18 at 9:03










  • $begingroup$
    It does not look like epoch greedy is implemented e.g. in Vowpal Wabbit, but you can ask in an issue to the repo on github. Take note Vowpal Wabbit has a learning curve as usage is only sparsely documented.
    $endgroup$
    – matanster
    Jul 23 '18 at 8:53










  • $begingroup$
    just to note you typically prime an online model with a pre-prepared offline trained model, rather than unleash a fresh-and-naive untrained online model, reason being that you typically have little knowledge on whether your feature set is good enough for the model to fit your problem. Also because you can spare some aggregate real-world cost by unleashing a pre-trained model based on history, if you have any. Now replace 'typically' with 'sometimes' as it depends on the scenario and how confident you are regarding it e.g. from a-priori knowledge.
    $endgroup$
    – matanster
    Jul 23 '18 at 9:11


















0












$begingroup$


Which of the algorithms in the current literature for contextual bandits can be implemented for online learning and which ones can't? I'd really appreciate it if someone could provide a link to papers too! Thanks for the help!










share|improve this question









$endgroup$




bumped to the homepage by Community 3 mins ago


This question has answers that may be good or bad; the system has marked it active so that they can be reviewed.











  • 2




    $begingroup$
    I'm pretty sure all bandit algorithms are necessarily online learning algorithms
    $endgroup$
    – David Marx
    Jan 7 '18 at 21:55










  • $begingroup$
    I agree with David, generally the whole point of the bandit problem, and how it is framed in the literature, is to maximise reward (or minimise "regret") during an active learning process. "Offline contextual bandits" are essentially just a supervised learning/regression problem.
    $endgroup$
    – Neil Slater
    Jan 7 '18 at 22:02












  • $begingroup$
    So for example the epoch greedy algorithm mentioned here hunch.net/~jl/projects/interactive/sidebandits/bandit.pdf is online? The reason i ask is because the algorithm itself solves a supervised learning algorithm, so i'm interested if this is implementable practically?
    $endgroup$
    – Pavan Sangha
    Jan 8 '18 at 9:03










  • $begingroup$
    It does not look like epoch greedy is implemented e.g. in Vowpal Wabbit, but you can ask in an issue to the repo on github. Take note Vowpal Wabbit has a learning curve as usage is only sparsely documented.
    $endgroup$
    – matanster
    Jul 23 '18 at 8:53










  • $begingroup$
    just to note you typically prime an online model with a pre-prepared offline trained model, rather than unleash a fresh-and-naive untrained online model, reason being that you typically have little knowledge on whether your feature set is good enough for the model to fit your problem. Also because you can spare some aggregate real-world cost by unleashing a pre-trained model based on history, if you have any. Now replace 'typically' with 'sometimes' as it depends on the scenario and how confident you are regarding it e.g. from a-priori knowledge.
    $endgroup$
    – matanster
    Jul 23 '18 at 9:11
















0












0








0





$begingroup$


Which of the algorithms in the current literature for contextual bandits can be implemented for online learning and which ones can't? I'd really appreciate it if someone could provide a link to papers too! Thanks for the help!










share|improve this question









$endgroup$




Which of the algorithms in the current literature for contextual bandits can be implemented for online learning and which ones can't? I'd really appreciate it if someone could provide a link to papers too! Thanks for the help!







machine-learning reinforcement-learning online-learning randomized-algorithms






share|improve this question













share|improve this question











share|improve this question




share|improve this question










asked Jan 7 '18 at 21:29









Pavan SanghaPavan Sangha

1314




1314





bumped to the homepage by Community 3 mins ago


This question has answers that may be good or bad; the system has marked it active so that they can be reviewed.







bumped to the homepage by Community 3 mins ago


This question has answers that may be good or bad; the system has marked it active so that they can be reviewed.










  • 2




    $begingroup$
    I'm pretty sure all bandit algorithms are necessarily online learning algorithms
    $endgroup$
    – David Marx
    Jan 7 '18 at 21:55










  • $begingroup$
    I agree with David, generally the whole point of the bandit problem, and how it is framed in the literature, is to maximise reward (or minimise "regret") during an active learning process. "Offline contextual bandits" are essentially just a supervised learning/regression problem.
    $endgroup$
    – Neil Slater
    Jan 7 '18 at 22:02












  • $begingroup$
    So for example the epoch greedy algorithm mentioned here hunch.net/~jl/projects/interactive/sidebandits/bandit.pdf is online? The reason i ask is because the algorithm itself solves a supervised learning algorithm, so i'm interested if this is implementable practically?
    $endgroup$
    – Pavan Sangha
    Jan 8 '18 at 9:03










  • $begingroup$
    It does not look like epoch greedy is implemented e.g. in Vowpal Wabbit, but you can ask in an issue to the repo on github. Take note Vowpal Wabbit has a learning curve as usage is only sparsely documented.
    $endgroup$
    – matanster
    Jul 23 '18 at 8:53










  • $begingroup$
    just to note you typically prime an online model with a pre-prepared offline trained model, rather than unleash a fresh-and-naive untrained online model, reason being that you typically have little knowledge on whether your feature set is good enough for the model to fit your problem. Also because you can spare some aggregate real-world cost by unleashing a pre-trained model based on history, if you have any. Now replace 'typically' with 'sometimes' as it depends on the scenario and how confident you are regarding it e.g. from a-priori knowledge.
    $endgroup$
    – matanster
    Jul 23 '18 at 9:11
















  • 2




    $begingroup$
    I'm pretty sure all bandit algorithms are necessarily online learning algorithms
    $endgroup$
    – David Marx
    Jan 7 '18 at 21:55










  • $begingroup$
    I agree with David, generally the whole point of the bandit problem, and how it is framed in the literature, is to maximise reward (or minimise "regret") during an active learning process. "Offline contextual bandits" are essentially just a supervised learning/regression problem.
    $endgroup$
    – Neil Slater
    Jan 7 '18 at 22:02












  • $begingroup$
    So for example the epoch greedy algorithm mentioned here hunch.net/~jl/projects/interactive/sidebandits/bandit.pdf is online? The reason i ask is because the algorithm itself solves a supervised learning algorithm, so i'm interested if this is implementable practically?
    $endgroup$
    – Pavan Sangha
    Jan 8 '18 at 9:03










  • $begingroup$
    It does not look like epoch greedy is implemented e.g. in Vowpal Wabbit, but you can ask in an issue to the repo on github. Take note Vowpal Wabbit has a learning curve as usage is only sparsely documented.
    $endgroup$
    – matanster
    Jul 23 '18 at 8:53










  • $begingroup$
    just to note you typically prime an online model with a pre-prepared offline trained model, rather than unleash a fresh-and-naive untrained online model, reason being that you typically have little knowledge on whether your feature set is good enough for the model to fit your problem. Also because you can spare some aggregate real-world cost by unleashing a pre-trained model based on history, if you have any. Now replace 'typically' with 'sometimes' as it depends on the scenario and how confident you are regarding it e.g. from a-priori knowledge.
    $endgroup$
    – matanster
    Jul 23 '18 at 9:11










2




2




$begingroup$
I'm pretty sure all bandit algorithms are necessarily online learning algorithms
$endgroup$
– David Marx
Jan 7 '18 at 21:55




$begingroup$
I'm pretty sure all bandit algorithms are necessarily online learning algorithms
$endgroup$
– David Marx
Jan 7 '18 at 21:55












$begingroup$
I agree with David, generally the whole point of the bandit problem, and how it is framed in the literature, is to maximise reward (or minimise "regret") during an active learning process. "Offline contextual bandits" are essentially just a supervised learning/regression problem.
$endgroup$
– Neil Slater
Jan 7 '18 at 22:02






$begingroup$
I agree with David, generally the whole point of the bandit problem, and how it is framed in the literature, is to maximise reward (or minimise "regret") during an active learning process. "Offline contextual bandits" are essentially just a supervised learning/regression problem.
$endgroup$
– Neil Slater
Jan 7 '18 at 22:02














$begingroup$
So for example the epoch greedy algorithm mentioned here hunch.net/~jl/projects/interactive/sidebandits/bandit.pdf is online? The reason i ask is because the algorithm itself solves a supervised learning algorithm, so i'm interested if this is implementable practically?
$endgroup$
– Pavan Sangha
Jan 8 '18 at 9:03




$begingroup$
So for example the epoch greedy algorithm mentioned here hunch.net/~jl/projects/interactive/sidebandits/bandit.pdf is online? The reason i ask is because the algorithm itself solves a supervised learning algorithm, so i'm interested if this is implementable practically?
$endgroup$
– Pavan Sangha
Jan 8 '18 at 9:03












$begingroup$
It does not look like epoch greedy is implemented e.g. in Vowpal Wabbit, but you can ask in an issue to the repo on github. Take note Vowpal Wabbit has a learning curve as usage is only sparsely documented.
$endgroup$
– matanster
Jul 23 '18 at 8:53




$begingroup$
It does not look like epoch greedy is implemented e.g. in Vowpal Wabbit, but you can ask in an issue to the repo on github. Take note Vowpal Wabbit has a learning curve as usage is only sparsely documented.
$endgroup$
– matanster
Jul 23 '18 at 8:53












$begingroup$
just to note you typically prime an online model with a pre-prepared offline trained model, rather than unleash a fresh-and-naive untrained online model, reason being that you typically have little knowledge on whether your feature set is good enough for the model to fit your problem. Also because you can spare some aggregate real-world cost by unleashing a pre-trained model based on history, if you have any. Now replace 'typically' with 'sometimes' as it depends on the scenario and how confident you are regarding it e.g. from a-priori knowledge.
$endgroup$
– matanster
Jul 23 '18 at 9:11






$begingroup$
just to note you typically prime an online model with a pre-prepared offline trained model, rather than unleash a fresh-and-naive untrained online model, reason being that you typically have little knowledge on whether your feature set is good enough for the model to fit your problem. Also because you can spare some aggregate real-world cost by unleashing a pre-trained model based on history, if you have any. Now replace 'typically' with 'sometimes' as it depends on the scenario and how confident you are regarding it e.g. from a-priori knowledge.
$endgroup$
– matanster
Jul 23 '18 at 9:11












1 Answer
1






active

oldest

votes


















0












$begingroup$

My answer can only be considered partial, I've not compiled a list, but I believe all algorithms implemented here, are, well, implemented for both offline and online mode. This one, can also be implemented for online mode.



Not trying to imply you should use that implementation, but this is kind of a living proof transcending deducing very analytically from articles. The thing to understand, is that certain CB algorithms are paired with rather benign algorithms for using offline-accumulated data for training them, in turn paired with mathematical proofs that the loss they incur in that offline training is a good predictor for the loss they'll incur in online mode (if the real world is still 'sufficiently similar' to the one logged from).



Some (other than mentioned above) algorithms may be only applicable to offline training, or at least I'm not aware of a theoretical refutation, that an algorithm may offline-train better in a way precluding direct use for online learning with the same algorithm. But many algorithms are encoded in software only for offline evaluations, as a lot of research dwells on offline. So I think it's a good question!



I think you should certainly email an author of any article that seems really helpful to you, to specifically ask them, if the article doesn't make that 100% clear, and they might even point you in rare cases at a solid online implementation! Do note online usage entails more production-readiness software considerations, and might be an extra mile in terms of the quality of the software expected as such ...






share|improve this answer











$endgroup$














    Your Answer








    StackExchange.ready(function() {
    var channelOptions = {
    tags: "".split(" "),
    id: "557"
    };
    initTagRenderer("".split(" "), "".split(" "), channelOptions);

    StackExchange.using("externalEditor", function() {
    // Have to fire editor after snippets, if snippets enabled
    if (StackExchange.settings.snippets.snippetsEnabled) {
    StackExchange.using("snippets", function() {
    createEditor();
    });
    }
    else {
    createEditor();
    }
    });

    function createEditor() {
    StackExchange.prepareEditor({
    heartbeatType: 'answer',
    autoActivateHeartbeat: false,
    convertImagesToLinks: false,
    noModals: true,
    showLowRepImageUploadWarning: true,
    reputationToPostImages: null,
    bindNavPrevention: true,
    postfix: "",
    imageUploader: {
    brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
    contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
    allowUrls: true
    },
    onDemand: true,
    discardSelector: ".discard-answer"
    ,immediatelyShowMarkdownHelp:true
    });


    }
    });














    draft saved

    draft discarded


















    StackExchange.ready(
    function () {
    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f26391%2fcontextual-bandits-for-online-learning%23new-answer', 'question_page');
    }
    );

    Post as a guest















    Required, but never shown

























    1 Answer
    1






    active

    oldest

    votes








    1 Answer
    1






    active

    oldest

    votes









    active

    oldest

    votes






    active

    oldest

    votes









    0












    $begingroup$

    My answer can only be considered partial, I've not compiled a list, but I believe all algorithms implemented here, are, well, implemented for both offline and online mode. This one, can also be implemented for online mode.



    Not trying to imply you should use that implementation, but this is kind of a living proof transcending deducing very analytically from articles. The thing to understand, is that certain CB algorithms are paired with rather benign algorithms for using offline-accumulated data for training them, in turn paired with mathematical proofs that the loss they incur in that offline training is a good predictor for the loss they'll incur in online mode (if the real world is still 'sufficiently similar' to the one logged from).



    Some (other than mentioned above) algorithms may be only applicable to offline training, or at least I'm not aware of a theoretical refutation, that an algorithm may offline-train better in a way precluding direct use for online learning with the same algorithm. But many algorithms are encoded in software only for offline evaluations, as a lot of research dwells on offline. So I think it's a good question!



    I think you should certainly email an author of any article that seems really helpful to you, to specifically ask them, if the article doesn't make that 100% clear, and they might even point you in rare cases at a solid online implementation! Do note online usage entails more production-readiness software considerations, and might be an extra mile in terms of the quality of the software expected as such ...






    share|improve this answer











    $endgroup$


















      0












      $begingroup$

      My answer can only be considered partial, I've not compiled a list, but I believe all algorithms implemented here, are, well, implemented for both offline and online mode. This one, can also be implemented for online mode.



      Not trying to imply you should use that implementation, but this is kind of a living proof transcending deducing very analytically from articles. The thing to understand, is that certain CB algorithms are paired with rather benign algorithms for using offline-accumulated data for training them, in turn paired with mathematical proofs that the loss they incur in that offline training is a good predictor for the loss they'll incur in online mode (if the real world is still 'sufficiently similar' to the one logged from).



      Some (other than mentioned above) algorithms may be only applicable to offline training, or at least I'm not aware of a theoretical refutation, that an algorithm may offline-train better in a way precluding direct use for online learning with the same algorithm. But many algorithms are encoded in software only for offline evaluations, as a lot of research dwells on offline. So I think it's a good question!



      I think you should certainly email an author of any article that seems really helpful to you, to specifically ask them, if the article doesn't make that 100% clear, and they might even point you in rare cases at a solid online implementation! Do note online usage entails more production-readiness software considerations, and might be an extra mile in terms of the quality of the software expected as such ...






      share|improve this answer











      $endgroup$
















        0












        0








        0





        $begingroup$

        My answer can only be considered partial, I've not compiled a list, but I believe all algorithms implemented here, are, well, implemented for both offline and online mode. This one, can also be implemented for online mode.



        Not trying to imply you should use that implementation, but this is kind of a living proof transcending deducing very analytically from articles. The thing to understand, is that certain CB algorithms are paired with rather benign algorithms for using offline-accumulated data for training them, in turn paired with mathematical proofs that the loss they incur in that offline training is a good predictor for the loss they'll incur in online mode (if the real world is still 'sufficiently similar' to the one logged from).



        Some (other than mentioned above) algorithms may be only applicable to offline training, or at least I'm not aware of a theoretical refutation, that an algorithm may offline-train better in a way precluding direct use for online learning with the same algorithm. But many algorithms are encoded in software only for offline evaluations, as a lot of research dwells on offline. So I think it's a good question!



        I think you should certainly email an author of any article that seems really helpful to you, to specifically ask them, if the article doesn't make that 100% clear, and they might even point you in rare cases at a solid online implementation! Do note online usage entails more production-readiness software considerations, and might be an extra mile in terms of the quality of the software expected as such ...






        share|improve this answer











        $endgroup$



        My answer can only be considered partial, I've not compiled a list, but I believe all algorithms implemented here, are, well, implemented for both offline and online mode. This one, can also be implemented for online mode.



        Not trying to imply you should use that implementation, but this is kind of a living proof transcending deducing very analytically from articles. The thing to understand, is that certain CB algorithms are paired with rather benign algorithms for using offline-accumulated data for training them, in turn paired with mathematical proofs that the loss they incur in that offline training is a good predictor for the loss they'll incur in online mode (if the real world is still 'sufficiently similar' to the one logged from).



        Some (other than mentioned above) algorithms may be only applicable to offline training, or at least I'm not aware of a theoretical refutation, that an algorithm may offline-train better in a way precluding direct use for online learning with the same algorithm. But many algorithms are encoded in software only for offline evaluations, as a lot of research dwells on offline. So I think it's a good question!



        I think you should certainly email an author of any article that seems really helpful to you, to specifically ask them, if the article doesn't make that 100% clear, and they might even point you in rare cases at a solid online implementation! Do note online usage entails more production-readiness software considerations, and might be an extra mile in terms of the quality of the software expected as such ...







        share|improve this answer














        share|improve this answer



        share|improve this answer








        edited Jul 23 '18 at 9:48

























        answered Jul 23 '18 at 9:18









        matanstermatanster

        1063




        1063






























            draft saved

            draft discarded




















































            Thanks for contributing an answer to Data Science Stack Exchange!


            • Please be sure to answer the question. Provide details and share your research!

            But avoid



            • Asking for help, clarification, or responding to other answers.

            • Making statements based on opinion; back them up with references or personal experience.


            Use MathJax to format equations. MathJax reference.


            To learn more, see our tips on writing great answers.




            draft saved


            draft discarded














            StackExchange.ready(
            function () {
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f26391%2fcontextual-bandits-for-online-learning%23new-answer', 'question_page');
            }
            );

            Post as a guest















            Required, but never shown





















































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown

































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown







            Popular posts from this blog

            Callistus I

            Tabula Rosettana

            How to label and detect the document text images