contextual bandits for online learning

Which of the algorithms in the current literature for contextual bandits can be implemented for online learning and which ones can't? I'd really appreciate it if someone could provide a link to papers too! Thanks for the help!

asked Jan 7 '18 at 21:29

Pavan Sangha

1314

bumped to the homepage by Community♦ 3 mins ago

This question has answers that may be good or bad; the system has marked it active so that they can be reviewed.

2

$begingroup$
I'm pretty sure all bandit algorithms are necessarily online learning algorithms
$endgroup$
– David Marx
Jan 7 '18 at 21:55

$begingroup$
I agree with David, generally the whole point of the bandit problem, and how it is framed in the literature, is to maximise reward (or minimise "regret") during an active learning process. "Offline contextual bandits" are essentially just a supervised learning/regression problem.
$endgroup$
– Neil Slater
Jan 7 '18 at 22:02

$begingroup$
So for example the epoch greedy algorithm mentioned here hunch.net/~jl/projects/interactive/sidebandits/bandit.pdf is online? The reason i ask is because the algorithm itself solves a supervised learning algorithm, so i'm interested if this is implementable practically?
$endgroup$
– Pavan Sangha
Jan 8 '18 at 9:03

$begingroup$
It does not look like epoch greedy is implemented e.g. in Vowpal Wabbit, but you can ask in an issue to the repo on github. Take note Vowpal Wabbit has a learning curve as usage is only sparsely documented.
$endgroup$
– matanster
Jul 23 '18 at 8:53

$begingroup$
just to note you typically prime an online model with a pre-prepared offline trained model, rather than unleash a fresh-and-naive untrained online model, reason being that you typically have little knowledge on whether your feature set is good enough for the model to fit your problem. Also because you can spare some aggregate real-world cost by unleashing a pre-trained model based on history, if you have any. Now replace 'typically' with 'sometimes' as it depends on the scenario and how confident you are regarding it e.g. from a-priori knowledge.
$endgroup$
– matanster
Jul 23 '18 at 9:11

|
show 1 more comment

asked Jan 7 '18 at 21:29

Pavan Sangha

1314

bumped to the homepage by Community♦ 3 mins ago

This question has answers that may be good or bad; the system has marked it active so that they can be reviewed.

2

$begingroup$
I'm pretty sure all bandit algorithms are necessarily online learning algorithms
$endgroup$
– David Marx
Jan 7 '18 at 21:55

$begingroup$
I agree with David, generally the whole point of the bandit problem, and how it is framed in the literature, is to maximise reward (or minimise "regret") during an active learning process. "Offline contextual bandits" are essentially just a supervised learning/regression problem.
$endgroup$
– Neil Slater
Jan 7 '18 at 22:02

$begingroup$
So for example the epoch greedy algorithm mentioned here hunch.net/~jl/projects/interactive/sidebandits/bandit.pdf is online? The reason i ask is because the algorithm itself solves a supervised learning algorithm, so i'm interested if this is implementable practically?
$endgroup$
– Pavan Sangha
Jan 8 '18 at 9:03

$begingroup$
It does not look like epoch greedy is implemented e.g. in Vowpal Wabbit, but you can ask in an issue to the repo on github. Take note Vowpal Wabbit has a learning curve as usage is only sparsely documented.
$endgroup$
– matanster
Jul 23 '18 at 8:53

$begingroup$
just to note you typically prime an online model with a pre-prepared offline trained model, rather than unleash a fresh-and-naive untrained online model, reason being that you typically have little knowledge on whether your feature set is good enough for the model to fit your problem. Also because you can spare some aggregate real-world cost by unleashing a pre-trained model based on history, if you have any. Now replace 'typically' with 'sometimes' as it depends on the scenario and how confident you are regarding it e.g. from a-priori knowledge.
$endgroup$
– matanster
Jul 23 '18 at 9:11

|
show 1 more comment

asked Jan 7 '18 at 21:29

Pavan Sangha

1314

machine-learning reinforcement-learning online-learning randomized-algorithms

asked Jan 7 '18 at 21:29

Pavan Sangha

1314

asked Jan 7 '18 at 21:29

Pavan Sangha

1314

asked Jan 7 '18 at 21:29

Pavan Sangha

1314

asked Jan 7 '18 at 21:29

Pavan Sangha

1314

asked Jan 7 '18 at 21:29

Pavan Sangha

1314

bumped to the homepage by Community♦ 3 mins ago

This question has answers that may be good or bad; the system has marked it active so that they can be reviewed.

bumped to the homepage by Community♦ 3 mins ago

This question has answers that may be good or bad; the system has marked it active so that they can be reviewed.

2

$begingroup$
I'm pretty sure all bandit algorithms are necessarily online learning algorithms
$endgroup$
– David Marx
Jan 7 '18 at 21:55

$begingroup$
I agree with David, generally the whole point of the bandit problem, and how it is framed in the literature, is to maximise reward (or minimise "regret") during an active learning process. "Offline contextual bandits" are essentially just a supervised learning/regression problem.
$endgroup$
– Neil Slater
Jan 7 '18 at 22:02

$begingroup$
So for example the epoch greedy algorithm mentioned here hunch.net/~jl/projects/interactive/sidebandits/bandit.pdf is online? The reason i ask is because the algorithm itself solves a supervised learning algorithm, so i'm interested if this is implementable practically?
$endgroup$
– Pavan Sangha
Jan 8 '18 at 9:03

$begingroup$
It does not look like epoch greedy is implemented e.g. in Vowpal Wabbit, but you can ask in an issue to the repo on github. Take note Vowpal Wabbit has a learning curve as usage is only sparsely documented.
$endgroup$
– matanster
Jul 23 '18 at 8:53

$begingroup$
just to note you typically prime an online model with a pre-prepared offline trained model, rather than unleash a fresh-and-naive untrained online model, reason being that you typically have little knowledge on whether your feature set is good enough for the model to fit your problem. Also because you can spare some aggregate real-world cost by unleashing a pre-trained model based on history, if you have any. Now replace 'typically' with 'sometimes' as it depends on the scenario and how confident you are regarding it e.g. from a-priori knowledge.
$endgroup$
– matanster
Jul 23 '18 at 9:11

|
show 1 more comment

2

$begingroup$
I'm pretty sure all bandit algorithms are necessarily online learning algorithms
$endgroup$
– David Marx
Jan 7 '18 at 21:55

$begingroup$
I agree with David, generally the whole point of the bandit problem, and how it is framed in the literature, is to maximise reward (or minimise "regret") during an active learning process. "Offline contextual bandits" are essentially just a supervised learning/regression problem.
$endgroup$
– Neil Slater
Jan 7 '18 at 22:02

$begingroup$
So for example the epoch greedy algorithm mentioned here hunch.net/~jl/projects/interactive/sidebandits/bandit.pdf is online? The reason i ask is because the algorithm itself solves a supervised learning algorithm, so i'm interested if this is implementable practically?
$endgroup$
– Pavan Sangha
Jan 8 '18 at 9:03

$begingroup$
It does not look like epoch greedy is implemented e.g. in Vowpal Wabbit, but you can ask in an issue to the repo on github. Take note Vowpal Wabbit has a learning curve as usage is only sparsely documented.
$endgroup$
– matanster
Jul 23 '18 at 8:53

$begingroup$
just to note you typically prime an online model with a pre-prepared offline trained model, rather than unleash a fresh-and-naive untrained online model, reason being that you typically have little knowledge on whether your feature set is good enough for the model to fit your problem. Also because you can spare some aggregate real-world cost by unleashing a pre-trained model based on history, if you have any. Now replace 'typically' with 'sometimes' as it depends on the scenario and how confident you are regarding it e.g. from a-priori knowledge.
$endgroup$
– matanster
Jul 23 '18 at 9:11

I'm pretty sure all bandit algorithms are necessarily online learning algorithms

– David Marx
Jan 7 '18 at 21:55

I agree with David, generally the whole point of the bandit problem, and how it is framed in the literature, is to maximise reward (or minimise "regret") during an active learning process. "Offline contextual bandits" are essentially just a supervised learning/regression problem.

– Neil Slater
Jan 7 '18 at 22:02

So for example the epoch greedy algorithm mentioned here hunch.net/~jl/projects/interactive/sidebandits/bandit.pdf is online? The reason i ask is because the algorithm itself solves a supervised learning algorithm, so i'm interested if this is implementable practically?

– Pavan Sangha
Jan 8 '18 at 9:03

It does not look like epoch greedy is implemented e.g. in Vowpal Wabbit, but you can ask in an issue to the repo on github. Take note Vowpal Wabbit has a learning curve as usage is only sparsely documented.

– matanster
Jul 23 '18 at 8:53

just to note you typically prime an online model with a pre-prepared offline trained model, rather than unleash a fresh-and-naive untrained online model, reason being that you typically have little knowledge on whether your feature set is good enough for the model to fit your problem. Also because you can spare some aggregate real-world cost by unleashing a pre-trained model based on history, if you have any. Now replace 'typically' with 'sometimes' as it depends on the scenario and how confident you are regarding it e.g. from a-priori knowledge.

– matanster
Jul 23 '18 at 9:11

|
show 1 more comment

1 Answer
1

active

oldest

votes

My answer can only be considered partial, I've not compiled a list, but I believe all algorithms implemented here, are, well, implemented for both offline and online mode. This one, can also be implemented for online mode.

Not trying to imply you should use that implementation, but this is kind of a living proof transcending deducing very analytically from articles. The thing to understand, is that certain CB algorithms are paired with rather benign algorithms for using offline-accumulated data for training them, in turn paired with mathematical proofs that the loss they incur in that offline training is a good predictor for the loss they'll incur in online mode (if the real world is still 'sufficiently similar' to the one logged from).

Some (other than mentioned above) algorithms may be only applicable to offline training, or at least I'm not aware of a theoretical refutation, that an algorithm may offline-train better in a way precluding direct use for online learning with the same algorithm. But many algorithms are encoded in software only for offline evaluations, as a lot of research dwells on offline. So I think it's a good question!

I think you should certainly email an author of any article that seems really helpful to you, to specifically ask them, if the article doesn't make that 100% clear, and they might even point you in rare cases at a solid online implementation! Do note online usage entails more production-readiness software considerations, and might be an extra mile in terms of the quality of the software expected as such ...

edited Jul 23 '18 at 9:48

answered Jul 23 '18 at 9:18

matanster

1063

add a comment |

Your Answer

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "557"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});

}
});

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f26391%2fcontextual-bandits-for-online-learning%23new-answer', 'question_page');
}
);

Post as a guest

Name

Required, but never shown

1 Answer
1

active

oldest

votes

1 Answer
1

active

oldest

votes

edited Jul 23 '18 at 9:48

answered Jul 23 '18 at 9:18

matanster

1063

add a comment |

edited Jul 23 '18 at 9:48

answered Jul 23 '18 at 9:18

matanster

1063

add a comment |

edited Jul 23 '18 at 9:48

answered Jul 23 '18 at 9:18

matanster

1063

edited Jul 23 '18 at 9:48

answered Jul 23 '18 at 9:18

matanster

1063

edited Jul 23 '18 at 9:48

answered Jul 23 '18 at 9:18

matanster

1063

answered Jul 23 '18 at 9:18

matanster

1063

answered Jul 23 '18 at 9:18

matanster

1063

add a comment |

draft saved

draft discarded

Thanks for contributing an answer to Data Science Stack Exchange!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

Use MathJax to format equations. MathJax reference.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Htydjtk