What's the best classification model for this recommendation engine?
$begingroup$
I'm not a data scientist but I'm trying to implement a recommendation engine on my company. My application runs on PHP but I'll use Python to process this data.
My company is an online school, with 40 online courses as of now. I have a CSV file with around 30k users preferences, and it looks like this:
0 means that user is not subscribed (I consider here that he has no interest), while 1 means subscribed (interested).
My idea is to compare one single user array such as [0,1,0,0,0,1,1...] with all this data and return a grade for each course with the probability of interest for this user.
I was thinking of using a Multinomial Logistic Regression, but as far as I know (and I don't know much) it would return me a binary result, right?
What classification model would you recommend me to use? Ideally, my result should be something like:
[0.95, 0.1, 0.54, 0.3, 0.87...]
Cheers!
python recommender-system multiclass-classification
$endgroup$
bumped to the homepage by Community♦ 12 mins ago
This question has answers that may be good or bad; the system has marked it active so that they can be reviewed.
add a comment |
$begingroup$
I'm not a data scientist but I'm trying to implement a recommendation engine on my company. My application runs on PHP but I'll use Python to process this data.
My company is an online school, with 40 online courses as of now. I have a CSV file with around 30k users preferences, and it looks like this:
0 means that user is not subscribed (I consider here that he has no interest), while 1 means subscribed (interested).
My idea is to compare one single user array such as [0,1,0,0,0,1,1...] with all this data and return a grade for each course with the probability of interest for this user.
I was thinking of using a Multinomial Logistic Regression, but as far as I know (and I don't know much) it would return me a binary result, right?
What classification model would you recommend me to use? Ideally, my result should be something like:
[0.95, 0.1, 0.54, 0.3, 0.87...]
Cheers!
python recommender-system multiclass-classification
$endgroup$
bumped to the homepage by Community♦ 12 mins ago
This question has answers that may be good or bad; the system has marked it active so that they can be reviewed.
1
$begingroup$
Formulate the problem as a Collaborative filtering task.
$endgroup$
– Fadi Bakoura
May 21 '18 at 14:52
$begingroup$
Thanks @FadiBakoura, will research on this and let you know.
$endgroup$
– grpaiva
May 21 '18 at 18:03
$begingroup$
Can you include more information about the user? (sex, age ...) An user single with 18 years old may like a course that another 50 years old do not like ...
$endgroup$
– Intruso
Aug 20 '18 at 13:50
$begingroup$
Seems like a prediction problem, not one of classification, so a neural network? Have you tried loading this data into Orange3? Seems you could test out your models pretty quickly. Orange3 uses Scikit, so once you find your workflow, you can use Python. By the way, if it is a neural network solution, TensorFlow has PHP bindings, so you could do the whole thing in PHP. Both may save you time.
$endgroup$
– davmor
Nov 18 '18 at 11:10
add a comment |
$begingroup$
I'm not a data scientist but I'm trying to implement a recommendation engine on my company. My application runs on PHP but I'll use Python to process this data.
My company is an online school, with 40 online courses as of now. I have a CSV file with around 30k users preferences, and it looks like this:
0 means that user is not subscribed (I consider here that he has no interest), while 1 means subscribed (interested).
My idea is to compare one single user array such as [0,1,0,0,0,1,1...] with all this data and return a grade for each course with the probability of interest for this user.
I was thinking of using a Multinomial Logistic Regression, but as far as I know (and I don't know much) it would return me a binary result, right?
What classification model would you recommend me to use? Ideally, my result should be something like:
[0.95, 0.1, 0.54, 0.3, 0.87...]
Cheers!
python recommender-system multiclass-classification
$endgroup$
I'm not a data scientist but I'm trying to implement a recommendation engine on my company. My application runs on PHP but I'll use Python to process this data.
My company is an online school, with 40 online courses as of now. I have a CSV file with around 30k users preferences, and it looks like this:
0 means that user is not subscribed (I consider here that he has no interest), while 1 means subscribed (interested).
My idea is to compare one single user array such as [0,1,0,0,0,1,1...] with all this data and return a grade for each course with the probability of interest for this user.
I was thinking of using a Multinomial Logistic Regression, but as far as I know (and I don't know much) it would return me a binary result, right?
What classification model would you recommend me to use? Ideally, my result should be something like:
[0.95, 0.1, 0.54, 0.3, 0.87...]
Cheers!
python recommender-system multiclass-classification
python recommender-system multiclass-classification
asked May 21 '18 at 14:34
grpaivagrpaiva
61
61
bumped to the homepage by Community♦ 12 mins ago
This question has answers that may be good or bad; the system has marked it active so that they can be reviewed.
bumped to the homepage by Community♦ 12 mins ago
This question has answers that may be good or bad; the system has marked it active so that they can be reviewed.
1
$begingroup$
Formulate the problem as a Collaborative filtering task.
$endgroup$
– Fadi Bakoura
May 21 '18 at 14:52
$begingroup$
Thanks @FadiBakoura, will research on this and let you know.
$endgroup$
– grpaiva
May 21 '18 at 18:03
$begingroup$
Can you include more information about the user? (sex, age ...) An user single with 18 years old may like a course that another 50 years old do not like ...
$endgroup$
– Intruso
Aug 20 '18 at 13:50
$begingroup$
Seems like a prediction problem, not one of classification, so a neural network? Have you tried loading this data into Orange3? Seems you could test out your models pretty quickly. Orange3 uses Scikit, so once you find your workflow, you can use Python. By the way, if it is a neural network solution, TensorFlow has PHP bindings, so you could do the whole thing in PHP. Both may save you time.
$endgroup$
– davmor
Nov 18 '18 at 11:10
add a comment |
1
$begingroup$
Formulate the problem as a Collaborative filtering task.
$endgroup$
– Fadi Bakoura
May 21 '18 at 14:52
$begingroup$
Thanks @FadiBakoura, will research on this and let you know.
$endgroup$
– grpaiva
May 21 '18 at 18:03
$begingroup$
Can you include more information about the user? (sex, age ...) An user single with 18 years old may like a course that another 50 years old do not like ...
$endgroup$
– Intruso
Aug 20 '18 at 13:50
$begingroup$
Seems like a prediction problem, not one of classification, so a neural network? Have you tried loading this data into Orange3? Seems you could test out your models pretty quickly. Orange3 uses Scikit, so once you find your workflow, you can use Python. By the way, if it is a neural network solution, TensorFlow has PHP bindings, so you could do the whole thing in PHP. Both may save you time.
$endgroup$
– davmor
Nov 18 '18 at 11:10
1
1
$begingroup$
Formulate the problem as a Collaborative filtering task.
$endgroup$
– Fadi Bakoura
May 21 '18 at 14:52
$begingroup$
Formulate the problem as a Collaborative filtering task.
$endgroup$
– Fadi Bakoura
May 21 '18 at 14:52
$begingroup$
Thanks @FadiBakoura, will research on this and let you know.
$endgroup$
– grpaiva
May 21 '18 at 18:03
$begingroup$
Thanks @FadiBakoura, will research on this and let you know.
$endgroup$
– grpaiva
May 21 '18 at 18:03
$begingroup$
Can you include more information about the user? (sex, age ...) An user single with 18 years old may like a course that another 50 years old do not like ...
$endgroup$
– Intruso
Aug 20 '18 at 13:50
$begingroup$
Can you include more information about the user? (sex, age ...) An user single with 18 years old may like a course that another 50 years old do not like ...
$endgroup$
– Intruso
Aug 20 '18 at 13:50
$begingroup$
Seems like a prediction problem, not one of classification, so a neural network? Have you tried loading this data into Orange3? Seems you could test out your models pretty quickly. Orange3 uses Scikit, so once you find your workflow, you can use Python. By the way, if it is a neural network solution, TensorFlow has PHP bindings, so you could do the whole thing in PHP. Both may save you time.
$endgroup$
– davmor
Nov 18 '18 at 11:10
$begingroup$
Seems like a prediction problem, not one of classification, so a neural network? Have you tried loading this data into Orange3? Seems you could test out your models pretty quickly. Orange3 uses Scikit, so once you find your workflow, you can use Python. By the way, if it is a neural network solution, TensorFlow has PHP bindings, so you could do the whole thing in PHP. Both may save you time.
$endgroup$
– davmor
Nov 18 '18 at 11:10
add a comment |
1 Answer
1
active
oldest
votes
$begingroup$
Without more information about your dataset, it's impossible to recommend one particular classifier over another.
If you want your classifier to return a vector of probabilities, then if you're using the sklearn library, you could use the predict_proba
method.
Here's an example:
from sklearn.datasets import load_digits
digits = load_digits(2)
from sklearn.linear_model import LogisticRegression
preds = LogisticRegression().fit(digits.data, digits.target).predict_proba(digits.data)
print([i[1] for i in preds])
$endgroup$
$begingroup$
Thanks for your answer @Lupacante! What I don't get here is that when I print digits.data.shape and digits.target.shape I get: (360, 64) and (360,). Shouldn't the target shape be something like(64,)? My dataset's shape looks like this: (27920, 46) and (46,). I'm getting an error:ValueError: Found input variables with inconsistent numbers of samples: [27920, 46]
$endgroup$
– grpaiva
May 21 '18 at 18:02
$begingroup$
The predictors and target from the training set should have the same number of rows. The first number in the tuple returned byshape
gives you the number of rows, so(360, 64)
and(360,)
is exactly what we'd expect.
$endgroup$
– marco_gorelli
May 22 '18 at 8:16
add a comment |
Your Answer
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "557"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f31932%2fwhats-the-best-classification-model-for-this-recommendation-engine%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
$begingroup$
Without more information about your dataset, it's impossible to recommend one particular classifier over another.
If you want your classifier to return a vector of probabilities, then if you're using the sklearn library, you could use the predict_proba
method.
Here's an example:
from sklearn.datasets import load_digits
digits = load_digits(2)
from sklearn.linear_model import LogisticRegression
preds = LogisticRegression().fit(digits.data, digits.target).predict_proba(digits.data)
print([i[1] for i in preds])
$endgroup$
$begingroup$
Thanks for your answer @Lupacante! What I don't get here is that when I print digits.data.shape and digits.target.shape I get: (360, 64) and (360,). Shouldn't the target shape be something like(64,)? My dataset's shape looks like this: (27920, 46) and (46,). I'm getting an error:ValueError: Found input variables with inconsistent numbers of samples: [27920, 46]
$endgroup$
– grpaiva
May 21 '18 at 18:02
$begingroup$
The predictors and target from the training set should have the same number of rows. The first number in the tuple returned byshape
gives you the number of rows, so(360, 64)
and(360,)
is exactly what we'd expect.
$endgroup$
– marco_gorelli
May 22 '18 at 8:16
add a comment |
$begingroup$
Without more information about your dataset, it's impossible to recommend one particular classifier over another.
If you want your classifier to return a vector of probabilities, then if you're using the sklearn library, you could use the predict_proba
method.
Here's an example:
from sklearn.datasets import load_digits
digits = load_digits(2)
from sklearn.linear_model import LogisticRegression
preds = LogisticRegression().fit(digits.data, digits.target).predict_proba(digits.data)
print([i[1] for i in preds])
$endgroup$
$begingroup$
Thanks for your answer @Lupacante! What I don't get here is that when I print digits.data.shape and digits.target.shape I get: (360, 64) and (360,). Shouldn't the target shape be something like(64,)? My dataset's shape looks like this: (27920, 46) and (46,). I'm getting an error:ValueError: Found input variables with inconsistent numbers of samples: [27920, 46]
$endgroup$
– grpaiva
May 21 '18 at 18:02
$begingroup$
The predictors and target from the training set should have the same number of rows. The first number in the tuple returned byshape
gives you the number of rows, so(360, 64)
and(360,)
is exactly what we'd expect.
$endgroup$
– marco_gorelli
May 22 '18 at 8:16
add a comment |
$begingroup$
Without more information about your dataset, it's impossible to recommend one particular classifier over another.
If you want your classifier to return a vector of probabilities, then if you're using the sklearn library, you could use the predict_proba
method.
Here's an example:
from sklearn.datasets import load_digits
digits = load_digits(2)
from sklearn.linear_model import LogisticRegression
preds = LogisticRegression().fit(digits.data, digits.target).predict_proba(digits.data)
print([i[1] for i in preds])
$endgroup$
Without more information about your dataset, it's impossible to recommend one particular classifier over another.
If you want your classifier to return a vector of probabilities, then if you're using the sklearn library, you could use the predict_proba
method.
Here's an example:
from sklearn.datasets import load_digits
digits = load_digits(2)
from sklearn.linear_model import LogisticRegression
preds = LogisticRegression().fit(digits.data, digits.target).predict_proba(digits.data)
print([i[1] for i in preds])
answered May 21 '18 at 14:47
marco_gorellimarco_gorelli
4819
4819
$begingroup$
Thanks for your answer @Lupacante! What I don't get here is that when I print digits.data.shape and digits.target.shape I get: (360, 64) and (360,). Shouldn't the target shape be something like(64,)? My dataset's shape looks like this: (27920, 46) and (46,). I'm getting an error:ValueError: Found input variables with inconsistent numbers of samples: [27920, 46]
$endgroup$
– grpaiva
May 21 '18 at 18:02
$begingroup$
The predictors and target from the training set should have the same number of rows. The first number in the tuple returned byshape
gives you the number of rows, so(360, 64)
and(360,)
is exactly what we'd expect.
$endgroup$
– marco_gorelli
May 22 '18 at 8:16
add a comment |
$begingroup$
Thanks for your answer @Lupacante! What I don't get here is that when I print digits.data.shape and digits.target.shape I get: (360, 64) and (360,). Shouldn't the target shape be something like(64,)? My dataset's shape looks like this: (27920, 46) and (46,). I'm getting an error:ValueError: Found input variables with inconsistent numbers of samples: [27920, 46]
$endgroup$
– grpaiva
May 21 '18 at 18:02
$begingroup$
The predictors and target from the training set should have the same number of rows. The first number in the tuple returned byshape
gives you the number of rows, so(360, 64)
and(360,)
is exactly what we'd expect.
$endgroup$
– marco_gorelli
May 22 '18 at 8:16
$begingroup$
Thanks for your answer @Lupacante! What I don't get here is that when I print digits.data.shape and digits.target.shape I get: (360, 64) and (360,). Shouldn't the target shape be something like(64,)? My dataset's shape looks like this: (27920, 46) and (46,). I'm getting an error:
ValueError: Found input variables with inconsistent numbers of samples: [27920, 46]
$endgroup$
– grpaiva
May 21 '18 at 18:02
$begingroup$
Thanks for your answer @Lupacante! What I don't get here is that when I print digits.data.shape and digits.target.shape I get: (360, 64) and (360,). Shouldn't the target shape be something like(64,)? My dataset's shape looks like this: (27920, 46) and (46,). I'm getting an error:
ValueError: Found input variables with inconsistent numbers of samples: [27920, 46]
$endgroup$
– grpaiva
May 21 '18 at 18:02
$begingroup$
The predictors and target from the training set should have the same number of rows. The first number in the tuple returned by
shape
gives you the number of rows, so (360, 64)
and (360,)
is exactly what we'd expect.$endgroup$
– marco_gorelli
May 22 '18 at 8:16
$begingroup$
The predictors and target from the training set should have the same number of rows. The first number in the tuple returned by
shape
gives you the number of rows, so (360, 64)
and (360,)
is exactly what we'd expect.$endgroup$
– marco_gorelli
May 22 '18 at 8:16
add a comment |
Thanks for contributing an answer to Data Science Stack Exchange!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
Use MathJax to format equations. MathJax reference.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f31932%2fwhats-the-best-classification-model-for-this-recommendation-engine%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
1
$begingroup$
Formulate the problem as a Collaborative filtering task.
$endgroup$
– Fadi Bakoura
May 21 '18 at 14:52
$begingroup$
Thanks @FadiBakoura, will research on this and let you know.
$endgroup$
– grpaiva
May 21 '18 at 18:03
$begingroup$
Can you include more information about the user? (sex, age ...) An user single with 18 years old may like a course that another 50 years old do not like ...
$endgroup$
– Intruso
Aug 20 '18 at 13:50
$begingroup$
Seems like a prediction problem, not one of classification, so a neural network? Have you tried loading this data into Orange3? Seems you could test out your models pretty quickly. Orange3 uses Scikit, so once you find your workflow, you can use Python. By the way, if it is a neural network solution, TensorFlow has PHP bindings, so you could do the whole thing in PHP. Both may save you time.
$endgroup$
– davmor
Nov 18 '18 at 11:10