Predicting Intent to do X with a confidence score or intent percentage score?

I have a data set like:

did_purchase  action_1_30d action_2_20d action_2_10d ....

   False            10          20            100

   True            ....etc

Where did_purchase shows whether the customer purchased or not, and the columns indicate the volume of actions taken before the purchase (or non-purchase) event.

So, for the first row the customer did 10 of action_1 within 30 days of the purchase event, but didn't purchase in the end.

I have been using sklearn's LogisticRegression to predict the did_purchase false/true, and can get about 89% accuracy, which is nice.

However, I'd like a percentage intent score instead. So it could say user-321 has a 46% chance of purchasing in the next 10 days.

What would be a good algo/approach for this?

asked 12 hours ago

LittleBobbyTables

1061

New contributor

$begingroup$
You mention 89% accuracy. What is the distribution of class labels? Is there a class imbalance? If so, accuracy may not be the right metric here.
$endgroup$
– Wes
12 hours ago

$begingroup$
Sorry I meant F1 is 0.89. Class labels were imbalanced 1% Yes - 99% No but I SMOTE'd them
$endgroup$
– LittleBobbyTables
12 hours ago

add a comment |

I have a data set like:

did_purchase  action_1_30d action_2_20d action_2_10d ....

   False            10          20            100

   True            ....etc

Where did_purchase shows whether the customer purchased or not, and the columns indicate the volume of actions taken before the purchase (or non-purchase) event.

So, for the first row the customer did 10 of action_1 within 30 days of the purchase event, but didn't purchase in the end.

I have been using sklearn's LogisticRegression to predict the did_purchase false/true, and can get about 89% accuracy, which is nice.

However, I'd like a percentage intent score instead. So it could say user-321 has a 46% chance of purchasing in the next 10 days.

What would be a good algo/approach for this?

asked 12 hours ago

LittleBobbyTables

1061

New contributor

$begingroup$
You mention 89% accuracy. What is the distribution of class labels? Is there a class imbalance? If so, accuracy may not be the right metric here.
$endgroup$
– Wes
12 hours ago

$begingroup$
Sorry I meant F1 is 0.89. Class labels were imbalanced 1% Yes - 99% No but I SMOTE'd them
$endgroup$
– LittleBobbyTables
12 hours ago

add a comment |

I have a data set like:

did_purchase  action_1_30d action_2_20d action_2_10d ....

   False            10          20            100

   True            ....etc

Where did_purchase shows whether the customer purchased or not, and the columns indicate the volume of actions taken before the purchase (or non-purchase) event.

So, for the first row the customer did 10 of action_1 within 30 days of the purchase event, but didn't purchase in the end.

I have been using sklearn's LogisticRegression to predict the did_purchase false/true, and can get about 89% accuracy, which is nice.

However, I'd like a percentage intent score instead. So it could say user-321 has a 46% chance of purchasing in the next 10 days.

What would be a good algo/approach for this?

asked 12 hours ago

LittleBobbyTables

1061

New contributor

I have a data set like:

did_purchase  action_1_30d action_2_20d action_2_10d ....

   False            10          20            100

   True            ....etc

Where did_purchase shows whether the customer purchased or not, and the columns indicate the volume of actions taken before the purchase (or non-purchase) event.

So, for the first row the customer did 10 of action_1 within 30 days of the purchase event, but didn't purchase in the end.

I have been using sklearn's LogisticRegression to predict the did_purchase false/true, and can get about 89% accuracy, which is nice.

However, I'd like a percentage intent score instead. So it could say user-321 has a 46% chance of purchasing in the next 10 days.

What would be a good algo/approach for this?

logistic-regression

asked 12 hours ago

LittleBobbyTables

1061

New contributor

asked 12 hours ago

LittleBobbyTables

1061

New contributor

asked 12 hours ago

LittleBobbyTables

1061

New contributor

asked 12 hours ago

LittleBobbyTables

1061

asked 12 hours ago

LittleBobbyTables

1061

New contributor

LittleBobbyTables is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.

$begingroup$
You mention 89% accuracy. What is the distribution of class labels? Is there a class imbalance? If so, accuracy may not be the right metric here.
$endgroup$
– Wes
12 hours ago

$begingroup$
Sorry I meant F1 is 0.89. Class labels were imbalanced 1% Yes - 99% No but I SMOTE'd them
$endgroup$
– LittleBobbyTables
12 hours ago

add a comment |

$begingroup$
You mention 89% accuracy. What is the distribution of class labels? Is there a class imbalance? If so, accuracy may not be the right metric here.
$endgroup$
– Wes
12 hours ago

$begingroup$
Sorry I meant F1 is 0.89. Class labels were imbalanced 1% Yes - 99% No but I SMOTE'd them
$endgroup$
– LittleBobbyTables
12 hours ago

You mention 89% accuracy. What is the distribution of class labels? Is there a class imbalance? If so, accuracy may not be the right metric here.

– Wes
12 hours ago

Sorry I meant F1 is 0.89. Class labels were imbalanced 1% Yes - 99% No but I SMOTE'd them

– LittleBobbyTables
12 hours ago

add a comment |

1 Answer
1

active

oldest

votes

You could use the probabilities output by LogisticRegressions predict_proba method.

answered 12 hours ago

Wes

1065

New contributor

add a comment |

Your Answer

StackExchange.ifUsing("editor", function () {
return StackExchange.using("mathjaxEditing", function () {
StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix) {
StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\$","\$"]]);
});
});
}, "mathjax-editing");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "557"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});

}
});

LittleBobbyTables is a new contributor. Be nice, and check out our Code of Conduct.

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f45405%2fpredicting-intent-to-do-x-with-a-confidence-score-or-intent-percentage-score%23new-answer', 'question_page');
}
);

Post as a guest

Name

Required, but never shown

1 Answer
1

active

oldest

votes

1 Answer
1

active

oldest

votes

You could use the probabilities output by LogisticRegressions predict_proba method.

answered 12 hours ago

Wes

1065

New contributor

add a comment |

You could use the probabilities output by LogisticRegressions predict_proba method.

answered 12 hours ago

Wes

1065

New contributor

add a comment |

You could use the probabilities output by LogisticRegressions predict_proba method.

answered 12 hours ago

Wes

1065

New contributor

You could use the probabilities output by LogisticRegressions predict_proba method.

answered 12 hours ago

Wes

1065

New contributor

answered 12 hours ago

Wes

1065

New contributor

answered 12 hours ago

Wes

1065

answered 12 hours ago

Wes

1065

New contributor

Wes is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.

add a comment |

LittleBobbyTables is a new contributor. Be nice, and check out our Code of Conduct.

draft saved

draft discarded

LittleBobbyTables is a new contributor. Be nice, and check out our Code of Conduct.

Thanks for contributing an answer to Data Science Stack Exchange!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

Use MathJax to format equations. MathJax reference.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Htydjtk