Predicting Intent to do X with a confidence score or intent percentage score?












1












$begingroup$


I have a data set like:



did_purchase  action_1_30d action_2_20d action_2_10d ....
False 10 20 100
True ....etc


Where did_purchase shows whether the customer purchased or not, and the columns indicate the volume of actions taken before the purchase (or non-purchase) event.



So, for the first row the customer did 10 of action_1 within 30 days of the purchase event, but didn't purchase in the end.



I have been using sklearn's LogisticRegression to predict the did_purchase false/true, and can get about 89% accuracy, which is nice.



However, I'd like a percentage intent score instead. So it could say user-321 has a 46% chance of purchasing in the next 10 days.



What would be a good algo/approach for this?










share|improve this question







New contributor




LittleBobbyTables is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.







$endgroup$












  • $begingroup$
    You mention 89% accuracy. What is the distribution of class labels? Is there a class imbalance? If so, accuracy may not be the right metric here.
    $endgroup$
    – Wes
    12 hours ago










  • $begingroup$
    Sorry I meant F1 is 0.89. Class labels were imbalanced 1% Yes - 99% No but I SMOTE'd them
    $endgroup$
    – LittleBobbyTables
    12 hours ago
















1












$begingroup$


I have a data set like:



did_purchase  action_1_30d action_2_20d action_2_10d ....
False 10 20 100
True ....etc


Where did_purchase shows whether the customer purchased or not, and the columns indicate the volume of actions taken before the purchase (or non-purchase) event.



So, for the first row the customer did 10 of action_1 within 30 days of the purchase event, but didn't purchase in the end.



I have been using sklearn's LogisticRegression to predict the did_purchase false/true, and can get about 89% accuracy, which is nice.



However, I'd like a percentage intent score instead. So it could say user-321 has a 46% chance of purchasing in the next 10 days.



What would be a good algo/approach for this?










share|improve this question







New contributor




LittleBobbyTables is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.







$endgroup$












  • $begingroup$
    You mention 89% accuracy. What is the distribution of class labels? Is there a class imbalance? If so, accuracy may not be the right metric here.
    $endgroup$
    – Wes
    12 hours ago










  • $begingroup$
    Sorry I meant F1 is 0.89. Class labels were imbalanced 1% Yes - 99% No but I SMOTE'd them
    $endgroup$
    – LittleBobbyTables
    12 hours ago














1












1








1





$begingroup$


I have a data set like:



did_purchase  action_1_30d action_2_20d action_2_10d ....
False 10 20 100
True ....etc


Where did_purchase shows whether the customer purchased or not, and the columns indicate the volume of actions taken before the purchase (or non-purchase) event.



So, for the first row the customer did 10 of action_1 within 30 days of the purchase event, but didn't purchase in the end.



I have been using sklearn's LogisticRegression to predict the did_purchase false/true, and can get about 89% accuracy, which is nice.



However, I'd like a percentage intent score instead. So it could say user-321 has a 46% chance of purchasing in the next 10 days.



What would be a good algo/approach for this?










share|improve this question







New contributor




LittleBobbyTables is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.







$endgroup$




I have a data set like:



did_purchase  action_1_30d action_2_20d action_2_10d ....
False 10 20 100
True ....etc


Where did_purchase shows whether the customer purchased or not, and the columns indicate the volume of actions taken before the purchase (or non-purchase) event.



So, for the first row the customer did 10 of action_1 within 30 days of the purchase event, but didn't purchase in the end.



I have been using sklearn's LogisticRegression to predict the did_purchase false/true, and can get about 89% accuracy, which is nice.



However, I'd like a percentage intent score instead. So it could say user-321 has a 46% chance of purchasing in the next 10 days.



What would be a good algo/approach for this?







logistic-regression






share|improve this question







New contributor




LittleBobbyTables is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.











share|improve this question







New contributor




LittleBobbyTables is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.









share|improve this question




share|improve this question






New contributor




LittleBobbyTables is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.









asked 12 hours ago









LittleBobbyTablesLittleBobbyTables

1061




1061




New contributor




LittleBobbyTables is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.





New contributor





LittleBobbyTables is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.






LittleBobbyTables is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.












  • $begingroup$
    You mention 89% accuracy. What is the distribution of class labels? Is there a class imbalance? If so, accuracy may not be the right metric here.
    $endgroup$
    – Wes
    12 hours ago










  • $begingroup$
    Sorry I meant F1 is 0.89. Class labels were imbalanced 1% Yes - 99% No but I SMOTE'd them
    $endgroup$
    – LittleBobbyTables
    12 hours ago


















  • $begingroup$
    You mention 89% accuracy. What is the distribution of class labels? Is there a class imbalance? If so, accuracy may not be the right metric here.
    $endgroup$
    – Wes
    12 hours ago










  • $begingroup$
    Sorry I meant F1 is 0.89. Class labels were imbalanced 1% Yes - 99% No but I SMOTE'd them
    $endgroup$
    – LittleBobbyTables
    12 hours ago
















$begingroup$
You mention 89% accuracy. What is the distribution of class labels? Is there a class imbalance? If so, accuracy may not be the right metric here.
$endgroup$
– Wes
12 hours ago




$begingroup$
You mention 89% accuracy. What is the distribution of class labels? Is there a class imbalance? If so, accuracy may not be the right metric here.
$endgroup$
– Wes
12 hours ago












$begingroup$
Sorry I meant F1 is 0.89. Class labels were imbalanced 1% Yes - 99% No but I SMOTE'd them
$endgroup$
– LittleBobbyTables
12 hours ago




$begingroup$
Sorry I meant F1 is 0.89. Class labels were imbalanced 1% Yes - 99% No but I SMOTE'd them
$endgroup$
– LittleBobbyTables
12 hours ago










1 Answer
1






active

oldest

votes


















0












$begingroup$

You could use the probabilities output by LogisticRegressions predict_proba method.






share|improve this answer








New contributor




Wes is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.






$endgroup$













    Your Answer





    StackExchange.ifUsing("editor", function () {
    return StackExchange.using("mathjaxEditing", function () {
    StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix) {
    StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\\(","\\)"]]);
    });
    });
    }, "mathjax-editing");

    StackExchange.ready(function() {
    var channelOptions = {
    tags: "".split(" "),
    id: "557"
    };
    initTagRenderer("".split(" "), "".split(" "), channelOptions);

    StackExchange.using("externalEditor", function() {
    // Have to fire editor after snippets, if snippets enabled
    if (StackExchange.settings.snippets.snippetsEnabled) {
    StackExchange.using("snippets", function() {
    createEditor();
    });
    }
    else {
    createEditor();
    }
    });

    function createEditor() {
    StackExchange.prepareEditor({
    heartbeatType: 'answer',
    autoActivateHeartbeat: false,
    convertImagesToLinks: false,
    noModals: true,
    showLowRepImageUploadWarning: true,
    reputationToPostImages: null,
    bindNavPrevention: true,
    postfix: "",
    imageUploader: {
    brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
    contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
    allowUrls: true
    },
    onDemand: true,
    discardSelector: ".discard-answer"
    ,immediatelyShowMarkdownHelp:true
    });


    }
    });






    LittleBobbyTables is a new contributor. Be nice, and check out our Code of Conduct.










    draft saved

    draft discarded


















    StackExchange.ready(
    function () {
    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f45405%2fpredicting-intent-to-do-x-with-a-confidence-score-or-intent-percentage-score%23new-answer', 'question_page');
    }
    );

    Post as a guest















    Required, but never shown

























    1 Answer
    1






    active

    oldest

    votes








    1 Answer
    1






    active

    oldest

    votes









    active

    oldest

    votes






    active

    oldest

    votes









    0












    $begingroup$

    You could use the probabilities output by LogisticRegressions predict_proba method.






    share|improve this answer








    New contributor




    Wes is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
    Check out our Code of Conduct.






    $endgroup$


















      0












      $begingroup$

      You could use the probabilities output by LogisticRegressions predict_proba method.






      share|improve this answer








      New contributor




      Wes is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
      Check out our Code of Conduct.






      $endgroup$
















        0












        0








        0





        $begingroup$

        You could use the probabilities output by LogisticRegressions predict_proba method.






        share|improve this answer








        New contributor




        Wes is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
        Check out our Code of Conduct.






        $endgroup$



        You could use the probabilities output by LogisticRegressions predict_proba method.







        share|improve this answer








        New contributor




        Wes is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
        Check out our Code of Conduct.









        share|improve this answer



        share|improve this answer






        New contributor




        Wes is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
        Check out our Code of Conduct.









        answered 12 hours ago









        WesWes

        1065




        1065




        New contributor




        Wes is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
        Check out our Code of Conduct.





        New contributor





        Wes is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
        Check out our Code of Conduct.






        Wes is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
        Check out our Code of Conduct.






















            LittleBobbyTables is a new contributor. Be nice, and check out our Code of Conduct.










            draft saved

            draft discarded


















            LittleBobbyTables is a new contributor. Be nice, and check out our Code of Conduct.













            LittleBobbyTables is a new contributor. Be nice, and check out our Code of Conduct.












            LittleBobbyTables is a new contributor. Be nice, and check out our Code of Conduct.
















            Thanks for contributing an answer to Data Science Stack Exchange!


            • Please be sure to answer the question. Provide details and share your research!

            But avoid



            • Asking for help, clarification, or responding to other answers.

            • Making statements based on opinion; back them up with references or personal experience.


            Use MathJax to format equations. MathJax reference.


            To learn more, see our tips on writing great answers.




            draft saved


            draft discarded














            StackExchange.ready(
            function () {
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f45405%2fpredicting-intent-to-do-x-with-a-confidence-score-or-intent-percentage-score%23new-answer', 'question_page');
            }
            );

            Post as a guest















            Required, but never shown





















































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown

































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown







            Popular posts from this blog

            Chemia organometallica

            Cannabis

            YA sci-fi/fantasy/horror book about a kid that has to overcome a lot of trials