Finding P value - Explain












0












$begingroup$


def get_pvalue(con_conv, test_conv,con_size,  test_size,):  
lift = - abs(test_conv - con_conv)
scale_one = con_conv * (1 - con_conv) * (1 / con_size)
scale_two = test_conv * (1 - test_conv) * (1 / test_size)
scale_val = (scale_one + scale_two)**0.5
p_value = 2 * stats.norm.cdf(lift, loc = 0, scale = scale_val )
return p_value


I have this function and I would like to know what it is actually doing and how it is actually calculating the p-value.



This is to find the difference between the conversion rate of control and test and group from an A/B test.



con_conv --> Conversion rate for control group
test_conv --> Conversion rate for test group
con_size --> population size for control group
test_size --> population size for test group


I understand that scale_one and scale_two are calculating the variance for each group, but I don't understand why they are adding both of them to calculate the standard deviation and why they are multiplying the cdf with 2 to get the p_value.










share|improve this question









New contributor




Kartikeya Sharma is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.







$endgroup$

















    0












    $begingroup$


    def get_pvalue(con_conv, test_conv,con_size,  test_size,):  
    lift = - abs(test_conv - con_conv)
    scale_one = con_conv * (1 - con_conv) * (1 / con_size)
    scale_two = test_conv * (1 - test_conv) * (1 / test_size)
    scale_val = (scale_one + scale_two)**0.5
    p_value = 2 * stats.norm.cdf(lift, loc = 0, scale = scale_val )
    return p_value


    I have this function and I would like to know what it is actually doing and how it is actually calculating the p-value.



    This is to find the difference between the conversion rate of control and test and group from an A/B test.



    con_conv --> Conversion rate for control group
    test_conv --> Conversion rate for test group
    con_size --> population size for control group
    test_size --> population size for test group


    I understand that scale_one and scale_two are calculating the variance for each group, but I don't understand why they are adding both of them to calculate the standard deviation and why they are multiplying the cdf with 2 to get the p_value.










    share|improve this question









    New contributor




    Kartikeya Sharma is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
    Check out our Code of Conduct.







    $endgroup$















      0












      0








      0





      $begingroup$


      def get_pvalue(con_conv, test_conv,con_size,  test_size,):  
      lift = - abs(test_conv - con_conv)
      scale_one = con_conv * (1 - con_conv) * (1 / con_size)
      scale_two = test_conv * (1 - test_conv) * (1 / test_size)
      scale_val = (scale_one + scale_two)**0.5
      p_value = 2 * stats.norm.cdf(lift, loc = 0, scale = scale_val )
      return p_value


      I have this function and I would like to know what it is actually doing and how it is actually calculating the p-value.



      This is to find the difference between the conversion rate of control and test and group from an A/B test.



      con_conv --> Conversion rate for control group
      test_conv --> Conversion rate for test group
      con_size --> population size for control group
      test_size --> population size for test group


      I understand that scale_one and scale_two are calculating the variance for each group, but I don't understand why they are adding both of them to calculate the standard deviation and why they are multiplying the cdf with 2 to get the p_value.










      share|improve this question









      New contributor




      Kartikeya Sharma is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
      Check out our Code of Conduct.







      $endgroup$




      def get_pvalue(con_conv, test_conv,con_size,  test_size,):  
      lift = - abs(test_conv - con_conv)
      scale_one = con_conv * (1 - con_conv) * (1 / con_size)
      scale_two = test_conv * (1 - test_conv) * (1 / test_size)
      scale_val = (scale_one + scale_two)**0.5
      p_value = 2 * stats.norm.cdf(lift, loc = 0, scale = scale_val )
      return p_value


      I have this function and I would like to know what it is actually doing and how it is actually calculating the p-value.



      This is to find the difference between the conversion rate of control and test and group from an A/B test.



      con_conv --> Conversion rate for control group
      test_conv --> Conversion rate for test group
      con_size --> population size for control group
      test_size --> population size for test group


      I understand that scale_one and scale_two are calculating the variance for each group, but I don't understand why they are adding both of them to calculate the standard deviation and why they are multiplying the cdf with 2 to get the p_value.







      python statistics






      share|improve this question









      New contributor




      Kartikeya Sharma is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
      Check out our Code of Conduct.











      share|improve this question









      New contributor




      Kartikeya Sharma is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
      Check out our Code of Conduct.









      share|improve this question




      share|improve this question








      edited 18 hours ago









      Stephen Rauch

      1,52551330




      1,52551330






      New contributor




      Kartikeya Sharma is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
      Check out our Code of Conduct.









      asked 20 hours ago









      Kartikeya SharmaKartikeya Sharma

      101




      101




      New contributor




      Kartikeya Sharma is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
      Check out our Code of Conduct.





      New contributor





      Kartikeya Sharma is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
      Check out our Code of Conduct.






      Kartikeya Sharma is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
      Check out our Code of Conduct.






















          1 Answer
          1






          active

          oldest

          votes


















          0












          $begingroup$

          p_value = 2 * stats.norm.cdf(lift, loc = 0, scale = scale_val )


          This is the key for your question: The p-value is the probability that the null hypothesis is true.



          If the null hypothesis is true: Your model does not find any differences between groups.
          If false: Your model finds differences between groups.



          Given that you are using a model which its subyacent assumption is normallity (amongst others), the hypothesis test is to be tried comparing the probability in the context of a normal distribution.



          The function stats.norm.cdf returns the probability of "lift being close to zero" if lift is supposed to be "normal". If lift is zero, then there is no difference between groups, so a p-value of <0.01 tell us that the probability that the groups are equal is almost 0, meaning that your groups are different.



          The 2 is due to a concept called "two-tailed distribution": The difference between groups can be A greater than B or B greater that A, that's why you measure the difference in either two of the ways.



          The addition between standard deviations obeys the concept of:
          $Var(X+Y) = Var(X) + Var(Y)$ if $X$ and $Y$ are independent.






          share|improve this answer










          New contributor




          Juan Esteban de la Calle is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
          Check out our Code of Conduct.






          $endgroup$














            Your Answer








            StackExchange.ready(function() {
            var channelOptions = {
            tags: "".split(" "),
            id: "557"
            };
            initTagRenderer("".split(" "), "".split(" "), channelOptions);

            StackExchange.using("externalEditor", function() {
            // Have to fire editor after snippets, if snippets enabled
            if (StackExchange.settings.snippets.snippetsEnabled) {
            StackExchange.using("snippets", function() {
            createEditor();
            });
            }
            else {
            createEditor();
            }
            });

            function createEditor() {
            StackExchange.prepareEditor({
            heartbeatType: 'answer',
            autoActivateHeartbeat: false,
            convertImagesToLinks: false,
            noModals: true,
            showLowRepImageUploadWarning: true,
            reputationToPostImages: null,
            bindNavPrevention: true,
            postfix: "",
            imageUploader: {
            brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
            contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
            allowUrls: true
            },
            onDemand: true,
            discardSelector: ".discard-answer"
            ,immediatelyShowMarkdownHelp:true
            });


            }
            });






            Kartikeya Sharma is a new contributor. Be nice, and check out our Code of Conduct.










            draft saved

            draft discarded


















            StackExchange.ready(
            function () {
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f49248%2ffinding-p-value-explain%23new-answer', 'question_page');
            }
            );

            Post as a guest















            Required, but never shown

























            1 Answer
            1






            active

            oldest

            votes








            1 Answer
            1






            active

            oldest

            votes









            active

            oldest

            votes






            active

            oldest

            votes









            0












            $begingroup$

            p_value = 2 * stats.norm.cdf(lift, loc = 0, scale = scale_val )


            This is the key for your question: The p-value is the probability that the null hypothesis is true.



            If the null hypothesis is true: Your model does not find any differences between groups.
            If false: Your model finds differences between groups.



            Given that you are using a model which its subyacent assumption is normallity (amongst others), the hypothesis test is to be tried comparing the probability in the context of a normal distribution.



            The function stats.norm.cdf returns the probability of "lift being close to zero" if lift is supposed to be "normal". If lift is zero, then there is no difference between groups, so a p-value of <0.01 tell us that the probability that the groups are equal is almost 0, meaning that your groups are different.



            The 2 is due to a concept called "two-tailed distribution": The difference between groups can be A greater than B or B greater that A, that's why you measure the difference in either two of the ways.



            The addition between standard deviations obeys the concept of:
            $Var(X+Y) = Var(X) + Var(Y)$ if $X$ and $Y$ are independent.






            share|improve this answer










            New contributor




            Juan Esteban de la Calle is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
            Check out our Code of Conduct.






            $endgroup$


















              0












              $begingroup$

              p_value = 2 * stats.norm.cdf(lift, loc = 0, scale = scale_val )


              This is the key for your question: The p-value is the probability that the null hypothesis is true.



              If the null hypothesis is true: Your model does not find any differences between groups.
              If false: Your model finds differences between groups.



              Given that you are using a model which its subyacent assumption is normallity (amongst others), the hypothesis test is to be tried comparing the probability in the context of a normal distribution.



              The function stats.norm.cdf returns the probability of "lift being close to zero" if lift is supposed to be "normal". If lift is zero, then there is no difference between groups, so a p-value of <0.01 tell us that the probability that the groups are equal is almost 0, meaning that your groups are different.



              The 2 is due to a concept called "two-tailed distribution": The difference between groups can be A greater than B or B greater that A, that's why you measure the difference in either two of the ways.



              The addition between standard deviations obeys the concept of:
              $Var(X+Y) = Var(X) + Var(Y)$ if $X$ and $Y$ are independent.






              share|improve this answer










              New contributor




              Juan Esteban de la Calle is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
              Check out our Code of Conduct.






              $endgroup$
















                0












                0








                0





                $begingroup$

                p_value = 2 * stats.norm.cdf(lift, loc = 0, scale = scale_val )


                This is the key for your question: The p-value is the probability that the null hypothesis is true.



                If the null hypothesis is true: Your model does not find any differences between groups.
                If false: Your model finds differences between groups.



                Given that you are using a model which its subyacent assumption is normallity (amongst others), the hypothesis test is to be tried comparing the probability in the context of a normal distribution.



                The function stats.norm.cdf returns the probability of "lift being close to zero" if lift is supposed to be "normal". If lift is zero, then there is no difference between groups, so a p-value of <0.01 tell us that the probability that the groups are equal is almost 0, meaning that your groups are different.



                The 2 is due to a concept called "two-tailed distribution": The difference between groups can be A greater than B or B greater that A, that's why you measure the difference in either two of the ways.



                The addition between standard deviations obeys the concept of:
                $Var(X+Y) = Var(X) + Var(Y)$ if $X$ and $Y$ are independent.






                share|improve this answer










                New contributor




                Juan Esteban de la Calle is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
                Check out our Code of Conduct.






                $endgroup$



                p_value = 2 * stats.norm.cdf(lift, loc = 0, scale = scale_val )


                This is the key for your question: The p-value is the probability that the null hypothesis is true.



                If the null hypothesis is true: Your model does not find any differences between groups.
                If false: Your model finds differences between groups.



                Given that you are using a model which its subyacent assumption is normallity (amongst others), the hypothesis test is to be tried comparing the probability in the context of a normal distribution.



                The function stats.norm.cdf returns the probability of "lift being close to zero" if lift is supposed to be "normal". If lift is zero, then there is no difference between groups, so a p-value of <0.01 tell us that the probability that the groups are equal is almost 0, meaning that your groups are different.



                The 2 is due to a concept called "two-tailed distribution": The difference between groups can be A greater than B or B greater that A, that's why you measure the difference in either two of the ways.



                The addition between standard deviations obeys the concept of:
                $Var(X+Y) = Var(X) + Var(Y)$ if $X$ and $Y$ are independent.







                share|improve this answer










                New contributor




                Juan Esteban de la Calle is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
                Check out our Code of Conduct.









                share|improve this answer



                share|improve this answer








                edited 19 hours ago





















                New contributor




                Juan Esteban de la Calle is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
                Check out our Code of Conduct.









                answered 20 hours ago









                Juan Esteban de la CalleJuan Esteban de la Calle

                938




                938




                New contributor




                Juan Esteban de la Calle is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
                Check out our Code of Conduct.





                New contributor





                Juan Esteban de la Calle is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
                Check out our Code of Conduct.






                Juan Esteban de la Calle is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
                Check out our Code of Conduct.






















                    Kartikeya Sharma is a new contributor. Be nice, and check out our Code of Conduct.










                    draft saved

                    draft discarded


















                    Kartikeya Sharma is a new contributor. Be nice, and check out our Code of Conduct.













                    Kartikeya Sharma is a new contributor. Be nice, and check out our Code of Conduct.












                    Kartikeya Sharma is a new contributor. Be nice, and check out our Code of Conduct.
















                    Thanks for contributing an answer to Data Science Stack Exchange!


                    • Please be sure to answer the question. Provide details and share your research!

                    But avoid



                    • Asking for help, clarification, or responding to other answers.

                    • Making statements based on opinion; back them up with references or personal experience.


                    Use MathJax to format equations. MathJax reference.


                    To learn more, see our tips on writing great answers.




                    draft saved


                    draft discarded














                    StackExchange.ready(
                    function () {
                    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f49248%2ffinding-p-value-explain%23new-answer', 'question_page');
                    }
                    );

                    Post as a guest















                    Required, but never shown





















































                    Required, but never shown














                    Required, but never shown












                    Required, but never shown







                    Required, but never shown

































                    Required, but never shown














                    Required, but never shown












                    Required, but never shown







                    Required, but never shown







                    Popular posts from this blog

                    Callistus I

                    Tabula Rosettana

                    How to label and detect the document text images