Finding P value - Explain
$begingroup$
def get_pvalue(con_conv, test_conv,con_size, test_size,):
lift = - abs(test_conv - con_conv)
scale_one = con_conv * (1 - con_conv) * (1 / con_size)
scale_two = test_conv * (1 - test_conv) * (1 / test_size)
scale_val = (scale_one + scale_two)**0.5
p_value = 2 * stats.norm.cdf(lift, loc = 0, scale = scale_val )
return p_value
I have this function and I would like to know what it is actually doing and how it is actually calculating the p-value.
This is to find the difference between the conversion rate of control and test and group from an A/B test.
con_conv --> Conversion rate for control group
test_conv --> Conversion rate for test group
con_size --> population size for control group
test_size --> population size for test group
I understand that scale_one and scale_two are calculating the variance for each group, but I don't understand why they are adding both of them to calculate the standard deviation and why they are multiplying the cdf with 2 to get the p_value.
python statistics
New contributor
Kartikeya Sharma is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
$endgroup$
add a comment |
$begingroup$
def get_pvalue(con_conv, test_conv,con_size, test_size,):
lift = - abs(test_conv - con_conv)
scale_one = con_conv * (1 - con_conv) * (1 / con_size)
scale_two = test_conv * (1 - test_conv) * (1 / test_size)
scale_val = (scale_one + scale_two)**0.5
p_value = 2 * stats.norm.cdf(lift, loc = 0, scale = scale_val )
return p_value
I have this function and I would like to know what it is actually doing and how it is actually calculating the p-value.
This is to find the difference between the conversion rate of control and test and group from an A/B test.
con_conv --> Conversion rate for control group
test_conv --> Conversion rate for test group
con_size --> population size for control group
test_size --> population size for test group
I understand that scale_one and scale_two are calculating the variance for each group, but I don't understand why they are adding both of them to calculate the standard deviation and why they are multiplying the cdf with 2 to get the p_value.
python statistics
New contributor
Kartikeya Sharma is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
$endgroup$
add a comment |
$begingroup$
def get_pvalue(con_conv, test_conv,con_size, test_size,):
lift = - abs(test_conv - con_conv)
scale_one = con_conv * (1 - con_conv) * (1 / con_size)
scale_two = test_conv * (1 - test_conv) * (1 / test_size)
scale_val = (scale_one + scale_two)**0.5
p_value = 2 * stats.norm.cdf(lift, loc = 0, scale = scale_val )
return p_value
I have this function and I would like to know what it is actually doing and how it is actually calculating the p-value.
This is to find the difference between the conversion rate of control and test and group from an A/B test.
con_conv --> Conversion rate for control group
test_conv --> Conversion rate for test group
con_size --> population size for control group
test_size --> population size for test group
I understand that scale_one and scale_two are calculating the variance for each group, but I don't understand why they are adding both of them to calculate the standard deviation and why they are multiplying the cdf with 2 to get the p_value.
python statistics
New contributor
Kartikeya Sharma is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
$endgroup$
def get_pvalue(con_conv, test_conv,con_size, test_size,):
lift = - abs(test_conv - con_conv)
scale_one = con_conv * (1 - con_conv) * (1 / con_size)
scale_two = test_conv * (1 - test_conv) * (1 / test_size)
scale_val = (scale_one + scale_two)**0.5
p_value = 2 * stats.norm.cdf(lift, loc = 0, scale = scale_val )
return p_value
I have this function and I would like to know what it is actually doing and how it is actually calculating the p-value.
This is to find the difference between the conversion rate of control and test and group from an A/B test.
con_conv --> Conversion rate for control group
test_conv --> Conversion rate for test group
con_size --> population size for control group
test_size --> population size for test group
I understand that scale_one and scale_two are calculating the variance for each group, but I don't understand why they are adding both of them to calculate the standard deviation and why they are multiplying the cdf with 2 to get the p_value.
python statistics
python statistics
New contributor
Kartikeya Sharma is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
New contributor
Kartikeya Sharma is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
edited 18 hours ago
Stephen Rauch♦
1,52551330
1,52551330
New contributor
Kartikeya Sharma is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
asked 20 hours ago
Kartikeya SharmaKartikeya Sharma
101
101
New contributor
Kartikeya Sharma is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
New contributor
Kartikeya Sharma is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
Kartikeya Sharma is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
add a comment |
add a comment |
1 Answer
1
active
oldest
votes
$begingroup$
p_value = 2 * stats.norm.cdf(lift, loc = 0, scale = scale_val )
This is the key for your question: The p-value is the probability that the null hypothesis is true.
If the null hypothesis is true: Your model does not find any differences between groups.
If false: Your model finds differences between groups.
Given that you are using a model which its subyacent assumption is normallity (amongst others), the hypothesis test is to be tried comparing the probability in the context of a normal distribution.
The function stats.norm.cdf returns the probability of "lift being close to zero" if lift is supposed to be "normal". If lift is zero, then there is no difference between groups, so a p-value of <0.01 tell us that the probability that the groups are equal is almost 0, meaning that your groups are different.
The 2 is due to a concept called "two-tailed distribution": The difference between groups can be A greater than B or B greater that A, that's why you measure the difference in either two of the ways.
The addition between standard deviations obeys the concept of:
$Var(X+Y) = Var(X) + Var(Y)$ if $X$ and $Y$ are independent.
New contributor
Juan Esteban de la Calle is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
$endgroup$
add a comment |
Your Answer
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "557"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Kartikeya Sharma is a new contributor. Be nice, and check out our Code of Conduct.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f49248%2ffinding-p-value-explain%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
$begingroup$
p_value = 2 * stats.norm.cdf(lift, loc = 0, scale = scale_val )
This is the key for your question: The p-value is the probability that the null hypothesis is true.
If the null hypothesis is true: Your model does not find any differences between groups.
If false: Your model finds differences between groups.
Given that you are using a model which its subyacent assumption is normallity (amongst others), the hypothesis test is to be tried comparing the probability in the context of a normal distribution.
The function stats.norm.cdf returns the probability of "lift being close to zero" if lift is supposed to be "normal". If lift is zero, then there is no difference between groups, so a p-value of <0.01 tell us that the probability that the groups are equal is almost 0, meaning that your groups are different.
The 2 is due to a concept called "two-tailed distribution": The difference between groups can be A greater than B or B greater that A, that's why you measure the difference in either two of the ways.
The addition between standard deviations obeys the concept of:
$Var(X+Y) = Var(X) + Var(Y)$ if $X$ and $Y$ are independent.
New contributor
Juan Esteban de la Calle is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
$endgroup$
add a comment |
$begingroup$
p_value = 2 * stats.norm.cdf(lift, loc = 0, scale = scale_val )
This is the key for your question: The p-value is the probability that the null hypothesis is true.
If the null hypothesis is true: Your model does not find any differences between groups.
If false: Your model finds differences between groups.
Given that you are using a model which its subyacent assumption is normallity (amongst others), the hypothesis test is to be tried comparing the probability in the context of a normal distribution.
The function stats.norm.cdf returns the probability of "lift being close to zero" if lift is supposed to be "normal". If lift is zero, then there is no difference between groups, so a p-value of <0.01 tell us that the probability that the groups are equal is almost 0, meaning that your groups are different.
The 2 is due to a concept called "two-tailed distribution": The difference between groups can be A greater than B or B greater that A, that's why you measure the difference in either two of the ways.
The addition between standard deviations obeys the concept of:
$Var(X+Y) = Var(X) + Var(Y)$ if $X$ and $Y$ are independent.
New contributor
Juan Esteban de la Calle is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
$endgroup$
add a comment |
$begingroup$
p_value = 2 * stats.norm.cdf(lift, loc = 0, scale = scale_val )
This is the key for your question: The p-value is the probability that the null hypothesis is true.
If the null hypothesis is true: Your model does not find any differences between groups.
If false: Your model finds differences between groups.
Given that you are using a model which its subyacent assumption is normallity (amongst others), the hypothesis test is to be tried comparing the probability in the context of a normal distribution.
The function stats.norm.cdf returns the probability of "lift being close to zero" if lift is supposed to be "normal". If lift is zero, then there is no difference between groups, so a p-value of <0.01 tell us that the probability that the groups are equal is almost 0, meaning that your groups are different.
The 2 is due to a concept called "two-tailed distribution": The difference between groups can be A greater than B or B greater that A, that's why you measure the difference in either two of the ways.
The addition between standard deviations obeys the concept of:
$Var(X+Y) = Var(X) + Var(Y)$ if $X$ and $Y$ are independent.
New contributor
Juan Esteban de la Calle is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
$endgroup$
p_value = 2 * stats.norm.cdf(lift, loc = 0, scale = scale_val )
This is the key for your question: The p-value is the probability that the null hypothesis is true.
If the null hypothesis is true: Your model does not find any differences between groups.
If false: Your model finds differences between groups.
Given that you are using a model which its subyacent assumption is normallity (amongst others), the hypothesis test is to be tried comparing the probability in the context of a normal distribution.
The function stats.norm.cdf returns the probability of "lift being close to zero" if lift is supposed to be "normal". If lift is zero, then there is no difference between groups, so a p-value of <0.01 tell us that the probability that the groups are equal is almost 0, meaning that your groups are different.
The 2 is due to a concept called "two-tailed distribution": The difference between groups can be A greater than B or B greater that A, that's why you measure the difference in either two of the ways.
The addition between standard deviations obeys the concept of:
$Var(X+Y) = Var(X) + Var(Y)$ if $X$ and $Y$ are independent.
New contributor
Juan Esteban de la Calle is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
edited 19 hours ago
New contributor
Juan Esteban de la Calle is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
answered 20 hours ago
Juan Esteban de la CalleJuan Esteban de la Calle
938
938
New contributor
Juan Esteban de la Calle is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
New contributor
Juan Esteban de la Calle is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
Juan Esteban de la Calle is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
add a comment |
add a comment |
Kartikeya Sharma is a new contributor. Be nice, and check out our Code of Conduct.
Kartikeya Sharma is a new contributor. Be nice, and check out our Code of Conduct.
Kartikeya Sharma is a new contributor. Be nice, and check out our Code of Conduct.
Kartikeya Sharma is a new contributor. Be nice, and check out our Code of Conduct.
Thanks for contributing an answer to Data Science Stack Exchange!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
Use MathJax to format equations. MathJax reference.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f49248%2ffinding-p-value-explain%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown