How to create a new column based on two other columns in Pandas?
$begingroup$
I am searching for a way to create a new column in my data. I have tried using iterows() but found it extremely time consuming in my dataset containing 40 lakh rows. So here is what I want.
Consider I have 2 columns:
Event ID, TeamID ,I want to find the no. of unique TeamID under each EventID as a new column. In other words, I want to find the number of teams participating in each event as a new column.
machine-learning python pandas dataframe
$endgroup$
add a comment |
$begingroup$
I am searching for a way to create a new column in my data. I have tried using iterows() but found it extremely time consuming in my dataset containing 40 lakh rows. So here is what I want.
Consider I have 2 columns:
Event ID, TeamID ,I want to find the no. of unique TeamID under each EventID as a new column. In other words, I want to find the number of teams participating in each event as a new column.
machine-learning python pandas dataframe
$endgroup$
$begingroup$
Something like a groupby?
$endgroup$
– Matthieu Brucher
Jan 3 at 16:59
add a comment |
$begingroup$
I am searching for a way to create a new column in my data. I have tried using iterows() but found it extremely time consuming in my dataset containing 40 lakh rows. So here is what I want.
Consider I have 2 columns:
Event ID, TeamID ,I want to find the no. of unique TeamID under each EventID as a new column. In other words, I want to find the number of teams participating in each event as a new column.
machine-learning python pandas dataframe
$endgroup$
I am searching for a way to create a new column in my data. I have tried using iterows() but found it extremely time consuming in my dataset containing 40 lakh rows. So here is what I want.
Consider I have 2 columns:
Event ID, TeamID ,I want to find the no. of unique TeamID under each EventID as a new column. In other words, I want to find the number of teams participating in each event as a new column.
machine-learning python pandas dataframe
machine-learning python pandas dataframe
asked Jan 3 at 16:56
Arjun ChandraArjun Chandra
31
31
$begingroup$
Something like a groupby?
$endgroup$
– Matthieu Brucher
Jan 3 at 16:59
add a comment |
$begingroup$
Something like a groupby?
$endgroup$
– Matthieu Brucher
Jan 3 at 16:59
$begingroup$
Something like a groupby?
$endgroup$
– Matthieu Brucher
Jan 3 at 16:59
$begingroup$
Something like a groupby?
$endgroup$
– Matthieu Brucher
Jan 3 at 16:59
add a comment |
2 Answers
2
active
oldest
votes
$begingroup$
You can try something like this to get a new dataframe that has pairs of (EventID, TeamCount):
event_id_team_count = data.groupby('EventID').agg({'TeamID': lambda x: x.nunique()})
event_id_team_count.rename(columns={"TeamID": "TeamCount"}, inplace=True)
If you want to have this new column in the original dataframe, all you need to do is to join the original dataframe with the one you have just created:
data = data.join(other=event_id_team_count, on="EventID")
$endgroup$
$begingroup$
Thanks a lot... It worked!!!
$endgroup$
– Arjun Chandra
Jan 4 at 6:25
add a comment |
$begingroup$
- Create a dictionary with the unique count of TeamID with respective to EventID
uCountDict = dict(data.groupby("EventID").TeamID.count())
uCountDict
Sample output
{'A': 4,
'C': 3,
'D': 2,
'F': 1
}
- Now create a new column with unique count with respective to TeamID using apply function
data["TeamCount"] = data.EventID.apply(lambda x : uCountDict[x])
New contributor
$endgroup$
add a comment |
Your Answer
StackExchange.ifUsing("editor", function () {
return StackExchange.using("mathjaxEditing", function () {
StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix) {
StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\\(","\\)"]]);
});
});
}, "mathjax-editing");
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "557"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f43439%2fhow-to-create-a-new-column-based-on-two-other-columns-in-pandas%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
2 Answers
2
active
oldest
votes
2 Answers
2
active
oldest
votes
active
oldest
votes
active
oldest
votes
$begingroup$
You can try something like this to get a new dataframe that has pairs of (EventID, TeamCount):
event_id_team_count = data.groupby('EventID').agg({'TeamID': lambda x: x.nunique()})
event_id_team_count.rename(columns={"TeamID": "TeamCount"}, inplace=True)
If you want to have this new column in the original dataframe, all you need to do is to join the original dataframe with the one you have just created:
data = data.join(other=event_id_team_count, on="EventID")
$endgroup$
$begingroup$
Thanks a lot... It worked!!!
$endgroup$
– Arjun Chandra
Jan 4 at 6:25
add a comment |
$begingroup$
You can try something like this to get a new dataframe that has pairs of (EventID, TeamCount):
event_id_team_count = data.groupby('EventID').agg({'TeamID': lambda x: x.nunique()})
event_id_team_count.rename(columns={"TeamID": "TeamCount"}, inplace=True)
If you want to have this new column in the original dataframe, all you need to do is to join the original dataframe with the one you have just created:
data = data.join(other=event_id_team_count, on="EventID")
$endgroup$
$begingroup$
Thanks a lot... It worked!!!
$endgroup$
– Arjun Chandra
Jan 4 at 6:25
add a comment |
$begingroup$
You can try something like this to get a new dataframe that has pairs of (EventID, TeamCount):
event_id_team_count = data.groupby('EventID').agg({'TeamID': lambda x: x.nunique()})
event_id_team_count.rename(columns={"TeamID": "TeamCount"}, inplace=True)
If you want to have this new column in the original dataframe, all you need to do is to join the original dataframe with the one you have just created:
data = data.join(other=event_id_team_count, on="EventID")
$endgroup$
You can try something like this to get a new dataframe that has pairs of (EventID, TeamCount):
event_id_team_count = data.groupby('EventID').agg({'TeamID': lambda x: x.nunique()})
event_id_team_count.rename(columns={"TeamID": "TeamCount"}, inplace=True)
If you want to have this new column in the original dataframe, all you need to do is to join the original dataframe with the one you have just created:
data = data.join(other=event_id_team_count, on="EventID")
answered Jan 3 at 17:32
msklmskl
712
712
$begingroup$
Thanks a lot... It worked!!!
$endgroup$
– Arjun Chandra
Jan 4 at 6:25
add a comment |
$begingroup$
Thanks a lot... It worked!!!
$endgroup$
– Arjun Chandra
Jan 4 at 6:25
$begingroup$
Thanks a lot... It worked!!!
$endgroup$
– Arjun Chandra
Jan 4 at 6:25
$begingroup$
Thanks a lot... It worked!!!
$endgroup$
– Arjun Chandra
Jan 4 at 6:25
add a comment |
$begingroup$
- Create a dictionary with the unique count of TeamID with respective to EventID
uCountDict = dict(data.groupby("EventID").TeamID.count())
uCountDict
Sample output
{'A': 4,
'C': 3,
'D': 2,
'F': 1
}
- Now create a new column with unique count with respective to TeamID using apply function
data["TeamCount"] = data.EventID.apply(lambda x : uCountDict[x])
New contributor
$endgroup$
add a comment |
$begingroup$
- Create a dictionary with the unique count of TeamID with respective to EventID
uCountDict = dict(data.groupby("EventID").TeamID.count())
uCountDict
Sample output
{'A': 4,
'C': 3,
'D': 2,
'F': 1
}
- Now create a new column with unique count with respective to TeamID using apply function
data["TeamCount"] = data.EventID.apply(lambda x : uCountDict[x])
New contributor
$endgroup$
add a comment |
$begingroup$
- Create a dictionary with the unique count of TeamID with respective to EventID
uCountDict = dict(data.groupby("EventID").TeamID.count())
uCountDict
Sample output
{'A': 4,
'C': 3,
'D': 2,
'F': 1
}
- Now create a new column with unique count with respective to TeamID using apply function
data["TeamCount"] = data.EventID.apply(lambda x : uCountDict[x])
New contributor
$endgroup$
- Create a dictionary with the unique count of TeamID with respective to EventID
uCountDict = dict(data.groupby("EventID").TeamID.count())
uCountDict
Sample output
{'A': 4,
'C': 3,
'D': 2,
'F': 1
}
- Now create a new column with unique count with respective to TeamID using apply function
data["TeamCount"] = data.EventID.apply(lambda x : uCountDict[x])
New contributor
New contributor
answered 5 hours ago
NoorNoor
12
12
New contributor
New contributor
add a comment |
add a comment |
Thanks for contributing an answer to Data Science Stack Exchange!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
Use MathJax to format equations. MathJax reference.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f43439%2fhow-to-create-a-new-column-based-on-two-other-columns-in-pandas%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
$begingroup$
Something like a groupby?
$endgroup$
– Matthieu Brucher
Jan 3 at 16:59