after grouping to minimum value in pandas, how to display the matching row result entirely along min() value
$begingroup$
The dataframe contains
>> df
A B C
A
196512 196512 1325 12.9010511000000
196512 196512 114569 12.9267705000000
196512 196512 118910 12.8983353775637
196512 196512 100688 12.9505091000000
196795 196795 28978 12.7805170314276
196795 196795 34591 12.8994111000000
196795 196795 13078 12.9135746000000
196795 196795 24173 12.8769653100000
196341 196341 118910 12.8983353775637
196341 196341 100688 12.9505091000000
196641 196641 28972 12.7805170314276
196641 196641 34591 12.8994111000000
196346 196341 118910 12.8983353775637
196346 196341 100688 12.9505091000000
196646 196641 28980 12.7805170314276
196646 196641 34591 12.8994111000000
I tried to get minimum value for each group and display using the following code,
df.columns = ['a','b','c']
df.index = df.a.astype(str)
dd=df.groupby('a').min()['c']
it gives the result
196512 12.7805170314276
196795 12.7805170314276
196341 12.7805170314276
196346 12.7805170314276
but after grouping, I want to get the row with the minimum 'c' value, grouped by column 'a' and display that full matching row in result
like,
196512 118910 12.8983353775637
196795 28978 12.7805170314276
196341 28972 12.7805170314276
196346 28980 12.7805170314276
python pandas dataframe
$endgroup$
add a comment |
$begingroup$
The dataframe contains
>> df
A B C
A
196512 196512 1325 12.9010511000000
196512 196512 114569 12.9267705000000
196512 196512 118910 12.8983353775637
196512 196512 100688 12.9505091000000
196795 196795 28978 12.7805170314276
196795 196795 34591 12.8994111000000
196795 196795 13078 12.9135746000000
196795 196795 24173 12.8769653100000
196341 196341 118910 12.8983353775637
196341 196341 100688 12.9505091000000
196641 196641 28972 12.7805170314276
196641 196641 34591 12.8994111000000
196346 196341 118910 12.8983353775637
196346 196341 100688 12.9505091000000
196646 196641 28980 12.7805170314276
196646 196641 34591 12.8994111000000
I tried to get minimum value for each group and display using the following code,
df.columns = ['a','b','c']
df.index = df.a.astype(str)
dd=df.groupby('a').min()['c']
it gives the result
196512 12.7805170314276
196795 12.7805170314276
196341 12.7805170314276
196346 12.7805170314276
but after grouping, I want to get the row with the minimum 'c' value, grouped by column 'a' and display that full matching row in result
like,
196512 118910 12.8983353775637
196795 28978 12.7805170314276
196341 28972 12.7805170314276
196346 28980 12.7805170314276
python pandas dataframe
$endgroup$
add a comment |
$begingroup$
The dataframe contains
>> df
A B C
A
196512 196512 1325 12.9010511000000
196512 196512 114569 12.9267705000000
196512 196512 118910 12.8983353775637
196512 196512 100688 12.9505091000000
196795 196795 28978 12.7805170314276
196795 196795 34591 12.8994111000000
196795 196795 13078 12.9135746000000
196795 196795 24173 12.8769653100000
196341 196341 118910 12.8983353775637
196341 196341 100688 12.9505091000000
196641 196641 28972 12.7805170314276
196641 196641 34591 12.8994111000000
196346 196341 118910 12.8983353775637
196346 196341 100688 12.9505091000000
196646 196641 28980 12.7805170314276
196646 196641 34591 12.8994111000000
I tried to get minimum value for each group and display using the following code,
df.columns = ['a','b','c']
df.index = df.a.astype(str)
dd=df.groupby('a').min()['c']
it gives the result
196512 12.7805170314276
196795 12.7805170314276
196341 12.7805170314276
196346 12.7805170314276
but after grouping, I want to get the row with the minimum 'c' value, grouped by column 'a' and display that full matching row in result
like,
196512 118910 12.8983353775637
196795 28978 12.7805170314276
196341 28972 12.7805170314276
196346 28980 12.7805170314276
python pandas dataframe
$endgroup$
The dataframe contains
>> df
A B C
A
196512 196512 1325 12.9010511000000
196512 196512 114569 12.9267705000000
196512 196512 118910 12.8983353775637
196512 196512 100688 12.9505091000000
196795 196795 28978 12.7805170314276
196795 196795 34591 12.8994111000000
196795 196795 13078 12.9135746000000
196795 196795 24173 12.8769653100000
196341 196341 118910 12.8983353775637
196341 196341 100688 12.9505091000000
196641 196641 28972 12.7805170314276
196641 196641 34591 12.8994111000000
196346 196341 118910 12.8983353775637
196346 196341 100688 12.9505091000000
196646 196641 28980 12.7805170314276
196646 196641 34591 12.8994111000000
I tried to get minimum value for each group and display using the following code,
df.columns = ['a','b','c']
df.index = df.a.astype(str)
dd=df.groupby('a').min()['c']
it gives the result
196512 12.7805170314276
196795 12.7805170314276
196341 12.7805170314276
196346 12.7805170314276
but after grouping, I want to get the row with the minimum 'c' value, grouped by column 'a' and display that full matching row in result
like,
196512 118910 12.8983353775637
196795 28978 12.7805170314276
196341 28972 12.7805170314276
196346 28980 12.7805170314276
python pandas dataframe
python pandas dataframe
edited Aug 25 '18 at 18:06
Stephen Rauch
1,51551129
1,51551129
asked Jan 5 '18 at 4:27
Sam JoeSam Joe
38118
38118
add a comment |
add a comment |
4 Answers
4
active
oldest
votes
$begingroup$
You can do this. But I doubt the efficiency.
>> import pandas as pd
>> df = pd.DataFrame({'a':[1,1,3,3],'b':[4,5,6,3], 'c':[1,2,3,5]})
>> df
a b c
0 1 4 1
1 1 5 2
2 3 6 3
3 3 3 5
>> df[df['c'].isin(df.groupby('a').min()['c'].values)]
a b c
0 1 4 1
2 3 6 3
$endgroup$
add a comment |
$begingroup$
In case this can help anyone else. Here is a solution that is more computationally efficient.
TL;DR version
If each row already has a unique index, then do this:
>>> df.loc[df.groupby('A')['C'].idxmin()]
If you've already indexed by 'A', then convert 'A' back into a column first.
>>> df2 = df.reset_index()
>>> df2.loc[df2.groupby('A')['C'].idxmin()]
Step by Step explanation:
Step 1.
First, make sure each row in your dataframe is uniquely indexed. This is the default when importing csv data. e.g.
>>> df = pd.read_csv('questionData.csv'); df
A B C
0 196512 1325 12.901051
1 196512 114569 12.926770
2 196512 118910 12.898335
3 196512 100688 12.950509
4 196795 28978 12.780517
5 196795 34591 12.899411
6 196795 13078 12.913575
7 196795 24173 12.876965
8 196341 118910 12.898335
9 196341 100688 12.950509
10 196641 28972 12.780517
11 196641 34591 12.899411
12 196346 118910 12.898335
13 196346 100688 12.950509
14 196646 28980 12.780517
15 196646 34591 12.899411
Aside: If you already converted column 'A' into an index, then you can turn the index back into a column (https://stackoverflow.com/questions/20461165/how-to-convert-pandas-index-in-a-dataframe-to-a-column) by doing: df.reset_index()
Step 2.
Use the pandas.DataFrame.idxmin function to retrieve the indices of the minimum of each group.
The semantics of the example below is this: "group by 'A', then just look at the 'C' column of each group, and finally return the index corresponding to the minimum 'C' in each group.
>>> indices = df.groupby('A')['C'].idxmin; indices
A
196341 8
196346 12
196512 2
196641 10
196646 14
196795 4
Name: C, dtype: int64
Step 3.
Finally, use the retrieved indices in the original dataframe using pandas.DataFrame.loc to get the rows of the original dataframe correponding to the minimum values of 'C' in each group that was grouped by 'A'.
>>> df.loc[indices]
A B C
8 196341 118910 12.898335
12 196346 118910 12.898335
2 196512 118910 12.898335
10 196641 28972 12.780517
14 196646 28980 12.780517
4 196795 28978 12.780517
Note: The groupby('A') operation returns groups sorted by A. Thus 'indices' is sorted by A. If we want the original order, we just have to do
>>> df.loc[indices].sort_index()
A B C
2 196512 118910 12.898335
4 196795 28978 12.780517
8 196341 118910 12.898335
10 196641 28972 12.780517
12 196346 118910 12.898335
14 196646 28980 12.780517
$endgroup$
add a comment |
$begingroup$
First check your data. You can't get value 12.780517 for index 196341. This value will be for index 196641.
Now, to get column 'b' in your result, use pd.merge. Don't make column 'a' as index. Leave it as it is.
>>> df = pd.DataFrame({'a':[196512, 196512, 196512, 196512, 196795, 196795, 196795, 196795, 196341, 196341, 196641, 196641, 196346, 196346, 196646, 196646],'b':[1325 , 114569 , 118910 , 100688 , 28978 ,34591 , 13078 , 24173 , 118910 , 100688 , 28972 , 34591 , 118910 , 100688 , 28980 , 34591 ],'c':[12.9010511000000 ,12.9267705000000 ,12.8983353775637 ,12.9505091000000 ,12.7805170314276 ,12.8994111000000 ,12.9135746000000 ,12.8769653100000 ,12.8983353775637 ,12.9505091000000 ,12.7805170314276 ,12.8994111000000 ,12.8983353775637 ,12.9505091000000 ,12.7805170314276 ,12.8994111000000 ]})
>>> df1 = df.groupby(['a'])['c'].min()
>>> df1
a
196341 12.898335
196346 12.898335
196512 12.898335
196641 12.780517
196646 12.780517
196795 12.780517
Name: c, dtype: float64
df1 is a series type object here. Convert it to dataframe.
>>> df1 = pd.DataFrame(df1, columns = ['c'])
Also, make index of df1 as column 'a' and change index.
>>> df1['a'] = df1.index
>>> df1.index = range(df1.shape[0])
>>> df1
c a
0 12.898335 196341
1 12.898335 196346
2 12.898335 196512
3 12.780517 196641
4 12.780517 196646
5 12.780517 196795
Now you have to join this df1 with df on both columns 'a' and 'c'.
>>> pd.merge(df,df1, on = ['a','c'])
a b c
0 196512 118910 12.898335
1 196795 28978 12.780517
2 196341 118910 12.898335
3 196641 28972 12.780517
4 196346 118910 12.898335
5 196646 28980 12.780517
And you have your output as desired!
$endgroup$
add a comment |
$begingroup$
You can sort the dataFrame by count and then remove duplicates. I think it's easier:
df.sort_values('c').drop_duplicates(['a'])
Credit to Rani from StackOverflow
New contributor
$endgroup$
add a comment |
Your Answer
StackExchange.ifUsing("editor", function () {
return StackExchange.using("mathjaxEditing", function () {
StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix) {
StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\\(","\\)"]]);
});
});
}, "mathjax-editing");
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "557"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f26308%2fafter-grouping-to-minimum-value-in-pandas-how-to-display-the-matching-row-resul%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
4 Answers
4
active
oldest
votes
4 Answers
4
active
oldest
votes
active
oldest
votes
active
oldest
votes
$begingroup$
You can do this. But I doubt the efficiency.
>> import pandas as pd
>> df = pd.DataFrame({'a':[1,1,3,3],'b':[4,5,6,3], 'c':[1,2,3,5]})
>> df
a b c
0 1 4 1
1 1 5 2
2 3 6 3
3 3 3 5
>> df[df['c'].isin(df.groupby('a').min()['c'].values)]
a b c
0 1 4 1
2 3 6 3
$endgroup$
add a comment |
$begingroup$
You can do this. But I doubt the efficiency.
>> import pandas as pd
>> df = pd.DataFrame({'a':[1,1,3,3],'b':[4,5,6,3], 'c':[1,2,3,5]})
>> df
a b c
0 1 4 1
1 1 5 2
2 3 6 3
3 3 3 5
>> df[df['c'].isin(df.groupby('a').min()['c'].values)]
a b c
0 1 4 1
2 3 6 3
$endgroup$
add a comment |
$begingroup$
You can do this. But I doubt the efficiency.
>> import pandas as pd
>> df = pd.DataFrame({'a':[1,1,3,3],'b':[4,5,6,3], 'c':[1,2,3,5]})
>> df
a b c
0 1 4 1
1 1 5 2
2 3 6 3
3 3 3 5
>> df[df['c'].isin(df.groupby('a').min()['c'].values)]
a b c
0 1 4 1
2 3 6 3
$endgroup$
You can do this. But I doubt the efficiency.
>> import pandas as pd
>> df = pd.DataFrame({'a':[1,1,3,3],'b':[4,5,6,3], 'c':[1,2,3,5]})
>> df
a b c
0 1 4 1
1 1 5 2
2 3 6 3
3 3 3 5
>> df[df['c'].isin(df.groupby('a').min()['c'].values)]
a b c
0 1 4 1
2 3 6 3
answered Jan 5 '18 at 8:11
Kiritee GakKiritee Gak
1,2311420
1,2311420
add a comment |
add a comment |
$begingroup$
In case this can help anyone else. Here is a solution that is more computationally efficient.
TL;DR version
If each row already has a unique index, then do this:
>>> df.loc[df.groupby('A')['C'].idxmin()]
If you've already indexed by 'A', then convert 'A' back into a column first.
>>> df2 = df.reset_index()
>>> df2.loc[df2.groupby('A')['C'].idxmin()]
Step by Step explanation:
Step 1.
First, make sure each row in your dataframe is uniquely indexed. This is the default when importing csv data. e.g.
>>> df = pd.read_csv('questionData.csv'); df
A B C
0 196512 1325 12.901051
1 196512 114569 12.926770
2 196512 118910 12.898335
3 196512 100688 12.950509
4 196795 28978 12.780517
5 196795 34591 12.899411
6 196795 13078 12.913575
7 196795 24173 12.876965
8 196341 118910 12.898335
9 196341 100688 12.950509
10 196641 28972 12.780517
11 196641 34591 12.899411
12 196346 118910 12.898335
13 196346 100688 12.950509
14 196646 28980 12.780517
15 196646 34591 12.899411
Aside: If you already converted column 'A' into an index, then you can turn the index back into a column (https://stackoverflow.com/questions/20461165/how-to-convert-pandas-index-in-a-dataframe-to-a-column) by doing: df.reset_index()
Step 2.
Use the pandas.DataFrame.idxmin function to retrieve the indices of the minimum of each group.
The semantics of the example below is this: "group by 'A', then just look at the 'C' column of each group, and finally return the index corresponding to the minimum 'C' in each group.
>>> indices = df.groupby('A')['C'].idxmin; indices
A
196341 8
196346 12
196512 2
196641 10
196646 14
196795 4
Name: C, dtype: int64
Step 3.
Finally, use the retrieved indices in the original dataframe using pandas.DataFrame.loc to get the rows of the original dataframe correponding to the minimum values of 'C' in each group that was grouped by 'A'.
>>> df.loc[indices]
A B C
8 196341 118910 12.898335
12 196346 118910 12.898335
2 196512 118910 12.898335
10 196641 28972 12.780517
14 196646 28980 12.780517
4 196795 28978 12.780517
Note: The groupby('A') operation returns groups sorted by A. Thus 'indices' is sorted by A. If we want the original order, we just have to do
>>> df.loc[indices].sort_index()
A B C
2 196512 118910 12.898335
4 196795 28978 12.780517
8 196341 118910 12.898335
10 196641 28972 12.780517
12 196346 118910 12.898335
14 196646 28980 12.780517
$endgroup$
add a comment |
$begingroup$
In case this can help anyone else. Here is a solution that is more computationally efficient.
TL;DR version
If each row already has a unique index, then do this:
>>> df.loc[df.groupby('A')['C'].idxmin()]
If you've already indexed by 'A', then convert 'A' back into a column first.
>>> df2 = df.reset_index()
>>> df2.loc[df2.groupby('A')['C'].idxmin()]
Step by Step explanation:
Step 1.
First, make sure each row in your dataframe is uniquely indexed. This is the default when importing csv data. e.g.
>>> df = pd.read_csv('questionData.csv'); df
A B C
0 196512 1325 12.901051
1 196512 114569 12.926770
2 196512 118910 12.898335
3 196512 100688 12.950509
4 196795 28978 12.780517
5 196795 34591 12.899411
6 196795 13078 12.913575
7 196795 24173 12.876965
8 196341 118910 12.898335
9 196341 100688 12.950509
10 196641 28972 12.780517
11 196641 34591 12.899411
12 196346 118910 12.898335
13 196346 100688 12.950509
14 196646 28980 12.780517
15 196646 34591 12.899411
Aside: If you already converted column 'A' into an index, then you can turn the index back into a column (https://stackoverflow.com/questions/20461165/how-to-convert-pandas-index-in-a-dataframe-to-a-column) by doing: df.reset_index()
Step 2.
Use the pandas.DataFrame.idxmin function to retrieve the indices of the minimum of each group.
The semantics of the example below is this: "group by 'A', then just look at the 'C' column of each group, and finally return the index corresponding to the minimum 'C' in each group.
>>> indices = df.groupby('A')['C'].idxmin; indices
A
196341 8
196346 12
196512 2
196641 10
196646 14
196795 4
Name: C, dtype: int64
Step 3.
Finally, use the retrieved indices in the original dataframe using pandas.DataFrame.loc to get the rows of the original dataframe correponding to the minimum values of 'C' in each group that was grouped by 'A'.
>>> df.loc[indices]
A B C
8 196341 118910 12.898335
12 196346 118910 12.898335
2 196512 118910 12.898335
10 196641 28972 12.780517
14 196646 28980 12.780517
4 196795 28978 12.780517
Note: The groupby('A') operation returns groups sorted by A. Thus 'indices' is sorted by A. If we want the original order, we just have to do
>>> df.loc[indices].sort_index()
A B C
2 196512 118910 12.898335
4 196795 28978 12.780517
8 196341 118910 12.898335
10 196641 28972 12.780517
12 196346 118910 12.898335
14 196646 28980 12.780517
$endgroup$
add a comment |
$begingroup$
In case this can help anyone else. Here is a solution that is more computationally efficient.
TL;DR version
If each row already has a unique index, then do this:
>>> df.loc[df.groupby('A')['C'].idxmin()]
If you've already indexed by 'A', then convert 'A' back into a column first.
>>> df2 = df.reset_index()
>>> df2.loc[df2.groupby('A')['C'].idxmin()]
Step by Step explanation:
Step 1.
First, make sure each row in your dataframe is uniquely indexed. This is the default when importing csv data. e.g.
>>> df = pd.read_csv('questionData.csv'); df
A B C
0 196512 1325 12.901051
1 196512 114569 12.926770
2 196512 118910 12.898335
3 196512 100688 12.950509
4 196795 28978 12.780517
5 196795 34591 12.899411
6 196795 13078 12.913575
7 196795 24173 12.876965
8 196341 118910 12.898335
9 196341 100688 12.950509
10 196641 28972 12.780517
11 196641 34591 12.899411
12 196346 118910 12.898335
13 196346 100688 12.950509
14 196646 28980 12.780517
15 196646 34591 12.899411
Aside: If you already converted column 'A' into an index, then you can turn the index back into a column (https://stackoverflow.com/questions/20461165/how-to-convert-pandas-index-in-a-dataframe-to-a-column) by doing: df.reset_index()
Step 2.
Use the pandas.DataFrame.idxmin function to retrieve the indices of the minimum of each group.
The semantics of the example below is this: "group by 'A', then just look at the 'C' column of each group, and finally return the index corresponding to the minimum 'C' in each group.
>>> indices = df.groupby('A')['C'].idxmin; indices
A
196341 8
196346 12
196512 2
196641 10
196646 14
196795 4
Name: C, dtype: int64
Step 3.
Finally, use the retrieved indices in the original dataframe using pandas.DataFrame.loc to get the rows of the original dataframe correponding to the minimum values of 'C' in each group that was grouped by 'A'.
>>> df.loc[indices]
A B C
8 196341 118910 12.898335
12 196346 118910 12.898335
2 196512 118910 12.898335
10 196641 28972 12.780517
14 196646 28980 12.780517
4 196795 28978 12.780517
Note: The groupby('A') operation returns groups sorted by A. Thus 'indices' is sorted by A. If we want the original order, we just have to do
>>> df.loc[indices].sort_index()
A B C
2 196512 118910 12.898335
4 196795 28978 12.780517
8 196341 118910 12.898335
10 196641 28972 12.780517
12 196346 118910 12.898335
14 196646 28980 12.780517
$endgroup$
In case this can help anyone else. Here is a solution that is more computationally efficient.
TL;DR version
If each row already has a unique index, then do this:
>>> df.loc[df.groupby('A')['C'].idxmin()]
If you've already indexed by 'A', then convert 'A' back into a column first.
>>> df2 = df.reset_index()
>>> df2.loc[df2.groupby('A')['C'].idxmin()]
Step by Step explanation:
Step 1.
First, make sure each row in your dataframe is uniquely indexed. This is the default when importing csv data. e.g.
>>> df = pd.read_csv('questionData.csv'); df
A B C
0 196512 1325 12.901051
1 196512 114569 12.926770
2 196512 118910 12.898335
3 196512 100688 12.950509
4 196795 28978 12.780517
5 196795 34591 12.899411
6 196795 13078 12.913575
7 196795 24173 12.876965
8 196341 118910 12.898335
9 196341 100688 12.950509
10 196641 28972 12.780517
11 196641 34591 12.899411
12 196346 118910 12.898335
13 196346 100688 12.950509
14 196646 28980 12.780517
15 196646 34591 12.899411
Aside: If you already converted column 'A' into an index, then you can turn the index back into a column (https://stackoverflow.com/questions/20461165/how-to-convert-pandas-index-in-a-dataframe-to-a-column) by doing: df.reset_index()
Step 2.
Use the pandas.DataFrame.idxmin function to retrieve the indices of the minimum of each group.
The semantics of the example below is this: "group by 'A', then just look at the 'C' column of each group, and finally return the index corresponding to the minimum 'C' in each group.
>>> indices = df.groupby('A')['C'].idxmin; indices
A
196341 8
196346 12
196512 2
196641 10
196646 14
196795 4
Name: C, dtype: int64
Step 3.
Finally, use the retrieved indices in the original dataframe using pandas.DataFrame.loc to get the rows of the original dataframe correponding to the minimum values of 'C' in each group that was grouped by 'A'.
>>> df.loc[indices]
A B C
8 196341 118910 12.898335
12 196346 118910 12.898335
2 196512 118910 12.898335
10 196641 28972 12.780517
14 196646 28980 12.780517
4 196795 28978 12.780517
Note: The groupby('A') operation returns groups sorted by A. Thus 'indices' is sorted by A. If we want the original order, we just have to do
>>> df.loc[indices].sort_index()
A B C
2 196512 118910 12.898335
4 196795 28978 12.780517
8 196341 118910 12.898335
10 196641 28972 12.780517
12 196346 118910 12.898335
14 196646 28980 12.780517
edited Apr 25 '18 at 22:09
answered Apr 25 '18 at 22:03
Bon RyuBon Ryu
11116
11116
add a comment |
add a comment |
$begingroup$
First check your data. You can't get value 12.780517 for index 196341. This value will be for index 196641.
Now, to get column 'b' in your result, use pd.merge. Don't make column 'a' as index. Leave it as it is.
>>> df = pd.DataFrame({'a':[196512, 196512, 196512, 196512, 196795, 196795, 196795, 196795, 196341, 196341, 196641, 196641, 196346, 196346, 196646, 196646],'b':[1325 , 114569 , 118910 , 100688 , 28978 ,34591 , 13078 , 24173 , 118910 , 100688 , 28972 , 34591 , 118910 , 100688 , 28980 , 34591 ],'c':[12.9010511000000 ,12.9267705000000 ,12.8983353775637 ,12.9505091000000 ,12.7805170314276 ,12.8994111000000 ,12.9135746000000 ,12.8769653100000 ,12.8983353775637 ,12.9505091000000 ,12.7805170314276 ,12.8994111000000 ,12.8983353775637 ,12.9505091000000 ,12.7805170314276 ,12.8994111000000 ]})
>>> df1 = df.groupby(['a'])['c'].min()
>>> df1
a
196341 12.898335
196346 12.898335
196512 12.898335
196641 12.780517
196646 12.780517
196795 12.780517
Name: c, dtype: float64
df1 is a series type object here. Convert it to dataframe.
>>> df1 = pd.DataFrame(df1, columns = ['c'])
Also, make index of df1 as column 'a' and change index.
>>> df1['a'] = df1.index
>>> df1.index = range(df1.shape[0])
>>> df1
c a
0 12.898335 196341
1 12.898335 196346
2 12.898335 196512
3 12.780517 196641
4 12.780517 196646
5 12.780517 196795
Now you have to join this df1 with df on both columns 'a' and 'c'.
>>> pd.merge(df,df1, on = ['a','c'])
a b c
0 196512 118910 12.898335
1 196795 28978 12.780517
2 196341 118910 12.898335
3 196641 28972 12.780517
4 196346 118910 12.898335
5 196646 28980 12.780517
And you have your output as desired!
$endgroup$
add a comment |
$begingroup$
First check your data. You can't get value 12.780517 for index 196341. This value will be for index 196641.
Now, to get column 'b' in your result, use pd.merge. Don't make column 'a' as index. Leave it as it is.
>>> df = pd.DataFrame({'a':[196512, 196512, 196512, 196512, 196795, 196795, 196795, 196795, 196341, 196341, 196641, 196641, 196346, 196346, 196646, 196646],'b':[1325 , 114569 , 118910 , 100688 , 28978 ,34591 , 13078 , 24173 , 118910 , 100688 , 28972 , 34591 , 118910 , 100688 , 28980 , 34591 ],'c':[12.9010511000000 ,12.9267705000000 ,12.8983353775637 ,12.9505091000000 ,12.7805170314276 ,12.8994111000000 ,12.9135746000000 ,12.8769653100000 ,12.8983353775637 ,12.9505091000000 ,12.7805170314276 ,12.8994111000000 ,12.8983353775637 ,12.9505091000000 ,12.7805170314276 ,12.8994111000000 ]})
>>> df1 = df.groupby(['a'])['c'].min()
>>> df1
a
196341 12.898335
196346 12.898335
196512 12.898335
196641 12.780517
196646 12.780517
196795 12.780517
Name: c, dtype: float64
df1 is a series type object here. Convert it to dataframe.
>>> df1 = pd.DataFrame(df1, columns = ['c'])
Also, make index of df1 as column 'a' and change index.
>>> df1['a'] = df1.index
>>> df1.index = range(df1.shape[0])
>>> df1
c a
0 12.898335 196341
1 12.898335 196346
2 12.898335 196512
3 12.780517 196641
4 12.780517 196646
5 12.780517 196795
Now you have to join this df1 with df on both columns 'a' and 'c'.
>>> pd.merge(df,df1, on = ['a','c'])
a b c
0 196512 118910 12.898335
1 196795 28978 12.780517
2 196341 118910 12.898335
3 196641 28972 12.780517
4 196346 118910 12.898335
5 196646 28980 12.780517
And you have your output as desired!
$endgroup$
add a comment |
$begingroup$
First check your data. You can't get value 12.780517 for index 196341. This value will be for index 196641.
Now, to get column 'b' in your result, use pd.merge. Don't make column 'a' as index. Leave it as it is.
>>> df = pd.DataFrame({'a':[196512, 196512, 196512, 196512, 196795, 196795, 196795, 196795, 196341, 196341, 196641, 196641, 196346, 196346, 196646, 196646],'b':[1325 , 114569 , 118910 , 100688 , 28978 ,34591 , 13078 , 24173 , 118910 , 100688 , 28972 , 34591 , 118910 , 100688 , 28980 , 34591 ],'c':[12.9010511000000 ,12.9267705000000 ,12.8983353775637 ,12.9505091000000 ,12.7805170314276 ,12.8994111000000 ,12.9135746000000 ,12.8769653100000 ,12.8983353775637 ,12.9505091000000 ,12.7805170314276 ,12.8994111000000 ,12.8983353775637 ,12.9505091000000 ,12.7805170314276 ,12.8994111000000 ]})
>>> df1 = df.groupby(['a'])['c'].min()
>>> df1
a
196341 12.898335
196346 12.898335
196512 12.898335
196641 12.780517
196646 12.780517
196795 12.780517
Name: c, dtype: float64
df1 is a series type object here. Convert it to dataframe.
>>> df1 = pd.DataFrame(df1, columns = ['c'])
Also, make index of df1 as column 'a' and change index.
>>> df1['a'] = df1.index
>>> df1.index = range(df1.shape[0])
>>> df1
c a
0 12.898335 196341
1 12.898335 196346
2 12.898335 196512
3 12.780517 196641
4 12.780517 196646
5 12.780517 196795
Now you have to join this df1 with df on both columns 'a' and 'c'.
>>> pd.merge(df,df1, on = ['a','c'])
a b c
0 196512 118910 12.898335
1 196795 28978 12.780517
2 196341 118910 12.898335
3 196641 28972 12.780517
4 196346 118910 12.898335
5 196646 28980 12.780517
And you have your output as desired!
$endgroup$
First check your data. You can't get value 12.780517 for index 196341. This value will be for index 196641.
Now, to get column 'b' in your result, use pd.merge. Don't make column 'a' as index. Leave it as it is.
>>> df = pd.DataFrame({'a':[196512, 196512, 196512, 196512, 196795, 196795, 196795, 196795, 196341, 196341, 196641, 196641, 196346, 196346, 196646, 196646],'b':[1325 , 114569 , 118910 , 100688 , 28978 ,34591 , 13078 , 24173 , 118910 , 100688 , 28972 , 34591 , 118910 , 100688 , 28980 , 34591 ],'c':[12.9010511000000 ,12.9267705000000 ,12.8983353775637 ,12.9505091000000 ,12.7805170314276 ,12.8994111000000 ,12.9135746000000 ,12.8769653100000 ,12.8983353775637 ,12.9505091000000 ,12.7805170314276 ,12.8994111000000 ,12.8983353775637 ,12.9505091000000 ,12.7805170314276 ,12.8994111000000 ]})
>>> df1 = df.groupby(['a'])['c'].min()
>>> df1
a
196341 12.898335
196346 12.898335
196512 12.898335
196641 12.780517
196646 12.780517
196795 12.780517
Name: c, dtype: float64
df1 is a series type object here. Convert it to dataframe.
>>> df1 = pd.DataFrame(df1, columns = ['c'])
Also, make index of df1 as column 'a' and change index.
>>> df1['a'] = df1.index
>>> df1.index = range(df1.shape[0])
>>> df1
c a
0 12.898335 196341
1 12.898335 196346
2 12.898335 196512
3 12.780517 196641
4 12.780517 196646
5 12.780517 196795
Now you have to join this df1 with df on both columns 'a' and 'c'.
>>> pd.merge(df,df1, on = ['a','c'])
a b c
0 196512 118910 12.898335
1 196795 28978 12.780517
2 196341 118910 12.898335
3 196641 28972 12.780517
4 196346 118910 12.898335
5 196646 28980 12.780517
And you have your output as desired!
edited Jan 5 '18 at 8:40
answered Jan 5 '18 at 7:40
Ankit SethAnkit Seth
969218
969218
add a comment |
add a comment |
$begingroup$
You can sort the dataFrame by count and then remove duplicates. I think it's easier:
df.sort_values('c').drop_duplicates(['a'])
Credit to Rani from StackOverflow
New contributor
$endgroup$
add a comment |
$begingroup$
You can sort the dataFrame by count and then remove duplicates. I think it's easier:
df.sort_values('c').drop_duplicates(['a'])
Credit to Rani from StackOverflow
New contributor
$endgroup$
add a comment |
$begingroup$
You can sort the dataFrame by count and then remove duplicates. I think it's easier:
df.sort_values('c').drop_duplicates(['a'])
Credit to Rani from StackOverflow
New contributor
$endgroup$
You can sort the dataFrame by count and then remove duplicates. I think it's easier:
df.sort_values('c').drop_duplicates(['a'])
Credit to Rani from StackOverflow
New contributor
New contributor
answered yesterday
NoahNoah
1
1
New contributor
New contributor
add a comment |
add a comment |
Thanks for contributing an answer to Data Science Stack Exchange!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
Use MathJax to format equations. MathJax reference.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f26308%2fafter-grouping-to-minimum-value-in-pandas-how-to-display-the-matching-row-resul%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown