after grouping to minimum value in pandas, how to display the matching row result entirely along min() value

The dataframe contains

>> df

        A          B            C

A

196512  196512    1325  12.9010511000000

196512  196512  114569  12.9267705000000

196512  196512  118910  12.8983353775637

196512  196512  100688  12.9505091000000

196795  196795   28978  12.7805170314276

196795  196795   34591  12.8994111000000

196795  196795   13078  12.9135746000000

196795  196795   24173  12.8769653100000

196341  196341  118910  12.8983353775637

196341  196341  100688  12.9505091000000

196641  196641   28972  12.7805170314276

196641  196641   34591  12.8994111000000

196346  196341  118910  12.8983353775637

196346  196341  100688  12.9505091000000

196646  196641   28980  12.7805170314276

196646  196641   34591  12.8994111000000

I tried to get minimum value for each group and display using the following code,

df.columns = ['a','b','c']

df.index = df.a.astype(str)

dd=df.groupby('a').min()['c']

it gives the result

196512    12.7805170314276

196795    12.7805170314276

196341    12.7805170314276

196346    12.7805170314276

but after grouping, I want to get the row with the minimum 'c' value, grouped by column 'a' and display that full matching row in result
like,

196512    118910      12.8983353775637  

196795     28978      12.7805170314276

196341     28972      12.7805170314276

196346     28980      12.7805170314276

edited Aug 25 '18 at 18:06

Stephen Rauch

1,51551129

asked Jan 5 '18 at 4:27

Sam Joe

38118

add a comment |

The dataframe contains

>> df

        A          B            C

A

196512  196512    1325  12.9010511000000

196512  196512  114569  12.9267705000000

196512  196512  118910  12.8983353775637

196512  196512  100688  12.9505091000000

196795  196795   28978  12.7805170314276

196795  196795   34591  12.8994111000000

196795  196795   13078  12.9135746000000

196795  196795   24173  12.8769653100000

196341  196341  118910  12.8983353775637

196341  196341  100688  12.9505091000000

196641  196641   28972  12.7805170314276

196641  196641   34591  12.8994111000000

196346  196341  118910  12.8983353775637

196346  196341  100688  12.9505091000000

196646  196641   28980  12.7805170314276

196646  196641   34591  12.8994111000000

I tried to get minimum value for each group and display using the following code,

df.columns = ['a','b','c']

df.index = df.a.astype(str)

dd=df.groupby('a').min()['c']

it gives the result

196512    12.7805170314276

196795    12.7805170314276

196341    12.7805170314276

196346    12.7805170314276

but after grouping, I want to get the row with the minimum 'c' value, grouped by column 'a' and display that full matching row in result
like,

196512    118910      12.8983353775637  

196795     28978      12.7805170314276

196341     28972      12.7805170314276

196346     28980      12.7805170314276

edited Aug 25 '18 at 18:06

Stephen Rauch

1,51551129

asked Jan 5 '18 at 4:27

Sam Joe

38118

add a comment |

The dataframe contains

>> df

        A          B            C

A

196512  196512    1325  12.9010511000000

196512  196512  114569  12.9267705000000

196512  196512  118910  12.8983353775637

196512  196512  100688  12.9505091000000

196795  196795   28978  12.7805170314276

196795  196795   34591  12.8994111000000

196795  196795   13078  12.9135746000000

196795  196795   24173  12.8769653100000

196341  196341  118910  12.8983353775637

196341  196341  100688  12.9505091000000

196641  196641   28972  12.7805170314276

196641  196641   34591  12.8994111000000

196346  196341  118910  12.8983353775637

196346  196341  100688  12.9505091000000

196646  196641   28980  12.7805170314276

196646  196641   34591  12.8994111000000

I tried to get minimum value for each group and display using the following code,

df.columns = ['a','b','c']

df.index = df.a.astype(str)

dd=df.groupby('a').min()['c']

it gives the result

196512    12.7805170314276

196795    12.7805170314276

196341    12.7805170314276

196346    12.7805170314276

but after grouping, I want to get the row with the minimum 'c' value, grouped by column 'a' and display that full matching row in result
like,

196512    118910      12.8983353775637  

196795     28978      12.7805170314276

196341     28972      12.7805170314276

196346     28980      12.7805170314276

edited Aug 25 '18 at 18:06

Stephen Rauch

1,51551129

asked Jan 5 '18 at 4:27

Sam Joe

38118

The dataframe contains

>> df

        A          B            C

A

196512  196512    1325  12.9010511000000

196512  196512  114569  12.9267705000000

196512  196512  118910  12.8983353775637

196512  196512  100688  12.9505091000000

196795  196795   28978  12.7805170314276

196795  196795   34591  12.8994111000000

196795  196795   13078  12.9135746000000

196795  196795   24173  12.8769653100000

196341  196341  118910  12.8983353775637

196341  196341  100688  12.9505091000000

196641  196641   28972  12.7805170314276

196641  196641   34591  12.8994111000000

196346  196341  118910  12.8983353775637

196346  196341  100688  12.9505091000000

196646  196641   28980  12.7805170314276

196646  196641   34591  12.8994111000000

I tried to get minimum value for each group and display using the following code,

df.columns = ['a','b','c']

df.index = df.a.astype(str)

dd=df.groupby('a').min()['c']

it gives the result

196512    12.7805170314276

196795    12.7805170314276

196341    12.7805170314276

196346    12.7805170314276

but after grouping, I want to get the row with the minimum 'c' value, grouped by column 'a' and display that full matching row in result
like,

196512    118910      12.8983353775637  

196795     28978      12.7805170314276

196341     28972      12.7805170314276

196346     28980      12.7805170314276

python pandas dataframe

edited Aug 25 '18 at 18:06

Stephen Rauch

1,51551129

asked Jan 5 '18 at 4:27

Sam Joe

38118

edited Aug 25 '18 at 18:06

Stephen Rauch

1,51551129

asked Jan 5 '18 at 4:27

Sam Joe

38118

edited Aug 25 '18 at 18:06

Stephen Rauch

1,51551129

edited Aug 25 '18 at 18:06

Stephen Rauch

1,51551129

edited Aug 25 '18 at 18:06

Stephen Rauch

1,51551129

asked Jan 5 '18 at 4:27

Sam Joe

38118

asked Jan 5 '18 at 4:27

Sam Joe

38118

asked Jan 5 '18 at 4:27

Sam Joe

38118

add a comment |

4 Answers
4

active

oldest

votes

You can do this. But I doubt the efficiency.



>> import pandas as pd

>> df = pd.DataFrame({'a':[1,1,3,3],'b':[4,5,6,3], 'c':[1,2,3,5]})

>> df

   a  b  c

0  1  4  1

1  1  5  2

2  3  6  3

3  3  3  5

>> df[df['c'].isin(df.groupby('a').min()['c'].values)]

   a  b  c

0  1  4  1

2  3  6  3

answered Jan 5 '18 at 8:11

Kiritee Gak

1,2311420

add a comment |

In case this can help anyone else. Here is a solution that is more computationally efficient.

TL;DR version

If each row already has a unique index, then do this:

>>> df.loc[df.groupby('A')['C'].idxmin()]

If you've already indexed by 'A', then convert 'A' back into a column first.

>>> df2 = df.reset_index()

>>> df2.loc[df2.groupby('A')['C'].idxmin()]

Step by Step explanation:

Step 1.

First, make sure each row in your dataframe is uniquely indexed. This is the default when importing csv data. e.g.

    >>> df = pd.read_csv('questionData.csv'); df

        A       B       C

    0   196512  1325    12.901051

    1   196512  114569  12.926770

    2   196512  118910  12.898335

    3   196512  100688  12.950509

    4   196795  28978   12.780517

    5   196795  34591   12.899411

    6   196795  13078   12.913575

    7   196795  24173   12.876965

    8   196341  118910  12.898335

    9   196341  100688  12.950509

    10  196641  28972   12.780517

    11  196641  34591   12.899411

    12  196346  118910  12.898335

    13  196346  100688  12.950509

    14  196646  28980   12.780517

    15  196646  34591   12.899411

Aside: If you already converted column 'A' into an index, then you can turn the index back into a column (https://stackoverflow.com/questions/20461165/how-to-convert-pandas-index-in-a-dataframe-to-a-column) by doing: df.reset_index()

Step 2.

Use the pandas.DataFrame.idxmin function to retrieve the indices of the minimum of each group.

The semantics of the example below is this: "group by 'A', then just look at the 'C' column of each group, and finally return the index corresponding to the minimum 'C' in each group.

>>> indices = df.groupby('A')['C'].idxmin; indices

A

196341     8

196346    12

196512     2

196641    10

196646    14

196795     4

Name: C, dtype: int64

Step 3.

Finally, use the retrieved indices in the original dataframe using pandas.DataFrame.loc to get the rows of the original dataframe correponding to the minimum values of 'C' in each group that was grouped by 'A'.

>>> df.loc[indices]

    A       B       C

8   196341  118910  12.898335

12  196346  118910  12.898335

2   196512  118910  12.898335

10  196641  28972   12.780517

14  196646  28980   12.780517

4   196795  28978   12.780517

Note: The groupby('A') operation returns groups sorted by A. Thus 'indices' is sorted by A. If we want the original order, we just have to do

>>> df.loc[indices].sort_index()

    A       B       C

2   196512  118910  12.898335

4   196795  28978   12.780517

8   196341  118910  12.898335

10  196641  28972   12.780517

12  196346  118910  12.898335

14  196646  28980   12.780517

edited Apr 25 '18 at 22:09

answered Apr 25 '18 at 22:03

Bon Ryu

11116

add a comment |

First check your data. You can't get value 12.780517 for index 196341. This value will be for index 196641.

Now, to get column 'b' in your result, use pd.merge. Don't make column 'a' as index. Leave it as it is.

>>> df = pd.DataFrame({'a':[196512, 196512, 196512, 196512, 196795, 196795, 196795, 196795, 196341, 196341, 196641, 196641, 196346, 196346, 196646, 196646],'b':[1325 , 114569 , 118910 , 100688 , 28978 ,34591 , 13078 ,  24173 , 118910 , 100688 , 28972 , 34591 , 118910 , 100688 , 28980 , 34591 ],'c':[12.9010511000000 ,12.9267705000000 ,12.8983353775637 ,12.9505091000000 ,12.7805170314276 ,12.8994111000000 ,12.9135746000000 ,12.8769653100000 ,12.8983353775637 ,12.9505091000000 ,12.7805170314276 ,12.8994111000000 ,12.8983353775637 ,12.9505091000000 ,12.7805170314276 ,12.8994111000000 ]})

>>> df1 = df.groupby(['a'])['c'].min()

>>> df1

a

196341    12.898335

196346    12.898335

196512    12.898335

196641    12.780517

196646    12.780517

196795    12.780517

Name: c, dtype: float64

df1 is a series type object here. Convert it to dataframe.

>>> df1 = pd.DataFrame(df1, columns = ['c'])

Also, make index of df1 as column 'a' and change index.

>>> df1['a'] = df1.index

>>> df1.index = range(df1.shape[0])

>>> df1

       c       a

0  12.898335  196341

1  12.898335  196346

2  12.898335  196512

3  12.780517  196641

4  12.780517  196646

5  12.780517  196795

Now you have to join this df1 with df on both columns 'a' and 'c'.

>>> pd.merge(df,df1, on = ['a','c'])

    a       b          c

0  196512  118910  12.898335

1  196795   28978  12.780517

2  196341  118910  12.898335

3  196641   28972  12.780517

4  196346  118910  12.898335

5  196646   28980  12.780517

And you have your output as desired!

edited Jan 5 '18 at 8:40

answered Jan 5 '18 at 7:40

Ankit Seth

969218

add a comment |

You can sort the dataFrame by count and then remove duplicates. I think it's easier:

df.sort_values('c').drop_duplicates(['a'])

Credit to Rani from StackOverflow

answered yesterday

Noah

New contributor

add a comment |

Your Answer

StackExchange.ifUsing("editor", function () {
return StackExchange.using("mathjaxEditing", function () {
StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix) {
StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\$","\$"]]);
});
});
}, "mathjax-editing");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "557"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});

}
});

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f26308%2fafter-grouping-to-minimum-value-in-pandas-how-to-display-the-matching-row-resul%23new-answer', 'question_page');
}
);

Post as a guest

Name

Required, but never shown

4 Answers
4

active

oldest

votes

4 Answers
4

active

oldest

votes

You can do this. But I doubt the efficiency.



>> import pandas as pd

>> df = pd.DataFrame({'a':[1,1,3,3],'b':[4,5,6,3], 'c':[1,2,3,5]})

>> df

   a  b  c

0  1  4  1

1  1  5  2

2  3  6  3

3  3  3  5

>> df[df['c'].isin(df.groupby('a').min()['c'].values)]

   a  b  c

0  1  4  1

2  3  6  3

answered Jan 5 '18 at 8:11

Kiritee Gak

1,2311420

add a comment |

You can do this. But I doubt the efficiency.



>> import pandas as pd

>> df = pd.DataFrame({'a':[1,1,3,3],'b':[4,5,6,3], 'c':[1,2,3,5]})

>> df

   a  b  c

0  1  4  1

1  1  5  2

2  3  6  3

3  3  3  5

>> df[df['c'].isin(df.groupby('a').min()['c'].values)]

   a  b  c

0  1  4  1

2  3  6  3

answered Jan 5 '18 at 8:11

Kiritee Gak

1,2311420

add a comment |

You can do this. But I doubt the efficiency.



>> import pandas as pd

>> df = pd.DataFrame({'a':[1,1,3,3],'b':[4,5,6,3], 'c':[1,2,3,5]})

>> df

   a  b  c

0  1  4  1

1  1  5  2

2  3  6  3

3  3  3  5

>> df[df['c'].isin(df.groupby('a').min()['c'].values)]

   a  b  c

0  1  4  1

2  3  6  3

answered Jan 5 '18 at 8:11

Kiritee Gak

1,2311420

You can do this. But I doubt the efficiency.



>> import pandas as pd

>> df = pd.DataFrame({'a':[1,1,3,3],'b':[4,5,6,3], 'c':[1,2,3,5]})

>> df

   a  b  c

0  1  4  1

1  1  5  2

2  3  6  3

3  3  3  5

>> df[df['c'].isin(df.groupby('a').min()['c'].values)]

   a  b  c

0  1  4  1

2  3  6  3

answered Jan 5 '18 at 8:11

Kiritee Gak

1,2311420

answered Jan 5 '18 at 8:11

Kiritee Gak

1,2311420

answered Jan 5 '18 at 8:11

Kiritee Gak

1,2311420

answered Jan 5 '18 at 8:11

Kiritee Gak

1,2311420

add a comment |

In case this can help anyone else. Here is a solution that is more computationally efficient.

TL;DR version

If each row already has a unique index, then do this:

>>> df.loc[df.groupby('A')['C'].idxmin()]

If you've already indexed by 'A', then convert 'A' back into a column first.

>>> df2 = df.reset_index()

>>> df2.loc[df2.groupby('A')['C'].idxmin()]

Step by Step explanation:

Step 1.

First, make sure each row in your dataframe is uniquely indexed. This is the default when importing csv data. e.g.

    >>> df = pd.read_csv('questionData.csv'); df

        A       B       C

    0   196512  1325    12.901051

    1   196512  114569  12.926770

    2   196512  118910  12.898335

    3   196512  100688  12.950509

    4   196795  28978   12.780517

    5   196795  34591   12.899411

    6   196795  13078   12.913575

    7   196795  24173   12.876965

    8   196341  118910  12.898335

    9   196341  100688  12.950509

    10  196641  28972   12.780517

    11  196641  34591   12.899411

    12  196346  118910  12.898335

    13  196346  100688  12.950509

    14  196646  28980   12.780517

    15  196646  34591   12.899411

Step 2.

Use the pandas.DataFrame.idxmin function to retrieve the indices of the minimum of each group.

The semantics of the example below is this: "group by 'A', then just look at the 'C' column of each group, and finally return the index corresponding to the minimum 'C' in each group.

>>> indices = df.groupby('A')['C'].idxmin; indices

A

196341     8

196346    12

196512     2

196641    10

196646    14

196795     4

Name: C, dtype: int64

Step 3.

>>> df.loc[indices]

    A       B       C

8   196341  118910  12.898335

12  196346  118910  12.898335

2   196512  118910  12.898335

10  196641  28972   12.780517

14  196646  28980   12.780517

4   196795  28978   12.780517

Note: The groupby('A') operation returns groups sorted by A. Thus 'indices' is sorted by A. If we want the original order, we just have to do

>>> df.loc[indices].sort_index()

    A       B       C

2   196512  118910  12.898335

4   196795  28978   12.780517

8   196341  118910  12.898335

10  196641  28972   12.780517

12  196346  118910  12.898335

14  196646  28980   12.780517

edited Apr 25 '18 at 22:09

answered Apr 25 '18 at 22:03

Bon Ryu

11116

add a comment |

In case this can help anyone else. Here is a solution that is more computationally efficient.

TL;DR version

If each row already has a unique index, then do this:

>>> df.loc[df.groupby('A')['C'].idxmin()]

If you've already indexed by 'A', then convert 'A' back into a column first.

>>> df2 = df.reset_index()

>>> df2.loc[df2.groupby('A')['C'].idxmin()]

Step by Step explanation:

Step 1.

First, make sure each row in your dataframe is uniquely indexed. This is the default when importing csv data. e.g.

    >>> df = pd.read_csv('questionData.csv'); df

        A       B       C

    0   196512  1325    12.901051

    1   196512  114569  12.926770

    2   196512  118910  12.898335

    3   196512  100688  12.950509

    4   196795  28978   12.780517

    5   196795  34591   12.899411

    6   196795  13078   12.913575

    7   196795  24173   12.876965

    8   196341  118910  12.898335

    9   196341  100688  12.950509

    10  196641  28972   12.780517

    11  196641  34591   12.899411

    12  196346  118910  12.898335

    13  196346  100688  12.950509

    14  196646  28980   12.780517

    15  196646  34591   12.899411

Step 2.

Use the pandas.DataFrame.idxmin function to retrieve the indices of the minimum of each group.

The semantics of the example below is this: "group by 'A', then just look at the 'C' column of each group, and finally return the index corresponding to the minimum 'C' in each group.

>>> indices = df.groupby('A')['C'].idxmin; indices

A

196341     8

196346    12

196512     2

196641    10

196646    14

196795     4

Name: C, dtype: int64

Step 3.

>>> df.loc[indices]

    A       B       C

8   196341  118910  12.898335

12  196346  118910  12.898335

2   196512  118910  12.898335

10  196641  28972   12.780517

14  196646  28980   12.780517

4   196795  28978   12.780517

Note: The groupby('A') operation returns groups sorted by A. Thus 'indices' is sorted by A. If we want the original order, we just have to do

>>> df.loc[indices].sort_index()

    A       B       C

2   196512  118910  12.898335

4   196795  28978   12.780517

8   196341  118910  12.898335

10  196641  28972   12.780517

12  196346  118910  12.898335

14  196646  28980   12.780517

edited Apr 25 '18 at 22:09

answered Apr 25 '18 at 22:03

Bon Ryu

11116

add a comment |

In case this can help anyone else. Here is a solution that is more computationally efficient.

TL;DR version

If each row already has a unique index, then do this:

>>> df.loc[df.groupby('A')['C'].idxmin()]

If you've already indexed by 'A', then convert 'A' back into a column first.

>>> df2 = df.reset_index()

>>> df2.loc[df2.groupby('A')['C'].idxmin()]

Step by Step explanation:

Step 1.

First, make sure each row in your dataframe is uniquely indexed. This is the default when importing csv data. e.g.

    >>> df = pd.read_csv('questionData.csv'); df

        A       B       C

    0   196512  1325    12.901051

    1   196512  114569  12.926770

    2   196512  118910  12.898335

    3   196512  100688  12.950509

    4   196795  28978   12.780517

    5   196795  34591   12.899411

    6   196795  13078   12.913575

    7   196795  24173   12.876965

    8   196341  118910  12.898335

    9   196341  100688  12.950509

    10  196641  28972   12.780517

    11  196641  34591   12.899411

    12  196346  118910  12.898335

    13  196346  100688  12.950509

    14  196646  28980   12.780517

    15  196646  34591   12.899411

Step 2.

Use the pandas.DataFrame.idxmin function to retrieve the indices of the minimum of each group.

The semantics of the example below is this: "group by 'A', then just look at the 'C' column of each group, and finally return the index corresponding to the minimum 'C' in each group.

>>> indices = df.groupby('A')['C'].idxmin; indices

A

196341     8

196346    12

196512     2

196641    10

196646    14

196795     4

Name: C, dtype: int64

Step 3.

>>> df.loc[indices]

    A       B       C

8   196341  118910  12.898335

12  196346  118910  12.898335

2   196512  118910  12.898335

10  196641  28972   12.780517

14  196646  28980   12.780517

4   196795  28978   12.780517

Note: The groupby('A') operation returns groups sorted by A. Thus 'indices' is sorted by A. If we want the original order, we just have to do

>>> df.loc[indices].sort_index()

    A       B       C

2   196512  118910  12.898335

4   196795  28978   12.780517

8   196341  118910  12.898335

10  196641  28972   12.780517

12  196346  118910  12.898335

14  196646  28980   12.780517

edited Apr 25 '18 at 22:09

answered Apr 25 '18 at 22:03

Bon Ryu

11116

In case this can help anyone else. Here is a solution that is more computationally efficient.

TL;DR version

If each row already has a unique index, then do this:

>>> df.loc[df.groupby('A')['C'].idxmin()]

If you've already indexed by 'A', then convert 'A' back into a column first.

>>> df2 = df.reset_index()

>>> df2.loc[df2.groupby('A')['C'].idxmin()]

Step by Step explanation:

Step 1.

First, make sure each row in your dataframe is uniquely indexed. This is the default when importing csv data. e.g.

    >>> df = pd.read_csv('questionData.csv'); df

        A       B       C

    0   196512  1325    12.901051

    1   196512  114569  12.926770

    2   196512  118910  12.898335

    3   196512  100688  12.950509

    4   196795  28978   12.780517

    5   196795  34591   12.899411

    6   196795  13078   12.913575

    7   196795  24173   12.876965

    8   196341  118910  12.898335

    9   196341  100688  12.950509

    10  196641  28972   12.780517

    11  196641  34591   12.899411

    12  196346  118910  12.898335

    13  196346  100688  12.950509

    14  196646  28980   12.780517

    15  196646  34591   12.899411

Step 2.

Use the pandas.DataFrame.idxmin function to retrieve the indices of the minimum of each group.

The semantics of the example below is this: "group by 'A', then just look at the 'C' column of each group, and finally return the index corresponding to the minimum 'C' in each group.

>>> indices = df.groupby('A')['C'].idxmin; indices

A

196341     8

196346    12

196512     2

196641    10

196646    14

196795     4

Name: C, dtype: int64

Step 3.

>>> df.loc[indices]

    A       B       C

8   196341  118910  12.898335

12  196346  118910  12.898335

2   196512  118910  12.898335

10  196641  28972   12.780517

14  196646  28980   12.780517

4   196795  28978   12.780517

Note: The groupby('A') operation returns groups sorted by A. Thus 'indices' is sorted by A. If we want the original order, we just have to do

>>> df.loc[indices].sort_index()

    A       B       C

2   196512  118910  12.898335

4   196795  28978   12.780517

8   196341  118910  12.898335

10  196641  28972   12.780517

12  196346  118910  12.898335

14  196646  28980   12.780517

edited Apr 25 '18 at 22:09

answered Apr 25 '18 at 22:03

Bon Ryu

11116

edited Apr 25 '18 at 22:09

answered Apr 25 '18 at 22:03

Bon Ryu

11116

answered Apr 25 '18 at 22:03

Bon Ryu

11116

answered Apr 25 '18 at 22:03

Bon Ryu

11116

add a comment |

First check your data. You can't get value 12.780517 for index 196341. This value will be for index 196641.

Now, to get column 'b' in your result, use pd.merge. Don't make column 'a' as index. Leave it as it is.

>>> df = pd.DataFrame({'a':[196512, 196512, 196512, 196512, 196795, 196795, 196795, 196795, 196341, 196341, 196641, 196641, 196346, 196346, 196646, 196646],'b':[1325 , 114569 , 118910 , 100688 , 28978 ,34591 , 13078 ,  24173 , 118910 , 100688 , 28972 , 34591 , 118910 , 100688 , 28980 , 34591 ],'c':[12.9010511000000 ,12.9267705000000 ,12.8983353775637 ,12.9505091000000 ,12.7805170314276 ,12.8994111000000 ,12.9135746000000 ,12.8769653100000 ,12.8983353775637 ,12.9505091000000 ,12.7805170314276 ,12.8994111000000 ,12.8983353775637 ,12.9505091000000 ,12.7805170314276 ,12.8994111000000 ]})

>>> df1 = df.groupby(['a'])['c'].min()

>>> df1

a

196341    12.898335

196346    12.898335

196512    12.898335

196641    12.780517

196646    12.780517

196795    12.780517

Name: c, dtype: float64

df1 is a series type object here. Convert it to dataframe.

>>> df1 = pd.DataFrame(df1, columns = ['c'])

Also, make index of df1 as column 'a' and change index.

>>> df1['a'] = df1.index

>>> df1.index = range(df1.shape[0])

>>> df1

       c       a

0  12.898335  196341

1  12.898335  196346

2  12.898335  196512

3  12.780517  196641

4  12.780517  196646

5  12.780517  196795

Now you have to join this df1 with df on both columns 'a' and 'c'.

>>> pd.merge(df,df1, on = ['a','c'])

    a       b          c

0  196512  118910  12.898335

1  196795   28978  12.780517

2  196341  118910  12.898335

3  196641   28972  12.780517

4  196346  118910  12.898335

5  196646   28980  12.780517

And you have your output as desired!

edited Jan 5 '18 at 8:40

answered Jan 5 '18 at 7:40

Ankit Seth

969218

add a comment |

First check your data. You can't get value 12.780517 for index 196341. This value will be for index 196641.

Now, to get column 'b' in your result, use pd.merge. Don't make column 'a' as index. Leave it as it is.

>>> df = pd.DataFrame({'a':[196512, 196512, 196512, 196512, 196795, 196795, 196795, 196795, 196341, 196341, 196641, 196641, 196346, 196346, 196646, 196646],'b':[1325 , 114569 , 118910 , 100688 , 28978 ,34591 , 13078 ,  24173 , 118910 , 100688 , 28972 , 34591 , 118910 , 100688 , 28980 , 34591 ],'c':[12.9010511000000 ,12.9267705000000 ,12.8983353775637 ,12.9505091000000 ,12.7805170314276 ,12.8994111000000 ,12.9135746000000 ,12.8769653100000 ,12.8983353775637 ,12.9505091000000 ,12.7805170314276 ,12.8994111000000 ,12.8983353775637 ,12.9505091000000 ,12.7805170314276 ,12.8994111000000 ]})

>>> df1 = df.groupby(['a'])['c'].min()

>>> df1

a

196341    12.898335

196346    12.898335

196512    12.898335

196641    12.780517

196646    12.780517

196795    12.780517

Name: c, dtype: float64

df1 is a series type object here. Convert it to dataframe.

>>> df1 = pd.DataFrame(df1, columns = ['c'])

Also, make index of df1 as column 'a' and change index.

>>> df1['a'] = df1.index

>>> df1.index = range(df1.shape[0])

>>> df1

       c       a

0  12.898335  196341

1  12.898335  196346

2  12.898335  196512

3  12.780517  196641

4  12.780517  196646

5  12.780517  196795

Now you have to join this df1 with df on both columns 'a' and 'c'.

>>> pd.merge(df,df1, on = ['a','c'])

    a       b          c

0  196512  118910  12.898335

1  196795   28978  12.780517

2  196341  118910  12.898335

3  196641   28972  12.780517

4  196346  118910  12.898335

5  196646   28980  12.780517

And you have your output as desired!

edited Jan 5 '18 at 8:40

answered Jan 5 '18 at 7:40

Ankit Seth

969218

add a comment |

First check your data. You can't get value 12.780517 for index 196341. This value will be for index 196641.

Now, to get column 'b' in your result, use pd.merge. Don't make column 'a' as index. Leave it as it is.

>>> df = pd.DataFrame({'a':[196512, 196512, 196512, 196512, 196795, 196795, 196795, 196795, 196341, 196341, 196641, 196641, 196346, 196346, 196646, 196646],'b':[1325 , 114569 , 118910 , 100688 , 28978 ,34591 , 13078 ,  24173 , 118910 , 100688 , 28972 , 34591 , 118910 , 100688 , 28980 , 34591 ],'c':[12.9010511000000 ,12.9267705000000 ,12.8983353775637 ,12.9505091000000 ,12.7805170314276 ,12.8994111000000 ,12.9135746000000 ,12.8769653100000 ,12.8983353775637 ,12.9505091000000 ,12.7805170314276 ,12.8994111000000 ,12.8983353775637 ,12.9505091000000 ,12.7805170314276 ,12.8994111000000 ]})

>>> df1 = df.groupby(['a'])['c'].min()

>>> df1

a

196341    12.898335

196346    12.898335

196512    12.898335

196641    12.780517

196646    12.780517

196795    12.780517

Name: c, dtype: float64

df1 is a series type object here. Convert it to dataframe.

>>> df1 = pd.DataFrame(df1, columns = ['c'])

Also, make index of df1 as column 'a' and change index.

>>> df1['a'] = df1.index

>>> df1.index = range(df1.shape[0])

>>> df1

       c       a

0  12.898335  196341

1  12.898335  196346

2  12.898335  196512

3  12.780517  196641

4  12.780517  196646

5  12.780517  196795

Now you have to join this df1 with df on both columns 'a' and 'c'.

>>> pd.merge(df,df1, on = ['a','c'])

    a       b          c

0  196512  118910  12.898335

1  196795   28978  12.780517

2  196341  118910  12.898335

3  196641   28972  12.780517

4  196346  118910  12.898335

5  196646   28980  12.780517

And you have your output as desired!

edited Jan 5 '18 at 8:40

answered Jan 5 '18 at 7:40

Ankit Seth

969218

First check your data. You can't get value 12.780517 for index 196341. This value will be for index 196641.

Now, to get column 'b' in your result, use pd.merge. Don't make column 'a' as index. Leave it as it is.

>>> df = pd.DataFrame({'a':[196512, 196512, 196512, 196512, 196795, 196795, 196795, 196795, 196341, 196341, 196641, 196641, 196346, 196346, 196646, 196646],'b':[1325 , 114569 , 118910 , 100688 , 28978 ,34591 , 13078 ,  24173 , 118910 , 100688 , 28972 , 34591 , 118910 , 100688 , 28980 , 34591 ],'c':[12.9010511000000 ,12.9267705000000 ,12.8983353775637 ,12.9505091000000 ,12.7805170314276 ,12.8994111000000 ,12.9135746000000 ,12.8769653100000 ,12.8983353775637 ,12.9505091000000 ,12.7805170314276 ,12.8994111000000 ,12.8983353775637 ,12.9505091000000 ,12.7805170314276 ,12.8994111000000 ]})

>>> df1 = df.groupby(['a'])['c'].min()

>>> df1

a

196341    12.898335

196346    12.898335

196512    12.898335

196641    12.780517

196646    12.780517

196795    12.780517

Name: c, dtype: float64

df1 is a series type object here. Convert it to dataframe.

>>> df1 = pd.DataFrame(df1, columns = ['c'])

Also, make index of df1 as column 'a' and change index.

>>> df1['a'] = df1.index

>>> df1.index = range(df1.shape[0])

>>> df1

       c       a

0  12.898335  196341

1  12.898335  196346

2  12.898335  196512

3  12.780517  196641

4  12.780517  196646

5  12.780517  196795

Now you have to join this df1 with df on both columns 'a' and 'c'.

>>> pd.merge(df,df1, on = ['a','c'])

    a       b          c

0  196512  118910  12.898335

1  196795   28978  12.780517

2  196341  118910  12.898335

3  196641   28972  12.780517

4  196346  118910  12.898335

5  196646   28980  12.780517

And you have your output as desired!

edited Jan 5 '18 at 8:40

answered Jan 5 '18 at 7:40

Ankit Seth

969218

edited Jan 5 '18 at 8:40

answered Jan 5 '18 at 7:40

Ankit Seth

969218

answered Jan 5 '18 at 7:40

Ankit Seth

969218

answered Jan 5 '18 at 7:40

Ankit Seth

969218

add a comment |

You can sort the dataFrame by count and then remove duplicates. I think it's easier:

df.sort_values('c').drop_duplicates(['a'])

Credit to Rani from StackOverflow

answered yesterday

Noah

New contributor

add a comment |

You can sort the dataFrame by count and then remove duplicates. I think it's easier:

df.sort_values('c').drop_duplicates(['a'])

Credit to Rani from StackOverflow

answered yesterday

Noah

New contributor

add a comment |

You can sort the dataFrame by count and then remove duplicates. I think it's easier:

df.sort_values('c').drop_duplicates(['a'])

Credit to Rani from StackOverflow

answered yesterday

Noah

New contributor

You can sort the dataFrame by count and then remove duplicates. I think it's easier:

df.sort_values('c').drop_duplicates(['a'])

Credit to Rani from StackOverflow

answered yesterday

Noah

New contributor

answered yesterday

Noah

New contributor

answered yesterday

Noah

answered yesterday

Noah

New contributor

Noah is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.

add a comment |

draft saved

draft discarded

Thanks for contributing an answer to Data Science Stack Exchange!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

Use MathJax to format equations. MathJax reference.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Htydjtk