after grouping to minimum value in pandas, how to display the matching row result entirely along min() value












4












$begingroup$


The dataframe contains



>> df
A B C
A
196512 196512 1325 12.9010511000000
196512 196512 114569 12.9267705000000
196512 196512 118910 12.8983353775637
196512 196512 100688 12.9505091000000
196795 196795 28978 12.7805170314276
196795 196795 34591 12.8994111000000
196795 196795 13078 12.9135746000000
196795 196795 24173 12.8769653100000
196341 196341 118910 12.8983353775637
196341 196341 100688 12.9505091000000
196641 196641 28972 12.7805170314276
196641 196641 34591 12.8994111000000
196346 196341 118910 12.8983353775637
196346 196341 100688 12.9505091000000
196646 196641 28980 12.7805170314276
196646 196641 34591 12.8994111000000


I tried to get minimum value for each group and display using the following code,



df.columns = ['a','b','c']
df.index = df.a.astype(str)
dd=df.groupby('a').min()['c']


it gives the result



196512    12.7805170314276
196795 12.7805170314276
196341 12.7805170314276
196346 12.7805170314276


but after grouping, I want to get the row with the minimum 'c' value, grouped by column 'a' and display that full matching row in result
like,



196512    118910      12.8983353775637  
196795 28978 12.7805170314276
196341 28972 12.7805170314276
196346 28980 12.7805170314276









share|improve this question











$endgroup$

















    4












    $begingroup$


    The dataframe contains



    >> df
    A B C
    A
    196512 196512 1325 12.9010511000000
    196512 196512 114569 12.9267705000000
    196512 196512 118910 12.8983353775637
    196512 196512 100688 12.9505091000000
    196795 196795 28978 12.7805170314276
    196795 196795 34591 12.8994111000000
    196795 196795 13078 12.9135746000000
    196795 196795 24173 12.8769653100000
    196341 196341 118910 12.8983353775637
    196341 196341 100688 12.9505091000000
    196641 196641 28972 12.7805170314276
    196641 196641 34591 12.8994111000000
    196346 196341 118910 12.8983353775637
    196346 196341 100688 12.9505091000000
    196646 196641 28980 12.7805170314276
    196646 196641 34591 12.8994111000000


    I tried to get minimum value for each group and display using the following code,



    df.columns = ['a','b','c']
    df.index = df.a.astype(str)
    dd=df.groupby('a').min()['c']


    it gives the result



    196512    12.7805170314276
    196795 12.7805170314276
    196341 12.7805170314276
    196346 12.7805170314276


    but after grouping, I want to get the row with the minimum 'c' value, grouped by column 'a' and display that full matching row in result
    like,



    196512    118910      12.8983353775637  
    196795 28978 12.7805170314276
    196341 28972 12.7805170314276
    196346 28980 12.7805170314276









    share|improve this question











    $endgroup$















      4












      4








      4


      1



      $begingroup$


      The dataframe contains



      >> df
      A B C
      A
      196512 196512 1325 12.9010511000000
      196512 196512 114569 12.9267705000000
      196512 196512 118910 12.8983353775637
      196512 196512 100688 12.9505091000000
      196795 196795 28978 12.7805170314276
      196795 196795 34591 12.8994111000000
      196795 196795 13078 12.9135746000000
      196795 196795 24173 12.8769653100000
      196341 196341 118910 12.8983353775637
      196341 196341 100688 12.9505091000000
      196641 196641 28972 12.7805170314276
      196641 196641 34591 12.8994111000000
      196346 196341 118910 12.8983353775637
      196346 196341 100688 12.9505091000000
      196646 196641 28980 12.7805170314276
      196646 196641 34591 12.8994111000000


      I tried to get minimum value for each group and display using the following code,



      df.columns = ['a','b','c']
      df.index = df.a.astype(str)
      dd=df.groupby('a').min()['c']


      it gives the result



      196512    12.7805170314276
      196795 12.7805170314276
      196341 12.7805170314276
      196346 12.7805170314276


      but after grouping, I want to get the row with the minimum 'c' value, grouped by column 'a' and display that full matching row in result
      like,



      196512    118910      12.8983353775637  
      196795 28978 12.7805170314276
      196341 28972 12.7805170314276
      196346 28980 12.7805170314276









      share|improve this question











      $endgroup$




      The dataframe contains



      >> df
      A B C
      A
      196512 196512 1325 12.9010511000000
      196512 196512 114569 12.9267705000000
      196512 196512 118910 12.8983353775637
      196512 196512 100688 12.9505091000000
      196795 196795 28978 12.7805170314276
      196795 196795 34591 12.8994111000000
      196795 196795 13078 12.9135746000000
      196795 196795 24173 12.8769653100000
      196341 196341 118910 12.8983353775637
      196341 196341 100688 12.9505091000000
      196641 196641 28972 12.7805170314276
      196641 196641 34591 12.8994111000000
      196346 196341 118910 12.8983353775637
      196346 196341 100688 12.9505091000000
      196646 196641 28980 12.7805170314276
      196646 196641 34591 12.8994111000000


      I tried to get minimum value for each group and display using the following code,



      df.columns = ['a','b','c']
      df.index = df.a.astype(str)
      dd=df.groupby('a').min()['c']


      it gives the result



      196512    12.7805170314276
      196795 12.7805170314276
      196341 12.7805170314276
      196346 12.7805170314276


      but after grouping, I want to get the row with the minimum 'c' value, grouped by column 'a' and display that full matching row in result
      like,



      196512    118910      12.8983353775637  
      196795 28978 12.7805170314276
      196341 28972 12.7805170314276
      196346 28980 12.7805170314276






      python pandas dataframe






      share|improve this question















      share|improve this question













      share|improve this question




      share|improve this question








      edited Aug 25 '18 at 18:06









      Stephen Rauch

      1,51551129




      1,51551129










      asked Jan 5 '18 at 4:27









      Sam JoeSam Joe

      38118




      38118






















          4 Answers
          4






          active

          oldest

          votes


















          3












          $begingroup$

          You can do this. But I doubt the efficiency.




          >> import pandas as pd
          >> df = pd.DataFrame({'a':[1,1,3,3],'b':[4,5,6,3], 'c':[1,2,3,5]})
          >> df
          a b c
          0 1 4 1
          1 1 5 2
          2 3 6 3
          3 3 3 5
          >> df[df['c'].isin(df.groupby('a').min()['c'].values)]
          a b c
          0 1 4 1
          2 3 6 3





          share|improve this answer









          $endgroup$





















            11












            $begingroup$

            In case this can help anyone else. Here is a solution that is more computationally efficient.



            TL;DR version



            If each row already has a unique index, then do this:



            >>> df.loc[df.groupby('A')['C'].idxmin()]


            If you've already indexed by 'A', then convert 'A' back into a column first.



            >>> df2 = df.reset_index()
            >>> df2.loc[df2.groupby('A')['C'].idxmin()]




            Step by Step explanation:



            Step 1.



            First, make sure each row in your dataframe is uniquely indexed. This is the default when importing csv data. e.g.



                >>> df = pd.read_csv('questionData.csv'); df
            A B C
            0 196512 1325 12.901051
            1 196512 114569 12.926770
            2 196512 118910 12.898335
            3 196512 100688 12.950509
            4 196795 28978 12.780517
            5 196795 34591 12.899411
            6 196795 13078 12.913575
            7 196795 24173 12.876965
            8 196341 118910 12.898335
            9 196341 100688 12.950509
            10 196641 28972 12.780517
            11 196641 34591 12.899411
            12 196346 118910 12.898335
            13 196346 100688 12.950509
            14 196646 28980 12.780517
            15 196646 34591 12.899411


            Aside: If you already converted column 'A' into an index, then you can turn the index back into a column (https://stackoverflow.com/questions/20461165/how-to-convert-pandas-index-in-a-dataframe-to-a-column) by doing: df.reset_index()



            Step 2.



            Use the pandas.DataFrame.idxmin function to retrieve the indices of the minimum of each group.



            The semantics of the example below is this: "group by 'A', then just look at the 'C' column of each group, and finally return the index corresponding to the minimum 'C' in each group.



            >>> indices = df.groupby('A')['C'].idxmin; indices
            A
            196341 8
            196346 12
            196512 2
            196641 10
            196646 14
            196795 4
            Name: C, dtype: int64


            Step 3.



            Finally, use the retrieved indices in the original dataframe using pandas.DataFrame.loc to get the rows of the original dataframe correponding to the minimum values of 'C' in each group that was grouped by 'A'.



            >>> df.loc[indices]
            A B C
            8 196341 118910 12.898335
            12 196346 118910 12.898335
            2 196512 118910 12.898335
            10 196641 28972 12.780517
            14 196646 28980 12.780517
            4 196795 28978 12.780517


            Note: The groupby('A') operation returns groups sorted by A. Thus 'indices' is sorted by A. If we want the original order, we just have to do



            >>> df.loc[indices].sort_index()
            A B C
            2 196512 118910 12.898335
            4 196795 28978 12.780517
            8 196341 118910 12.898335
            10 196641 28972 12.780517
            12 196346 118910 12.898335
            14 196646 28980 12.780517





            share|improve this answer











            $endgroup$





















              2












              $begingroup$

              First check your data. You can't get value 12.780517 for index 196341. This value will be for index 196641.



              Now, to get column 'b' in your result, use pd.merge. Don't make column 'a' as index. Leave it as it is.



              >>> df = pd.DataFrame({'a':[196512, 196512, 196512, 196512, 196795, 196795, 196795, 196795, 196341, 196341, 196641, 196641, 196346, 196346, 196646, 196646],'b':[1325 , 114569 , 118910 , 100688 , 28978 ,34591 , 13078 ,  24173 , 118910 , 100688 , 28972 , 34591 , 118910 , 100688 , 28980 , 34591 ],'c':[12.9010511000000 ,12.9267705000000 ,12.8983353775637 ,12.9505091000000 ,12.7805170314276 ,12.8994111000000 ,12.9135746000000 ,12.8769653100000 ,12.8983353775637 ,12.9505091000000 ,12.7805170314276 ,12.8994111000000 ,12.8983353775637 ,12.9505091000000 ,12.7805170314276 ,12.8994111000000 ]})
              >>> df1 = df.groupby(['a'])['c'].min()
              >>> df1
              a
              196341 12.898335
              196346 12.898335
              196512 12.898335
              196641 12.780517
              196646 12.780517
              196795 12.780517
              Name: c, dtype: float64


              df1 is a series type object here. Convert it to dataframe.



              >>> df1 = pd.DataFrame(df1, columns = ['c'])


              Also, make index of df1 as column 'a' and change index.



              >>> df1['a'] = df1.index
              >>> df1.index = range(df1.shape[0])
              >>> df1
              c a
              0 12.898335 196341
              1 12.898335 196346
              2 12.898335 196512
              3 12.780517 196641
              4 12.780517 196646
              5 12.780517 196795


              Now you have to join this df1 with df on both columns 'a' and 'c'.



              >>> pd.merge(df,df1, on = ['a','c'])
              a b c
              0 196512 118910 12.898335
              1 196795 28978 12.780517
              2 196341 118910 12.898335
              3 196641 28972 12.780517
              4 196346 118910 12.898335
              5 196646 28980 12.780517


              And you have your output as desired!






              share|improve this answer











              $endgroup$





















                0












                $begingroup$

                You can sort the dataFrame by count and then remove duplicates. I think it's easier:



                df.sort_values('c').drop_duplicates(['a'])


                Credit to Rani from StackOverflow






                share|improve this answer








                New contributor




                Noah is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
                Check out our Code of Conduct.






                $endgroup$













                  Your Answer





                  StackExchange.ifUsing("editor", function () {
                  return StackExchange.using("mathjaxEditing", function () {
                  StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix) {
                  StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\\(","\\)"]]);
                  });
                  });
                  }, "mathjax-editing");

                  StackExchange.ready(function() {
                  var channelOptions = {
                  tags: "".split(" "),
                  id: "557"
                  };
                  initTagRenderer("".split(" "), "".split(" "), channelOptions);

                  StackExchange.using("externalEditor", function() {
                  // Have to fire editor after snippets, if snippets enabled
                  if (StackExchange.settings.snippets.snippetsEnabled) {
                  StackExchange.using("snippets", function() {
                  createEditor();
                  });
                  }
                  else {
                  createEditor();
                  }
                  });

                  function createEditor() {
                  StackExchange.prepareEditor({
                  heartbeatType: 'answer',
                  autoActivateHeartbeat: false,
                  convertImagesToLinks: false,
                  noModals: true,
                  showLowRepImageUploadWarning: true,
                  reputationToPostImages: null,
                  bindNavPrevention: true,
                  postfix: "",
                  imageUploader: {
                  brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
                  contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
                  allowUrls: true
                  },
                  onDemand: true,
                  discardSelector: ".discard-answer"
                  ,immediatelyShowMarkdownHelp:true
                  });


                  }
                  });














                  draft saved

                  draft discarded


















                  StackExchange.ready(
                  function () {
                  StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f26308%2fafter-grouping-to-minimum-value-in-pandas-how-to-display-the-matching-row-resul%23new-answer', 'question_page');
                  }
                  );

                  Post as a guest















                  Required, but never shown

























                  4 Answers
                  4






                  active

                  oldest

                  votes








                  4 Answers
                  4






                  active

                  oldest

                  votes









                  active

                  oldest

                  votes






                  active

                  oldest

                  votes









                  3












                  $begingroup$

                  You can do this. But I doubt the efficiency.




                  >> import pandas as pd
                  >> df = pd.DataFrame({'a':[1,1,3,3],'b':[4,5,6,3], 'c':[1,2,3,5]})
                  >> df
                  a b c
                  0 1 4 1
                  1 1 5 2
                  2 3 6 3
                  3 3 3 5
                  >> df[df['c'].isin(df.groupby('a').min()['c'].values)]
                  a b c
                  0 1 4 1
                  2 3 6 3





                  share|improve this answer









                  $endgroup$


















                    3












                    $begingroup$

                    You can do this. But I doubt the efficiency.




                    >> import pandas as pd
                    >> df = pd.DataFrame({'a':[1,1,3,3],'b':[4,5,6,3], 'c':[1,2,3,5]})
                    >> df
                    a b c
                    0 1 4 1
                    1 1 5 2
                    2 3 6 3
                    3 3 3 5
                    >> df[df['c'].isin(df.groupby('a').min()['c'].values)]
                    a b c
                    0 1 4 1
                    2 3 6 3





                    share|improve this answer









                    $endgroup$
















                      3












                      3








                      3





                      $begingroup$

                      You can do this. But I doubt the efficiency.




                      >> import pandas as pd
                      >> df = pd.DataFrame({'a':[1,1,3,3],'b':[4,5,6,3], 'c':[1,2,3,5]})
                      >> df
                      a b c
                      0 1 4 1
                      1 1 5 2
                      2 3 6 3
                      3 3 3 5
                      >> df[df['c'].isin(df.groupby('a').min()['c'].values)]
                      a b c
                      0 1 4 1
                      2 3 6 3





                      share|improve this answer









                      $endgroup$



                      You can do this. But I doubt the efficiency.




                      >> import pandas as pd
                      >> df = pd.DataFrame({'a':[1,1,3,3],'b':[4,5,6,3], 'c':[1,2,3,5]})
                      >> df
                      a b c
                      0 1 4 1
                      1 1 5 2
                      2 3 6 3
                      3 3 3 5
                      >> df[df['c'].isin(df.groupby('a').min()['c'].values)]
                      a b c
                      0 1 4 1
                      2 3 6 3






                      share|improve this answer












                      share|improve this answer



                      share|improve this answer










                      answered Jan 5 '18 at 8:11









                      Kiritee GakKiritee Gak

                      1,2311420




                      1,2311420























                          11












                          $begingroup$

                          In case this can help anyone else. Here is a solution that is more computationally efficient.



                          TL;DR version



                          If each row already has a unique index, then do this:



                          >>> df.loc[df.groupby('A')['C'].idxmin()]


                          If you've already indexed by 'A', then convert 'A' back into a column first.



                          >>> df2 = df.reset_index()
                          >>> df2.loc[df2.groupby('A')['C'].idxmin()]




                          Step by Step explanation:



                          Step 1.



                          First, make sure each row in your dataframe is uniquely indexed. This is the default when importing csv data. e.g.



                              >>> df = pd.read_csv('questionData.csv'); df
                          A B C
                          0 196512 1325 12.901051
                          1 196512 114569 12.926770
                          2 196512 118910 12.898335
                          3 196512 100688 12.950509
                          4 196795 28978 12.780517
                          5 196795 34591 12.899411
                          6 196795 13078 12.913575
                          7 196795 24173 12.876965
                          8 196341 118910 12.898335
                          9 196341 100688 12.950509
                          10 196641 28972 12.780517
                          11 196641 34591 12.899411
                          12 196346 118910 12.898335
                          13 196346 100688 12.950509
                          14 196646 28980 12.780517
                          15 196646 34591 12.899411


                          Aside: If you already converted column 'A' into an index, then you can turn the index back into a column (https://stackoverflow.com/questions/20461165/how-to-convert-pandas-index-in-a-dataframe-to-a-column) by doing: df.reset_index()



                          Step 2.



                          Use the pandas.DataFrame.idxmin function to retrieve the indices of the minimum of each group.



                          The semantics of the example below is this: "group by 'A', then just look at the 'C' column of each group, and finally return the index corresponding to the minimum 'C' in each group.



                          >>> indices = df.groupby('A')['C'].idxmin; indices
                          A
                          196341 8
                          196346 12
                          196512 2
                          196641 10
                          196646 14
                          196795 4
                          Name: C, dtype: int64


                          Step 3.



                          Finally, use the retrieved indices in the original dataframe using pandas.DataFrame.loc to get the rows of the original dataframe correponding to the minimum values of 'C' in each group that was grouped by 'A'.



                          >>> df.loc[indices]
                          A B C
                          8 196341 118910 12.898335
                          12 196346 118910 12.898335
                          2 196512 118910 12.898335
                          10 196641 28972 12.780517
                          14 196646 28980 12.780517
                          4 196795 28978 12.780517


                          Note: The groupby('A') operation returns groups sorted by A. Thus 'indices' is sorted by A. If we want the original order, we just have to do



                          >>> df.loc[indices].sort_index()
                          A B C
                          2 196512 118910 12.898335
                          4 196795 28978 12.780517
                          8 196341 118910 12.898335
                          10 196641 28972 12.780517
                          12 196346 118910 12.898335
                          14 196646 28980 12.780517





                          share|improve this answer











                          $endgroup$


















                            11












                            $begingroup$

                            In case this can help anyone else. Here is a solution that is more computationally efficient.



                            TL;DR version



                            If each row already has a unique index, then do this:



                            >>> df.loc[df.groupby('A')['C'].idxmin()]


                            If you've already indexed by 'A', then convert 'A' back into a column first.



                            >>> df2 = df.reset_index()
                            >>> df2.loc[df2.groupby('A')['C'].idxmin()]




                            Step by Step explanation:



                            Step 1.



                            First, make sure each row in your dataframe is uniquely indexed. This is the default when importing csv data. e.g.



                                >>> df = pd.read_csv('questionData.csv'); df
                            A B C
                            0 196512 1325 12.901051
                            1 196512 114569 12.926770
                            2 196512 118910 12.898335
                            3 196512 100688 12.950509
                            4 196795 28978 12.780517
                            5 196795 34591 12.899411
                            6 196795 13078 12.913575
                            7 196795 24173 12.876965
                            8 196341 118910 12.898335
                            9 196341 100688 12.950509
                            10 196641 28972 12.780517
                            11 196641 34591 12.899411
                            12 196346 118910 12.898335
                            13 196346 100688 12.950509
                            14 196646 28980 12.780517
                            15 196646 34591 12.899411


                            Aside: If you already converted column 'A' into an index, then you can turn the index back into a column (https://stackoverflow.com/questions/20461165/how-to-convert-pandas-index-in-a-dataframe-to-a-column) by doing: df.reset_index()



                            Step 2.



                            Use the pandas.DataFrame.idxmin function to retrieve the indices of the minimum of each group.



                            The semantics of the example below is this: "group by 'A', then just look at the 'C' column of each group, and finally return the index corresponding to the minimum 'C' in each group.



                            >>> indices = df.groupby('A')['C'].idxmin; indices
                            A
                            196341 8
                            196346 12
                            196512 2
                            196641 10
                            196646 14
                            196795 4
                            Name: C, dtype: int64


                            Step 3.



                            Finally, use the retrieved indices in the original dataframe using pandas.DataFrame.loc to get the rows of the original dataframe correponding to the minimum values of 'C' in each group that was grouped by 'A'.



                            >>> df.loc[indices]
                            A B C
                            8 196341 118910 12.898335
                            12 196346 118910 12.898335
                            2 196512 118910 12.898335
                            10 196641 28972 12.780517
                            14 196646 28980 12.780517
                            4 196795 28978 12.780517


                            Note: The groupby('A') operation returns groups sorted by A. Thus 'indices' is sorted by A. If we want the original order, we just have to do



                            >>> df.loc[indices].sort_index()
                            A B C
                            2 196512 118910 12.898335
                            4 196795 28978 12.780517
                            8 196341 118910 12.898335
                            10 196641 28972 12.780517
                            12 196346 118910 12.898335
                            14 196646 28980 12.780517





                            share|improve this answer











                            $endgroup$
















                              11












                              11








                              11





                              $begingroup$

                              In case this can help anyone else. Here is a solution that is more computationally efficient.



                              TL;DR version



                              If each row already has a unique index, then do this:



                              >>> df.loc[df.groupby('A')['C'].idxmin()]


                              If you've already indexed by 'A', then convert 'A' back into a column first.



                              >>> df2 = df.reset_index()
                              >>> df2.loc[df2.groupby('A')['C'].idxmin()]




                              Step by Step explanation:



                              Step 1.



                              First, make sure each row in your dataframe is uniquely indexed. This is the default when importing csv data. e.g.



                                  >>> df = pd.read_csv('questionData.csv'); df
                              A B C
                              0 196512 1325 12.901051
                              1 196512 114569 12.926770
                              2 196512 118910 12.898335
                              3 196512 100688 12.950509
                              4 196795 28978 12.780517
                              5 196795 34591 12.899411
                              6 196795 13078 12.913575
                              7 196795 24173 12.876965
                              8 196341 118910 12.898335
                              9 196341 100688 12.950509
                              10 196641 28972 12.780517
                              11 196641 34591 12.899411
                              12 196346 118910 12.898335
                              13 196346 100688 12.950509
                              14 196646 28980 12.780517
                              15 196646 34591 12.899411


                              Aside: If you already converted column 'A' into an index, then you can turn the index back into a column (https://stackoverflow.com/questions/20461165/how-to-convert-pandas-index-in-a-dataframe-to-a-column) by doing: df.reset_index()



                              Step 2.



                              Use the pandas.DataFrame.idxmin function to retrieve the indices of the minimum of each group.



                              The semantics of the example below is this: "group by 'A', then just look at the 'C' column of each group, and finally return the index corresponding to the minimum 'C' in each group.



                              >>> indices = df.groupby('A')['C'].idxmin; indices
                              A
                              196341 8
                              196346 12
                              196512 2
                              196641 10
                              196646 14
                              196795 4
                              Name: C, dtype: int64


                              Step 3.



                              Finally, use the retrieved indices in the original dataframe using pandas.DataFrame.loc to get the rows of the original dataframe correponding to the minimum values of 'C' in each group that was grouped by 'A'.



                              >>> df.loc[indices]
                              A B C
                              8 196341 118910 12.898335
                              12 196346 118910 12.898335
                              2 196512 118910 12.898335
                              10 196641 28972 12.780517
                              14 196646 28980 12.780517
                              4 196795 28978 12.780517


                              Note: The groupby('A') operation returns groups sorted by A. Thus 'indices' is sorted by A. If we want the original order, we just have to do



                              >>> df.loc[indices].sort_index()
                              A B C
                              2 196512 118910 12.898335
                              4 196795 28978 12.780517
                              8 196341 118910 12.898335
                              10 196641 28972 12.780517
                              12 196346 118910 12.898335
                              14 196646 28980 12.780517





                              share|improve this answer











                              $endgroup$



                              In case this can help anyone else. Here is a solution that is more computationally efficient.



                              TL;DR version



                              If each row already has a unique index, then do this:



                              >>> df.loc[df.groupby('A')['C'].idxmin()]


                              If you've already indexed by 'A', then convert 'A' back into a column first.



                              >>> df2 = df.reset_index()
                              >>> df2.loc[df2.groupby('A')['C'].idxmin()]




                              Step by Step explanation:



                              Step 1.



                              First, make sure each row in your dataframe is uniquely indexed. This is the default when importing csv data. e.g.



                                  >>> df = pd.read_csv('questionData.csv'); df
                              A B C
                              0 196512 1325 12.901051
                              1 196512 114569 12.926770
                              2 196512 118910 12.898335
                              3 196512 100688 12.950509
                              4 196795 28978 12.780517
                              5 196795 34591 12.899411
                              6 196795 13078 12.913575
                              7 196795 24173 12.876965
                              8 196341 118910 12.898335
                              9 196341 100688 12.950509
                              10 196641 28972 12.780517
                              11 196641 34591 12.899411
                              12 196346 118910 12.898335
                              13 196346 100688 12.950509
                              14 196646 28980 12.780517
                              15 196646 34591 12.899411


                              Aside: If you already converted column 'A' into an index, then you can turn the index back into a column (https://stackoverflow.com/questions/20461165/how-to-convert-pandas-index-in-a-dataframe-to-a-column) by doing: df.reset_index()



                              Step 2.



                              Use the pandas.DataFrame.idxmin function to retrieve the indices of the minimum of each group.



                              The semantics of the example below is this: "group by 'A', then just look at the 'C' column of each group, and finally return the index corresponding to the minimum 'C' in each group.



                              >>> indices = df.groupby('A')['C'].idxmin; indices
                              A
                              196341 8
                              196346 12
                              196512 2
                              196641 10
                              196646 14
                              196795 4
                              Name: C, dtype: int64


                              Step 3.



                              Finally, use the retrieved indices in the original dataframe using pandas.DataFrame.loc to get the rows of the original dataframe correponding to the minimum values of 'C' in each group that was grouped by 'A'.



                              >>> df.loc[indices]
                              A B C
                              8 196341 118910 12.898335
                              12 196346 118910 12.898335
                              2 196512 118910 12.898335
                              10 196641 28972 12.780517
                              14 196646 28980 12.780517
                              4 196795 28978 12.780517


                              Note: The groupby('A') operation returns groups sorted by A. Thus 'indices' is sorted by A. If we want the original order, we just have to do



                              >>> df.loc[indices].sort_index()
                              A B C
                              2 196512 118910 12.898335
                              4 196795 28978 12.780517
                              8 196341 118910 12.898335
                              10 196641 28972 12.780517
                              12 196346 118910 12.898335
                              14 196646 28980 12.780517






                              share|improve this answer














                              share|improve this answer



                              share|improve this answer








                              edited Apr 25 '18 at 22:09

























                              answered Apr 25 '18 at 22:03









                              Bon RyuBon Ryu

                              11116




                              11116























                                  2












                                  $begingroup$

                                  First check your data. You can't get value 12.780517 for index 196341. This value will be for index 196641.



                                  Now, to get column 'b' in your result, use pd.merge. Don't make column 'a' as index. Leave it as it is.



                                  >>> df = pd.DataFrame({'a':[196512, 196512, 196512, 196512, 196795, 196795, 196795, 196795, 196341, 196341, 196641, 196641, 196346, 196346, 196646, 196646],'b':[1325 , 114569 , 118910 , 100688 , 28978 ,34591 , 13078 ,  24173 , 118910 , 100688 , 28972 , 34591 , 118910 , 100688 , 28980 , 34591 ],'c':[12.9010511000000 ,12.9267705000000 ,12.8983353775637 ,12.9505091000000 ,12.7805170314276 ,12.8994111000000 ,12.9135746000000 ,12.8769653100000 ,12.8983353775637 ,12.9505091000000 ,12.7805170314276 ,12.8994111000000 ,12.8983353775637 ,12.9505091000000 ,12.7805170314276 ,12.8994111000000 ]})
                                  >>> df1 = df.groupby(['a'])['c'].min()
                                  >>> df1
                                  a
                                  196341 12.898335
                                  196346 12.898335
                                  196512 12.898335
                                  196641 12.780517
                                  196646 12.780517
                                  196795 12.780517
                                  Name: c, dtype: float64


                                  df1 is a series type object here. Convert it to dataframe.



                                  >>> df1 = pd.DataFrame(df1, columns = ['c'])


                                  Also, make index of df1 as column 'a' and change index.



                                  >>> df1['a'] = df1.index
                                  >>> df1.index = range(df1.shape[0])
                                  >>> df1
                                  c a
                                  0 12.898335 196341
                                  1 12.898335 196346
                                  2 12.898335 196512
                                  3 12.780517 196641
                                  4 12.780517 196646
                                  5 12.780517 196795


                                  Now you have to join this df1 with df on both columns 'a' and 'c'.



                                  >>> pd.merge(df,df1, on = ['a','c'])
                                  a b c
                                  0 196512 118910 12.898335
                                  1 196795 28978 12.780517
                                  2 196341 118910 12.898335
                                  3 196641 28972 12.780517
                                  4 196346 118910 12.898335
                                  5 196646 28980 12.780517


                                  And you have your output as desired!






                                  share|improve this answer











                                  $endgroup$


















                                    2












                                    $begingroup$

                                    First check your data. You can't get value 12.780517 for index 196341. This value will be for index 196641.



                                    Now, to get column 'b' in your result, use pd.merge. Don't make column 'a' as index. Leave it as it is.



                                    >>> df = pd.DataFrame({'a':[196512, 196512, 196512, 196512, 196795, 196795, 196795, 196795, 196341, 196341, 196641, 196641, 196346, 196346, 196646, 196646],'b':[1325 , 114569 , 118910 , 100688 , 28978 ,34591 , 13078 ,  24173 , 118910 , 100688 , 28972 , 34591 , 118910 , 100688 , 28980 , 34591 ],'c':[12.9010511000000 ,12.9267705000000 ,12.8983353775637 ,12.9505091000000 ,12.7805170314276 ,12.8994111000000 ,12.9135746000000 ,12.8769653100000 ,12.8983353775637 ,12.9505091000000 ,12.7805170314276 ,12.8994111000000 ,12.8983353775637 ,12.9505091000000 ,12.7805170314276 ,12.8994111000000 ]})
                                    >>> df1 = df.groupby(['a'])['c'].min()
                                    >>> df1
                                    a
                                    196341 12.898335
                                    196346 12.898335
                                    196512 12.898335
                                    196641 12.780517
                                    196646 12.780517
                                    196795 12.780517
                                    Name: c, dtype: float64


                                    df1 is a series type object here. Convert it to dataframe.



                                    >>> df1 = pd.DataFrame(df1, columns = ['c'])


                                    Also, make index of df1 as column 'a' and change index.



                                    >>> df1['a'] = df1.index
                                    >>> df1.index = range(df1.shape[0])
                                    >>> df1
                                    c a
                                    0 12.898335 196341
                                    1 12.898335 196346
                                    2 12.898335 196512
                                    3 12.780517 196641
                                    4 12.780517 196646
                                    5 12.780517 196795


                                    Now you have to join this df1 with df on both columns 'a' and 'c'.



                                    >>> pd.merge(df,df1, on = ['a','c'])
                                    a b c
                                    0 196512 118910 12.898335
                                    1 196795 28978 12.780517
                                    2 196341 118910 12.898335
                                    3 196641 28972 12.780517
                                    4 196346 118910 12.898335
                                    5 196646 28980 12.780517


                                    And you have your output as desired!






                                    share|improve this answer











                                    $endgroup$
















                                      2












                                      2








                                      2





                                      $begingroup$

                                      First check your data. You can't get value 12.780517 for index 196341. This value will be for index 196641.



                                      Now, to get column 'b' in your result, use pd.merge. Don't make column 'a' as index. Leave it as it is.



                                      >>> df = pd.DataFrame({'a':[196512, 196512, 196512, 196512, 196795, 196795, 196795, 196795, 196341, 196341, 196641, 196641, 196346, 196346, 196646, 196646],'b':[1325 , 114569 , 118910 , 100688 , 28978 ,34591 , 13078 ,  24173 , 118910 , 100688 , 28972 , 34591 , 118910 , 100688 , 28980 , 34591 ],'c':[12.9010511000000 ,12.9267705000000 ,12.8983353775637 ,12.9505091000000 ,12.7805170314276 ,12.8994111000000 ,12.9135746000000 ,12.8769653100000 ,12.8983353775637 ,12.9505091000000 ,12.7805170314276 ,12.8994111000000 ,12.8983353775637 ,12.9505091000000 ,12.7805170314276 ,12.8994111000000 ]})
                                      >>> df1 = df.groupby(['a'])['c'].min()
                                      >>> df1
                                      a
                                      196341 12.898335
                                      196346 12.898335
                                      196512 12.898335
                                      196641 12.780517
                                      196646 12.780517
                                      196795 12.780517
                                      Name: c, dtype: float64


                                      df1 is a series type object here. Convert it to dataframe.



                                      >>> df1 = pd.DataFrame(df1, columns = ['c'])


                                      Also, make index of df1 as column 'a' and change index.



                                      >>> df1['a'] = df1.index
                                      >>> df1.index = range(df1.shape[0])
                                      >>> df1
                                      c a
                                      0 12.898335 196341
                                      1 12.898335 196346
                                      2 12.898335 196512
                                      3 12.780517 196641
                                      4 12.780517 196646
                                      5 12.780517 196795


                                      Now you have to join this df1 with df on both columns 'a' and 'c'.



                                      >>> pd.merge(df,df1, on = ['a','c'])
                                      a b c
                                      0 196512 118910 12.898335
                                      1 196795 28978 12.780517
                                      2 196341 118910 12.898335
                                      3 196641 28972 12.780517
                                      4 196346 118910 12.898335
                                      5 196646 28980 12.780517


                                      And you have your output as desired!






                                      share|improve this answer











                                      $endgroup$



                                      First check your data. You can't get value 12.780517 for index 196341. This value will be for index 196641.



                                      Now, to get column 'b' in your result, use pd.merge. Don't make column 'a' as index. Leave it as it is.



                                      >>> df = pd.DataFrame({'a':[196512, 196512, 196512, 196512, 196795, 196795, 196795, 196795, 196341, 196341, 196641, 196641, 196346, 196346, 196646, 196646],'b':[1325 , 114569 , 118910 , 100688 , 28978 ,34591 , 13078 ,  24173 , 118910 , 100688 , 28972 , 34591 , 118910 , 100688 , 28980 , 34591 ],'c':[12.9010511000000 ,12.9267705000000 ,12.8983353775637 ,12.9505091000000 ,12.7805170314276 ,12.8994111000000 ,12.9135746000000 ,12.8769653100000 ,12.8983353775637 ,12.9505091000000 ,12.7805170314276 ,12.8994111000000 ,12.8983353775637 ,12.9505091000000 ,12.7805170314276 ,12.8994111000000 ]})
                                      >>> df1 = df.groupby(['a'])['c'].min()
                                      >>> df1
                                      a
                                      196341 12.898335
                                      196346 12.898335
                                      196512 12.898335
                                      196641 12.780517
                                      196646 12.780517
                                      196795 12.780517
                                      Name: c, dtype: float64


                                      df1 is a series type object here. Convert it to dataframe.



                                      >>> df1 = pd.DataFrame(df1, columns = ['c'])


                                      Also, make index of df1 as column 'a' and change index.



                                      >>> df1['a'] = df1.index
                                      >>> df1.index = range(df1.shape[0])
                                      >>> df1
                                      c a
                                      0 12.898335 196341
                                      1 12.898335 196346
                                      2 12.898335 196512
                                      3 12.780517 196641
                                      4 12.780517 196646
                                      5 12.780517 196795


                                      Now you have to join this df1 with df on both columns 'a' and 'c'.



                                      >>> pd.merge(df,df1, on = ['a','c'])
                                      a b c
                                      0 196512 118910 12.898335
                                      1 196795 28978 12.780517
                                      2 196341 118910 12.898335
                                      3 196641 28972 12.780517
                                      4 196346 118910 12.898335
                                      5 196646 28980 12.780517


                                      And you have your output as desired!







                                      share|improve this answer














                                      share|improve this answer



                                      share|improve this answer








                                      edited Jan 5 '18 at 8:40

























                                      answered Jan 5 '18 at 7:40









                                      Ankit SethAnkit Seth

                                      969218




                                      969218























                                          0












                                          $begingroup$

                                          You can sort the dataFrame by count and then remove duplicates. I think it's easier:



                                          df.sort_values('c').drop_duplicates(['a'])


                                          Credit to Rani from StackOverflow






                                          share|improve this answer








                                          New contributor




                                          Noah is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
                                          Check out our Code of Conduct.






                                          $endgroup$


















                                            0












                                            $begingroup$

                                            You can sort the dataFrame by count and then remove duplicates. I think it's easier:



                                            df.sort_values('c').drop_duplicates(['a'])


                                            Credit to Rani from StackOverflow






                                            share|improve this answer








                                            New contributor




                                            Noah is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
                                            Check out our Code of Conduct.






                                            $endgroup$
















                                              0












                                              0








                                              0





                                              $begingroup$

                                              You can sort the dataFrame by count and then remove duplicates. I think it's easier:



                                              df.sort_values('c').drop_duplicates(['a'])


                                              Credit to Rani from StackOverflow






                                              share|improve this answer








                                              New contributor




                                              Noah is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
                                              Check out our Code of Conduct.






                                              $endgroup$



                                              You can sort the dataFrame by count and then remove duplicates. I think it's easier:



                                              df.sort_values('c').drop_duplicates(['a'])


                                              Credit to Rani from StackOverflow







                                              share|improve this answer








                                              New contributor




                                              Noah is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
                                              Check out our Code of Conduct.









                                              share|improve this answer



                                              share|improve this answer






                                              New contributor




                                              Noah is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
                                              Check out our Code of Conduct.









                                              answered yesterday









                                              NoahNoah

                                              1




                                              1




                                              New contributor




                                              Noah is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
                                              Check out our Code of Conduct.





                                              New contributor





                                              Noah is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
                                              Check out our Code of Conduct.






                                              Noah is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
                                              Check out our Code of Conduct.






























                                                  draft saved

                                                  draft discarded




















































                                                  Thanks for contributing an answer to Data Science Stack Exchange!


                                                  • Please be sure to answer the question. Provide details and share your research!

                                                  But avoid



                                                  • Asking for help, clarification, or responding to other answers.

                                                  • Making statements based on opinion; back them up with references or personal experience.


                                                  Use MathJax to format equations. MathJax reference.


                                                  To learn more, see our tips on writing great answers.




                                                  draft saved


                                                  draft discarded














                                                  StackExchange.ready(
                                                  function () {
                                                  StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f26308%2fafter-grouping-to-minimum-value-in-pandas-how-to-display-the-matching-row-resul%23new-answer', 'question_page');
                                                  }
                                                  );

                                                  Post as a guest















                                                  Required, but never shown





















































                                                  Required, but never shown














                                                  Required, but never shown












                                                  Required, but never shown







                                                  Required, but never shown

































                                                  Required, but never shown














                                                  Required, but never shown












                                                  Required, but never shown







                                                  Required, but never shown







                                                  Popular posts from this blog

                                                  How to label and detect the document text images

                                                  Vallis Paradisi

                                                  Tabula Rosettana