Python 3 pandas.groupby.filter












6















I am trying to perform a groupby filter that is very similar to the example in this documentation: pandas groupby filter



>>> df = pd.DataFrame({'A' : ['foo', 'bar', 'foo', 'bar',
... 'foo', 'bar'],
... 'B' : [1, 2, 3, 4, 5, 6],
... 'C' : [2.0, 5., 8., 1., 2., 9.]})
>>> grouped = df.groupby('A')
>>> grouped.filter(lambda x: x['B'].mean() > 3.)
A B C
1 bar 2 5.0
3 bar 4 1.0
5 bar 6 9.0


I am trying to return a DataFrame that has all 3 columns, but only 2 rows. Those 2 rows contain the minimum values of column B, after grouping by column A. I tried the following line of code:



grouped.filter(lambda x: x['B'] == x['B'].min())


but it doesn't work and I get this error:
TypeError: filter function returned a Series, but expected a scalar bool



The DataFrame I am trying to return should look like this:



    A   B   C
0 foo 1 2.0
1 bar 2 5.0


I would appreciate any help can provide. Thank you, in advance, for your help.









share

























  • The doc string reading can seem a bit ambiguous: "Return a copy of a DataFrame excluding elements from groups that do not satisfy..." You aren't excluding elements from groups, you are excluding elements from the DataFrame of groups that do not satisfy the single condition.

    – ALollz
    1 hour ago
















6















I am trying to perform a groupby filter that is very similar to the example in this documentation: pandas groupby filter



>>> df = pd.DataFrame({'A' : ['foo', 'bar', 'foo', 'bar',
... 'foo', 'bar'],
... 'B' : [1, 2, 3, 4, 5, 6],
... 'C' : [2.0, 5., 8., 1., 2., 9.]})
>>> grouped = df.groupby('A')
>>> grouped.filter(lambda x: x['B'].mean() > 3.)
A B C
1 bar 2 5.0
3 bar 4 1.0
5 bar 6 9.0


I am trying to return a DataFrame that has all 3 columns, but only 2 rows. Those 2 rows contain the minimum values of column B, after grouping by column A. I tried the following line of code:



grouped.filter(lambda x: x['B'] == x['B'].min())


but it doesn't work and I get this error:
TypeError: filter function returned a Series, but expected a scalar bool



The DataFrame I am trying to return should look like this:



    A   B   C
0 foo 1 2.0
1 bar 2 5.0


I would appreciate any help can provide. Thank you, in advance, for your help.









share

























  • The doc string reading can seem a bit ambiguous: "Return a copy of a DataFrame excluding elements from groups that do not satisfy..." You aren't excluding elements from groups, you are excluding elements from the DataFrame of groups that do not satisfy the single condition.

    – ALollz
    1 hour ago














6












6








6








I am trying to perform a groupby filter that is very similar to the example in this documentation: pandas groupby filter



>>> df = pd.DataFrame({'A' : ['foo', 'bar', 'foo', 'bar',
... 'foo', 'bar'],
... 'B' : [1, 2, 3, 4, 5, 6],
... 'C' : [2.0, 5., 8., 1., 2., 9.]})
>>> grouped = df.groupby('A')
>>> grouped.filter(lambda x: x['B'].mean() > 3.)
A B C
1 bar 2 5.0
3 bar 4 1.0
5 bar 6 9.0


I am trying to return a DataFrame that has all 3 columns, but only 2 rows. Those 2 rows contain the minimum values of column B, after grouping by column A. I tried the following line of code:



grouped.filter(lambda x: x['B'] == x['B'].min())


but it doesn't work and I get this error:
TypeError: filter function returned a Series, but expected a scalar bool



The DataFrame I am trying to return should look like this:



    A   B   C
0 foo 1 2.0
1 bar 2 5.0


I would appreciate any help can provide. Thank you, in advance, for your help.









share
















I am trying to perform a groupby filter that is very similar to the example in this documentation: pandas groupby filter



>>> df = pd.DataFrame({'A' : ['foo', 'bar', 'foo', 'bar',
... 'foo', 'bar'],
... 'B' : [1, 2, 3, 4, 5, 6],
... 'C' : [2.0, 5., 8., 1., 2., 9.]})
>>> grouped = df.groupby('A')
>>> grouped.filter(lambda x: x['B'].mean() > 3.)
A B C
1 bar 2 5.0
3 bar 4 1.0
5 bar 6 9.0


I am trying to return a DataFrame that has all 3 columns, but only 2 rows. Those 2 rows contain the minimum values of column B, after grouping by column A. I tried the following line of code:



grouped.filter(lambda x: x['B'] == x['B'].min())


but it doesn't work and I get this error:
TypeError: filter function returned a Series, but expected a scalar bool



The DataFrame I am trying to return should look like this:



    A   B   C
0 foo 1 2.0
1 bar 2 5.0


I would appreciate any help can provide. Thank you, in advance, for your help.







python pandas dataframe





share














share












share



share








edited 1 hour ago









ALollz

13.3k31636




13.3k31636










asked 2 hours ago









FinProgFinProg

333




333













  • The doc string reading can seem a bit ambiguous: "Return a copy of a DataFrame excluding elements from groups that do not satisfy..." You aren't excluding elements from groups, you are excluding elements from the DataFrame of groups that do not satisfy the single condition.

    – ALollz
    1 hour ago



















  • The doc string reading can seem a bit ambiguous: "Return a copy of a DataFrame excluding elements from groups that do not satisfy..." You aren't excluding elements from groups, you are excluding elements from the DataFrame of groups that do not satisfy the single condition.

    – ALollz
    1 hour ago

















The doc string reading can seem a bit ambiguous: "Return a copy of a DataFrame excluding elements from groups that do not satisfy..." You aren't excluding elements from groups, you are excluding elements from the DataFrame of groups that do not satisfy the single condition.

– ALollz
1 hour ago





The doc string reading can seem a bit ambiguous: "Return a copy of a DataFrame excluding elements from groups that do not satisfy..." You aren't excluding elements from groups, you are excluding elements from the DataFrame of groups that do not satisfy the single condition.

– ALollz
1 hour ago












4 Answers
4






active

oldest

votes


















3














df.groupby('A').apply(lambda x: x.loc[x['B'].idxmin(), ['B','C']]).reset_index()





share|improve this answer































    2














    No need groupby :-)



    df.sort_values('B').drop_duplicates('A')
    Out[288]:
    A B C
    0 foo 1 2.0
    1 bar 2 5.0





    share|improve this answer































      2














      There's a fundamental difference: In the documentation example, there is a single Boolean value per group. That is, you return the entire group if the mean is greater than 3. In your example, you want to filter specific rows within a group.



      For your task the usual trick is to sort values and use .head or .tail to filter to the row with the smallest or largest value respectively:



      df.sort_values('B').groupby('A').head(1)

      # A B C
      #0 foo 1 2.0
      #1 bar 2 5.0


      For more complicated queries you can use .transform or .apply to create a Boolean Series to slice. Also in this case safer if multiple rows share the minimum and you need all of them:



      df[df.groupby('A').B.transform(lambda x: x == x.min())]

      # A B C
      #0 foo 1 2.0
      #1 bar 2 5.0





      share|improve this answer

































        0














        The short answer:



        grouped.apply(lambda x: x[x['B'] == x['B']].min())




        ... and the longer one:



        Your grouped object has 2 groups:



        In[25]: for df in grouped:
        ...: print(df)
        ...:
        ('bar',
        A B C
        1 bar 2 5.0
        3 bar 4 1.0
        5 bar 6 9.0)

        ('foo',
        A B C
        0 foo 1 2.0
        2 foo 3 8.0
        4 foo 5 2.0)


        filter() method for GroupBy object is for filtering groups as entities, NOT for filtering their individual rows. So using the filter() method, you may obtain only 4 results:




        • an empty DataFrame (0 rows),

        • rows of the group 'bar' (3 rows),

        • rows of the group 'foo' (3 rows),

        • rows of both groups (6 rows)


        Nothing else, regardless of the used parameter (boolean function) in the filter() method.





        So you have to use some other method. An appropriate one is the very flexible apply() method, which lets you apply an arbitrary function which




        • takes a DataFrame (a group of GroupBy object) as its only parameter,

        • returns either a Pandas object or a scalar.


        In your case that function should return (for every of your 2 groups) the 1-row DataFrame having the minimal value in the column 'B', so we will use the Boolean mask



        group['B'] == group['B'].min()


        for selecting such a row (or - maybe - more rows):



        In[26]: def select_min_b(group):
        ...: return group[group['B'] == group['B'].min()]


        Now using this function as a parameter of the apply() method of GroupBy object grouped we will obtain



        In[27]: grouped.apply(select_min_b)
        Out[27]:
        A B C
        A
        bar 1 bar 2 5.0
        foo 0 foo 1 2.0




        Note:



        The same, but as only one command (using the lambda function):



        grouped.apply(lambda group: group[group['B'] == group['B']].min())





        share|improve this answer

























          Your Answer






          StackExchange.ifUsing("editor", function () {
          StackExchange.using("externalEditor", function () {
          StackExchange.using("snippets", function () {
          StackExchange.snippets.init();
          });
          });
          }, "code-snippets");

          StackExchange.ready(function() {
          var channelOptions = {
          tags: "".split(" "),
          id: "1"
          };
          initTagRenderer("".split(" "), "".split(" "), channelOptions);

          StackExchange.using("externalEditor", function() {
          // Have to fire editor after snippets, if snippets enabled
          if (StackExchange.settings.snippets.snippetsEnabled) {
          StackExchange.using("snippets", function() {
          createEditor();
          });
          }
          else {
          createEditor();
          }
          });

          function createEditor() {
          StackExchange.prepareEditor({
          heartbeatType: 'answer',
          autoActivateHeartbeat: false,
          convertImagesToLinks: true,
          noModals: true,
          showLowRepImageUploadWarning: true,
          reputationToPostImages: 10,
          bindNavPrevention: true,
          postfix: "",
          imageUploader: {
          brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
          contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
          allowUrls: true
          },
          onDemand: true,
          discardSelector: ".discard-answer"
          ,immediatelyShowMarkdownHelp:true
          });


          }
          });














          draft saved

          draft discarded


















          StackExchange.ready(
          function () {
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f54717473%2fpython-3-pandas-groupby-filter%23new-answer', 'question_page');
          }
          );

          Post as a guest















          Required, but never shown

























          4 Answers
          4






          active

          oldest

          votes








          4 Answers
          4






          active

          oldest

          votes









          active

          oldest

          votes






          active

          oldest

          votes









          3














          df.groupby('A').apply(lambda x: x.loc[x['B'].idxmin(), ['B','C']]).reset_index()





          share|improve this answer




























            3














            df.groupby('A').apply(lambda x: x.loc[x['B'].idxmin(), ['B','C']]).reset_index()





            share|improve this answer


























              3












              3








              3







              df.groupby('A').apply(lambda x: x.loc[x['B'].idxmin(), ['B','C']]).reset_index()





              share|improve this answer













              df.groupby('A').apply(lambda x: x.loc[x['B'].idxmin(), ['B','C']]).reset_index()






              share|improve this answer












              share|improve this answer



              share|improve this answer










              answered 2 hours ago









              kudehkudeh

              30719




              30719

























                  2














                  No need groupby :-)



                  df.sort_values('B').drop_duplicates('A')
                  Out[288]:
                  A B C
                  0 foo 1 2.0
                  1 bar 2 5.0





                  share|improve this answer




























                    2














                    No need groupby :-)



                    df.sort_values('B').drop_duplicates('A')
                    Out[288]:
                    A B C
                    0 foo 1 2.0
                    1 bar 2 5.0





                    share|improve this answer


























                      2












                      2








                      2







                      No need groupby :-)



                      df.sort_values('B').drop_duplicates('A')
                      Out[288]:
                      A B C
                      0 foo 1 2.0
                      1 bar 2 5.0





                      share|improve this answer













                      No need groupby :-)



                      df.sort_values('B').drop_duplicates('A')
                      Out[288]:
                      A B C
                      0 foo 1 2.0
                      1 bar 2 5.0






                      share|improve this answer












                      share|improve this answer



                      share|improve this answer










                      answered 1 hour ago









                      Wen-BenWen-Ben

                      110k83266




                      110k83266























                          2














                          There's a fundamental difference: In the documentation example, there is a single Boolean value per group. That is, you return the entire group if the mean is greater than 3. In your example, you want to filter specific rows within a group.



                          For your task the usual trick is to sort values and use .head or .tail to filter to the row with the smallest or largest value respectively:



                          df.sort_values('B').groupby('A').head(1)

                          # A B C
                          #0 foo 1 2.0
                          #1 bar 2 5.0


                          For more complicated queries you can use .transform or .apply to create a Boolean Series to slice. Also in this case safer if multiple rows share the minimum and you need all of them:



                          df[df.groupby('A').B.transform(lambda x: x == x.min())]

                          # A B C
                          #0 foo 1 2.0
                          #1 bar 2 5.0





                          share|improve this answer






























                            2














                            There's a fundamental difference: In the documentation example, there is a single Boolean value per group. That is, you return the entire group if the mean is greater than 3. In your example, you want to filter specific rows within a group.



                            For your task the usual trick is to sort values and use .head or .tail to filter to the row with the smallest or largest value respectively:



                            df.sort_values('B').groupby('A').head(1)

                            # A B C
                            #0 foo 1 2.0
                            #1 bar 2 5.0


                            For more complicated queries you can use .transform or .apply to create a Boolean Series to slice. Also in this case safer if multiple rows share the minimum and you need all of them:



                            df[df.groupby('A').B.transform(lambda x: x == x.min())]

                            # A B C
                            #0 foo 1 2.0
                            #1 bar 2 5.0





                            share|improve this answer




























                              2












                              2








                              2







                              There's a fundamental difference: In the documentation example, there is a single Boolean value per group. That is, you return the entire group if the mean is greater than 3. In your example, you want to filter specific rows within a group.



                              For your task the usual trick is to sort values and use .head or .tail to filter to the row with the smallest or largest value respectively:



                              df.sort_values('B').groupby('A').head(1)

                              # A B C
                              #0 foo 1 2.0
                              #1 bar 2 5.0


                              For more complicated queries you can use .transform or .apply to create a Boolean Series to slice. Also in this case safer if multiple rows share the minimum and you need all of them:



                              df[df.groupby('A').B.transform(lambda x: x == x.min())]

                              # A B C
                              #0 foo 1 2.0
                              #1 bar 2 5.0





                              share|improve this answer















                              There's a fundamental difference: In the documentation example, there is a single Boolean value per group. That is, you return the entire group if the mean is greater than 3. In your example, you want to filter specific rows within a group.



                              For your task the usual trick is to sort values and use .head or .tail to filter to the row with the smallest or largest value respectively:



                              df.sort_values('B').groupby('A').head(1)

                              # A B C
                              #0 foo 1 2.0
                              #1 bar 2 5.0


                              For more complicated queries you can use .transform or .apply to create a Boolean Series to slice. Also in this case safer if multiple rows share the minimum and you need all of them:



                              df[df.groupby('A').B.transform(lambda x: x == x.min())]

                              # A B C
                              #0 foo 1 2.0
                              #1 bar 2 5.0






                              share|improve this answer














                              share|improve this answer



                              share|improve this answer








                              edited 1 hour ago

























                              answered 1 hour ago









                              ALollzALollz

                              13.3k31636




                              13.3k31636























                                  0














                                  The short answer:



                                  grouped.apply(lambda x: x[x['B'] == x['B']].min())




                                  ... and the longer one:



                                  Your grouped object has 2 groups:



                                  In[25]: for df in grouped:
                                  ...: print(df)
                                  ...:
                                  ('bar',
                                  A B C
                                  1 bar 2 5.0
                                  3 bar 4 1.0
                                  5 bar 6 9.0)

                                  ('foo',
                                  A B C
                                  0 foo 1 2.0
                                  2 foo 3 8.0
                                  4 foo 5 2.0)


                                  filter() method for GroupBy object is for filtering groups as entities, NOT for filtering their individual rows. So using the filter() method, you may obtain only 4 results:




                                  • an empty DataFrame (0 rows),

                                  • rows of the group 'bar' (3 rows),

                                  • rows of the group 'foo' (3 rows),

                                  • rows of both groups (6 rows)


                                  Nothing else, regardless of the used parameter (boolean function) in the filter() method.





                                  So you have to use some other method. An appropriate one is the very flexible apply() method, which lets you apply an arbitrary function which




                                  • takes a DataFrame (a group of GroupBy object) as its only parameter,

                                  • returns either a Pandas object or a scalar.


                                  In your case that function should return (for every of your 2 groups) the 1-row DataFrame having the minimal value in the column 'B', so we will use the Boolean mask



                                  group['B'] == group['B'].min()


                                  for selecting such a row (or - maybe - more rows):



                                  In[26]: def select_min_b(group):
                                  ...: return group[group['B'] == group['B'].min()]


                                  Now using this function as a parameter of the apply() method of GroupBy object grouped we will obtain



                                  In[27]: grouped.apply(select_min_b)
                                  Out[27]:
                                  A B C
                                  A
                                  bar 1 bar 2 5.0
                                  foo 0 foo 1 2.0




                                  Note:



                                  The same, but as only one command (using the lambda function):



                                  grouped.apply(lambda group: group[group['B'] == group['B']].min())





                                  share|improve this answer






























                                    0














                                    The short answer:



                                    grouped.apply(lambda x: x[x['B'] == x['B']].min())




                                    ... and the longer one:



                                    Your grouped object has 2 groups:



                                    In[25]: for df in grouped:
                                    ...: print(df)
                                    ...:
                                    ('bar',
                                    A B C
                                    1 bar 2 5.0
                                    3 bar 4 1.0
                                    5 bar 6 9.0)

                                    ('foo',
                                    A B C
                                    0 foo 1 2.0
                                    2 foo 3 8.0
                                    4 foo 5 2.0)


                                    filter() method for GroupBy object is for filtering groups as entities, NOT for filtering their individual rows. So using the filter() method, you may obtain only 4 results:




                                    • an empty DataFrame (0 rows),

                                    • rows of the group 'bar' (3 rows),

                                    • rows of the group 'foo' (3 rows),

                                    • rows of both groups (6 rows)


                                    Nothing else, regardless of the used parameter (boolean function) in the filter() method.





                                    So you have to use some other method. An appropriate one is the very flexible apply() method, which lets you apply an arbitrary function which




                                    • takes a DataFrame (a group of GroupBy object) as its only parameter,

                                    • returns either a Pandas object or a scalar.


                                    In your case that function should return (for every of your 2 groups) the 1-row DataFrame having the minimal value in the column 'B', so we will use the Boolean mask



                                    group['B'] == group['B'].min()


                                    for selecting such a row (or - maybe - more rows):



                                    In[26]: def select_min_b(group):
                                    ...: return group[group['B'] == group['B'].min()]


                                    Now using this function as a parameter of the apply() method of GroupBy object grouped we will obtain



                                    In[27]: grouped.apply(select_min_b)
                                    Out[27]:
                                    A B C
                                    A
                                    bar 1 bar 2 5.0
                                    foo 0 foo 1 2.0




                                    Note:



                                    The same, but as only one command (using the lambda function):



                                    grouped.apply(lambda group: group[group['B'] == group['B']].min())





                                    share|improve this answer




























                                      0












                                      0








                                      0







                                      The short answer:



                                      grouped.apply(lambda x: x[x['B'] == x['B']].min())




                                      ... and the longer one:



                                      Your grouped object has 2 groups:



                                      In[25]: for df in grouped:
                                      ...: print(df)
                                      ...:
                                      ('bar',
                                      A B C
                                      1 bar 2 5.0
                                      3 bar 4 1.0
                                      5 bar 6 9.0)

                                      ('foo',
                                      A B C
                                      0 foo 1 2.0
                                      2 foo 3 8.0
                                      4 foo 5 2.0)


                                      filter() method for GroupBy object is for filtering groups as entities, NOT for filtering their individual rows. So using the filter() method, you may obtain only 4 results:




                                      • an empty DataFrame (0 rows),

                                      • rows of the group 'bar' (3 rows),

                                      • rows of the group 'foo' (3 rows),

                                      • rows of both groups (6 rows)


                                      Nothing else, regardless of the used parameter (boolean function) in the filter() method.





                                      So you have to use some other method. An appropriate one is the very flexible apply() method, which lets you apply an arbitrary function which




                                      • takes a DataFrame (a group of GroupBy object) as its only parameter,

                                      • returns either a Pandas object or a scalar.


                                      In your case that function should return (for every of your 2 groups) the 1-row DataFrame having the minimal value in the column 'B', so we will use the Boolean mask



                                      group['B'] == group['B'].min()


                                      for selecting such a row (or - maybe - more rows):



                                      In[26]: def select_min_b(group):
                                      ...: return group[group['B'] == group['B'].min()]


                                      Now using this function as a parameter of the apply() method of GroupBy object grouped we will obtain



                                      In[27]: grouped.apply(select_min_b)
                                      Out[27]:
                                      A B C
                                      A
                                      bar 1 bar 2 5.0
                                      foo 0 foo 1 2.0




                                      Note:



                                      The same, but as only one command (using the lambda function):



                                      grouped.apply(lambda group: group[group['B'] == group['B']].min())





                                      share|improve this answer















                                      The short answer:



                                      grouped.apply(lambda x: x[x['B'] == x['B']].min())




                                      ... and the longer one:



                                      Your grouped object has 2 groups:



                                      In[25]: for df in grouped:
                                      ...: print(df)
                                      ...:
                                      ('bar',
                                      A B C
                                      1 bar 2 5.0
                                      3 bar 4 1.0
                                      5 bar 6 9.0)

                                      ('foo',
                                      A B C
                                      0 foo 1 2.0
                                      2 foo 3 8.0
                                      4 foo 5 2.0)


                                      filter() method for GroupBy object is for filtering groups as entities, NOT for filtering their individual rows. So using the filter() method, you may obtain only 4 results:




                                      • an empty DataFrame (0 rows),

                                      • rows of the group 'bar' (3 rows),

                                      • rows of the group 'foo' (3 rows),

                                      • rows of both groups (6 rows)


                                      Nothing else, regardless of the used parameter (boolean function) in the filter() method.





                                      So you have to use some other method. An appropriate one is the very flexible apply() method, which lets you apply an arbitrary function which




                                      • takes a DataFrame (a group of GroupBy object) as its only parameter,

                                      • returns either a Pandas object or a scalar.


                                      In your case that function should return (for every of your 2 groups) the 1-row DataFrame having the minimal value in the column 'B', so we will use the Boolean mask



                                      group['B'] == group['B'].min()


                                      for selecting such a row (or - maybe - more rows):



                                      In[26]: def select_min_b(group):
                                      ...: return group[group['B'] == group['B'].min()]


                                      Now using this function as a parameter of the apply() method of GroupBy object grouped we will obtain



                                      In[27]: grouped.apply(select_min_b)
                                      Out[27]:
                                      A B C
                                      A
                                      bar 1 bar 2 5.0
                                      foo 0 foo 1 2.0




                                      Note:



                                      The same, but as only one command (using the lambda function):



                                      grouped.apply(lambda group: group[group['B'] == group['B']].min())






                                      share|improve this answer














                                      share|improve this answer



                                      share|improve this answer








                                      edited 12 mins ago

























                                      answered 1 hour ago









                                      MarianDMarianD

                                      4,38761331




                                      4,38761331






























                                          draft saved

                                          draft discarded




















































                                          Thanks for contributing an answer to Stack Overflow!


                                          • Please be sure to answer the question. Provide details and share your research!

                                          But avoid



                                          • Asking for help, clarification, or responding to other answers.

                                          • Making statements based on opinion; back them up with references or personal experience.


                                          To learn more, see our tips on writing great answers.




                                          draft saved


                                          draft discarded














                                          StackExchange.ready(
                                          function () {
                                          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f54717473%2fpython-3-pandas-groupby-filter%23new-answer', 'question_page');
                                          }
                                          );

                                          Post as a guest















                                          Required, but never shown





















































                                          Required, but never shown














                                          Required, but never shown












                                          Required, but never shown







                                          Required, but never shown

































                                          Required, but never shown














                                          Required, but never shown












                                          Required, but never shown







                                          Required, but never shown







                                          Popular posts from this blog

                                          Callistus I

                                          Tabula Rosettana

                                          How to label and detect the document text images