Numpy array from pandas dataframe












1












$begingroup$


I am new in using python for data science.

What is the difference between selecting a a column with: df['name'].values and df.iloc[:,1].values and df.iloc[:,1:2].values they return differnt types of numpy vectors. why?










share|improve this question









$endgroup$








  • 2




    $begingroup$
    assuming 'name' is the second column. they should be identical. please includes some example codes, so we can help you.
    $endgroup$
    – Louis T
    5 hours ago


















1












$begingroup$


I am new in using python for data science.

What is the difference between selecting a a column with: df['name'].values and df.iloc[:,1].values and df.iloc[:,1:2].values they return differnt types of numpy vectors. why?










share|improve this question









$endgroup$








  • 2




    $begingroup$
    assuming 'name' is the second column. they should be identical. please includes some example codes, so we can help you.
    $endgroup$
    – Louis T
    5 hours ago
















1












1








1





$begingroup$


I am new in using python for data science.

What is the difference between selecting a a column with: df['name'].values and df.iloc[:,1].values and df.iloc[:,1:2].values they return differnt types of numpy vectors. why?










share|improve this question









$endgroup$




I am new in using python for data science.

What is the difference between selecting a a column with: df['name'].values and df.iloc[:,1].values and df.iloc[:,1:2].values they return differnt types of numpy vectors. why?







python dataset pandas numpy dataframe






share|improve this question













share|improve this question











share|improve this question




share|improve this question










asked 14 hours ago









3nomis3nomis

1186




1186








  • 2




    $begingroup$
    assuming 'name' is the second column. they should be identical. please includes some example codes, so we can help you.
    $endgroup$
    – Louis T
    5 hours ago
















  • 2




    $begingroup$
    assuming 'name' is the second column. they should be identical. please includes some example codes, so we can help you.
    $endgroup$
    – Louis T
    5 hours ago










2




2




$begingroup$
assuming 'name' is the second column. they should be identical. please includes some example codes, so we can help you.
$endgroup$
– Louis T
5 hours ago






$begingroup$
assuming 'name' is the second column. they should be identical. please includes some example codes, so we can help you.
$endgroup$
– Louis T
5 hours ago












2 Answers
2






active

oldest

votes


















0












$begingroup$

Assuming 'name' is the second column. they should be identical. Pandas use the 0-based index. So the first element is index 0. That might the 'gotcha' bit here.






share|improve this answer









$endgroup$





















    0












    $begingroup$

    Not entirely sure what you mean by "numpy vectors" but am assuming the question is why each of these methods return essentially (almost but not quite) the same output...



    Reference: pandas docs: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html



    df['name'].values is a "Series corresponding to colname". In other words, you're just calling the data from that column and putting the in an array by calling .values.



    .iloc is a "Purely integer-location based indexing for selection by position". Same as above but you're calling the indexed location of the column where df.iloc[:, 1] is df.iloc[all rows, col 2]. Probably an easier method to call multiple consecutive columns in a DataFrame then writing out each individual column name.



    df.iloc[:,1:2].values <-- creates an array of arrays where the main array is the column that you called (col2) and each row values is contained in a subarray. This is--I think-- because you're slicing the dataframe between column index locations 1 and 2 (rather than just calling loc 1 like above). This would mean that each row is being called individually so that a new array is created for each row that exists between column index locations 1 and 2 (which is the 'parent' array).






    share|improve this answer









    $endgroup$













      Your Answer





      StackExchange.ifUsing("editor", function () {
      return StackExchange.using("mathjaxEditing", function () {
      StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix) {
      StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\\(","\\)"]]);
      });
      });
      }, "mathjax-editing");

      StackExchange.ready(function() {
      var channelOptions = {
      tags: "".split(" "),
      id: "557"
      };
      initTagRenderer("".split(" "), "".split(" "), channelOptions);

      StackExchange.using("externalEditor", function() {
      // Have to fire editor after snippets, if snippets enabled
      if (StackExchange.settings.snippets.snippetsEnabled) {
      StackExchange.using("snippets", function() {
      createEditor();
      });
      }
      else {
      createEditor();
      }
      });

      function createEditor() {
      StackExchange.prepareEditor({
      heartbeatType: 'answer',
      autoActivateHeartbeat: false,
      convertImagesToLinks: false,
      noModals: true,
      showLowRepImageUploadWarning: true,
      reputationToPostImages: null,
      bindNavPrevention: true,
      postfix: "",
      imageUploader: {
      brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
      contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
      allowUrls: true
      },
      onDemand: true,
      discardSelector: ".discard-answer"
      ,immediatelyShowMarkdownHelp:true
      });


      }
      });














      draft saved

      draft discarded


















      StackExchange.ready(
      function () {
      StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f45311%2fnumpy-array-from-pandas-dataframe%23new-answer', 'question_page');
      }
      );

      Post as a guest















      Required, but never shown

























      2 Answers
      2






      active

      oldest

      votes








      2 Answers
      2






      active

      oldest

      votes









      active

      oldest

      votes






      active

      oldest

      votes









      0












      $begingroup$

      Assuming 'name' is the second column. they should be identical. Pandas use the 0-based index. So the first element is index 0. That might the 'gotcha' bit here.






      share|improve this answer









      $endgroup$


















        0












        $begingroup$

        Assuming 'name' is the second column. they should be identical. Pandas use the 0-based index. So the first element is index 0. That might the 'gotcha' bit here.






        share|improve this answer









        $endgroup$
















          0












          0








          0





          $begingroup$

          Assuming 'name' is the second column. they should be identical. Pandas use the 0-based index. So the first element is index 0. That might the 'gotcha' bit here.






          share|improve this answer









          $endgroup$



          Assuming 'name' is the second column. they should be identical. Pandas use the 0-based index. So the first element is index 0. That might the 'gotcha' bit here.







          share|improve this answer












          share|improve this answer



          share|improve this answer










          answered 5 hours ago









          Louis TLouis T

          668219




          668219























              0












              $begingroup$

              Not entirely sure what you mean by "numpy vectors" but am assuming the question is why each of these methods return essentially (almost but not quite) the same output...



              Reference: pandas docs: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html



              df['name'].values is a "Series corresponding to colname". In other words, you're just calling the data from that column and putting the in an array by calling .values.



              .iloc is a "Purely integer-location based indexing for selection by position". Same as above but you're calling the indexed location of the column where df.iloc[:, 1] is df.iloc[all rows, col 2]. Probably an easier method to call multiple consecutive columns in a DataFrame then writing out each individual column name.



              df.iloc[:,1:2].values <-- creates an array of arrays where the main array is the column that you called (col2) and each row values is contained in a subarray. This is--I think-- because you're slicing the dataframe between column index locations 1 and 2 (rather than just calling loc 1 like above). This would mean that each row is being called individually so that a new array is created for each row that exists between column index locations 1 and 2 (which is the 'parent' array).






              share|improve this answer









              $endgroup$


















                0












                $begingroup$

                Not entirely sure what you mean by "numpy vectors" but am assuming the question is why each of these methods return essentially (almost but not quite) the same output...



                Reference: pandas docs: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html



                df['name'].values is a "Series corresponding to colname". In other words, you're just calling the data from that column and putting the in an array by calling .values.



                .iloc is a "Purely integer-location based indexing for selection by position". Same as above but you're calling the indexed location of the column where df.iloc[:, 1] is df.iloc[all rows, col 2]. Probably an easier method to call multiple consecutive columns in a DataFrame then writing out each individual column name.



                df.iloc[:,1:2].values <-- creates an array of arrays where the main array is the column that you called (col2) and each row values is contained in a subarray. This is--I think-- because you're slicing the dataframe between column index locations 1 and 2 (rather than just calling loc 1 like above). This would mean that each row is being called individually so that a new array is created for each row that exists between column index locations 1 and 2 (which is the 'parent' array).






                share|improve this answer









                $endgroup$
















                  0












                  0








                  0





                  $begingroup$

                  Not entirely sure what you mean by "numpy vectors" but am assuming the question is why each of these methods return essentially (almost but not quite) the same output...



                  Reference: pandas docs: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html



                  df['name'].values is a "Series corresponding to colname". In other words, you're just calling the data from that column and putting the in an array by calling .values.



                  .iloc is a "Purely integer-location based indexing for selection by position". Same as above but you're calling the indexed location of the column where df.iloc[:, 1] is df.iloc[all rows, col 2]. Probably an easier method to call multiple consecutive columns in a DataFrame then writing out each individual column name.



                  df.iloc[:,1:2].values <-- creates an array of arrays where the main array is the column that you called (col2) and each row values is contained in a subarray. This is--I think-- because you're slicing the dataframe between column index locations 1 and 2 (rather than just calling loc 1 like above). This would mean that each row is being called individually so that a new array is created for each row that exists between column index locations 1 and 2 (which is the 'parent' array).






                  share|improve this answer









                  $endgroup$



                  Not entirely sure what you mean by "numpy vectors" but am assuming the question is why each of these methods return essentially (almost but not quite) the same output...



                  Reference: pandas docs: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html



                  df['name'].values is a "Series corresponding to colname". In other words, you're just calling the data from that column and putting the in an array by calling .values.



                  .iloc is a "Purely integer-location based indexing for selection by position". Same as above but you're calling the indexed location of the column where df.iloc[:, 1] is df.iloc[all rows, col 2]. Probably an easier method to call multiple consecutive columns in a DataFrame then writing out each individual column name.



                  df.iloc[:,1:2].values <-- creates an array of arrays where the main array is the column that you called (col2) and each row values is contained in a subarray. This is--I think-- because you're slicing the dataframe between column index locations 1 and 2 (rather than just calling loc 1 like above). This would mean that each row is being called individually so that a new array is created for each row that exists between column index locations 1 and 2 (which is the 'parent' array).







                  share|improve this answer












                  share|improve this answer



                  share|improve this answer










                  answered 3 hours ago









                  Cat C.Cat C.

                  11




                  11






























                      draft saved

                      draft discarded




















































                      Thanks for contributing an answer to Data Science Stack Exchange!


                      • Please be sure to answer the question. Provide details and share your research!

                      But avoid



                      • Asking for help, clarification, or responding to other answers.

                      • Making statements based on opinion; back them up with references or personal experience.


                      Use MathJax to format equations. MathJax reference.


                      To learn more, see our tips on writing great answers.




                      draft saved


                      draft discarded














                      StackExchange.ready(
                      function () {
                      StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f45311%2fnumpy-array-from-pandas-dataframe%23new-answer', 'question_page');
                      }
                      );

                      Post as a guest















                      Required, but never shown





















































                      Required, but never shown














                      Required, but never shown












                      Required, but never shown







                      Required, but never shown

































                      Required, but never shown














                      Required, but never shown












                      Required, but never shown







                      Required, but never shown







                      Popular posts from this blog

                      How to label and detect the document text images

                      Vallis Paradisi

                      Tabula Rosettana