How to use a one-hot encoded nominal feature in a classifier in Scikit Learn?












0












$begingroup$


Im working on a genre classification problem on a songs dataset. Since genre is a nominal feature, I used sklearn's LabelBinarizer to get the one-hot encoding for this feature for every row in the dataset. I'm then left with a dataframe(df_train_num) with two columns, both numeric in nature and a Series object for which every row value is a numpy array - the one-hot encoding of the genre.I now want to fit a classifier on this data. What I did was:



svm_classifier = LinearSVC()
svm_classifier.fit(df_train_num,df_train_genre)


This gives me a ValueError: Unknown label type: 'unknown'
What exactly is causing this error? Am I not allowed to use a Series object with a DataFrame object in the to fit a classifier?Although replacing df_train_genre with df_train_genre.values so as to pass the numpy array directly to the fit method also doesnt change anything. Same error



Here is a view of the two pandas objects:



df_train_num.head(5)


Unique_Word_Count Sentiment Polarity
157277 126 0.027766
90109 114 -0.199545
106224 16 0.000000
221087 103 -0.058025
247082 409 -0.170143

df_train_genre.head(5)

157277 [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...
90109 [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, ...
106224 [0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...
221087 [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...
247082 [0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...
Name: Genre_Encoded, dtype: object









share|improve this question







New contributor




Mudit Jha is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.







$endgroup$

















    0












    $begingroup$


    Im working on a genre classification problem on a songs dataset. Since genre is a nominal feature, I used sklearn's LabelBinarizer to get the one-hot encoding for this feature for every row in the dataset. I'm then left with a dataframe(df_train_num) with two columns, both numeric in nature and a Series object for which every row value is a numpy array - the one-hot encoding of the genre.I now want to fit a classifier on this data. What I did was:



    svm_classifier = LinearSVC()
    svm_classifier.fit(df_train_num,df_train_genre)


    This gives me a ValueError: Unknown label type: 'unknown'
    What exactly is causing this error? Am I not allowed to use a Series object with a DataFrame object in the to fit a classifier?Although replacing df_train_genre with df_train_genre.values so as to pass the numpy array directly to the fit method also doesnt change anything. Same error



    Here is a view of the two pandas objects:



    df_train_num.head(5)


    Unique_Word_Count Sentiment Polarity
    157277 126 0.027766
    90109 114 -0.199545
    106224 16 0.000000
    221087 103 -0.058025
    247082 409 -0.170143

    df_train_genre.head(5)

    157277 [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...
    90109 [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, ...
    106224 [0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...
    221087 [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...
    247082 [0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...
    Name: Genre_Encoded, dtype: object









    share|improve this question







    New contributor




    Mudit Jha is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
    Check out our Code of Conduct.







    $endgroup$















      0












      0








      0





      $begingroup$


      Im working on a genre classification problem on a songs dataset. Since genre is a nominal feature, I used sklearn's LabelBinarizer to get the one-hot encoding for this feature for every row in the dataset. I'm then left with a dataframe(df_train_num) with two columns, both numeric in nature and a Series object for which every row value is a numpy array - the one-hot encoding of the genre.I now want to fit a classifier on this data. What I did was:



      svm_classifier = LinearSVC()
      svm_classifier.fit(df_train_num,df_train_genre)


      This gives me a ValueError: Unknown label type: 'unknown'
      What exactly is causing this error? Am I not allowed to use a Series object with a DataFrame object in the to fit a classifier?Although replacing df_train_genre with df_train_genre.values so as to pass the numpy array directly to the fit method also doesnt change anything. Same error



      Here is a view of the two pandas objects:



      df_train_num.head(5)


      Unique_Word_Count Sentiment Polarity
      157277 126 0.027766
      90109 114 -0.199545
      106224 16 0.000000
      221087 103 -0.058025
      247082 409 -0.170143

      df_train_genre.head(5)

      157277 [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...
      90109 [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, ...
      106224 [0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...
      221087 [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...
      247082 [0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...
      Name: Genre_Encoded, dtype: object









      share|improve this question







      New contributor




      Mudit Jha is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
      Check out our Code of Conduct.







      $endgroup$




      Im working on a genre classification problem on a songs dataset. Since genre is a nominal feature, I used sklearn's LabelBinarizer to get the one-hot encoding for this feature for every row in the dataset. I'm then left with a dataframe(df_train_num) with two columns, both numeric in nature and a Series object for which every row value is a numpy array - the one-hot encoding of the genre.I now want to fit a classifier on this data. What I did was:



      svm_classifier = LinearSVC()
      svm_classifier.fit(df_train_num,df_train_genre)


      This gives me a ValueError: Unknown label type: 'unknown'
      What exactly is causing this error? Am I not allowed to use a Series object with a DataFrame object in the to fit a classifier?Although replacing df_train_genre with df_train_genre.values so as to pass the numpy array directly to the fit method also doesnt change anything. Same error



      Here is a view of the two pandas objects:



      df_train_num.head(5)


      Unique_Word_Count Sentiment Polarity
      157277 126 0.027766
      90109 114 -0.199545
      106224 16 0.000000
      221087 103 -0.058025
      247082 409 -0.170143

      df_train_genre.head(5)

      157277 [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...
      90109 [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, ...
      106224 [0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...
      221087 [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...
      247082 [0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...
      Name: Genre_Encoded, dtype: object






      machine-learning scikit-learn nlp pandas






      share|improve this question







      New contributor




      Mudit Jha is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
      Check out our Code of Conduct.











      share|improve this question







      New contributor




      Mudit Jha is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
      Check out our Code of Conduct.









      share|improve this question




      share|improve this question






      New contributor




      Mudit Jha is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
      Check out our Code of Conduct.









      asked 1 hour ago









      Mudit JhaMudit Jha

      1




      1




      New contributor




      Mudit Jha is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
      Check out our Code of Conduct.





      New contributor





      Mudit Jha is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
      Check out our Code of Conduct.






      Mudit Jha is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
      Check out our Code of Conduct.






















          0






          active

          oldest

          votes











          Your Answer





          StackExchange.ifUsing("editor", function () {
          return StackExchange.using("mathjaxEditing", function () {
          StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix) {
          StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\\(","\\)"]]);
          });
          });
          }, "mathjax-editing");

          StackExchange.ready(function() {
          var channelOptions = {
          tags: "".split(" "),
          id: "557"
          };
          initTagRenderer("".split(" "), "".split(" "), channelOptions);

          StackExchange.using("externalEditor", function() {
          // Have to fire editor after snippets, if snippets enabled
          if (StackExchange.settings.snippets.snippetsEnabled) {
          StackExchange.using("snippets", function() {
          createEditor();
          });
          }
          else {
          createEditor();
          }
          });

          function createEditor() {
          StackExchange.prepareEditor({
          heartbeatType: 'answer',
          autoActivateHeartbeat: false,
          convertImagesToLinks: false,
          noModals: true,
          showLowRepImageUploadWarning: true,
          reputationToPostImages: null,
          bindNavPrevention: true,
          postfix: "",
          imageUploader: {
          brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
          contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
          allowUrls: true
          },
          onDemand: true,
          discardSelector: ".discard-answer"
          ,immediatelyShowMarkdownHelp:true
          });


          }
          });






          Mudit Jha is a new contributor. Be nice, and check out our Code of Conduct.










          draft saved

          draft discarded


















          StackExchange.ready(
          function () {
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f47973%2fhow-to-use-a-one-hot-encoded-nominal-feature-in-a-classifier-in-scikit-learn%23new-answer', 'question_page');
          }
          );

          Post as a guest















          Required, but never shown

























          0






          active

          oldest

          votes








          0






          active

          oldest

          votes









          active

          oldest

          votes






          active

          oldest

          votes








          Mudit Jha is a new contributor. Be nice, and check out our Code of Conduct.










          draft saved

          draft discarded


















          Mudit Jha is a new contributor. Be nice, and check out our Code of Conduct.













          Mudit Jha is a new contributor. Be nice, and check out our Code of Conduct.












          Mudit Jha is a new contributor. Be nice, and check out our Code of Conduct.
















          Thanks for contributing an answer to Data Science Stack Exchange!


          • Please be sure to answer the question. Provide details and share your research!

          But avoid



          • Asking for help, clarification, or responding to other answers.

          • Making statements based on opinion; back them up with references or personal experience.


          Use MathJax to format equations. MathJax reference.


          To learn more, see our tips on writing great answers.




          draft saved


          draft discarded














          StackExchange.ready(
          function () {
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f47973%2fhow-to-use-a-one-hot-encoded-nominal-feature-in-a-classifier-in-scikit-learn%23new-answer', 'question_page');
          }
          );

          Post as a guest















          Required, but never shown





















































          Required, but never shown














          Required, but never shown












          Required, but never shown







          Required, but never shown

































          Required, but never shown














          Required, but never shown












          Required, but never shown







          Required, but never shown







          Popular posts from this blog

          Callistus I

          Tabula Rosettana

          How to label and detect the document text images