How can I check the correlation between features and target variable?












4












$begingroup$


I am trying to build a Regression model and I am looking for a way to check whether there's any correlation between features and target variables?



This is my sample dataset



     Loan_ID    Gender  Married Dependents  Education Self_Employed ApplicantIncome    

0 LP001002 Male No 0 Graduate No 5849
1 LP001003 Male Yes 1 Graduate No 4583
2 LP001005 Male Yes 0 Graduate Yes 3000
3 LP001006 Male Yes 0 Not Graduate No 2583
4 LP001008 Male No 0 Graduate No 6000

CoapplicantIncome LoanAmount Loan_Amount_Term Credit_History Area Loan_Status
0.0 123 360.0 1.0 Urban Y
1508.0 128.0 360.0 1.0 Rural N
0.0 66.0 360.0 1.0 Urban Y
2358.0 120.0 360.0 1.0 Urban Y
0.0 141.0 360.0 1.0 Urban Y


I am trying to predict LoanAmount column based on the features available above.



I just want to see if there's a correlation between the features and target variable. I tried LinearRegression, GradientBoostingRegressor and I'm hardly getting a accuracy of around 0.30 - 0.40%.



Any suggestions on algorithms, params etc that I should use for better prediction?










share|improve this question











$endgroup$












  • $begingroup$
    Is there a special function for this in R?
    $endgroup$
    – alkanschtein
    13 hours ago
















4












$begingroup$


I am trying to build a Regression model and I am looking for a way to check whether there's any correlation between features and target variables?



This is my sample dataset



     Loan_ID    Gender  Married Dependents  Education Self_Employed ApplicantIncome    

0 LP001002 Male No 0 Graduate No 5849
1 LP001003 Male Yes 1 Graduate No 4583
2 LP001005 Male Yes 0 Graduate Yes 3000
3 LP001006 Male Yes 0 Not Graduate No 2583
4 LP001008 Male No 0 Graduate No 6000

CoapplicantIncome LoanAmount Loan_Amount_Term Credit_History Area Loan_Status
0.0 123 360.0 1.0 Urban Y
1508.0 128.0 360.0 1.0 Rural N
0.0 66.0 360.0 1.0 Urban Y
2358.0 120.0 360.0 1.0 Urban Y
0.0 141.0 360.0 1.0 Urban Y


I am trying to predict LoanAmount column based on the features available above.



I just want to see if there's a correlation between the features and target variable. I tried LinearRegression, GradientBoostingRegressor and I'm hardly getting a accuracy of around 0.30 - 0.40%.



Any suggestions on algorithms, params etc that I should use for better prediction?










share|improve this question











$endgroup$












  • $begingroup$
    Is there a special function for this in R?
    $endgroup$
    – alkanschtein
    13 hours ago














4












4








4


1



$begingroup$


I am trying to build a Regression model and I am looking for a way to check whether there's any correlation between features and target variables?



This is my sample dataset



     Loan_ID    Gender  Married Dependents  Education Self_Employed ApplicantIncome    

0 LP001002 Male No 0 Graduate No 5849
1 LP001003 Male Yes 1 Graduate No 4583
2 LP001005 Male Yes 0 Graduate Yes 3000
3 LP001006 Male Yes 0 Not Graduate No 2583
4 LP001008 Male No 0 Graduate No 6000

CoapplicantIncome LoanAmount Loan_Amount_Term Credit_History Area Loan_Status
0.0 123 360.0 1.0 Urban Y
1508.0 128.0 360.0 1.0 Rural N
0.0 66.0 360.0 1.0 Urban Y
2358.0 120.0 360.0 1.0 Urban Y
0.0 141.0 360.0 1.0 Urban Y


I am trying to predict LoanAmount column based on the features available above.



I just want to see if there's a correlation between the features and target variable. I tried LinearRegression, GradientBoostingRegressor and I'm hardly getting a accuracy of around 0.30 - 0.40%.



Any suggestions on algorithms, params etc that I should use for better prediction?










share|improve this question











$endgroup$




I am trying to build a Regression model and I am looking for a way to check whether there's any correlation between features and target variables?



This is my sample dataset



     Loan_ID    Gender  Married Dependents  Education Self_Employed ApplicantIncome    

0 LP001002 Male No 0 Graduate No 5849
1 LP001003 Male Yes 1 Graduate No 4583
2 LP001005 Male Yes 0 Graduate Yes 3000
3 LP001006 Male Yes 0 Not Graduate No 2583
4 LP001008 Male No 0 Graduate No 6000

CoapplicantIncome LoanAmount Loan_Amount_Term Credit_History Area Loan_Status
0.0 123 360.0 1.0 Urban Y
1508.0 128.0 360.0 1.0 Rural N
0.0 66.0 360.0 1.0 Urban Y
2358.0 120.0 360.0 1.0 Urban Y
0.0 141.0 360.0 1.0 Urban Y


I am trying to predict LoanAmount column based on the features available above.



I just want to see if there's a correlation between the features and target variable. I tried LinearRegression, GradientBoostingRegressor and I'm hardly getting a accuracy of around 0.30 - 0.40%.



Any suggestions on algorithms, params etc that I should use for better prediction?







machine-learning scikit-learn regression linear-regression






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited Oct 3 '18 at 18:55







user214

















asked Oct 3 '18 at 18:43









user214user214

1485




1485












  • $begingroup$
    Is there a special function for this in R?
    $endgroup$
    – alkanschtein
    13 hours ago


















  • $begingroup$
    Is there a special function for this in R?
    $endgroup$
    – alkanschtein
    13 hours ago
















$begingroup$
Is there a special function for this in R?
$endgroup$
– alkanschtein
13 hours ago




$begingroup$
Is there a special function for this in R?
$endgroup$
– alkanschtein
13 hours ago










2 Answers
2






active

oldest

votes


















2












$begingroup$

Your data can be put into a pandas DataFrame using



import pandas as pd
data = {'Loan ID': ['LP001002', 'LP001003', 'LP001005', 'LP001006', 'LP001008'],
'Married': ['No', 'Yes', 'Yes', 'Yes', 'No'],
'Dependents': [0, 1, 0, 0, 0],
'Education': ['Graduate', 'Graduate', 'Graduate', 'Not Graduate', 'Graduate'],
'Self_Employed': ['No', 'No', 'Yes', 'No', 'No'],
'Income': [5849, 4583, 3000, 2583, 6000],
'Coapplicant Income': [0, 1508, 0, 2358, 0],
'LoanAmount': [123, 128, 66, 120, 141],
'Area': ['Urban', 'Rural', 'Urban', 'Urban', 'Urban'],
'Loan Status': ['Y', 'N', 'Y', 'Y', 'Y']}
df = pd.DataFrame(data)


Now to get a correlation we need to convert our categorical features to numerical ones. Of course the choice of order will affect the correlation but luckily all of our categories seem to be binary. If this is not the case you will need to devise a custom ordering.



df = pd.DataFrame(data)
df['Married'] =df['Married'].astype('category').cat.codes
df['Education'] =df['Education'].astype('category').cat.codes
df['Self_Employed'] =df['Self_Employed'].astype('category').cat.codes
df['Area'] =df['Area'].astype('category').cat.codes
df['Loan Status'] =df['Loan Status'].astype('category').cat.codes


Now we can get the correlation between the 'LoanAmount' and all the other features.



df[df.columns[1:]].corr()['LoanAmount'][:]


enter image description here





Now using some machine learning on this data is not likely to work. There just is not sufficient data to extract some relevant information between your large number of features and the loan amount.



You need at at least 10 times more instances than features in order to expect to get some good results.





To only obtain the correlation between a feature and a subset of the features you can do



df[['Income', 'Education', 'LoanAmount']].corr()['LoanAmount'][:]


This will take a subset of the DataFrame and then apply the same corr() function as above. Make sure that the subset of columns selected includes the column with which you want to calculate the correlation, in this example that's 'LoanAmount'.






share|improve this answer











$endgroup$













  • $begingroup$
    Is it possible to choose what features to keep when looking at correlation of the features?
    $endgroup$
    – user214
    Oct 5 '18 at 4:16










  • $begingroup$
    @user2475, I added to the answer regarding your question.
    $endgroup$
    – JahKnows
    Oct 5 '18 at 4:33










  • $begingroup$
    my pleasure! let us know if other questions come up.
    $endgroup$
    – JahKnows
    Oct 5 '18 at 4:39










  • $begingroup$
    @JabKnows Just one doubt. Do I need to scale my feature values. I mean should I use standardscaler or min_max_scalar. I want to do a classification task.
    $endgroup$
    – user214
    Oct 5 '18 at 5:55








  • 1




    $begingroup$
    It depends on the algorithm you select. Some require it others don't, you'd need to check the algorithm. It is good however to keep the scale of the different features similar.
    $endgroup$
    – JahKnows
    Oct 6 '18 at 22:48



















3












$begingroup$

Method in Python



One way to check the correlation of every feature against the target variable is to run the code:



# Your data should be a pandas dataframe for this example
import pandas
yourdata = ...
corr_matrix = yourdata.corr()
print(corr_matrix["your_target_variable"].sort_values(ascending=False))


The following correlation output should list all the variables and their correlations to the target variable. The negative correlations mean that as the target variable decreases in value, the feature variable increases in value. (Linearly)



To plot the correlations on plots instead, run the code:



# make sure to specify some features that you might want to focus on or the plots might be too big
from pandas.tools.plotting import scatter_matrix
attributes = [list of whatever features you want to plot against the target variable]
scatter_matrix(yourdata[attributes], figsize=(12, 8))


For the figsize argument for the scatter_matrix function, input whatever size fits best.






share|improve this answer











$endgroup$













    Your Answer





    StackExchange.ifUsing("editor", function () {
    return StackExchange.using("mathjaxEditing", function () {
    StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix) {
    StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\\(","\\)"]]);
    });
    });
    }, "mathjax-editing");

    StackExchange.ready(function() {
    var channelOptions = {
    tags: "".split(" "),
    id: "557"
    };
    initTagRenderer("".split(" "), "".split(" "), channelOptions);

    StackExchange.using("externalEditor", function() {
    // Have to fire editor after snippets, if snippets enabled
    if (StackExchange.settings.snippets.snippetsEnabled) {
    StackExchange.using("snippets", function() {
    createEditor();
    });
    }
    else {
    createEditor();
    }
    });

    function createEditor() {
    StackExchange.prepareEditor({
    heartbeatType: 'answer',
    autoActivateHeartbeat: false,
    convertImagesToLinks: false,
    noModals: true,
    showLowRepImageUploadWarning: true,
    reputationToPostImages: null,
    bindNavPrevention: true,
    postfix: "",
    imageUploader: {
    brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
    contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
    allowUrls: true
    },
    onDemand: true,
    discardSelector: ".discard-answer"
    ,immediatelyShowMarkdownHelp:true
    });


    }
    });














    draft saved

    draft discarded


















    StackExchange.ready(
    function () {
    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f39137%2fhow-can-i-check-the-correlation-between-features-and-target-variable%23new-answer', 'question_page');
    }
    );

    Post as a guest















    Required, but never shown

























    2 Answers
    2






    active

    oldest

    votes








    2 Answers
    2






    active

    oldest

    votes









    active

    oldest

    votes






    active

    oldest

    votes









    2












    $begingroup$

    Your data can be put into a pandas DataFrame using



    import pandas as pd
    data = {'Loan ID': ['LP001002', 'LP001003', 'LP001005', 'LP001006', 'LP001008'],
    'Married': ['No', 'Yes', 'Yes', 'Yes', 'No'],
    'Dependents': [0, 1, 0, 0, 0],
    'Education': ['Graduate', 'Graduate', 'Graduate', 'Not Graduate', 'Graduate'],
    'Self_Employed': ['No', 'No', 'Yes', 'No', 'No'],
    'Income': [5849, 4583, 3000, 2583, 6000],
    'Coapplicant Income': [0, 1508, 0, 2358, 0],
    'LoanAmount': [123, 128, 66, 120, 141],
    'Area': ['Urban', 'Rural', 'Urban', 'Urban', 'Urban'],
    'Loan Status': ['Y', 'N', 'Y', 'Y', 'Y']}
    df = pd.DataFrame(data)


    Now to get a correlation we need to convert our categorical features to numerical ones. Of course the choice of order will affect the correlation but luckily all of our categories seem to be binary. If this is not the case you will need to devise a custom ordering.



    df = pd.DataFrame(data)
    df['Married'] =df['Married'].astype('category').cat.codes
    df['Education'] =df['Education'].astype('category').cat.codes
    df['Self_Employed'] =df['Self_Employed'].astype('category').cat.codes
    df['Area'] =df['Area'].astype('category').cat.codes
    df['Loan Status'] =df['Loan Status'].astype('category').cat.codes


    Now we can get the correlation between the 'LoanAmount' and all the other features.



    df[df.columns[1:]].corr()['LoanAmount'][:]


    enter image description here





    Now using some machine learning on this data is not likely to work. There just is not sufficient data to extract some relevant information between your large number of features and the loan amount.



    You need at at least 10 times more instances than features in order to expect to get some good results.





    To only obtain the correlation between a feature and a subset of the features you can do



    df[['Income', 'Education', 'LoanAmount']].corr()['LoanAmount'][:]


    This will take a subset of the DataFrame and then apply the same corr() function as above. Make sure that the subset of columns selected includes the column with which you want to calculate the correlation, in this example that's 'LoanAmount'.






    share|improve this answer











    $endgroup$













    • $begingroup$
      Is it possible to choose what features to keep when looking at correlation of the features?
      $endgroup$
      – user214
      Oct 5 '18 at 4:16










    • $begingroup$
      @user2475, I added to the answer regarding your question.
      $endgroup$
      – JahKnows
      Oct 5 '18 at 4:33










    • $begingroup$
      my pleasure! let us know if other questions come up.
      $endgroup$
      – JahKnows
      Oct 5 '18 at 4:39










    • $begingroup$
      @JabKnows Just one doubt. Do I need to scale my feature values. I mean should I use standardscaler or min_max_scalar. I want to do a classification task.
      $endgroup$
      – user214
      Oct 5 '18 at 5:55








    • 1




      $begingroup$
      It depends on the algorithm you select. Some require it others don't, you'd need to check the algorithm. It is good however to keep the scale of the different features similar.
      $endgroup$
      – JahKnows
      Oct 6 '18 at 22:48
















    2












    $begingroup$

    Your data can be put into a pandas DataFrame using



    import pandas as pd
    data = {'Loan ID': ['LP001002', 'LP001003', 'LP001005', 'LP001006', 'LP001008'],
    'Married': ['No', 'Yes', 'Yes', 'Yes', 'No'],
    'Dependents': [0, 1, 0, 0, 0],
    'Education': ['Graduate', 'Graduate', 'Graduate', 'Not Graduate', 'Graduate'],
    'Self_Employed': ['No', 'No', 'Yes', 'No', 'No'],
    'Income': [5849, 4583, 3000, 2583, 6000],
    'Coapplicant Income': [0, 1508, 0, 2358, 0],
    'LoanAmount': [123, 128, 66, 120, 141],
    'Area': ['Urban', 'Rural', 'Urban', 'Urban', 'Urban'],
    'Loan Status': ['Y', 'N', 'Y', 'Y', 'Y']}
    df = pd.DataFrame(data)


    Now to get a correlation we need to convert our categorical features to numerical ones. Of course the choice of order will affect the correlation but luckily all of our categories seem to be binary. If this is not the case you will need to devise a custom ordering.



    df = pd.DataFrame(data)
    df['Married'] =df['Married'].astype('category').cat.codes
    df['Education'] =df['Education'].astype('category').cat.codes
    df['Self_Employed'] =df['Self_Employed'].astype('category').cat.codes
    df['Area'] =df['Area'].astype('category').cat.codes
    df['Loan Status'] =df['Loan Status'].astype('category').cat.codes


    Now we can get the correlation between the 'LoanAmount' and all the other features.



    df[df.columns[1:]].corr()['LoanAmount'][:]


    enter image description here





    Now using some machine learning on this data is not likely to work. There just is not sufficient data to extract some relevant information between your large number of features and the loan amount.



    You need at at least 10 times more instances than features in order to expect to get some good results.





    To only obtain the correlation between a feature and a subset of the features you can do



    df[['Income', 'Education', 'LoanAmount']].corr()['LoanAmount'][:]


    This will take a subset of the DataFrame and then apply the same corr() function as above. Make sure that the subset of columns selected includes the column with which you want to calculate the correlation, in this example that's 'LoanAmount'.






    share|improve this answer











    $endgroup$













    • $begingroup$
      Is it possible to choose what features to keep when looking at correlation of the features?
      $endgroup$
      – user214
      Oct 5 '18 at 4:16










    • $begingroup$
      @user2475, I added to the answer regarding your question.
      $endgroup$
      – JahKnows
      Oct 5 '18 at 4:33










    • $begingroup$
      my pleasure! let us know if other questions come up.
      $endgroup$
      – JahKnows
      Oct 5 '18 at 4:39










    • $begingroup$
      @JabKnows Just one doubt. Do I need to scale my feature values. I mean should I use standardscaler or min_max_scalar. I want to do a classification task.
      $endgroup$
      – user214
      Oct 5 '18 at 5:55








    • 1




      $begingroup$
      It depends on the algorithm you select. Some require it others don't, you'd need to check the algorithm. It is good however to keep the scale of the different features similar.
      $endgroup$
      – JahKnows
      Oct 6 '18 at 22:48














    2












    2








    2





    $begingroup$

    Your data can be put into a pandas DataFrame using



    import pandas as pd
    data = {'Loan ID': ['LP001002', 'LP001003', 'LP001005', 'LP001006', 'LP001008'],
    'Married': ['No', 'Yes', 'Yes', 'Yes', 'No'],
    'Dependents': [0, 1, 0, 0, 0],
    'Education': ['Graduate', 'Graduate', 'Graduate', 'Not Graduate', 'Graduate'],
    'Self_Employed': ['No', 'No', 'Yes', 'No', 'No'],
    'Income': [5849, 4583, 3000, 2583, 6000],
    'Coapplicant Income': [0, 1508, 0, 2358, 0],
    'LoanAmount': [123, 128, 66, 120, 141],
    'Area': ['Urban', 'Rural', 'Urban', 'Urban', 'Urban'],
    'Loan Status': ['Y', 'N', 'Y', 'Y', 'Y']}
    df = pd.DataFrame(data)


    Now to get a correlation we need to convert our categorical features to numerical ones. Of course the choice of order will affect the correlation but luckily all of our categories seem to be binary. If this is not the case you will need to devise a custom ordering.



    df = pd.DataFrame(data)
    df['Married'] =df['Married'].astype('category').cat.codes
    df['Education'] =df['Education'].astype('category').cat.codes
    df['Self_Employed'] =df['Self_Employed'].astype('category').cat.codes
    df['Area'] =df['Area'].astype('category').cat.codes
    df['Loan Status'] =df['Loan Status'].astype('category').cat.codes


    Now we can get the correlation between the 'LoanAmount' and all the other features.



    df[df.columns[1:]].corr()['LoanAmount'][:]


    enter image description here





    Now using some machine learning on this data is not likely to work. There just is not sufficient data to extract some relevant information between your large number of features and the loan amount.



    You need at at least 10 times more instances than features in order to expect to get some good results.





    To only obtain the correlation between a feature and a subset of the features you can do



    df[['Income', 'Education', 'LoanAmount']].corr()['LoanAmount'][:]


    This will take a subset of the DataFrame and then apply the same corr() function as above. Make sure that the subset of columns selected includes the column with which you want to calculate the correlation, in this example that's 'LoanAmount'.






    share|improve this answer











    $endgroup$



    Your data can be put into a pandas DataFrame using



    import pandas as pd
    data = {'Loan ID': ['LP001002', 'LP001003', 'LP001005', 'LP001006', 'LP001008'],
    'Married': ['No', 'Yes', 'Yes', 'Yes', 'No'],
    'Dependents': [0, 1, 0, 0, 0],
    'Education': ['Graduate', 'Graduate', 'Graduate', 'Not Graduate', 'Graduate'],
    'Self_Employed': ['No', 'No', 'Yes', 'No', 'No'],
    'Income': [5849, 4583, 3000, 2583, 6000],
    'Coapplicant Income': [0, 1508, 0, 2358, 0],
    'LoanAmount': [123, 128, 66, 120, 141],
    'Area': ['Urban', 'Rural', 'Urban', 'Urban', 'Urban'],
    'Loan Status': ['Y', 'N', 'Y', 'Y', 'Y']}
    df = pd.DataFrame(data)


    Now to get a correlation we need to convert our categorical features to numerical ones. Of course the choice of order will affect the correlation but luckily all of our categories seem to be binary. If this is not the case you will need to devise a custom ordering.



    df = pd.DataFrame(data)
    df['Married'] =df['Married'].astype('category').cat.codes
    df['Education'] =df['Education'].astype('category').cat.codes
    df['Self_Employed'] =df['Self_Employed'].astype('category').cat.codes
    df['Area'] =df['Area'].astype('category').cat.codes
    df['Loan Status'] =df['Loan Status'].astype('category').cat.codes


    Now we can get the correlation between the 'LoanAmount' and all the other features.



    df[df.columns[1:]].corr()['LoanAmount'][:]


    enter image description here





    Now using some machine learning on this data is not likely to work. There just is not sufficient data to extract some relevant information between your large number of features and the loan amount.



    You need at at least 10 times more instances than features in order to expect to get some good results.





    To only obtain the correlation between a feature and a subset of the features you can do



    df[['Income', 'Education', 'LoanAmount']].corr()['LoanAmount'][:]


    This will take a subset of the DataFrame and then apply the same corr() function as above. Make sure that the subset of columns selected includes the column with which you want to calculate the correlation, in this example that's 'LoanAmount'.







    share|improve this answer














    share|improve this answer



    share|improve this answer








    edited Oct 5 '18 at 4:33

























    answered Oct 5 '18 at 4:07









    JahKnowsJahKnows

    4,627524




    4,627524












    • $begingroup$
      Is it possible to choose what features to keep when looking at correlation of the features?
      $endgroup$
      – user214
      Oct 5 '18 at 4:16










    • $begingroup$
      @user2475, I added to the answer regarding your question.
      $endgroup$
      – JahKnows
      Oct 5 '18 at 4:33










    • $begingroup$
      my pleasure! let us know if other questions come up.
      $endgroup$
      – JahKnows
      Oct 5 '18 at 4:39










    • $begingroup$
      @JabKnows Just one doubt. Do I need to scale my feature values. I mean should I use standardscaler or min_max_scalar. I want to do a classification task.
      $endgroup$
      – user214
      Oct 5 '18 at 5:55








    • 1




      $begingroup$
      It depends on the algorithm you select. Some require it others don't, you'd need to check the algorithm. It is good however to keep the scale of the different features similar.
      $endgroup$
      – JahKnows
      Oct 6 '18 at 22:48


















    • $begingroup$
      Is it possible to choose what features to keep when looking at correlation of the features?
      $endgroup$
      – user214
      Oct 5 '18 at 4:16










    • $begingroup$
      @user2475, I added to the answer regarding your question.
      $endgroup$
      – JahKnows
      Oct 5 '18 at 4:33










    • $begingroup$
      my pleasure! let us know if other questions come up.
      $endgroup$
      – JahKnows
      Oct 5 '18 at 4:39










    • $begingroup$
      @JabKnows Just one doubt. Do I need to scale my feature values. I mean should I use standardscaler or min_max_scalar. I want to do a classification task.
      $endgroup$
      – user214
      Oct 5 '18 at 5:55








    • 1




      $begingroup$
      It depends on the algorithm you select. Some require it others don't, you'd need to check the algorithm. It is good however to keep the scale of the different features similar.
      $endgroup$
      – JahKnows
      Oct 6 '18 at 22:48
















    $begingroup$
    Is it possible to choose what features to keep when looking at correlation of the features?
    $endgroup$
    – user214
    Oct 5 '18 at 4:16




    $begingroup$
    Is it possible to choose what features to keep when looking at correlation of the features?
    $endgroup$
    – user214
    Oct 5 '18 at 4:16












    $begingroup$
    @user2475, I added to the answer regarding your question.
    $endgroup$
    – JahKnows
    Oct 5 '18 at 4:33




    $begingroup$
    @user2475, I added to the answer regarding your question.
    $endgroup$
    – JahKnows
    Oct 5 '18 at 4:33












    $begingroup$
    my pleasure! let us know if other questions come up.
    $endgroup$
    – JahKnows
    Oct 5 '18 at 4:39




    $begingroup$
    my pleasure! let us know if other questions come up.
    $endgroup$
    – JahKnows
    Oct 5 '18 at 4:39












    $begingroup$
    @JabKnows Just one doubt. Do I need to scale my feature values. I mean should I use standardscaler or min_max_scalar. I want to do a classification task.
    $endgroup$
    – user214
    Oct 5 '18 at 5:55






    $begingroup$
    @JabKnows Just one doubt. Do I need to scale my feature values. I mean should I use standardscaler or min_max_scalar. I want to do a classification task.
    $endgroup$
    – user214
    Oct 5 '18 at 5:55






    1




    1




    $begingroup$
    It depends on the algorithm you select. Some require it others don't, you'd need to check the algorithm. It is good however to keep the scale of the different features similar.
    $endgroup$
    – JahKnows
    Oct 6 '18 at 22:48




    $begingroup$
    It depends on the algorithm you select. Some require it others don't, you'd need to check the algorithm. It is good however to keep the scale of the different features similar.
    $endgroup$
    – JahKnows
    Oct 6 '18 at 22:48











    3












    $begingroup$

    Method in Python



    One way to check the correlation of every feature against the target variable is to run the code:



    # Your data should be a pandas dataframe for this example
    import pandas
    yourdata = ...
    corr_matrix = yourdata.corr()
    print(corr_matrix["your_target_variable"].sort_values(ascending=False))


    The following correlation output should list all the variables and their correlations to the target variable. The negative correlations mean that as the target variable decreases in value, the feature variable increases in value. (Linearly)



    To plot the correlations on plots instead, run the code:



    # make sure to specify some features that you might want to focus on or the plots might be too big
    from pandas.tools.plotting import scatter_matrix
    attributes = [list of whatever features you want to plot against the target variable]
    scatter_matrix(yourdata[attributes], figsize=(12, 8))


    For the figsize argument for the scatter_matrix function, input whatever size fits best.






    share|improve this answer











    $endgroup$


















      3












      $begingroup$

      Method in Python



      One way to check the correlation of every feature against the target variable is to run the code:



      # Your data should be a pandas dataframe for this example
      import pandas
      yourdata = ...
      corr_matrix = yourdata.corr()
      print(corr_matrix["your_target_variable"].sort_values(ascending=False))


      The following correlation output should list all the variables and their correlations to the target variable. The negative correlations mean that as the target variable decreases in value, the feature variable increases in value. (Linearly)



      To plot the correlations on plots instead, run the code:



      # make sure to specify some features that you might want to focus on or the plots might be too big
      from pandas.tools.plotting import scatter_matrix
      attributes = [list of whatever features you want to plot against the target variable]
      scatter_matrix(yourdata[attributes], figsize=(12, 8))


      For the figsize argument for the scatter_matrix function, input whatever size fits best.






      share|improve this answer











      $endgroup$
















        3












        3








        3





        $begingroup$

        Method in Python



        One way to check the correlation of every feature against the target variable is to run the code:



        # Your data should be a pandas dataframe for this example
        import pandas
        yourdata = ...
        corr_matrix = yourdata.corr()
        print(corr_matrix["your_target_variable"].sort_values(ascending=False))


        The following correlation output should list all the variables and their correlations to the target variable. The negative correlations mean that as the target variable decreases in value, the feature variable increases in value. (Linearly)



        To plot the correlations on plots instead, run the code:



        # make sure to specify some features that you might want to focus on or the plots might be too big
        from pandas.tools.plotting import scatter_matrix
        attributes = [list of whatever features you want to plot against the target variable]
        scatter_matrix(yourdata[attributes], figsize=(12, 8))


        For the figsize argument for the scatter_matrix function, input whatever size fits best.






        share|improve this answer











        $endgroup$



        Method in Python



        One way to check the correlation of every feature against the target variable is to run the code:



        # Your data should be a pandas dataframe for this example
        import pandas
        yourdata = ...
        corr_matrix = yourdata.corr()
        print(corr_matrix["your_target_variable"].sort_values(ascending=False))


        The following correlation output should list all the variables and their correlations to the target variable. The negative correlations mean that as the target variable decreases in value, the feature variable increases in value. (Linearly)



        To plot the correlations on plots instead, run the code:



        # make sure to specify some features that you might want to focus on or the plots might be too big
        from pandas.tools.plotting import scatter_matrix
        attributes = [list of whatever features you want to plot against the target variable]
        scatter_matrix(yourdata[attributes], figsize=(12, 8))


        For the figsize argument for the scatter_matrix function, input whatever size fits best.







        share|improve this answer














        share|improve this answer



        share|improve this answer








        edited Oct 7 '18 at 22:11









        Stephen Rauch

        1,53751129




        1,53751129










        answered Oct 5 '18 at 1:23









        Ethan YunEthan Yun

        846




        846






























            draft saved

            draft discarded




















































            Thanks for contributing an answer to Data Science Stack Exchange!


            • Please be sure to answer the question. Provide details and share your research!

            But avoid



            • Asking for help, clarification, or responding to other answers.

            • Making statements based on opinion; back them up with references or personal experience.


            Use MathJax to format equations. MathJax reference.


            To learn more, see our tips on writing great answers.




            draft saved


            draft discarded














            StackExchange.ready(
            function () {
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f39137%2fhow-can-i-check-the-correlation-between-features-and-target-variable%23new-answer', 'question_page');
            }
            );

            Post as a guest















            Required, but never shown





















































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown

































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown







            Popular posts from this blog

            Callistus I

            Tabula Rosettana

            How to label and detect the document text images