How can I check the correlation between features and target variable?

I am trying to build a Regression model and I am looking for a way to check whether there's any correlation between features and target variables?

This is my sample dataset

     Loan_ID    Gender  Married Dependents  Education Self_Employed ApplicantIncome    



0   LP001002    Male    No         0        Graduate      No            5849    

1   LP001003    Male    Yes        1        Graduate      No            4583    

2   LP001005    Male    Yes        0        Graduate     Yes            3000    

3   LP001006    Male    Yes        0        Not Graduate  No            2583    

4   LP001008    Male    No         0        Graduate      No            6000    



CoapplicantIncome  LoanAmount   Loan_Amount_Term  Credit_History Area Loan_Status

      0.0               123          360.0            1.0        Urban     Y

      1508.0          128.0          360.0            1.0        Rural     N

      0.0              66.0          360.0            1.0        Urban     Y

      2358.0          120.0          360.0            1.0        Urban     Y

      0.0             141.0          360.0            1.0        Urban     Y

I am trying to predict LoanAmount column based on the features available above.

I just want to see if there's a correlation between the features and target variable. I tried LinearRegression, GradientBoostingRegressor and I'm hardly getting a accuracy of around 0.30 - 0.40%.

Any suggestions on algorithms, params etc that I should use for better prediction?

edited Oct 3 '18 at 18:55

asked Oct 3 '18 at 18:43

user214

1485

$begingroup$
Is there a special function for this in R?
$endgroup$
– alkanschtein
13 hours ago

add a comment |

I am trying to build a Regression model and I am looking for a way to check whether there's any correlation between features and target variables?

This is my sample dataset

     Loan_ID    Gender  Married Dependents  Education Self_Employed ApplicantIncome    



0   LP001002    Male    No         0        Graduate      No            5849    

1   LP001003    Male    Yes        1        Graduate      No            4583    

2   LP001005    Male    Yes        0        Graduate     Yes            3000    

3   LP001006    Male    Yes        0        Not Graduate  No            2583    

4   LP001008    Male    No         0        Graduate      No            6000    



CoapplicantIncome  LoanAmount   Loan_Amount_Term  Credit_History Area Loan_Status

      0.0               123          360.0            1.0        Urban     Y

      1508.0          128.0          360.0            1.0        Rural     N

      0.0              66.0          360.0            1.0        Urban     Y

      2358.0          120.0          360.0            1.0        Urban     Y

      0.0             141.0          360.0            1.0        Urban     Y

I am trying to predict LoanAmount column based on the features available above.

Any suggestions on algorithms, params etc that I should use for better prediction?

edited Oct 3 '18 at 18:55

asked Oct 3 '18 at 18:43

user214

1485

$begingroup$
Is there a special function for this in R?
$endgroup$
– alkanschtein
13 hours ago

add a comment |

I am trying to build a Regression model and I am looking for a way to check whether there's any correlation between features and target variables?

This is my sample dataset

     Loan_ID    Gender  Married Dependents  Education Self_Employed ApplicantIncome    



0   LP001002    Male    No         0        Graduate      No            5849    

1   LP001003    Male    Yes        1        Graduate      No            4583    

2   LP001005    Male    Yes        0        Graduate     Yes            3000    

3   LP001006    Male    Yes        0        Not Graduate  No            2583    

4   LP001008    Male    No         0        Graduate      No            6000    



CoapplicantIncome  LoanAmount   Loan_Amount_Term  Credit_History Area Loan_Status

      0.0               123          360.0            1.0        Urban     Y

      1508.0          128.0          360.0            1.0        Rural     N

      0.0              66.0          360.0            1.0        Urban     Y

      2358.0          120.0          360.0            1.0        Urban     Y

      0.0             141.0          360.0            1.0        Urban     Y

I am trying to predict LoanAmount column based on the features available above.

Any suggestions on algorithms, params etc that I should use for better prediction?

edited Oct 3 '18 at 18:55

asked Oct 3 '18 at 18:43

user214

1485

I am trying to build a Regression model and I am looking for a way to check whether there's any correlation between features and target variables?

This is my sample dataset

     Loan_ID    Gender  Married Dependents  Education Self_Employed ApplicantIncome    



0   LP001002    Male    No         0        Graduate      No            5849    

1   LP001003    Male    Yes        1        Graduate      No            4583    

2   LP001005    Male    Yes        0        Graduate     Yes            3000    

3   LP001006    Male    Yes        0        Not Graduate  No            2583    

4   LP001008    Male    No         0        Graduate      No            6000    



CoapplicantIncome  LoanAmount   Loan_Amount_Term  Credit_History Area Loan_Status

      0.0               123          360.0            1.0        Urban     Y

      1508.0          128.0          360.0            1.0        Rural     N

      0.0              66.0          360.0            1.0        Urban     Y

      2358.0          120.0          360.0            1.0        Urban     Y

      0.0             141.0          360.0            1.0        Urban     Y

I am trying to predict LoanAmount column based on the features available above.

Any suggestions on algorithms, params etc that I should use for better prediction?

machine-learning scikit-learn regression linear-regression

edited Oct 3 '18 at 18:55

asked Oct 3 '18 at 18:43

user214

1485

edited Oct 3 '18 at 18:55

asked Oct 3 '18 at 18:43

user214

1485

edited Oct 3 '18 at 18:55

asked Oct 3 '18 at 18:43

user214

1485

asked Oct 3 '18 at 18:43

user214

1485

asked Oct 3 '18 at 18:43

user214

1485

$begingroup$
Is there a special function for this in R?
$endgroup$
– alkanschtein
13 hours ago

add a comment |

$begingroup$
Is there a special function for this in R?
$endgroup$
– alkanschtein
13 hours ago

Is there a special function for this in R?

– alkanschtein
13 hours ago

add a comment |

2 Answers
2

active

oldest

votes

Your data can be put into a pandas DataFrame using

import pandas as pd

data = {'Loan ID': ['LP001002', 'LP001003', 'LP001005', 'LP001006', 'LP001008'],

        'Married': ['No', 'Yes', 'Yes', 'Yes', 'No'],

        'Dependents': [0, 1, 0, 0, 0],

        'Education': ['Graduate', 'Graduate', 'Graduate', 'Not Graduate', 'Graduate'],

        'Self_Employed': ['No', 'No', 'Yes', 'No', 'No'],

        'Income': [5849, 4583, 3000, 2583, 6000],

        'Coapplicant Income': [0, 1508, 0, 2358, 0],

        'LoanAmount': [123, 128, 66, 120, 141],

        'Area': ['Urban', 'Rural', 'Urban', 'Urban', 'Urban'],

        'Loan Status': ['Y', 'N', 'Y', 'Y', 'Y']} 

df = pd.DataFrame(data)

Now to get a correlation we need to convert our categorical features to numerical ones. Of course the choice of order will affect the correlation but luckily all of our categories seem to be binary. If this is not the case you will need to devise a custom ordering.

df = pd.DataFrame(data)

df['Married'] =df['Married'].astype('category').cat.codes

df['Education'] =df['Education'].astype('category').cat.codes

df['Self_Employed'] =df['Self_Employed'].astype('category').cat.codes

df['Area'] =df['Area'].astype('category').cat.codes

df['Loan Status'] =df['Loan Status'].astype('category').cat.codes

Now we can get the correlation between the 'LoanAmount' and all the other features.

df[df.columns[1:]].corr()['LoanAmount'][:]

enter image description here

Now using some machine learning on this data is not likely to work. There just is not sufficient data to extract some relevant information between your large number of features and the loan amount.

You need at at least 10 times more instances than features in order to expect to get some good results.

To only obtain the correlation between a feature and a subset of the features you can do

df[['Income', 'Education', 'LoanAmount']].corr()['LoanAmount'][:]

This will take a subset of the DataFrame and then apply the same corr() function as above. Make sure that the subset of columns selected includes the column with which you want to calculate the correlation, in this example that's 'LoanAmount'.

edited Oct 5 '18 at 4:33

answered Oct 5 '18 at 4:07

JahKnows

4,627524

$begingroup$
Is it possible to choose what features to keep when looking at correlation of the features?
$endgroup$
– user214
Oct 5 '18 at 4:16

$begingroup$
@user2475, I added to the answer regarding your question.
$endgroup$
– JahKnows
Oct 5 '18 at 4:33

$begingroup$
my pleasure! let us know if other questions come up.
$endgroup$
– JahKnows
Oct 5 '18 at 4:39

$begingroup$
@JabKnows Just one doubt. Do I need to scale my feature values. I mean should I use standardscaler or min_max_scalar. I want to do a classification task.
$endgroup$
– user214
Oct 5 '18 at 5:55

1

$begingroup$
It depends on the algorithm you select. Some require it others don't, you'd need to check the algorithm. It is good however to keep the scale of the different features similar.
$endgroup$
– JahKnows
Oct 6 '18 at 22:48

add a comment |

Method in Python

One way to check the correlation of every feature against the target variable is to run the code:

# Your data should be a pandas dataframe for this example

import pandas

yourdata = ...

corr_matrix = yourdata.corr()

print(corr_matrix["your_target_variable"].sort_values(ascending=False))

The following correlation output should list all the variables and their correlations to the target variable. The negative correlations mean that as the target variable decreases in value, the feature variable increases in value. (Linearly)

To plot the correlations on plots instead, run the code:

# make sure to specify some features that you might want to focus on or the plots might be too big

from pandas.tools.plotting import scatter_matrix

attributes = [list of whatever features you want to plot against the target variable]

scatter_matrix(yourdata[attributes], figsize=(12, 8))

For the figsize argument for the scatter_matrix function, input whatever size fits best.

edited Oct 7 '18 at 22:11

Stephen Rauch

1,53751129

answered Oct 5 '18 at 1:23

Ethan Yun

846

add a comment |

Your Answer

StackExchange.ifUsing("editor", function () {
return StackExchange.using("mathjaxEditing", function () {
StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix) {
StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\$","\$"]]);
});
});
}, "mathjax-editing");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "557"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});

}
});

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f39137%2fhow-can-i-check-the-correlation-between-features-and-target-variable%23new-answer', 'question_page');
}
);

Post as a guest

Name

Required, but never shown

2 Answers
2

active

oldest

votes

2 Answers
2

active

oldest

votes

Your data can be put into a pandas DataFrame using

import pandas as pd

data = {'Loan ID': ['LP001002', 'LP001003', 'LP001005', 'LP001006', 'LP001008'],

        'Married': ['No', 'Yes', 'Yes', 'Yes', 'No'],

        'Dependents': [0, 1, 0, 0, 0],

        'Education': ['Graduate', 'Graduate', 'Graduate', 'Not Graduate', 'Graduate'],

        'Self_Employed': ['No', 'No', 'Yes', 'No', 'No'],

        'Income': [5849, 4583, 3000, 2583, 6000],

        'Coapplicant Income': [0, 1508, 0, 2358, 0],

        'LoanAmount': [123, 128, 66, 120, 141],

        'Area': ['Urban', 'Rural', 'Urban', 'Urban', 'Urban'],

        'Loan Status': ['Y', 'N', 'Y', 'Y', 'Y']} 

df = pd.DataFrame(data)

df = pd.DataFrame(data)

df['Married'] =df['Married'].astype('category').cat.codes

df['Education'] =df['Education'].astype('category').cat.codes

df['Self_Employed'] =df['Self_Employed'].astype('category').cat.codes

df['Area'] =df['Area'].astype('category').cat.codes

df['Loan Status'] =df['Loan Status'].astype('category').cat.codes

Now we can get the correlation between the 'LoanAmount' and all the other features.

df[df.columns[1:]].corr()['LoanAmount'][:]

enter image description here

Now using some machine learning on this data is not likely to work. There just is not sufficient data to extract some relevant information between your large number of features and the loan amount.

You need at at least 10 times more instances than features in order to expect to get some good results.

To only obtain the correlation between a feature and a subset of the features you can do

df[['Income', 'Education', 'LoanAmount']].corr()['LoanAmount'][:]

edited Oct 5 '18 at 4:33

answered Oct 5 '18 at 4:07

JahKnows

4,627524

$begingroup$
Is it possible to choose what features to keep when looking at correlation of the features?
$endgroup$
– user214
Oct 5 '18 at 4:16

$begingroup$
@user2475, I added to the answer regarding your question.
$endgroup$
– JahKnows
Oct 5 '18 at 4:33

$begingroup$
my pleasure! let us know if other questions come up.
$endgroup$
– JahKnows
Oct 5 '18 at 4:39

$begingroup$
@JabKnows Just one doubt. Do I need to scale my feature values. I mean should I use standardscaler or min_max_scalar. I want to do a classification task.
$endgroup$
– user214
Oct 5 '18 at 5:55

1

$begingroup$
It depends on the algorithm you select. Some require it others don't, you'd need to check the algorithm. It is good however to keep the scale of the different features similar.
$endgroup$
– JahKnows
Oct 6 '18 at 22:48

add a comment |

Your data can be put into a pandas DataFrame using

import pandas as pd

data = {'Loan ID': ['LP001002', 'LP001003', 'LP001005', 'LP001006', 'LP001008'],

        'Married': ['No', 'Yes', 'Yes', 'Yes', 'No'],

        'Dependents': [0, 1, 0, 0, 0],

        'Education': ['Graduate', 'Graduate', 'Graduate', 'Not Graduate', 'Graduate'],

        'Self_Employed': ['No', 'No', 'Yes', 'No', 'No'],

        'Income': [5849, 4583, 3000, 2583, 6000],

        'Coapplicant Income': [0, 1508, 0, 2358, 0],

        'LoanAmount': [123, 128, 66, 120, 141],

        'Area': ['Urban', 'Rural', 'Urban', 'Urban', 'Urban'],

        'Loan Status': ['Y', 'N', 'Y', 'Y', 'Y']} 

df = pd.DataFrame(data)

df = pd.DataFrame(data)

df['Married'] =df['Married'].astype('category').cat.codes

df['Education'] =df['Education'].astype('category').cat.codes

df['Self_Employed'] =df['Self_Employed'].astype('category').cat.codes

df['Area'] =df['Area'].astype('category').cat.codes

df['Loan Status'] =df['Loan Status'].astype('category').cat.codes

Now we can get the correlation between the 'LoanAmount' and all the other features.

df[df.columns[1:]].corr()['LoanAmount'][:]

enter image description here

Now using some machine learning on this data is not likely to work. There just is not sufficient data to extract some relevant information between your large number of features and the loan amount.

You need at at least 10 times more instances than features in order to expect to get some good results.

To only obtain the correlation between a feature and a subset of the features you can do

df[['Income', 'Education', 'LoanAmount']].corr()['LoanAmount'][:]

edited Oct 5 '18 at 4:33

answered Oct 5 '18 at 4:07

JahKnows

4,627524

$begingroup$
Is it possible to choose what features to keep when looking at correlation of the features?
$endgroup$
– user214
Oct 5 '18 at 4:16

$begingroup$
@user2475, I added to the answer regarding your question.
$endgroup$
– JahKnows
Oct 5 '18 at 4:33

$begingroup$
my pleasure! let us know if other questions come up.
$endgroup$
– JahKnows
Oct 5 '18 at 4:39

$begingroup$
@JabKnows Just one doubt. Do I need to scale my feature values. I mean should I use standardscaler or min_max_scalar. I want to do a classification task.
$endgroup$
– user214
Oct 5 '18 at 5:55

1

$begingroup$
It depends on the algorithm you select. Some require it others don't, you'd need to check the algorithm. It is good however to keep the scale of the different features similar.
$endgroup$
– JahKnows
Oct 6 '18 at 22:48

add a comment |

Your data can be put into a pandas DataFrame using

import pandas as pd

data = {'Loan ID': ['LP001002', 'LP001003', 'LP001005', 'LP001006', 'LP001008'],

        'Married': ['No', 'Yes', 'Yes', 'Yes', 'No'],

        'Dependents': [0, 1, 0, 0, 0],

        'Education': ['Graduate', 'Graduate', 'Graduate', 'Not Graduate', 'Graduate'],

        'Self_Employed': ['No', 'No', 'Yes', 'No', 'No'],

        'Income': [5849, 4583, 3000, 2583, 6000],

        'Coapplicant Income': [0, 1508, 0, 2358, 0],

        'LoanAmount': [123, 128, 66, 120, 141],

        'Area': ['Urban', 'Rural', 'Urban', 'Urban', 'Urban'],

        'Loan Status': ['Y', 'N', 'Y', 'Y', 'Y']} 

df = pd.DataFrame(data)

df = pd.DataFrame(data)

df['Married'] =df['Married'].astype('category').cat.codes

df['Education'] =df['Education'].astype('category').cat.codes

df['Self_Employed'] =df['Self_Employed'].astype('category').cat.codes

df['Area'] =df['Area'].astype('category').cat.codes

df['Loan Status'] =df['Loan Status'].astype('category').cat.codes

Now we can get the correlation between the 'LoanAmount' and all the other features.

df[df.columns[1:]].corr()['LoanAmount'][:]

enter image description here

Now using some machine learning on this data is not likely to work. There just is not sufficient data to extract some relevant information between your large number of features and the loan amount.

You need at at least 10 times more instances than features in order to expect to get some good results.

To only obtain the correlation between a feature and a subset of the features you can do

df[['Income', 'Education', 'LoanAmount']].corr()['LoanAmount'][:]

edited Oct 5 '18 at 4:33

answered Oct 5 '18 at 4:07

JahKnows

4,627524

Your data can be put into a pandas DataFrame using

import pandas as pd

data = {'Loan ID': ['LP001002', 'LP001003', 'LP001005', 'LP001006', 'LP001008'],

        'Married': ['No', 'Yes', 'Yes', 'Yes', 'No'],

        'Dependents': [0, 1, 0, 0, 0],

        'Education': ['Graduate', 'Graduate', 'Graduate', 'Not Graduate', 'Graduate'],

        'Self_Employed': ['No', 'No', 'Yes', 'No', 'No'],

        'Income': [5849, 4583, 3000, 2583, 6000],

        'Coapplicant Income': [0, 1508, 0, 2358, 0],

        'LoanAmount': [123, 128, 66, 120, 141],

        'Area': ['Urban', 'Rural', 'Urban', 'Urban', 'Urban'],

        'Loan Status': ['Y', 'N', 'Y', 'Y', 'Y']} 

df = pd.DataFrame(data)

df = pd.DataFrame(data)

df['Married'] =df['Married'].astype('category').cat.codes

df['Education'] =df['Education'].astype('category').cat.codes

df['Self_Employed'] =df['Self_Employed'].astype('category').cat.codes

df['Area'] =df['Area'].astype('category').cat.codes

df['Loan Status'] =df['Loan Status'].astype('category').cat.codes

Now we can get the correlation between the 'LoanAmount' and all the other features.

df[df.columns[1:]].corr()['LoanAmount'][:]

enter image description here

Now using some machine learning on this data is not likely to work. There just is not sufficient data to extract some relevant information between your large number of features and the loan amount.

You need at at least 10 times more instances than features in order to expect to get some good results.

To only obtain the correlation between a feature and a subset of the features you can do

df[['Income', 'Education', 'LoanAmount']].corr()['LoanAmount'][:]

edited Oct 5 '18 at 4:33

answered Oct 5 '18 at 4:07

JahKnows

4,627524

edited Oct 5 '18 at 4:33

answered Oct 5 '18 at 4:07

JahKnows

4,627524

answered Oct 5 '18 at 4:07

JahKnows

4,627524

answered Oct 5 '18 at 4:07

JahKnows

4,627524

$begingroup$
Is it possible to choose what features to keep when looking at correlation of the features?
$endgroup$
– user214
Oct 5 '18 at 4:16

$begingroup$
@user2475, I added to the answer regarding your question.
$endgroup$
– JahKnows
Oct 5 '18 at 4:33

$begingroup$
my pleasure! let us know if other questions come up.
$endgroup$
– JahKnows
Oct 5 '18 at 4:39

$begingroup$
@JabKnows Just one doubt. Do I need to scale my feature values. I mean should I use standardscaler or min_max_scalar. I want to do a classification task.
$endgroup$
– user214
Oct 5 '18 at 5:55

1

$begingroup$
It depends on the algorithm you select. Some require it others don't, you'd need to check the algorithm. It is good however to keep the scale of the different features similar.
$endgroup$
– JahKnows
Oct 6 '18 at 22:48

add a comment |

$begingroup$
Is it possible to choose what features to keep when looking at correlation of the features?
$endgroup$
– user214
Oct 5 '18 at 4:16

$begingroup$
@user2475, I added to the answer regarding your question.
$endgroup$
– JahKnows
Oct 5 '18 at 4:33

$begingroup$
my pleasure! let us know if other questions come up.
$endgroup$
– JahKnows
Oct 5 '18 at 4:39

$begingroup$
@JabKnows Just one doubt. Do I need to scale my feature values. I mean should I use standardscaler or min_max_scalar. I want to do a classification task.
$endgroup$
– user214
Oct 5 '18 at 5:55

1

$begingroup$
It depends on the algorithm you select. Some require it others don't, you'd need to check the algorithm. It is good however to keep the scale of the different features similar.
$endgroup$
– JahKnows
Oct 6 '18 at 22:48

Is it possible to choose what features to keep when looking at correlation of the features?

– user214
Oct 5 '18 at 4:16

@user2475, I added to the answer regarding your question.

– JahKnows
Oct 5 '18 at 4:33

my pleasure! let us know if other questions come up.

– JahKnows
Oct 5 '18 at 4:39

@JabKnows Just one doubt. Do I need to scale my feature values. I mean should I use standardscaler or min_max_scalar. I want to do a classification task.

– user214
Oct 5 '18 at 5:55

It depends on the algorithm you select. Some require it others don't, you'd need to check the algorithm. It is good however to keep the scale of the different features similar.

– JahKnows
Oct 6 '18 at 22:48

add a comment |

Method in Python

One way to check the correlation of every feature against the target variable is to run the code:

# Your data should be a pandas dataframe for this example

import pandas

yourdata = ...

corr_matrix = yourdata.corr()

print(corr_matrix["your_target_variable"].sort_values(ascending=False))

To plot the correlations on plots instead, run the code:

# make sure to specify some features that you might want to focus on or the plots might be too big

from pandas.tools.plotting import scatter_matrix

attributes = [list of whatever features you want to plot against the target variable]

scatter_matrix(yourdata[attributes], figsize=(12, 8))

For the figsize argument for the scatter_matrix function, input whatever size fits best.

edited Oct 7 '18 at 22:11

Stephen Rauch

1,53751129

answered Oct 5 '18 at 1:23

Ethan Yun

846

add a comment |

Method in Python

One way to check the correlation of every feature against the target variable is to run the code:

# Your data should be a pandas dataframe for this example

import pandas

yourdata = ...

corr_matrix = yourdata.corr()

print(corr_matrix["your_target_variable"].sort_values(ascending=False))

To plot the correlations on plots instead, run the code:

# make sure to specify some features that you might want to focus on or the plots might be too big

from pandas.tools.plotting import scatter_matrix

attributes = [list of whatever features you want to plot against the target variable]

scatter_matrix(yourdata[attributes], figsize=(12, 8))

For the figsize argument for the scatter_matrix function, input whatever size fits best.

edited Oct 7 '18 at 22:11

Stephen Rauch

1,53751129

answered Oct 5 '18 at 1:23

Ethan Yun

846

add a comment |

Method in Python

One way to check the correlation of every feature against the target variable is to run the code:

# Your data should be a pandas dataframe for this example

import pandas

yourdata = ...

corr_matrix = yourdata.corr()

print(corr_matrix["your_target_variable"].sort_values(ascending=False))

To plot the correlations on plots instead, run the code:

# make sure to specify some features that you might want to focus on or the plots might be too big

from pandas.tools.plotting import scatter_matrix

attributes = [list of whatever features you want to plot against the target variable]

scatter_matrix(yourdata[attributes], figsize=(12, 8))

For the figsize argument for the scatter_matrix function, input whatever size fits best.

edited Oct 7 '18 at 22:11

Stephen Rauch

1,53751129

answered Oct 5 '18 at 1:23

Ethan Yun

846

Method in Python

One way to check the correlation of every feature against the target variable is to run the code:

# Your data should be a pandas dataframe for this example

import pandas

yourdata = ...

corr_matrix = yourdata.corr()

print(corr_matrix["your_target_variable"].sort_values(ascending=False))

To plot the correlations on plots instead, run the code:

# make sure to specify some features that you might want to focus on or the plots might be too big

from pandas.tools.plotting import scatter_matrix

attributes = [list of whatever features you want to plot against the target variable]

scatter_matrix(yourdata[attributes], figsize=(12, 8))

For the figsize argument for the scatter_matrix function, input whatever size fits best.

edited Oct 7 '18 at 22:11

Stephen Rauch

1,53751129

answered Oct 5 '18 at 1:23

Ethan Yun

846

edited Oct 7 '18 at 22:11

Stephen Rauch

1,53751129

edited Oct 7 '18 at 22:11

Stephen Rauch

1,53751129

edited Oct 7 '18 at 22:11

Stephen Rauch

1,53751129

answered Oct 5 '18 at 1:23

Ethan Yun

846

answered Oct 5 '18 at 1:23

Ethan Yun

846

answered Oct 5 '18 at 1:23

Ethan Yun

846

add a comment |

draft saved

draft discarded

Thanks for contributing an answer to Data Science Stack Exchange!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

Use MathJax to format equations. MathJax reference.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Htydjtk