How can I fill NaN values in a pandas data frame?

Greeting everyone. I am trying to learn data analysis and machine learning by trying out some problems. I found a competition "House prices" which is actually a playground competition. Since I am very new to this field, I got confused after exploring the data. The data has 81 columns out of which 1 is the target column which is the house value. This data contains multiple columns where majority of values are "NaN". When I ran

nulls = data.isnull().sum()

nulls[nulls > 0]

This shows the columns with missing values:

LotFrontage     259 

Alley           1369

MasVnrType      8   

MasVnrArea      8   

BsmtQual        37  

BsmtCond        37  

BsmtExposure    38  

BsmtFinType1    37  

BsmtFinType2    38  

Electrical      1   

FireplaceQu     690 

GarageType      81  

GarageYrBlt     81  

GarageFinish    81  

GarageQual      81  

GarageCond      81  

PoolQC          1453

Fence           1179

MiscFeature     1406

At this point I am totally lost and I don't know how to get rid of these "NaN" values. Any help would be appreciated.

edited Nov 16 '17 at 1:38

timleathart

2,139726

asked Dec 25 '16 at 22:29

Ahmed Dhanani

12315

add a comment |

nulls = data.isnull().sum()

nulls[nulls > 0]

This shows the columns with missing values:

LotFrontage     259 

Alley           1369

MasVnrType      8   

MasVnrArea      8   

BsmtQual        37  

BsmtCond        37  

BsmtExposure    38  

BsmtFinType1    37  

BsmtFinType2    38  

Electrical      1   

FireplaceQu     690 

GarageType      81  

GarageYrBlt     81  

GarageFinish    81  

GarageQual      81  

GarageCond      81  

PoolQC          1453

Fence           1179

MiscFeature     1406

At this point I am totally lost and I don't know how to get rid of these "NaN" values. Any help would be appreciated.

edited Nov 16 '17 at 1:38

timleathart

2,139726

asked Dec 25 '16 at 22:29

Ahmed Dhanani

12315

add a comment |

nulls = data.isnull().sum()

nulls[nulls > 0]

This shows the columns with missing values:

LotFrontage     259 

Alley           1369

MasVnrType      8   

MasVnrArea      8   

BsmtQual        37  

BsmtCond        37  

BsmtExposure    38  

BsmtFinType1    37  

BsmtFinType2    38  

Electrical      1   

FireplaceQu     690 

GarageType      81  

GarageYrBlt     81  

GarageFinish    81  

GarageQual      81  

GarageCond      81  

PoolQC          1453

Fence           1179

MiscFeature     1406

At this point I am totally lost and I don't know how to get rid of these "NaN" values. Any help would be appreciated.

edited Nov 16 '17 at 1:38

timleathart

2,139726

asked Dec 25 '16 at 22:29

Ahmed Dhanani

12315

nulls = data.isnull().sum()

nulls[nulls > 0]

This shows the columns with missing values:

LotFrontage     259 

Alley           1369

MasVnrType      8   

MasVnrArea      8   

BsmtQual        37  

BsmtCond        37  

BsmtExposure    38  

BsmtFinType1    37  

BsmtFinType2    38  

Electrical      1   

FireplaceQu     690 

GarageType      81  

GarageYrBlt     81  

GarageFinish    81  

GarageQual      81  

GarageCond      81  

PoolQC          1453

Fence           1179

MiscFeature     1406

At this point I am totally lost and I don't know how to get rid of these "NaN" values. Any help would be appreciated.

python data-cleaning kaggle

edited Nov 16 '17 at 1:38

timleathart

2,139726

asked Dec 25 '16 at 22:29

Ahmed Dhanani

12315

edited Nov 16 '17 at 1:38

timleathart

2,139726

asked Dec 25 '16 at 22:29

Ahmed Dhanani

12315

edited Nov 16 '17 at 1:38

timleathart

2,139726

edited Nov 16 '17 at 1:38

timleathart

2,139726

edited Nov 16 '17 at 1:38

timleathart

2,139726

asked Dec 25 '16 at 22:29

Ahmed Dhanani

12315

asked Dec 25 '16 at 22:29

Ahmed Dhanani

12315

asked Dec 25 '16 at 22:29

Ahmed Dhanani

12315

add a comment |

2 Answers
2

active

oldest

votes

You can use the DataFrame.fillna function to fill the NaN values in your data. For example, assuming your data is in a DataFrame called df,

df.fillna(0, inplace=True)

will replace the missing values with the constant value 0. You can also do more clever things, such as replacing the missing values with the mean of that column:

df.fillna(df.mean(), inplace=True)

or take the last value seen for a column:

df.fillna(method='ffill', inplace=True)

Filling the NaN values is called imputation. Try a range of different imputation methods and see which ones work best for your data.

answered Dec 26 '16 at 0:06

timleathart

2,139726

$begingroup$
Thanks for the response. The dataset also consists of string values. I think df.fillna() will work on float or integer values. Any pointers on converting string values to numeric values?
$endgroup$
– Ahmed Dhanani
Dec 26 '16 at 13:07

1

$begingroup$
Ah, I had assumed the data was numeric for some reason. By string values, do you mean categorical data i.e. strings from a particular set of values? Then, you can use scikit-learn's LabelEncoder. Natural language, on the other hand, is more difficult to deal with. Bag-of-words is probably the easiest to think about, but have a look at these options.
$endgroup$
– timleathart
Dec 26 '16 at 22:01

add a comment |

~ # Taking care of missing data
~ from sklearn.preprocessing import Imputer
~ imputer = Imputer(missing_values = 'NaN', strategy = 'mean', axis = 0)
~ imputer = imputer.fit(X[:, 1:3])
~ X[:, 1:3] = imputer.transform(X[:, 1:3])

suppose the name of my array is X and I want to take care of missing data in columns indexed 1 and 2 by replacing it with mean. Imputer is a great class to do this from sklearn library

answered 45 mins ago

smit patel

New contributor

add a comment |

Your Answer

StackExchange.ifUsing("editor", function () {
return StackExchange.using("mathjaxEditing", function () {
StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix) {
StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\$","\$"]]);
});
});
}, "mathjax-editing");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "557"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});

}
});

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f15924%2fhow-can-i-fill-nan-values-in-a-pandas-data-frame%23new-answer', 'question_page');
}
);

Post as a guest

Name

Required, but never shown

2 Answers
2

active

oldest

votes

2 Answers
2

active

oldest

votes

You can use the DataFrame.fillna function to fill the NaN values in your data. For example, assuming your data is in a DataFrame called df,

df.fillna(0, inplace=True)

will replace the missing values with the constant value 0. You can also do more clever things, such as replacing the missing values with the mean of that column:

df.fillna(df.mean(), inplace=True)

or take the last value seen for a column:

df.fillna(method='ffill', inplace=True)

Filling the NaN values is called imputation. Try a range of different imputation methods and see which ones work best for your data.

answered Dec 26 '16 at 0:06

timleathart

2,139726

$begingroup$
Thanks for the response. The dataset also consists of string values. I think df.fillna() will work on float or integer values. Any pointers on converting string values to numeric values?
$endgroup$
– Ahmed Dhanani
Dec 26 '16 at 13:07

1

$begingroup$
Ah, I had assumed the data was numeric for some reason. By string values, do you mean categorical data i.e. strings from a particular set of values? Then, you can use scikit-learn's LabelEncoder. Natural language, on the other hand, is more difficult to deal with. Bag-of-words is probably the easiest to think about, but have a look at these options.
$endgroup$
– timleathart
Dec 26 '16 at 22:01

add a comment |

You can use the DataFrame.fillna function to fill the NaN values in your data. For example, assuming your data is in a DataFrame called df,

df.fillna(0, inplace=True)

will replace the missing values with the constant value 0. You can also do more clever things, such as replacing the missing values with the mean of that column:

df.fillna(df.mean(), inplace=True)

or take the last value seen for a column:

df.fillna(method='ffill', inplace=True)

Filling the NaN values is called imputation. Try a range of different imputation methods and see which ones work best for your data.

answered Dec 26 '16 at 0:06

timleathart

2,139726

$begingroup$
Thanks for the response. The dataset also consists of string values. I think df.fillna() will work on float or integer values. Any pointers on converting string values to numeric values?
$endgroup$
– Ahmed Dhanani
Dec 26 '16 at 13:07

1

$begingroup$
Ah, I had assumed the data was numeric for some reason. By string values, do you mean categorical data i.e. strings from a particular set of values? Then, you can use scikit-learn's LabelEncoder. Natural language, on the other hand, is more difficult to deal with. Bag-of-words is probably the easiest to think about, but have a look at these options.
$endgroup$
– timleathart
Dec 26 '16 at 22:01

add a comment |

You can use the DataFrame.fillna function to fill the NaN values in your data. For example, assuming your data is in a DataFrame called df,

df.fillna(0, inplace=True)

will replace the missing values with the constant value 0. You can also do more clever things, such as replacing the missing values with the mean of that column:

df.fillna(df.mean(), inplace=True)

or take the last value seen for a column:

df.fillna(method='ffill', inplace=True)

Filling the NaN values is called imputation. Try a range of different imputation methods and see which ones work best for your data.

answered Dec 26 '16 at 0:06

timleathart

2,139726

You can use the DataFrame.fillna function to fill the NaN values in your data. For example, assuming your data is in a DataFrame called df,

df.fillna(0, inplace=True)

will replace the missing values with the constant value 0. You can also do more clever things, such as replacing the missing values with the mean of that column:

df.fillna(df.mean(), inplace=True)

or take the last value seen for a column:

df.fillna(method='ffill', inplace=True)

Filling the NaN values is called imputation. Try a range of different imputation methods and see which ones work best for your data.

answered Dec 26 '16 at 0:06

timleathart

2,139726

answered Dec 26 '16 at 0:06

timleathart

2,139726

answered Dec 26 '16 at 0:06

timleathart

2,139726

answered Dec 26 '16 at 0:06

timleathart

2,139726

$begingroup$
Thanks for the response. The dataset also consists of string values. I think df.fillna() will work on float or integer values. Any pointers on converting string values to numeric values?
$endgroup$
– Ahmed Dhanani
Dec 26 '16 at 13:07

1

$begingroup$
Ah, I had assumed the data was numeric for some reason. By string values, do you mean categorical data i.e. strings from a particular set of values? Then, you can use scikit-learn's LabelEncoder. Natural language, on the other hand, is more difficult to deal with. Bag-of-words is probably the easiest to think about, but have a look at these options.
$endgroup$
– timleathart
Dec 26 '16 at 22:01

add a comment |

$begingroup$
Thanks for the response. The dataset also consists of string values. I think df.fillna() will work on float or integer values. Any pointers on converting string values to numeric values?
$endgroup$
– Ahmed Dhanani
Dec 26 '16 at 13:07

1

$begingroup$
Ah, I had assumed the data was numeric for some reason. By string values, do you mean categorical data i.e. strings from a particular set of values? Then, you can use scikit-learn's LabelEncoder. Natural language, on the other hand, is more difficult to deal with. Bag-of-words is probably the easiest to think about, but have a look at these options.
$endgroup$
– timleathart
Dec 26 '16 at 22:01

Thanks for the response. The dataset also consists of string values. I think df.fillna() will work on float or integer values. Any pointers on converting string values to numeric values?

– Ahmed Dhanani
Dec 26 '16 at 13:07

Ah, I had assumed the data was numeric for some reason. By string values, do you mean categorical data i.e. strings from a particular set of values? Then, you can use scikit-learn's LabelEncoder. Natural language, on the other hand, is more difficult to deal with. Bag-of-words is probably the easiest to think about, but have a look at these options.

– timleathart
Dec 26 '16 at 22:01

add a comment |

suppose the name of my array is X and I want to take care of missing data in columns indexed 1 and 2 by replacing it with mean. Imputer is a great class to do this from sklearn library

answered 45 mins ago

smit patel

New contributor

add a comment |

suppose the name of my array is X and I want to take care of missing data in columns indexed 1 and 2 by replacing it with mean. Imputer is a great class to do this from sklearn library

answered 45 mins ago

smit patel

New contributor

add a comment |

suppose the name of my array is X and I want to take care of missing data in columns indexed 1 and 2 by replacing it with mean. Imputer is a great class to do this from sklearn library

answered 45 mins ago

smit patel

New contributor

suppose the name of my array is X and I want to take care of missing data in columns indexed 1 and 2 by replacing it with mean. Imputer is a great class to do this from sklearn library

answered 45 mins ago

smit patel

New contributor

answered 45 mins ago

smit patel

New contributor

answered 45 mins ago

smit patel

answered 45 mins ago

smit patel

New contributor

smit patel is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.

add a comment |

draft saved

draft discarded

Thanks for contributing an answer to Data Science Stack Exchange!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

Use MathJax to format equations. MathJax reference.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Htydjtk