How can I fill NaN values in a pandas data frame?
$begingroup$
Greeting everyone. I am trying to learn data analysis and machine learning by trying out some problems. I found a competition "House prices" which is actually a playground competition. Since I am very new to this field, I got confused after exploring the data. The data has 81 columns out of which 1 is the target column which is the house value. This data contains multiple columns where majority of values are "NaN". When I ran
nulls = data.isnull().sum()
nulls[nulls > 0]
This shows the columns with missing values:
LotFrontage 259
Alley 1369
MasVnrType 8
MasVnrArea 8
BsmtQual 37
BsmtCond 37
BsmtExposure 38
BsmtFinType1 37
BsmtFinType2 38
Electrical 1
FireplaceQu 690
GarageType 81
GarageYrBlt 81
GarageFinish 81
GarageQual 81
GarageCond 81
PoolQC 1453
Fence 1179
MiscFeature 1406
At this point I am totally lost and I don't know how to get rid of these "NaN" values. Any help would be appreciated.
python data-cleaning kaggle
$endgroup$
add a comment |
$begingroup$
Greeting everyone. I am trying to learn data analysis and machine learning by trying out some problems. I found a competition "House prices" which is actually a playground competition. Since I am very new to this field, I got confused after exploring the data. The data has 81 columns out of which 1 is the target column which is the house value. This data contains multiple columns where majority of values are "NaN". When I ran
nulls = data.isnull().sum()
nulls[nulls > 0]
This shows the columns with missing values:
LotFrontage 259
Alley 1369
MasVnrType 8
MasVnrArea 8
BsmtQual 37
BsmtCond 37
BsmtExposure 38
BsmtFinType1 37
BsmtFinType2 38
Electrical 1
FireplaceQu 690
GarageType 81
GarageYrBlt 81
GarageFinish 81
GarageQual 81
GarageCond 81
PoolQC 1453
Fence 1179
MiscFeature 1406
At this point I am totally lost and I don't know how to get rid of these "NaN" values. Any help would be appreciated.
python data-cleaning kaggle
$endgroup$
add a comment |
$begingroup$
Greeting everyone. I am trying to learn data analysis and machine learning by trying out some problems. I found a competition "House prices" which is actually a playground competition. Since I am very new to this field, I got confused after exploring the data. The data has 81 columns out of which 1 is the target column which is the house value. This data contains multiple columns where majority of values are "NaN". When I ran
nulls = data.isnull().sum()
nulls[nulls > 0]
This shows the columns with missing values:
LotFrontage 259
Alley 1369
MasVnrType 8
MasVnrArea 8
BsmtQual 37
BsmtCond 37
BsmtExposure 38
BsmtFinType1 37
BsmtFinType2 38
Electrical 1
FireplaceQu 690
GarageType 81
GarageYrBlt 81
GarageFinish 81
GarageQual 81
GarageCond 81
PoolQC 1453
Fence 1179
MiscFeature 1406
At this point I am totally lost and I don't know how to get rid of these "NaN" values. Any help would be appreciated.
python data-cleaning kaggle
$endgroup$
Greeting everyone. I am trying to learn data analysis and machine learning by trying out some problems. I found a competition "House prices" which is actually a playground competition. Since I am very new to this field, I got confused after exploring the data. The data has 81 columns out of which 1 is the target column which is the house value. This data contains multiple columns where majority of values are "NaN". When I ran
nulls = data.isnull().sum()
nulls[nulls > 0]
This shows the columns with missing values:
LotFrontage 259
Alley 1369
MasVnrType 8
MasVnrArea 8
BsmtQual 37
BsmtCond 37
BsmtExposure 38
BsmtFinType1 37
BsmtFinType2 38
Electrical 1
FireplaceQu 690
GarageType 81
GarageYrBlt 81
GarageFinish 81
GarageQual 81
GarageCond 81
PoolQC 1453
Fence 1179
MiscFeature 1406
At this point I am totally lost and I don't know how to get rid of these "NaN" values. Any help would be appreciated.
python data-cleaning kaggle
python data-cleaning kaggle
edited Nov 16 '17 at 1:38
timleathart
2,139726
2,139726
asked Dec 25 '16 at 22:29
Ahmed DhananiAhmed Dhanani
12315
12315
add a comment |
add a comment |
2 Answers
2
active
oldest
votes
$begingroup$
You can use the DataFrame.fillna
function to fill the NaN
values in your data. For example, assuming your data is in a DataFrame called df
,
df.fillna(0, inplace=True)
will replace the missing values with the constant value 0
. You can also do more clever things, such as replacing the missing values with the mean of that column:
df.fillna(df.mean(), inplace=True)
or take the last value seen for a column:
df.fillna(method='ffill', inplace=True)
Filling the NaN
values is called imputation. Try a range of different imputation methods and see which ones work best for your data.
$endgroup$
$begingroup$
Thanks for the response. The dataset also consists of string values. I thinkdf.fillna()
will work on float or integer values. Any pointers on converting string values to numeric values?
$endgroup$
– Ahmed Dhanani
Dec 26 '16 at 13:07
1
$begingroup$
Ah, I had assumed the data was numeric for some reason. By string values, do you mean categorical data i.e. strings from a particular set of values? Then, you can use scikit-learn's LabelEncoder. Natural language, on the other hand, is more difficult to deal with. Bag-of-words is probably the easiest to think about, but have a look at these options.
$endgroup$
– timleathart
Dec 26 '16 at 22:01
add a comment |
$begingroup$
~ # Taking care of missing data
~ from sklearn.preprocessing import Imputer
~ imputer = Imputer(missing_values = 'NaN', strategy = 'mean', axis = 0)
~ imputer = imputer.fit(X[:, 1:3])
~ X[:, 1:3] = imputer.transform(X[:, 1:3])
suppose the name of my array is X and I want to take care of missing data in columns indexed 1 and 2 by replacing it with mean. Imputer is a great class to do this from sklearn library
New contributor
$endgroup$
add a comment |
Your Answer
StackExchange.ifUsing("editor", function () {
return StackExchange.using("mathjaxEditing", function () {
StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix) {
StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\\(","\\)"]]);
});
});
}, "mathjax-editing");
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "557"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f15924%2fhow-can-i-fill-nan-values-in-a-pandas-data-frame%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
2 Answers
2
active
oldest
votes
2 Answers
2
active
oldest
votes
active
oldest
votes
active
oldest
votes
$begingroup$
You can use the DataFrame.fillna
function to fill the NaN
values in your data. For example, assuming your data is in a DataFrame called df
,
df.fillna(0, inplace=True)
will replace the missing values with the constant value 0
. You can also do more clever things, such as replacing the missing values with the mean of that column:
df.fillna(df.mean(), inplace=True)
or take the last value seen for a column:
df.fillna(method='ffill', inplace=True)
Filling the NaN
values is called imputation. Try a range of different imputation methods and see which ones work best for your data.
$endgroup$
$begingroup$
Thanks for the response. The dataset also consists of string values. I thinkdf.fillna()
will work on float or integer values. Any pointers on converting string values to numeric values?
$endgroup$
– Ahmed Dhanani
Dec 26 '16 at 13:07
1
$begingroup$
Ah, I had assumed the data was numeric for some reason. By string values, do you mean categorical data i.e. strings from a particular set of values? Then, you can use scikit-learn's LabelEncoder. Natural language, on the other hand, is more difficult to deal with. Bag-of-words is probably the easiest to think about, but have a look at these options.
$endgroup$
– timleathart
Dec 26 '16 at 22:01
add a comment |
$begingroup$
You can use the DataFrame.fillna
function to fill the NaN
values in your data. For example, assuming your data is in a DataFrame called df
,
df.fillna(0, inplace=True)
will replace the missing values with the constant value 0
. You can also do more clever things, such as replacing the missing values with the mean of that column:
df.fillna(df.mean(), inplace=True)
or take the last value seen for a column:
df.fillna(method='ffill', inplace=True)
Filling the NaN
values is called imputation. Try a range of different imputation methods and see which ones work best for your data.
$endgroup$
$begingroup$
Thanks for the response. The dataset also consists of string values. I thinkdf.fillna()
will work on float or integer values. Any pointers on converting string values to numeric values?
$endgroup$
– Ahmed Dhanani
Dec 26 '16 at 13:07
1
$begingroup$
Ah, I had assumed the data was numeric for some reason. By string values, do you mean categorical data i.e. strings from a particular set of values? Then, you can use scikit-learn's LabelEncoder. Natural language, on the other hand, is more difficult to deal with. Bag-of-words is probably the easiest to think about, but have a look at these options.
$endgroup$
– timleathart
Dec 26 '16 at 22:01
add a comment |
$begingroup$
You can use the DataFrame.fillna
function to fill the NaN
values in your data. For example, assuming your data is in a DataFrame called df
,
df.fillna(0, inplace=True)
will replace the missing values with the constant value 0
. You can also do more clever things, such as replacing the missing values with the mean of that column:
df.fillna(df.mean(), inplace=True)
or take the last value seen for a column:
df.fillna(method='ffill', inplace=True)
Filling the NaN
values is called imputation. Try a range of different imputation methods and see which ones work best for your data.
$endgroup$
You can use the DataFrame.fillna
function to fill the NaN
values in your data. For example, assuming your data is in a DataFrame called df
,
df.fillna(0, inplace=True)
will replace the missing values with the constant value 0
. You can also do more clever things, such as replacing the missing values with the mean of that column:
df.fillna(df.mean(), inplace=True)
or take the last value seen for a column:
df.fillna(method='ffill', inplace=True)
Filling the NaN
values is called imputation. Try a range of different imputation methods and see which ones work best for your data.
answered Dec 26 '16 at 0:06
timleatharttimleathart
2,139726
2,139726
$begingroup$
Thanks for the response. The dataset also consists of string values. I thinkdf.fillna()
will work on float or integer values. Any pointers on converting string values to numeric values?
$endgroup$
– Ahmed Dhanani
Dec 26 '16 at 13:07
1
$begingroup$
Ah, I had assumed the data was numeric for some reason. By string values, do you mean categorical data i.e. strings from a particular set of values? Then, you can use scikit-learn's LabelEncoder. Natural language, on the other hand, is more difficult to deal with. Bag-of-words is probably the easiest to think about, but have a look at these options.
$endgroup$
– timleathart
Dec 26 '16 at 22:01
add a comment |
$begingroup$
Thanks for the response. The dataset also consists of string values. I thinkdf.fillna()
will work on float or integer values. Any pointers on converting string values to numeric values?
$endgroup$
– Ahmed Dhanani
Dec 26 '16 at 13:07
1
$begingroup$
Ah, I had assumed the data was numeric for some reason. By string values, do you mean categorical data i.e. strings from a particular set of values? Then, you can use scikit-learn's LabelEncoder. Natural language, on the other hand, is more difficult to deal with. Bag-of-words is probably the easiest to think about, but have a look at these options.
$endgroup$
– timleathart
Dec 26 '16 at 22:01
$begingroup$
Thanks for the response. The dataset also consists of string values. I think
df.fillna()
will work on float or integer values. Any pointers on converting string values to numeric values?$endgroup$
– Ahmed Dhanani
Dec 26 '16 at 13:07
$begingroup$
Thanks for the response. The dataset also consists of string values. I think
df.fillna()
will work on float or integer values. Any pointers on converting string values to numeric values?$endgroup$
– Ahmed Dhanani
Dec 26 '16 at 13:07
1
1
$begingroup$
Ah, I had assumed the data was numeric for some reason. By string values, do you mean categorical data i.e. strings from a particular set of values? Then, you can use scikit-learn's LabelEncoder. Natural language, on the other hand, is more difficult to deal with. Bag-of-words is probably the easiest to think about, but have a look at these options.
$endgroup$
– timleathart
Dec 26 '16 at 22:01
$begingroup$
Ah, I had assumed the data was numeric for some reason. By string values, do you mean categorical data i.e. strings from a particular set of values? Then, you can use scikit-learn's LabelEncoder. Natural language, on the other hand, is more difficult to deal with. Bag-of-words is probably the easiest to think about, but have a look at these options.
$endgroup$
– timleathart
Dec 26 '16 at 22:01
add a comment |
$begingroup$
~ # Taking care of missing data
~ from sklearn.preprocessing import Imputer
~ imputer = Imputer(missing_values = 'NaN', strategy = 'mean', axis = 0)
~ imputer = imputer.fit(X[:, 1:3])
~ X[:, 1:3] = imputer.transform(X[:, 1:3])
suppose the name of my array is X and I want to take care of missing data in columns indexed 1 and 2 by replacing it with mean. Imputer is a great class to do this from sklearn library
New contributor
$endgroup$
add a comment |
$begingroup$
~ # Taking care of missing data
~ from sklearn.preprocessing import Imputer
~ imputer = Imputer(missing_values = 'NaN', strategy = 'mean', axis = 0)
~ imputer = imputer.fit(X[:, 1:3])
~ X[:, 1:3] = imputer.transform(X[:, 1:3])
suppose the name of my array is X and I want to take care of missing data in columns indexed 1 and 2 by replacing it with mean. Imputer is a great class to do this from sklearn library
New contributor
$endgroup$
add a comment |
$begingroup$
~ # Taking care of missing data
~ from sklearn.preprocessing import Imputer
~ imputer = Imputer(missing_values = 'NaN', strategy = 'mean', axis = 0)
~ imputer = imputer.fit(X[:, 1:3])
~ X[:, 1:3] = imputer.transform(X[:, 1:3])
suppose the name of my array is X and I want to take care of missing data in columns indexed 1 and 2 by replacing it with mean. Imputer is a great class to do this from sklearn library
New contributor
$endgroup$
~ # Taking care of missing data
~ from sklearn.preprocessing import Imputer
~ imputer = Imputer(missing_values = 'NaN', strategy = 'mean', axis = 0)
~ imputer = imputer.fit(X[:, 1:3])
~ X[:, 1:3] = imputer.transform(X[:, 1:3])
suppose the name of my array is X and I want to take care of missing data in columns indexed 1 and 2 by replacing it with mean. Imputer is a great class to do this from sklearn library
New contributor
New contributor
answered 45 mins ago
smit patelsmit patel
11
11
New contributor
New contributor
add a comment |
add a comment |
Thanks for contributing an answer to Data Science Stack Exchange!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
Use MathJax to format equations. MathJax reference.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f15924%2fhow-can-i-fill-nan-values-in-a-pandas-data-frame%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown