Numpy array from pandas dataframe
$begingroup$
I am new in using python for data science.
What is the difference between selecting a a column with: df['name'].values
and df.iloc[:,1].values
and df.iloc[:,1:2].values
they return differnt types of numpy vectors. why?
python dataset pandas numpy dataframe
$endgroup$
add a comment |
$begingroup$
I am new in using python for data science.
What is the difference between selecting a a column with: df['name'].values
and df.iloc[:,1].values
and df.iloc[:,1:2].values
they return differnt types of numpy vectors. why?
python dataset pandas numpy dataframe
$endgroup$
2
$begingroup$
assuming 'name' is the second column. they should be identical. please includes some example codes, so we can help you.
$endgroup$
– Louis T
5 hours ago
add a comment |
$begingroup$
I am new in using python for data science.
What is the difference between selecting a a column with: df['name'].values
and df.iloc[:,1].values
and df.iloc[:,1:2].values
they return differnt types of numpy vectors. why?
python dataset pandas numpy dataframe
$endgroup$
I am new in using python for data science.
What is the difference between selecting a a column with: df['name'].values
and df.iloc[:,1].values
and df.iloc[:,1:2].values
they return differnt types of numpy vectors. why?
python dataset pandas numpy dataframe
python dataset pandas numpy dataframe
asked 14 hours ago
3nomis3nomis
1186
1186
2
$begingroup$
assuming 'name' is the second column. they should be identical. please includes some example codes, so we can help you.
$endgroup$
– Louis T
5 hours ago
add a comment |
2
$begingroup$
assuming 'name' is the second column. they should be identical. please includes some example codes, so we can help you.
$endgroup$
– Louis T
5 hours ago
2
2
$begingroup$
assuming 'name' is the second column. they should be identical. please includes some example codes, so we can help you.
$endgroup$
– Louis T
5 hours ago
$begingroup$
assuming 'name' is the second column. they should be identical. please includes some example codes, so we can help you.
$endgroup$
– Louis T
5 hours ago
add a comment |
2 Answers
2
active
oldest
votes
$begingroup$
Assuming 'name' is the second column. they should be identical. Pandas use the 0-based index. So the first element is index 0. That might the 'gotcha' bit here.
$endgroup$
add a comment |
$begingroup$
Not entirely sure what you mean by "numpy vectors" but am assuming the question is why each of these methods return essentially (almost but not quite) the same output...
Reference: pandas docs: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html
df['name'].values
is a "Series corresponding to colname". In other words, you're just calling the data from that column and putting the in an array by calling .values
.
.iloc
is a "Purely integer-location based indexing for selection by position". Same as above but you're calling the indexed location of the column where df.iloc[:, 1]
is df.iloc[all rows, col 2]
. Probably an easier method to call multiple consecutive columns in a DataFrame then writing out each individual column name.
df.iloc[:,1:2].values
<-- creates an array of arrays where the main array is the column that you called (col2) and each row values is contained in a subarray. This is--I think-- because you're slicing the dataframe between column index locations 1 and 2 (rather than just calling loc 1 like above). This would mean that each row is being called individually so that a new array is created for each row that exists between column index locations 1 and 2 (which is the 'parent' array).
$endgroup$
add a comment |
Your Answer
StackExchange.ifUsing("editor", function () {
return StackExchange.using("mathjaxEditing", function () {
StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix) {
StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\\(","\\)"]]);
});
});
}, "mathjax-editing");
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "557"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f45311%2fnumpy-array-from-pandas-dataframe%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
2 Answers
2
active
oldest
votes
2 Answers
2
active
oldest
votes
active
oldest
votes
active
oldest
votes
$begingroup$
Assuming 'name' is the second column. they should be identical. Pandas use the 0-based index. So the first element is index 0. That might the 'gotcha' bit here.
$endgroup$
add a comment |
$begingroup$
Assuming 'name' is the second column. they should be identical. Pandas use the 0-based index. So the first element is index 0. That might the 'gotcha' bit here.
$endgroup$
add a comment |
$begingroup$
Assuming 'name' is the second column. they should be identical. Pandas use the 0-based index. So the first element is index 0. That might the 'gotcha' bit here.
$endgroup$
Assuming 'name' is the second column. they should be identical. Pandas use the 0-based index. So the first element is index 0. That might the 'gotcha' bit here.
answered 5 hours ago
Louis TLouis T
668219
668219
add a comment |
add a comment |
$begingroup$
Not entirely sure what you mean by "numpy vectors" but am assuming the question is why each of these methods return essentially (almost but not quite) the same output...
Reference: pandas docs: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html
df['name'].values
is a "Series corresponding to colname". In other words, you're just calling the data from that column and putting the in an array by calling .values
.
.iloc
is a "Purely integer-location based indexing for selection by position". Same as above but you're calling the indexed location of the column where df.iloc[:, 1]
is df.iloc[all rows, col 2]
. Probably an easier method to call multiple consecutive columns in a DataFrame then writing out each individual column name.
df.iloc[:,1:2].values
<-- creates an array of arrays where the main array is the column that you called (col2) and each row values is contained in a subarray. This is--I think-- because you're slicing the dataframe between column index locations 1 and 2 (rather than just calling loc 1 like above). This would mean that each row is being called individually so that a new array is created for each row that exists between column index locations 1 and 2 (which is the 'parent' array).
$endgroup$
add a comment |
$begingroup$
Not entirely sure what you mean by "numpy vectors" but am assuming the question is why each of these methods return essentially (almost but not quite) the same output...
Reference: pandas docs: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html
df['name'].values
is a "Series corresponding to colname". In other words, you're just calling the data from that column and putting the in an array by calling .values
.
.iloc
is a "Purely integer-location based indexing for selection by position". Same as above but you're calling the indexed location of the column where df.iloc[:, 1]
is df.iloc[all rows, col 2]
. Probably an easier method to call multiple consecutive columns in a DataFrame then writing out each individual column name.
df.iloc[:,1:2].values
<-- creates an array of arrays where the main array is the column that you called (col2) and each row values is contained in a subarray. This is--I think-- because you're slicing the dataframe between column index locations 1 and 2 (rather than just calling loc 1 like above). This would mean that each row is being called individually so that a new array is created for each row that exists between column index locations 1 and 2 (which is the 'parent' array).
$endgroup$
add a comment |
$begingroup$
Not entirely sure what you mean by "numpy vectors" but am assuming the question is why each of these methods return essentially (almost but not quite) the same output...
Reference: pandas docs: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html
df['name'].values
is a "Series corresponding to colname". In other words, you're just calling the data from that column and putting the in an array by calling .values
.
.iloc
is a "Purely integer-location based indexing for selection by position". Same as above but you're calling the indexed location of the column where df.iloc[:, 1]
is df.iloc[all rows, col 2]
. Probably an easier method to call multiple consecutive columns in a DataFrame then writing out each individual column name.
df.iloc[:,1:2].values
<-- creates an array of arrays where the main array is the column that you called (col2) and each row values is contained in a subarray. This is--I think-- because you're slicing the dataframe between column index locations 1 and 2 (rather than just calling loc 1 like above). This would mean that each row is being called individually so that a new array is created for each row that exists between column index locations 1 and 2 (which is the 'parent' array).
$endgroup$
Not entirely sure what you mean by "numpy vectors" but am assuming the question is why each of these methods return essentially (almost but not quite) the same output...
Reference: pandas docs: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html
df['name'].values
is a "Series corresponding to colname". In other words, you're just calling the data from that column and putting the in an array by calling .values
.
.iloc
is a "Purely integer-location based indexing for selection by position". Same as above but you're calling the indexed location of the column where df.iloc[:, 1]
is df.iloc[all rows, col 2]
. Probably an easier method to call multiple consecutive columns in a DataFrame then writing out each individual column name.
df.iloc[:,1:2].values
<-- creates an array of arrays where the main array is the column that you called (col2) and each row values is contained in a subarray. This is--I think-- because you're slicing the dataframe between column index locations 1 and 2 (rather than just calling loc 1 like above). This would mean that each row is being called individually so that a new array is created for each row that exists between column index locations 1 and 2 (which is the 'parent' array).
answered 3 hours ago
Cat C.Cat C.
11
11
add a comment |
add a comment |
Thanks for contributing an answer to Data Science Stack Exchange!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
Use MathJax to format equations. MathJax reference.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f45311%2fnumpy-array-from-pandas-dataframe%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
2
$begingroup$
assuming 'name' is the second column. they should be identical. please includes some example codes, so we can help you.
$endgroup$
– Louis T
5 hours ago