Why are pandas/R/etc so common in a data scientist or analyst's workflow when one can resort to UDFs in an...

It seems common for an analyst to have this workflow when using an RDBMS: use SQL to get a subset of data from the database, export it or use a connector, and then apply a data mining / model algorithm on it (e.g. kNN, regression models, etc) using something like Pandas, R or Matlab. In other words one operates within the DBMS and then operates outside of it.

One can use User-defined Functions (UDF) to implement the algorithm of choice in many languages (build a UDF in C using e.g. PostgreSQL) and remain operational within the DBMS environment, i.e. one does not give up the benefits obtained of operating inside of a DBMS. Furthermore, one can seamlessly integrate the UDF with the SQL construct, which is very powerful if one requires to perform subsequent SQL operations on the data which passed through a model.

I was wondering what is the reason behind operating outside of the DBMS given that such functionality is available and that the DBMS is, ironically, good at managing data (supported by many years of research).

There are other discussions which seem to conclude that SQL is for preprocessing and Pandas is for data analysis; to clarify I am building on this conclusion: why did things end up this way, when the DBMS is very well-suited for data analysis as well?

edited yesterday

asked yesterday

Zeruno

1062

New contributor

$begingroup$
Possible duplicate of Why do people prefer Pandas to SQL?
$endgroup$
– Simon Larsson
yesterday

$begingroup$
That discussion, while relevant, is not technically the same. Here we are discussing why UDF's don't seem to cut it for the data scientist.
$endgroup$
– Zeruno
yesterday

$begingroup$
I agree that it is not necessarily a duplicate. Just wanted to bring it in here since I think it has quite a bit of overlap and some good answers.
$endgroup$
– Simon Larsson
yesterday

add a comment |

edited yesterday

asked yesterday

Zeruno

1062

New contributor

$begingroup$
Possible duplicate of Why do people prefer Pandas to SQL?
$endgroup$
– Simon Larsson
yesterday

$begingroup$
That discussion, while relevant, is not technically the same. Here we are discussing why UDF's don't seem to cut it for the data scientist.
$endgroup$
– Zeruno
yesterday

$begingroup$
I agree that it is not necessarily a duplicate. Just wanted to bring it in here since I think it has quite a bit of overlap and some good answers.
$endgroup$
– Simon Larsson
yesterday

add a comment |

edited yesterday

asked yesterday

Zeruno

1062

New contributor

r data-mining pandas matlab databases

edited yesterday

asked yesterday

Zeruno

1062

New contributor

edited yesterday

asked yesterday

Zeruno

1062

New contributor

edited yesterday

asked yesterday

Zeruno

1062

New contributor

asked yesterday

Zeruno

1062

asked yesterday

Zeruno

1062

New contributor

Zeruno is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.

$begingroup$
Possible duplicate of Why do people prefer Pandas to SQL?
$endgroup$
– Simon Larsson
yesterday

$begingroup$
That discussion, while relevant, is not technically the same. Here we are discussing why UDF's don't seem to cut it for the data scientist.
$endgroup$
– Zeruno
yesterday

$begingroup$
I agree that it is not necessarily a duplicate. Just wanted to bring it in here since I think it has quite a bit of overlap and some good answers.
$endgroup$
– Simon Larsson
yesterday

add a comment |

$begingroup$
Possible duplicate of Why do people prefer Pandas to SQL?
$endgroup$
– Simon Larsson
yesterday

$begingroup$
That discussion, while relevant, is not technically the same. Here we are discussing why UDF's don't seem to cut it for the data scientist.
$endgroup$
– Zeruno
yesterday

$begingroup$
I agree that it is not necessarily a duplicate. Just wanted to bring it in here since I think it has quite a bit of overlap and some good answers.
$endgroup$
– Simon Larsson
yesterday

Possible duplicate of Why do people prefer Pandas to SQL?

– Simon Larsson
yesterday

That discussion, while relevant, is not technically the same. Here we are discussing why UDF's don't seem to cut it for the data scientist.

– Zeruno
yesterday

I agree that it is not necessarily a duplicate. Just wanted to bring it in here since I think it has quite a bit of overlap and some good answers.

– Simon Larsson
yesterday

add a comment |

0

active

oldest

votes

Your Answer

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "557"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});

}
});

Zeruno is a new contributor. Be nice, and check out our Code of Conduct.

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f49148%2fwhy-are-pandas-r-etc-so-common-in-a-data-scientist-or-analysts-workflow-when-on%23new-answer', 'question_page');
}
);

Post as a guest

Name

Required, but never shown

0

active

oldest

votes

0

active

oldest

votes

Zeruno is a new contributor. Be nice, and check out our Code of Conduct.

draft saved

draft discarded

Zeruno is a new contributor. Be nice, and check out our Code of Conduct.

Thanks for contributing an answer to Data Science Stack Exchange!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

Use MathJax to format equations. MathJax reference.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Htydjtk