Accuracy differs between MATLAB and scikit-learn for a decision tree
$begingroup$
Is there any possibility to vary the accuracy of same data set in matlab and jupyter notebook by using python code ?
For same data set, at first I applied it in matlab and get 96% accuracy for decision tree method, then I apply that same data set in jupyter notebook by using python code where I get 53% accuracy for C4.5 (decision tree) by using k-fold cross validation.
I didn't understand where's the problem for getting different accuracy for same dataset and same method.
My procedure in python code is given below:
import pandas as pd
import numpy as np
from sklearn import tree
from sklearn.model_selection import KFold
train=pd.read_csv('E://New.csv')
train.head()

# define X and y
feature_cols = ['Past','Family_History','Current','current or previous
workplace','diagnosed with a mental health condition by a
medical professional?','do you feel that it interferes with
your work when being treated effectively?','Gender']
X = train[feature_cols]
# y is a vector, hence we use dot to access 'label'
y = train['Diagonised condition']
kfold = KFold(n_splits=10,random_state=None)
model = tree.DecisionTreeClassifier(criterion='gini')
results = cross_val_score(model, X, y, cv=kfold,scoring = 'accuracy')
result = results.mean()*100
std = results.std()*100
print (result)

python scikit-learn decision-trees accuracy matlab
$endgroup$
add a comment |
$begingroup$
Is there any possibility to vary the accuracy of same data set in matlab and jupyter notebook by using python code ?
For same data set, at first I applied it in matlab and get 96% accuracy for decision tree method, then I apply that same data set in jupyter notebook by using python code where I get 53% accuracy for C4.5 (decision tree) by using k-fold cross validation.
I didn't understand where's the problem for getting different accuracy for same dataset and same method.
My procedure in python code is given below:
import pandas as pd
import numpy as np
from sklearn import tree
from sklearn.model_selection import KFold
train=pd.read_csv('E://New.csv')
train.head()

# define X and y
feature_cols = ['Past','Family_History','Current','current or previous
workplace','diagnosed with a mental health condition by a
medical professional?','do you feel that it interferes with
your work when being treated effectively?','Gender']
X = train[feature_cols]
# y is a vector, hence we use dot to access 'label'
y = train['Diagonised condition']
kfold = KFold(n_splits=10,random_state=None)
model = tree.DecisionTreeClassifier(criterion='gini')
results = cross_val_score(model, X, y, cv=kfold,scoring = 'accuracy')
result = results.mean()*100
std = results.std()*100
print (result)

python scikit-learn decision-trees accuracy matlab
$endgroup$
$begingroup$
Please post the MATLAB code so it can be compared to the Python code.
$endgroup$
– Brian Spiering
Jan 23 at 16:12
$begingroup$
In matlab I use classification app (decision tree) and load my data set then calculate accuracy.
$endgroup$
– IS2057
Jan 23 at 18:05
$begingroup$
Are you sure that all other parameters for your decision tree are the same?
$endgroup$
– Majid Mortazavi
Jan 24 at 6:23
$begingroup$
@MajidMortazavi, Yes I am sure . I use the same dataset and same parameters.
$endgroup$
– IS2057
Jan 24 at 6:49
add a comment |
$begingroup$
Is there any possibility to vary the accuracy of same data set in matlab and jupyter notebook by using python code ?
For same data set, at first I applied it in matlab and get 96% accuracy for decision tree method, then I apply that same data set in jupyter notebook by using python code where I get 53% accuracy for C4.5 (decision tree) by using k-fold cross validation.
I didn't understand where's the problem for getting different accuracy for same dataset and same method.
My procedure in python code is given below:
import pandas as pd
import numpy as np
from sklearn import tree
from sklearn.model_selection import KFold
train=pd.read_csv('E://New.csv')
train.head()

# define X and y
feature_cols = ['Past','Family_History','Current','current or previous
workplace','diagnosed with a mental health condition by a
medical professional?','do you feel that it interferes with
your work when being treated effectively?','Gender']
X = train[feature_cols]
# y is a vector, hence we use dot to access 'label'
y = train['Diagonised condition']
kfold = KFold(n_splits=10,random_state=None)
model = tree.DecisionTreeClassifier(criterion='gini')
results = cross_val_score(model, X, y, cv=kfold,scoring = 'accuracy')
result = results.mean()*100
std = results.std()*100
print (result)

python scikit-learn decision-trees accuracy matlab
$endgroup$
Is there any possibility to vary the accuracy of same data set in matlab and jupyter notebook by using python code ?
For same data set, at first I applied it in matlab and get 96% accuracy for decision tree method, then I apply that same data set in jupyter notebook by using python code where I get 53% accuracy for C4.5 (decision tree) by using k-fold cross validation.
I didn't understand where's the problem for getting different accuracy for same dataset and same method.
My procedure in python code is given below:
import pandas as pd
import numpy as np
from sklearn import tree
from sklearn.model_selection import KFold
train=pd.read_csv('E://New.csv')
train.head()

# define X and y
feature_cols = ['Past','Family_History','Current','current or previous
workplace','diagnosed with a mental health condition by a
medical professional?','do you feel that it interferes with
your work when being treated effectively?','Gender']
X = train[feature_cols]
# y is a vector, hence we use dot to access 'label'
y = train['Diagonised condition']
kfold = KFold(n_splits=10,random_state=None)
model = tree.DecisionTreeClassifier(criterion='gini')
results = cross_val_score(model, X, y, cv=kfold,scoring = 'accuracy')
result = results.mean()*100
std = results.std()*100
print (result)

python scikit-learn decision-trees accuracy matlab
python scikit-learn decision-trees accuracy matlab
edited 17 mins ago
Brian Spiering
3,5531028
3,5531028
asked Jan 23 at 15:37
IS2057IS2057
1021317
1021317
$begingroup$
Please post the MATLAB code so it can be compared to the Python code.
$endgroup$
– Brian Spiering
Jan 23 at 16:12
$begingroup$
In matlab I use classification app (decision tree) and load my data set then calculate accuracy.
$endgroup$
– IS2057
Jan 23 at 18:05
$begingroup$
Are you sure that all other parameters for your decision tree are the same?
$endgroup$
– Majid Mortazavi
Jan 24 at 6:23
$begingroup$
@MajidMortazavi, Yes I am sure . I use the same dataset and same parameters.
$endgroup$
– IS2057
Jan 24 at 6:49
add a comment |
$begingroup$
Please post the MATLAB code so it can be compared to the Python code.
$endgroup$
– Brian Spiering
Jan 23 at 16:12
$begingroup$
In matlab I use classification app (decision tree) and load my data set then calculate accuracy.
$endgroup$
– IS2057
Jan 23 at 18:05
$begingroup$
Are you sure that all other parameters for your decision tree are the same?
$endgroup$
– Majid Mortazavi
Jan 24 at 6:23
$begingroup$
@MajidMortazavi, Yes I am sure . I use the same dataset and same parameters.
$endgroup$
– IS2057
Jan 24 at 6:49
$begingroup$
Please post the MATLAB code so it can be compared to the Python code.
$endgroup$
– Brian Spiering
Jan 23 at 16:12
$begingroup$
Please post the MATLAB code so it can be compared to the Python code.
$endgroup$
– Brian Spiering
Jan 23 at 16:12
$begingroup$
In matlab I use classification app (decision tree) and load my data set then calculate accuracy.
$endgroup$
– IS2057
Jan 23 at 18:05
$begingroup$
In matlab I use classification app (decision tree) and load my data set then calculate accuracy.
$endgroup$
– IS2057
Jan 23 at 18:05
$begingroup$
Are you sure that all other parameters for your decision tree are the same?
$endgroup$
– Majid Mortazavi
Jan 24 at 6:23
$begingroup$
Are you sure that all other parameters for your decision tree are the same?
$endgroup$
– Majid Mortazavi
Jan 24 at 6:23
$begingroup$
@MajidMortazavi, Yes I am sure . I use the same dataset and same parameters.
$endgroup$
– IS2057
Jan 24 at 6:49
$begingroup$
@MajidMortazavi, Yes I am sure . I use the same dataset and same parameters.
$endgroup$
– IS2057
Jan 24 at 6:49
add a comment |
1 Answer
1
active
oldest
votes
$begingroup$
It is hard to make a direct comparison between a white box implementation (scikit-learn) and a black box implementation (MATLAB).
One guess they are using different algorithms. scikit-learn uses an optimized version of the CART algorithm. Maybe MATLAB uses ID3, C4.5, or something else. Another guess two implementations are using different hyperparameters (e.g., different splitting criteria, max depth, minimum node size, ...).
Since decision trees are white-box models, you can examine their internal structure. Plot both trained trees. See how they each are making the splits and how many splits are being made.
$endgroup$
$begingroup$
Yes, I understand it. Thanks for answering.
$endgroup$
– IS2057
8 hours ago
add a comment |
Your Answer
StackExchange.ifUsing("editor", function () {
return StackExchange.using("mathjaxEditing", function () {
StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix) {
StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\\(","\\)"]]);
});
});
}, "mathjax-editing");
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "557"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f44450%2faccuracy-differs-between-matlab-and-scikit-learn-for-a-decision-tree%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
$begingroup$
It is hard to make a direct comparison between a white box implementation (scikit-learn) and a black box implementation (MATLAB).
One guess they are using different algorithms. scikit-learn uses an optimized version of the CART algorithm. Maybe MATLAB uses ID3, C4.5, or something else. Another guess two implementations are using different hyperparameters (e.g., different splitting criteria, max depth, minimum node size, ...).
Since decision trees are white-box models, you can examine their internal structure. Plot both trained trees. See how they each are making the splits and how many splits are being made.
$endgroup$
$begingroup$
Yes, I understand it. Thanks for answering.
$endgroup$
– IS2057
8 hours ago
add a comment |
$begingroup$
It is hard to make a direct comparison between a white box implementation (scikit-learn) and a black box implementation (MATLAB).
One guess they are using different algorithms. scikit-learn uses an optimized version of the CART algorithm. Maybe MATLAB uses ID3, C4.5, or something else. Another guess two implementations are using different hyperparameters (e.g., different splitting criteria, max depth, minimum node size, ...).
Since decision trees are white-box models, you can examine their internal structure. Plot both trained trees. See how they each are making the splits and how many splits are being made.
$endgroup$
$begingroup$
Yes, I understand it. Thanks for answering.
$endgroup$
– IS2057
8 hours ago
add a comment |
$begingroup$
It is hard to make a direct comparison between a white box implementation (scikit-learn) and a black box implementation (MATLAB).
One guess they are using different algorithms. scikit-learn uses an optimized version of the CART algorithm. Maybe MATLAB uses ID3, C4.5, or something else. Another guess two implementations are using different hyperparameters (e.g., different splitting criteria, max depth, minimum node size, ...).
Since decision trees are white-box models, you can examine their internal structure. Plot both trained trees. See how they each are making the splits and how many splits are being made.
$endgroup$
It is hard to make a direct comparison between a white box implementation (scikit-learn) and a black box implementation (MATLAB).
One guess they are using different algorithms. scikit-learn uses an optimized version of the CART algorithm. Maybe MATLAB uses ID3, C4.5, or something else. Another guess two implementations are using different hyperparameters (e.g., different splitting criteria, max depth, minimum node size, ...).
Since decision trees are white-box models, you can examine their internal structure. Plot both trained trees. See how they each are making the splits and how many splits are being made.
edited 2 days ago
answered Jan 24 at 18:17
Brian SpieringBrian Spiering
3,5531028
3,5531028
$begingroup$
Yes, I understand it. Thanks for answering.
$endgroup$
– IS2057
8 hours ago
add a comment |
$begingroup$
Yes, I understand it. Thanks for answering.
$endgroup$
– IS2057
8 hours ago
$begingroup$
Yes, I understand it. Thanks for answering.
$endgroup$
– IS2057
8 hours ago
$begingroup$
Yes, I understand it. Thanks for answering.
$endgroup$
– IS2057
8 hours ago
add a comment |
Thanks for contributing an answer to Data Science Stack Exchange!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
Use MathJax to format equations. MathJax reference.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f44450%2faccuracy-differs-between-matlab-and-scikit-learn-for-a-decision-tree%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
$begingroup$
Please post the MATLAB code so it can be compared to the Python code.
$endgroup$
– Brian Spiering
Jan 23 at 16:12
$begingroup$
In matlab I use classification app (decision tree) and load my data set then calculate accuracy.
$endgroup$
– IS2057
Jan 23 at 18:05
$begingroup$
Are you sure that all other parameters for your decision tree are the same?
$endgroup$
– Majid Mortazavi
Jan 24 at 6:23
$begingroup$
@MajidMortazavi, Yes I am sure . I use the same dataset and same parameters.
$endgroup$
– IS2057
Jan 24 at 6:49