Accuracy differs between MATLAB and scikit-learn for a decision tree












0












$begingroup$


Is there any possibility to vary the accuracy of same data set in matlab and jupyter notebook by using python code ?



For same data set, at first I applied it in matlab and get 96% accuracy for decision tree method, then I apply that same data set in jupyter notebook by using python code where I get 53% accuracy for C4.5 (decision tree) by using k-fold cross validation.



I didn't understand where's the problem for getting different accuracy for same dataset and same method.



My procedure in python code is given below:



import pandas as pd
import numpy as np
from sklearn import tree
from sklearn.model_selection import KFold

train=pd.read_csv('E://New.csv')
train.head()


enter image description here



# define X and y
feature_cols = ['Past','Family_History','Current','current or previous
workplace','diagnosed with a mental health condition by a
medical professional?','do you feel that it interferes with
your work when being treated effectively?','Gender']
X = train[feature_cols]

# y is a vector, hence we use dot to access 'label'
y = train['Diagonised condition']

kfold = KFold(n_splits=10,random_state=None)
model = tree.DecisionTreeClassifier(criterion='gini')

results = cross_val_score(model, X, y, cv=kfold,scoring = 'accuracy')
result = results.mean()*100

std = results.std()*100
print (result)


enter image description here










share|improve this question











$endgroup$












  • $begingroup$
    Please post the MATLAB code so it can be compared to the Python code.
    $endgroup$
    – Brian Spiering
    Jan 23 at 16:12










  • $begingroup$
    In matlab I use classification app (decision tree) and load my data set then calculate accuracy.
    $endgroup$
    – IS2057
    Jan 23 at 18:05










  • $begingroup$
    Are you sure that all other parameters for your decision tree are the same?
    $endgroup$
    – Majid Mortazavi
    Jan 24 at 6:23










  • $begingroup$
    @MajidMortazavi, Yes I am sure . I use the same dataset and same parameters.
    $endgroup$
    – IS2057
    Jan 24 at 6:49
















0












$begingroup$


Is there any possibility to vary the accuracy of same data set in matlab and jupyter notebook by using python code ?



For same data set, at first I applied it in matlab and get 96% accuracy for decision tree method, then I apply that same data set in jupyter notebook by using python code where I get 53% accuracy for C4.5 (decision tree) by using k-fold cross validation.



I didn't understand where's the problem for getting different accuracy for same dataset and same method.



My procedure in python code is given below:



import pandas as pd
import numpy as np
from sklearn import tree
from sklearn.model_selection import KFold

train=pd.read_csv('E://New.csv')
train.head()


enter image description here



# define X and y
feature_cols = ['Past','Family_History','Current','current or previous
workplace','diagnosed with a mental health condition by a
medical professional?','do you feel that it interferes with
your work when being treated effectively?','Gender']
X = train[feature_cols]

# y is a vector, hence we use dot to access 'label'
y = train['Diagonised condition']

kfold = KFold(n_splits=10,random_state=None)
model = tree.DecisionTreeClassifier(criterion='gini')

results = cross_val_score(model, X, y, cv=kfold,scoring = 'accuracy')
result = results.mean()*100

std = results.std()*100
print (result)


enter image description here










share|improve this question











$endgroup$












  • $begingroup$
    Please post the MATLAB code so it can be compared to the Python code.
    $endgroup$
    – Brian Spiering
    Jan 23 at 16:12










  • $begingroup$
    In matlab I use classification app (decision tree) and load my data set then calculate accuracy.
    $endgroup$
    – IS2057
    Jan 23 at 18:05










  • $begingroup$
    Are you sure that all other parameters for your decision tree are the same?
    $endgroup$
    – Majid Mortazavi
    Jan 24 at 6:23










  • $begingroup$
    @MajidMortazavi, Yes I am sure . I use the same dataset and same parameters.
    $endgroup$
    – IS2057
    Jan 24 at 6:49














0












0








0





$begingroup$


Is there any possibility to vary the accuracy of same data set in matlab and jupyter notebook by using python code ?



For same data set, at first I applied it in matlab and get 96% accuracy for decision tree method, then I apply that same data set in jupyter notebook by using python code where I get 53% accuracy for C4.5 (decision tree) by using k-fold cross validation.



I didn't understand where's the problem for getting different accuracy for same dataset and same method.



My procedure in python code is given below:



import pandas as pd
import numpy as np
from sklearn import tree
from sklearn.model_selection import KFold

train=pd.read_csv('E://New.csv')
train.head()


enter image description here



# define X and y
feature_cols = ['Past','Family_History','Current','current or previous
workplace','diagnosed with a mental health condition by a
medical professional?','do you feel that it interferes with
your work when being treated effectively?','Gender']
X = train[feature_cols]

# y is a vector, hence we use dot to access 'label'
y = train['Diagonised condition']

kfold = KFold(n_splits=10,random_state=None)
model = tree.DecisionTreeClassifier(criterion='gini')

results = cross_val_score(model, X, y, cv=kfold,scoring = 'accuracy')
result = results.mean()*100

std = results.std()*100
print (result)


enter image description here










share|improve this question











$endgroup$




Is there any possibility to vary the accuracy of same data set in matlab and jupyter notebook by using python code ?



For same data set, at first I applied it in matlab and get 96% accuracy for decision tree method, then I apply that same data set in jupyter notebook by using python code where I get 53% accuracy for C4.5 (decision tree) by using k-fold cross validation.



I didn't understand where's the problem for getting different accuracy for same dataset and same method.



My procedure in python code is given below:



import pandas as pd
import numpy as np
from sklearn import tree
from sklearn.model_selection import KFold

train=pd.read_csv('E://New.csv')
train.head()


enter image description here



# define X and y
feature_cols = ['Past','Family_History','Current','current or previous
workplace','diagnosed with a mental health condition by a
medical professional?','do you feel that it interferes with
your work when being treated effectively?','Gender']
X = train[feature_cols]

# y is a vector, hence we use dot to access 'label'
y = train['Diagonised condition']

kfold = KFold(n_splits=10,random_state=None)
model = tree.DecisionTreeClassifier(criterion='gini')

results = cross_val_score(model, X, y, cv=kfold,scoring = 'accuracy')
result = results.mean()*100

std = results.std()*100
print (result)


enter image description here







python scikit-learn decision-trees accuracy matlab






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited 17 mins ago









Brian Spiering

3,5531028




3,5531028










asked Jan 23 at 15:37









IS2057IS2057

1021317




1021317












  • $begingroup$
    Please post the MATLAB code so it can be compared to the Python code.
    $endgroup$
    – Brian Spiering
    Jan 23 at 16:12










  • $begingroup$
    In matlab I use classification app (decision tree) and load my data set then calculate accuracy.
    $endgroup$
    – IS2057
    Jan 23 at 18:05










  • $begingroup$
    Are you sure that all other parameters for your decision tree are the same?
    $endgroup$
    – Majid Mortazavi
    Jan 24 at 6:23










  • $begingroup$
    @MajidMortazavi, Yes I am sure . I use the same dataset and same parameters.
    $endgroup$
    – IS2057
    Jan 24 at 6:49


















  • $begingroup$
    Please post the MATLAB code so it can be compared to the Python code.
    $endgroup$
    – Brian Spiering
    Jan 23 at 16:12










  • $begingroup$
    In matlab I use classification app (decision tree) and load my data set then calculate accuracy.
    $endgroup$
    – IS2057
    Jan 23 at 18:05










  • $begingroup$
    Are you sure that all other parameters for your decision tree are the same?
    $endgroup$
    – Majid Mortazavi
    Jan 24 at 6:23










  • $begingroup$
    @MajidMortazavi, Yes I am sure . I use the same dataset and same parameters.
    $endgroup$
    – IS2057
    Jan 24 at 6:49
















$begingroup$
Please post the MATLAB code so it can be compared to the Python code.
$endgroup$
– Brian Spiering
Jan 23 at 16:12




$begingroup$
Please post the MATLAB code so it can be compared to the Python code.
$endgroup$
– Brian Spiering
Jan 23 at 16:12












$begingroup$
In matlab I use classification app (decision tree) and load my data set then calculate accuracy.
$endgroup$
– IS2057
Jan 23 at 18:05




$begingroup$
In matlab I use classification app (decision tree) and load my data set then calculate accuracy.
$endgroup$
– IS2057
Jan 23 at 18:05












$begingroup$
Are you sure that all other parameters for your decision tree are the same?
$endgroup$
– Majid Mortazavi
Jan 24 at 6:23




$begingroup$
Are you sure that all other parameters for your decision tree are the same?
$endgroup$
– Majid Mortazavi
Jan 24 at 6:23












$begingroup$
@MajidMortazavi, Yes I am sure . I use the same dataset and same parameters.
$endgroup$
– IS2057
Jan 24 at 6:49




$begingroup$
@MajidMortazavi, Yes I am sure . I use the same dataset and same parameters.
$endgroup$
– IS2057
Jan 24 at 6:49










1 Answer
1






active

oldest

votes


















1












$begingroup$

It is hard to make a direct comparison between a white box implementation (scikit-learn) and a black box implementation (MATLAB).



One guess they are using different algorithms. scikit-learn uses an optimized version of the CART algorithm. Maybe MATLAB uses ID3, C4.5, or something else. Another guess two implementations are using different hyperparameters (e.g., different splitting criteria, max depth, minimum node size, ...).



Since decision trees are white-box models, you can examine their internal structure. Plot both trained trees. See how they each are making the splits and how many splits are being made.






share|improve this answer











$endgroup$













  • $begingroup$
    Yes, I understand it. Thanks for answering.
    $endgroup$
    – IS2057
    8 hours ago











Your Answer





StackExchange.ifUsing("editor", function () {
return StackExchange.using("mathjaxEditing", function () {
StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix) {
StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\\(","\\)"]]);
});
});
}, "mathjax-editing");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "557"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});


}
});














draft saved

draft discarded


















StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f44450%2faccuracy-differs-between-matlab-and-scikit-learn-for-a-decision-tree%23new-answer', 'question_page');
}
);

Post as a guest















Required, but never shown

























1 Answer
1






active

oldest

votes








1 Answer
1






active

oldest

votes









active

oldest

votes






active

oldest

votes









1












$begingroup$

It is hard to make a direct comparison between a white box implementation (scikit-learn) and a black box implementation (MATLAB).



One guess they are using different algorithms. scikit-learn uses an optimized version of the CART algorithm. Maybe MATLAB uses ID3, C4.5, or something else. Another guess two implementations are using different hyperparameters (e.g., different splitting criteria, max depth, minimum node size, ...).



Since decision trees are white-box models, you can examine their internal structure. Plot both trained trees. See how they each are making the splits and how many splits are being made.






share|improve this answer











$endgroup$













  • $begingroup$
    Yes, I understand it. Thanks for answering.
    $endgroup$
    – IS2057
    8 hours ago
















1












$begingroup$

It is hard to make a direct comparison between a white box implementation (scikit-learn) and a black box implementation (MATLAB).



One guess they are using different algorithms. scikit-learn uses an optimized version of the CART algorithm. Maybe MATLAB uses ID3, C4.5, or something else. Another guess two implementations are using different hyperparameters (e.g., different splitting criteria, max depth, minimum node size, ...).



Since decision trees are white-box models, you can examine their internal structure. Plot both trained trees. See how they each are making the splits and how many splits are being made.






share|improve this answer











$endgroup$













  • $begingroup$
    Yes, I understand it. Thanks for answering.
    $endgroup$
    – IS2057
    8 hours ago














1












1








1





$begingroup$

It is hard to make a direct comparison between a white box implementation (scikit-learn) and a black box implementation (MATLAB).



One guess they are using different algorithms. scikit-learn uses an optimized version of the CART algorithm. Maybe MATLAB uses ID3, C4.5, or something else. Another guess two implementations are using different hyperparameters (e.g., different splitting criteria, max depth, minimum node size, ...).



Since decision trees are white-box models, you can examine their internal structure. Plot both trained trees. See how they each are making the splits and how many splits are being made.






share|improve this answer











$endgroup$



It is hard to make a direct comparison between a white box implementation (scikit-learn) and a black box implementation (MATLAB).



One guess they are using different algorithms. scikit-learn uses an optimized version of the CART algorithm. Maybe MATLAB uses ID3, C4.5, or something else. Another guess two implementations are using different hyperparameters (e.g., different splitting criteria, max depth, minimum node size, ...).



Since decision trees are white-box models, you can examine their internal structure. Plot both trained trees. See how they each are making the splits and how many splits are being made.







share|improve this answer














share|improve this answer



share|improve this answer








edited 2 days ago

























answered Jan 24 at 18:17









Brian SpieringBrian Spiering

3,5531028




3,5531028












  • $begingroup$
    Yes, I understand it. Thanks for answering.
    $endgroup$
    – IS2057
    8 hours ago


















  • $begingroup$
    Yes, I understand it. Thanks for answering.
    $endgroup$
    – IS2057
    8 hours ago
















$begingroup$
Yes, I understand it. Thanks for answering.
$endgroup$
– IS2057
8 hours ago




$begingroup$
Yes, I understand it. Thanks for answering.
$endgroup$
– IS2057
8 hours ago


















draft saved

draft discarded




















































Thanks for contributing an answer to Data Science Stack Exchange!


  • Please be sure to answer the question. Provide details and share your research!

But avoid



  • Asking for help, clarification, or responding to other answers.

  • Making statements based on opinion; back them up with references or personal experience.


Use MathJax to format equations. MathJax reference.


To learn more, see our tips on writing great answers.




draft saved


draft discarded














StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f44450%2faccuracy-differs-between-matlab-and-scikit-learn-for-a-decision-tree%23new-answer', 'question_page');
}
);

Post as a guest















Required, but never shown





















































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown

































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown







Popular posts from this blog

Callistus I

Tabula Rosettana

How to label and detect the document text images