What exactly is a Gini Index
$begingroup$
I am going through the tutorial at this site. Here, I can see the author is explaining the derivation of Gini Index. I want to understand the following terms
- Group
- Classes : As far as I have understood, it represents the possible values of labels in the data which we are supposed to classify. Please correct me if I am wrong.
The website here states that it is the difference between 1 and the probabilities of the classified values within the dataset while creating the split. But the first link does add some more points to the simple derivation. Can anyone please explain in layman terms the derivation of Gini Index?
machine-learning decision-trees
$endgroup$
add a comment |
$begingroup$
I am going through the tutorial at this site. Here, I can see the author is explaining the derivation of Gini Index. I want to understand the following terms
- Group
- Classes : As far as I have understood, it represents the possible values of labels in the data which we are supposed to classify. Please correct me if I am wrong.
The website here states that it is the difference between 1 and the probabilities of the classified values within the dataset while creating the split. But the first link does add some more points to the simple derivation. Can anyone please explain in layman terms the derivation of Gini Index?
machine-learning decision-trees
$endgroup$
add a comment |
$begingroup$
I am going through the tutorial at this site. Here, I can see the author is explaining the derivation of Gini Index. I want to understand the following terms
- Group
- Classes : As far as I have understood, it represents the possible values of labels in the data which we are supposed to classify. Please correct me if I am wrong.
The website here states that it is the difference between 1 and the probabilities of the classified values within the dataset while creating the split. But the first link does add some more points to the simple derivation. Can anyone please explain in layman terms the derivation of Gini Index?
machine-learning decision-trees
$endgroup$
I am going through the tutorial at this site. Here, I can see the author is explaining the derivation of Gini Index. I want to understand the following terms
- Group
- Classes : As far as I have understood, it represents the possible values of labels in the data which we are supposed to classify. Please correct me if I am wrong.
The website here states that it is the difference between 1 and the probabilities of the classified values within the dataset while creating the split. But the first link does add some more points to the simple derivation. Can anyone please explain in layman terms the derivation of Gini Index?
machine-learning decision-trees
machine-learning decision-trees
asked Sep 25 '17 at 14:36
Neeleshkumar Srinivasan MannurNeeleshkumar Srinivasan Mannur
12
12
add a comment |
add a comment |
1 Answer
1
active
oldest
votes
$begingroup$
A class is simply a label you use to categorize a bunch of objects. For example, if you were trying to create an email filter, you might have a spam
class and non-spam
class.
A Gini index is used in decision trees. A single decision in a decision tree is called a node, and the Gini index is a way to measure how "impure" a single node is.
Suppose you have a data set that lists several attributes for a bunch of animals and you're trying to predict if each animal is a mammal or not. You would have two classes, mammal
, and not-mammal
. You start making your decision tree by asking if an animal is warm blooded or not and split your data set into two groups based on this splitting criteria. If an animal is cold blooded, it belongs to the not-mammal
class, however, if an animal is warm-blooded, it may or may not belong to the mammal
class. This new node (e.g., decision) might contain a mix, or group, of animals that may or may not be mammals (i.e., the group could contain birds and mammals). A 50/50 split between mammal
s and non-mammal
s at this node would mean the node is impure (with a Gini index of 0.5). A completely pure node would have a Gini index of 0 and would indicate a node is made up of only 1 class.
New contributor
$endgroup$
add a comment |
Your Answer
StackExchange.ifUsing("editor", function () {
return StackExchange.using("mathjaxEditing", function () {
StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix) {
StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\\(","\\)"]]);
});
});
}, "mathjax-editing");
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "557"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f23299%2fwhat-exactly-is-a-gini-index%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
$begingroup$
A class is simply a label you use to categorize a bunch of objects. For example, if you were trying to create an email filter, you might have a spam
class and non-spam
class.
A Gini index is used in decision trees. A single decision in a decision tree is called a node, and the Gini index is a way to measure how "impure" a single node is.
Suppose you have a data set that lists several attributes for a bunch of animals and you're trying to predict if each animal is a mammal or not. You would have two classes, mammal
, and not-mammal
. You start making your decision tree by asking if an animal is warm blooded or not and split your data set into two groups based on this splitting criteria. If an animal is cold blooded, it belongs to the not-mammal
class, however, if an animal is warm-blooded, it may or may not belong to the mammal
class. This new node (e.g., decision) might contain a mix, or group, of animals that may or may not be mammals (i.e., the group could contain birds and mammals). A 50/50 split between mammal
s and non-mammal
s at this node would mean the node is impure (with a Gini index of 0.5). A completely pure node would have a Gini index of 0 and would indicate a node is made up of only 1 class.
New contributor
$endgroup$
add a comment |
$begingroup$
A class is simply a label you use to categorize a bunch of objects. For example, if you were trying to create an email filter, you might have a spam
class and non-spam
class.
A Gini index is used in decision trees. A single decision in a decision tree is called a node, and the Gini index is a way to measure how "impure" a single node is.
Suppose you have a data set that lists several attributes for a bunch of animals and you're trying to predict if each animal is a mammal or not. You would have two classes, mammal
, and not-mammal
. You start making your decision tree by asking if an animal is warm blooded or not and split your data set into two groups based on this splitting criteria. If an animal is cold blooded, it belongs to the not-mammal
class, however, if an animal is warm-blooded, it may or may not belong to the mammal
class. This new node (e.g., decision) might contain a mix, or group, of animals that may or may not be mammals (i.e., the group could contain birds and mammals). A 50/50 split between mammal
s and non-mammal
s at this node would mean the node is impure (with a Gini index of 0.5). A completely pure node would have a Gini index of 0 and would indicate a node is made up of only 1 class.
New contributor
$endgroup$
add a comment |
$begingroup$
A class is simply a label you use to categorize a bunch of objects. For example, if you were trying to create an email filter, you might have a spam
class and non-spam
class.
A Gini index is used in decision trees. A single decision in a decision tree is called a node, and the Gini index is a way to measure how "impure" a single node is.
Suppose you have a data set that lists several attributes for a bunch of animals and you're trying to predict if each animal is a mammal or not. You would have two classes, mammal
, and not-mammal
. You start making your decision tree by asking if an animal is warm blooded or not and split your data set into two groups based on this splitting criteria. If an animal is cold blooded, it belongs to the not-mammal
class, however, if an animal is warm-blooded, it may or may not belong to the mammal
class. This new node (e.g., decision) might contain a mix, or group, of animals that may or may not be mammals (i.e., the group could contain birds and mammals). A 50/50 split between mammal
s and non-mammal
s at this node would mean the node is impure (with a Gini index of 0.5). A completely pure node would have a Gini index of 0 and would indicate a node is made up of only 1 class.
New contributor
$endgroup$
A class is simply a label you use to categorize a bunch of objects. For example, if you were trying to create an email filter, you might have a spam
class and non-spam
class.
A Gini index is used in decision trees. A single decision in a decision tree is called a node, and the Gini index is a way to measure how "impure" a single node is.
Suppose you have a data set that lists several attributes for a bunch of animals and you're trying to predict if each animal is a mammal or not. You would have two classes, mammal
, and not-mammal
. You start making your decision tree by asking if an animal is warm blooded or not and split your data set into two groups based on this splitting criteria. If an animal is cold blooded, it belongs to the not-mammal
class, however, if an animal is warm-blooded, it may or may not belong to the mammal
class. This new node (e.g., decision) might contain a mix, or group, of animals that may or may not be mammals (i.e., the group could contain birds and mammals). A 50/50 split between mammal
s and non-mammal
s at this node would mean the node is impure (with a Gini index of 0.5). A completely pure node would have a Gini index of 0 and would indicate a node is made up of only 1 class.
New contributor
New contributor
answered 15 mins ago
darksingedarksinge
101
101
New contributor
New contributor
add a comment |
add a comment |
Thanks for contributing an answer to Data Science Stack Exchange!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
Use MathJax to format equations. MathJax reference.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f23299%2fwhat-exactly-is-a-gini-index%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown