Knn distance plot for determining eps of DBSCAN
$begingroup$
I would like to use the knn distance plot to be able to figure out which eps value should I choose for the DBSCAN algorithm.
Based on this page:
The idea is to calculate, the average of the distances of every point
to its k nearest neighbors. The value of k will be specified by the
user and corresponds to MinPts. Next, these k-distances are plotted in
an ascending order. The aim is to determine the “knee”, which
corresponds to the optimal eps parameter.
Using python with numpy/sklearn, I have the following points, with the following distance for 6-knn:
X = np.array([[-1, -1], [-2, -1], [-3, -2], [1, 1], [2, 1], [3, 2]])
nbrs = NearestNeighbors(n_neighbors=len(X)).fit(X)
distances, indices = nbrs.kneighbors(X)
# Indices
[[0 1 2 3 4 5]
[1 0 2 3 4 5]
[2 1 0 3 4 5]
[3 4 5 0 1 2]
[4 3 5 0 1 2]
[5 4 3 0 1 2]]
# Distances
[[ 0. 1. 2.23606798 2.82842712 3.60555128 5. ]
[ 0. 1. 1.41421356 3.60555128 4.47213595 5.83095189]
[ 0. 1.41421356 2.23606798 5. 5.83095189 7.21110255]
[ 0. 1. 2.23606798 2.82842712 3.60555128 5. ]
[ 0. 1. 1.41421356 3.60555128 4.47213595 5.83095189]
[ 0. 1.41421356 2.23606798 5. 5.83095189 7.21110255]]
then I computed the average distance:
distances.mean()
2.9269575028354495
The problem is I don't understand how exactly could I represent the same plot as them with distances in y-axis and number of points according to the distances on the x-axis using python.
Thank for your help.
python clustering parameter-estimation dbscan
$endgroup$
add a comment |
$begingroup$
I would like to use the knn distance plot to be able to figure out which eps value should I choose for the DBSCAN algorithm.
Based on this page:
The idea is to calculate, the average of the distances of every point
to its k nearest neighbors. The value of k will be specified by the
user and corresponds to MinPts. Next, these k-distances are plotted in
an ascending order. The aim is to determine the “knee”, which
corresponds to the optimal eps parameter.
Using python with numpy/sklearn, I have the following points, with the following distance for 6-knn:
X = np.array([[-1, -1], [-2, -1], [-3, -2], [1, 1], [2, 1], [3, 2]])
nbrs = NearestNeighbors(n_neighbors=len(X)).fit(X)
distances, indices = nbrs.kneighbors(X)
# Indices
[[0 1 2 3 4 5]
[1 0 2 3 4 5]
[2 1 0 3 4 5]
[3 4 5 0 1 2]
[4 3 5 0 1 2]
[5 4 3 0 1 2]]
# Distances
[[ 0. 1. 2.23606798 2.82842712 3.60555128 5. ]
[ 0. 1. 1.41421356 3.60555128 4.47213595 5.83095189]
[ 0. 1.41421356 2.23606798 5. 5.83095189 7.21110255]
[ 0. 1. 2.23606798 2.82842712 3.60555128 5. ]
[ 0. 1. 1.41421356 3.60555128 4.47213595 5.83095189]
[ 0. 1.41421356 2.23606798 5. 5.83095189 7.21110255]]
then I computed the average distance:
distances.mean()
2.9269575028354495
The problem is I don't understand how exactly could I represent the same plot as them with distances in y-axis and number of points according to the distances on the x-axis using python.
Thank for your help.
python clustering parameter-estimation dbscan
$endgroup$
$begingroup$
![enter image description here](i.stack.imgur.com/KFDbs.png) Why does my neighboring point graph have this shape? Please help me!!!
$endgroup$
– Dung Le
Oct 10 '17 at 0:37
add a comment |
$begingroup$
I would like to use the knn distance plot to be able to figure out which eps value should I choose for the DBSCAN algorithm.
Based on this page:
The idea is to calculate, the average of the distances of every point
to its k nearest neighbors. The value of k will be specified by the
user and corresponds to MinPts. Next, these k-distances are plotted in
an ascending order. The aim is to determine the “knee”, which
corresponds to the optimal eps parameter.
Using python with numpy/sklearn, I have the following points, with the following distance for 6-knn:
X = np.array([[-1, -1], [-2, -1], [-3, -2], [1, 1], [2, 1], [3, 2]])
nbrs = NearestNeighbors(n_neighbors=len(X)).fit(X)
distances, indices = nbrs.kneighbors(X)
# Indices
[[0 1 2 3 4 5]
[1 0 2 3 4 5]
[2 1 0 3 4 5]
[3 4 5 0 1 2]
[4 3 5 0 1 2]
[5 4 3 0 1 2]]
# Distances
[[ 0. 1. 2.23606798 2.82842712 3.60555128 5. ]
[ 0. 1. 1.41421356 3.60555128 4.47213595 5.83095189]
[ 0. 1.41421356 2.23606798 5. 5.83095189 7.21110255]
[ 0. 1. 2.23606798 2.82842712 3.60555128 5. ]
[ 0. 1. 1.41421356 3.60555128 4.47213595 5.83095189]
[ 0. 1.41421356 2.23606798 5. 5.83095189 7.21110255]]
then I computed the average distance:
distances.mean()
2.9269575028354495
The problem is I don't understand how exactly could I represent the same plot as them with distances in y-axis and number of points according to the distances on the x-axis using python.
Thank for your help.
python clustering parameter-estimation dbscan
$endgroup$
I would like to use the knn distance plot to be able to figure out which eps value should I choose for the DBSCAN algorithm.
Based on this page:
The idea is to calculate, the average of the distances of every point
to its k nearest neighbors. The value of k will be specified by the
user and corresponds to MinPts. Next, these k-distances are plotted in
an ascending order. The aim is to determine the “knee”, which
corresponds to the optimal eps parameter.
Using python with numpy/sklearn, I have the following points, with the following distance for 6-knn:
X = np.array([[-1, -1], [-2, -1], [-3, -2], [1, 1], [2, 1], [3, 2]])
nbrs = NearestNeighbors(n_neighbors=len(X)).fit(X)
distances, indices = nbrs.kneighbors(X)
# Indices
[[0 1 2 3 4 5]
[1 0 2 3 4 5]
[2 1 0 3 4 5]
[3 4 5 0 1 2]
[4 3 5 0 1 2]
[5 4 3 0 1 2]]
# Distances
[[ 0. 1. 2.23606798 2.82842712 3.60555128 5. ]
[ 0. 1. 1.41421356 3.60555128 4.47213595 5.83095189]
[ 0. 1.41421356 2.23606798 5. 5.83095189 7.21110255]
[ 0. 1. 2.23606798 2.82842712 3.60555128 5. ]
[ 0. 1. 1.41421356 3.60555128 4.47213595 5.83095189]
[ 0. 1.41421356 2.23606798 5. 5.83095189 7.21110255]]
then I computed the average distance:
distances.mean()
2.9269575028354495
The problem is I don't understand how exactly could I represent the same plot as them with distances in y-axis and number of points according to the distances on the x-axis using python.
Thank for your help.
python clustering parameter-estimation dbscan
python clustering parameter-estimation dbscan
edited Mar 2 '16 at 15:50
Kasra Manshaei
3,8071135
3,8071135
asked Feb 9 '16 at 16:29
marcLmarcL
8226
8226
$begingroup$
![enter image description here](i.stack.imgur.com/KFDbs.png) Why does my neighboring point graph have this shape? Please help me!!!
$endgroup$
– Dung Le
Oct 10 '17 at 0:37
add a comment |
$begingroup$
![enter image description here](i.stack.imgur.com/KFDbs.png) Why does my neighboring point graph have this shape? Please help me!!!
$endgroup$
– Dung Le
Oct 10 '17 at 0:37
$begingroup$
![enter image description here](i.stack.imgur.com/KFDbs.png) Why does my neighboring point graph have this shape? Please help me!!!
$endgroup$
– Dung Le
Oct 10 '17 at 0:37
$begingroup$
![enter image description here](i.stack.imgur.com/KFDbs.png) Why does my neighboring point graph have this shape? Please help me!!!
$endgroup$
– Dung Le
Oct 10 '17 at 0:37
add a comment |
2 Answers
2
active
oldest
votes
$begingroup$
You
- take the last column of that matrix
- sort descending
- plot index, distance
- hope to see a knee (if the distance does not work well. there might be none)
$endgroup$
$begingroup$
On the same plot, I do this for different k? or only one k for one plot as in the example? and what do you mean by "index"
$endgroup$
– marcL
Feb 9 '16 at 20:53
$begingroup$
Using the 6NN when you only have 6 points is of course nonsense. Do it for an appropriate k. Index as in "array index". because you need 2d to plot.
$endgroup$
– Anony-Mousse
Feb 9 '16 at 20:57
$begingroup$
And i only use the last column of the distance matrix. Because in the example they talk about averaging distances..
$endgroup$
– marcL
Feb 9 '16 at 22:26
$begingroup$
That post is incorrect there and in at least another place (you don't need to set a seed)
$endgroup$
– Anony-Mousse
Feb 9 '16 at 22:46
1
$begingroup$
You only have one k. Why don't you use the DBSCAN paper. but mash-up various low-quality websites?
$endgroup$
– Anony-Mousse
Feb 9 '16 at 22:53
|
show 1 more comment
$begingroup$
why do me take the last column of the distance matrix? Please elaborate.
New contributor
$endgroup$
add a comment |
Your Answer
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "557"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f10162%2fknn-distance-plot-for-determining-eps-of-dbscan%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
2 Answers
2
active
oldest
votes
2 Answers
2
active
oldest
votes
active
oldest
votes
active
oldest
votes
$begingroup$
You
- take the last column of that matrix
- sort descending
- plot index, distance
- hope to see a knee (if the distance does not work well. there might be none)
$endgroup$
$begingroup$
On the same plot, I do this for different k? or only one k for one plot as in the example? and what do you mean by "index"
$endgroup$
– marcL
Feb 9 '16 at 20:53
$begingroup$
Using the 6NN when you only have 6 points is of course nonsense. Do it for an appropriate k. Index as in "array index". because you need 2d to plot.
$endgroup$
– Anony-Mousse
Feb 9 '16 at 20:57
$begingroup$
And i only use the last column of the distance matrix. Because in the example they talk about averaging distances..
$endgroup$
– marcL
Feb 9 '16 at 22:26
$begingroup$
That post is incorrect there and in at least another place (you don't need to set a seed)
$endgroup$
– Anony-Mousse
Feb 9 '16 at 22:46
1
$begingroup$
You only have one k. Why don't you use the DBSCAN paper. but mash-up various low-quality websites?
$endgroup$
– Anony-Mousse
Feb 9 '16 at 22:53
|
show 1 more comment
$begingroup$
You
- take the last column of that matrix
- sort descending
- plot index, distance
- hope to see a knee (if the distance does not work well. there might be none)
$endgroup$
$begingroup$
On the same plot, I do this for different k? or only one k for one plot as in the example? and what do you mean by "index"
$endgroup$
– marcL
Feb 9 '16 at 20:53
$begingroup$
Using the 6NN when you only have 6 points is of course nonsense. Do it for an appropriate k. Index as in "array index". because you need 2d to plot.
$endgroup$
– Anony-Mousse
Feb 9 '16 at 20:57
$begingroup$
And i only use the last column of the distance matrix. Because in the example they talk about averaging distances..
$endgroup$
– marcL
Feb 9 '16 at 22:26
$begingroup$
That post is incorrect there and in at least another place (you don't need to set a seed)
$endgroup$
– Anony-Mousse
Feb 9 '16 at 22:46
1
$begingroup$
You only have one k. Why don't you use the DBSCAN paper. but mash-up various low-quality websites?
$endgroup$
– Anony-Mousse
Feb 9 '16 at 22:53
|
show 1 more comment
$begingroup$
You
- take the last column of that matrix
- sort descending
- plot index, distance
- hope to see a knee (if the distance does not work well. there might be none)
$endgroup$
You
- take the last column of that matrix
- sort descending
- plot index, distance
- hope to see a knee (if the distance does not work well. there might be none)
answered Feb 9 '16 at 19:34
Anony-MousseAnony-Mousse
5,300625
5,300625
$begingroup$
On the same plot, I do this for different k? or only one k for one plot as in the example? and what do you mean by "index"
$endgroup$
– marcL
Feb 9 '16 at 20:53
$begingroup$
Using the 6NN when you only have 6 points is of course nonsense. Do it for an appropriate k. Index as in "array index". because you need 2d to plot.
$endgroup$
– Anony-Mousse
Feb 9 '16 at 20:57
$begingroup$
And i only use the last column of the distance matrix. Because in the example they talk about averaging distances..
$endgroup$
– marcL
Feb 9 '16 at 22:26
$begingroup$
That post is incorrect there and in at least another place (you don't need to set a seed)
$endgroup$
– Anony-Mousse
Feb 9 '16 at 22:46
1
$begingroup$
You only have one k. Why don't you use the DBSCAN paper. but mash-up various low-quality websites?
$endgroup$
– Anony-Mousse
Feb 9 '16 at 22:53
|
show 1 more comment
$begingroup$
On the same plot, I do this for different k? or only one k for one plot as in the example? and what do you mean by "index"
$endgroup$
– marcL
Feb 9 '16 at 20:53
$begingroup$
Using the 6NN when you only have 6 points is of course nonsense. Do it for an appropriate k. Index as in "array index". because you need 2d to plot.
$endgroup$
– Anony-Mousse
Feb 9 '16 at 20:57
$begingroup$
And i only use the last column of the distance matrix. Because in the example they talk about averaging distances..
$endgroup$
– marcL
Feb 9 '16 at 22:26
$begingroup$
That post is incorrect there and in at least another place (you don't need to set a seed)
$endgroup$
– Anony-Mousse
Feb 9 '16 at 22:46
1
$begingroup$
You only have one k. Why don't you use the DBSCAN paper. but mash-up various low-quality websites?
$endgroup$
– Anony-Mousse
Feb 9 '16 at 22:53
$begingroup$
On the same plot, I do this for different k? or only one k for one plot as in the example? and what do you mean by "index"
$endgroup$
– marcL
Feb 9 '16 at 20:53
$begingroup$
On the same plot, I do this for different k? or only one k for one plot as in the example? and what do you mean by "index"
$endgroup$
– marcL
Feb 9 '16 at 20:53
$begingroup$
Using the 6NN when you only have 6 points is of course nonsense. Do it for an appropriate k. Index as in "array index". because you need 2d to plot.
$endgroup$
– Anony-Mousse
Feb 9 '16 at 20:57
$begingroup$
Using the 6NN when you only have 6 points is of course nonsense. Do it for an appropriate k. Index as in "array index". because you need 2d to plot.
$endgroup$
– Anony-Mousse
Feb 9 '16 at 20:57
$begingroup$
And i only use the last column of the distance matrix. Because in the example they talk about averaging distances..
$endgroup$
– marcL
Feb 9 '16 at 22:26
$begingroup$
And i only use the last column of the distance matrix. Because in the example they talk about averaging distances..
$endgroup$
– marcL
Feb 9 '16 at 22:26
$begingroup$
That post is incorrect there and in at least another place (you don't need to set a seed)
$endgroup$
– Anony-Mousse
Feb 9 '16 at 22:46
$begingroup$
That post is incorrect there and in at least another place (you don't need to set a seed)
$endgroup$
– Anony-Mousse
Feb 9 '16 at 22:46
1
1
$begingroup$
You only have one k. Why don't you use the DBSCAN paper. but mash-up various low-quality websites?
$endgroup$
– Anony-Mousse
Feb 9 '16 at 22:53
$begingroup$
You only have one k. Why don't you use the DBSCAN paper. but mash-up various low-quality websites?
$endgroup$
– Anony-Mousse
Feb 9 '16 at 22:53
|
show 1 more comment
$begingroup$
why do me take the last column of the distance matrix? Please elaborate.
New contributor
$endgroup$
add a comment |
$begingroup$
why do me take the last column of the distance matrix? Please elaborate.
New contributor
$endgroup$
add a comment |
$begingroup$
why do me take the last column of the distance matrix? Please elaborate.
New contributor
$endgroup$
why do me take the last column of the distance matrix? Please elaborate.
New contributor
New contributor
answered 7 mins ago
NehaNeha
1
1
New contributor
New contributor
add a comment |
add a comment |
Thanks for contributing an answer to Data Science Stack Exchange!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
Use MathJax to format equations. MathJax reference.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f10162%2fknn-distance-plot-for-determining-eps-of-dbscan%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
$begingroup$
![enter image description here](i.stack.imgur.com/KFDbs.png) Why does my neighboring point graph have this shape? Please help me!!!
$endgroup$
– Dung Le
Oct 10 '17 at 0:37