Are there algorithms for clustering objects with pairwise distances, without computing all pairwise...
$begingroup$
I'm looking for a clustering algorithm that clusters objects, by using their pairwise distances, without needing to calculate all pairwise distances.
Normally pairwise clustering is done like this: (see here)
- Compute full distance matrix between all pairwise combination of objects
- Assuming that the distances there are non-euclidean, one might use Spectral Clustering or Affinity propagation on the distance matrix and retrieve the clustering results.
Here comes the however:
Computing the full distance matrix for all pairwise combination of objects is computationally very expensive. So my though was, whether there are some clustering algorithms that only do lookups on a subset of the pairwise distances, so it is not necessary to compute the full matrix?
I know Spectral Clustering works also on sparse matrices, but since it is theoretically possible to compute all pairwise distances, which ones should be left out?
Exited to hear your ideas, thanks!
clustering similarity graphs distance
New contributor
$endgroup$
add a comment |
$begingroup$
I'm looking for a clustering algorithm that clusters objects, by using their pairwise distances, without needing to calculate all pairwise distances.
Normally pairwise clustering is done like this: (see here)
- Compute full distance matrix between all pairwise combination of objects
- Assuming that the distances there are non-euclidean, one might use Spectral Clustering or Affinity propagation on the distance matrix and retrieve the clustering results.
Here comes the however:
Computing the full distance matrix for all pairwise combination of objects is computationally very expensive. So my though was, whether there are some clustering algorithms that only do lookups on a subset of the pairwise distances, so it is not necessary to compute the full matrix?
I know Spectral Clustering works also on sparse matrices, but since it is theoretically possible to compute all pairwise distances, which ones should be left out?
Exited to hear your ideas, thanks!
clustering similarity graphs distance
New contributor
$endgroup$
add a comment |
$begingroup$
I'm looking for a clustering algorithm that clusters objects, by using their pairwise distances, without needing to calculate all pairwise distances.
Normally pairwise clustering is done like this: (see here)
- Compute full distance matrix between all pairwise combination of objects
- Assuming that the distances there are non-euclidean, one might use Spectral Clustering or Affinity propagation on the distance matrix and retrieve the clustering results.
Here comes the however:
Computing the full distance matrix for all pairwise combination of objects is computationally very expensive. So my though was, whether there are some clustering algorithms that only do lookups on a subset of the pairwise distances, so it is not necessary to compute the full matrix?
I know Spectral Clustering works also on sparse matrices, but since it is theoretically possible to compute all pairwise distances, which ones should be left out?
Exited to hear your ideas, thanks!
clustering similarity graphs distance
New contributor
$endgroup$
I'm looking for a clustering algorithm that clusters objects, by using their pairwise distances, without needing to calculate all pairwise distances.
Normally pairwise clustering is done like this: (see here)
- Compute full distance matrix between all pairwise combination of objects
- Assuming that the distances there are non-euclidean, one might use Spectral Clustering or Affinity propagation on the distance matrix and retrieve the clustering results.
Here comes the however:
Computing the full distance matrix for all pairwise combination of objects is computationally very expensive. So my though was, whether there are some clustering algorithms that only do lookups on a subset of the pairwise distances, so it is not necessary to compute the full matrix?
I know Spectral Clustering works also on sparse matrices, but since it is theoretically possible to compute all pairwise distances, which ones should be left out?
Exited to hear your ideas, thanks!
clustering similarity graphs distance
clustering similarity graphs distance
New contributor
New contributor
New contributor
asked 2 days ago
TaizameTaizame
61
61
New contributor
New contributor
add a comment |
add a comment |
3 Answers
3
active
oldest
votes
$begingroup$
Well, one may argue that DBSCAN is based on all pairwise distances, but it uuses data index to avoid computing all of them using geometric bounds.
And there are other examples if you browse through literature.
For example, the classic CLARA method is an approximation to PAM that avoids computing all pairwise distances.
And there are many more such techniques.
$endgroup$
add a comment |
$begingroup$
Quadtree can be used for this purpose.
This algo divides 2 D space into clusters. In this example ; we can exclude point 'C' from comparison with 'E' and 'F'
http://wiki.gis.com/wiki/index.php/Quadtree
DBSCAN is also useful in some scenarios : https://en.wikipedia.org/wiki/DBSCAN
$endgroup$
add a comment |
$begingroup$
You can use Locality Sensitive Hashing technique Wiki article
With this, you can estimate either the Jaccard Similarity (MinHash) or Cosine Similarity (SimHash) between two documents and then apply clustering on the documents collection.
There is a great example with Python code for MinHash. What I get from the article is the bellow quote
In the example code, we have a collection of 10,000 articles which contain, on average, 250 shingles each. Computing the Jaccard similarities directly for all pairs takes 20 minutes on my PC, while generating and comparing the MinHash signatures takes only about 2 minutes and 45 seconds.
MinHash explanation with Python code
$endgroup$
add a comment |
Your Answer
StackExchange.ifUsing("editor", function () {
return StackExchange.using("mathjaxEditing", function () {
StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix) {
StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\\(","\\)"]]);
});
});
}, "mathjax-editing");
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "557"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Taizame is a new contributor. Be nice, and check out our Code of Conduct.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f46950%2fare-there-algorithms-for-clustering-objects-with-pairwise-distances-without-com%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
3 Answers
3
active
oldest
votes
3 Answers
3
active
oldest
votes
active
oldest
votes
active
oldest
votes
$begingroup$
Well, one may argue that DBSCAN is based on all pairwise distances, but it uuses data index to avoid computing all of them using geometric bounds.
And there are other examples if you browse through literature.
For example, the classic CLARA method is an approximation to PAM that avoids computing all pairwise distances.
And there are many more such techniques.
$endgroup$
add a comment |
$begingroup$
Well, one may argue that DBSCAN is based on all pairwise distances, but it uuses data index to avoid computing all of them using geometric bounds.
And there are other examples if you browse through literature.
For example, the classic CLARA method is an approximation to PAM that avoids computing all pairwise distances.
And there are many more such techniques.
$endgroup$
add a comment |
$begingroup$
Well, one may argue that DBSCAN is based on all pairwise distances, but it uuses data index to avoid computing all of them using geometric bounds.
And there are other examples if you browse through literature.
For example, the classic CLARA method is an approximation to PAM that avoids computing all pairwise distances.
And there are many more such techniques.
$endgroup$
Well, one may argue that DBSCAN is based on all pairwise distances, but it uuses data index to avoid computing all of them using geometric bounds.
And there are other examples if you browse through literature.
For example, the classic CLARA method is an approximation to PAM that avoids computing all pairwise distances.
And there are many more such techniques.
answered yesterday
Anony-MousseAnony-Mousse
4,896624
4,896624
add a comment |
add a comment |
$begingroup$
Quadtree can be used for this purpose.
This algo divides 2 D space into clusters. In this example ; we can exclude point 'C' from comparison with 'E' and 'F'
http://wiki.gis.com/wiki/index.php/Quadtree
DBSCAN is also useful in some scenarios : https://en.wikipedia.org/wiki/DBSCAN
$endgroup$
add a comment |
$begingroup$
Quadtree can be used for this purpose.
This algo divides 2 D space into clusters. In this example ; we can exclude point 'C' from comparison with 'E' and 'F'
http://wiki.gis.com/wiki/index.php/Quadtree
DBSCAN is also useful in some scenarios : https://en.wikipedia.org/wiki/DBSCAN
$endgroup$
add a comment |
$begingroup$
Quadtree can be used for this purpose.
This algo divides 2 D space into clusters. In this example ; we can exclude point 'C' from comparison with 'E' and 'F'
http://wiki.gis.com/wiki/index.php/Quadtree
DBSCAN is also useful in some scenarios : https://en.wikipedia.org/wiki/DBSCAN
$endgroup$
Quadtree can be used for this purpose.
This algo divides 2 D space into clusters. In this example ; we can exclude point 'C' from comparison with 'E' and 'F'
http://wiki.gis.com/wiki/index.php/Quadtree
DBSCAN is also useful in some scenarios : https://en.wikipedia.org/wiki/DBSCAN
answered 2 days ago
Shamit VermaShamit Verma
78426
78426
add a comment |
add a comment |
$begingroup$
You can use Locality Sensitive Hashing technique Wiki article
With this, you can estimate either the Jaccard Similarity (MinHash) or Cosine Similarity (SimHash) between two documents and then apply clustering on the documents collection.
There is a great example with Python code for MinHash. What I get from the article is the bellow quote
In the example code, we have a collection of 10,000 articles which contain, on average, 250 shingles each. Computing the Jaccard similarities directly for all pairs takes 20 minutes on my PC, while generating and comparing the MinHash signatures takes only about 2 minutes and 45 seconds.
MinHash explanation with Python code
$endgroup$
add a comment |
$begingroup$
You can use Locality Sensitive Hashing technique Wiki article
With this, you can estimate either the Jaccard Similarity (MinHash) or Cosine Similarity (SimHash) between two documents and then apply clustering on the documents collection.
There is a great example with Python code for MinHash. What I get from the article is the bellow quote
In the example code, we have a collection of 10,000 articles which contain, on average, 250 shingles each. Computing the Jaccard similarities directly for all pairs takes 20 minutes on my PC, while generating and comparing the MinHash signatures takes only about 2 minutes and 45 seconds.
MinHash explanation with Python code
$endgroup$
add a comment |
$begingroup$
You can use Locality Sensitive Hashing technique Wiki article
With this, you can estimate either the Jaccard Similarity (MinHash) or Cosine Similarity (SimHash) between two documents and then apply clustering on the documents collection.
There is a great example with Python code for MinHash. What I get from the article is the bellow quote
In the example code, we have a collection of 10,000 articles which contain, on average, 250 shingles each. Computing the Jaccard similarities directly for all pairs takes 20 minutes on my PC, while generating and comparing the MinHash signatures takes only about 2 minutes and 45 seconds.
MinHash explanation with Python code
$endgroup$
You can use Locality Sensitive Hashing technique Wiki article
With this, you can estimate either the Jaccard Similarity (MinHash) or Cosine Similarity (SimHash) between two documents and then apply clustering on the documents collection.
There is a great example with Python code for MinHash. What I get from the article is the bellow quote
In the example code, we have a collection of 10,000 articles which contain, on average, 250 shingles each. Computing the Jaccard similarities directly for all pairs takes 20 minutes on my PC, while generating and comparing the MinHash signatures takes only about 2 minutes and 45 seconds.
MinHash explanation with Python code
answered 2 days ago
TasosTasos
995630
995630
add a comment |
add a comment |
Taizame is a new contributor. Be nice, and check out our Code of Conduct.
Taizame is a new contributor. Be nice, and check out our Code of Conduct.
Taizame is a new contributor. Be nice, and check out our Code of Conduct.
Taizame is a new contributor. Be nice, and check out our Code of Conduct.
Thanks for contributing an answer to Data Science Stack Exchange!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
Use MathJax to format equations. MathJax reference.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f46950%2fare-there-algorithms-for-clustering-objects-with-pairwise-distances-without-com%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown