Multidimensional scaling producing different results for different seeds
$begingroup$
I took the data from here and wanted to play around with multidimensional scaling with this data. The data looks like this:

In particular, I want to plot the cities in a 2D space, and see how much it matches their real locations in a geographic map from just the information about how far they are from each other, without any explicit latitude and longitude information. This is my code:
import pandas as pd
import numpy as np
from sklearn import manifold
import matplotlib.pyplot as plt
data = pd.read_csv("european_city_distances.csv", index_col='Cities')
mds = manifold.MDS(n_components=2, dissimilarity="precomputed", random_state=6)
results = mds.fit(data.values)
cities = data.columns
coords = results.embedding_
fig = plt.figure(figsize=(12,10))
plt.subplots_adjust(bottom = 0.1)
plt.scatter(coords[:, 0], coords[:, 1])
for label, x, y in zip(cities, coords[:, 0], coords[:, 1]):
plt.annotate(
label,
xy = (x, y),
xytext = (-20, 20),
textcoords = 'offset points'
)
plt.show()

Most of the cities seem to be around the correct general location relative to each other, except a few infractions - Dublin is too far away from London, Istanbul is in the wrong location, etc. However, if I give a different random_state value, it produces a different "map". For example, random_state=1 produces the following map, where many of the cities do not seem to be around the correct general location relative to other cities:

What I don't understand is, dimensionality reduction methods are not supposed to have randomness associated with them, and thus should not give different results for different seeds. But it does here; so what does it mean?
The documentation of the sklearn.manifold.MDS function states that random_state is "the generator used to initialize the centers". So, in particular, I guess what I'm asking is, whatever initialization of the centres we choose, shouldn't all of them lead to one unique result?
I get a much more "accurate" map (to my eyes at least) by giving the following hyperparameter values:
mds = manifold.MDS(n_components=2, dissimilarity="euclidean", n_init=100, max_iter=1000, random_state=1)

python dimensionality-reduction geospatial
$endgroup$
add a comment |
$begingroup$
I took the data from here and wanted to play around with multidimensional scaling with this data. The data looks like this:

In particular, I want to plot the cities in a 2D space, and see how much it matches their real locations in a geographic map from just the information about how far they are from each other, without any explicit latitude and longitude information. This is my code:
import pandas as pd
import numpy as np
from sklearn import manifold
import matplotlib.pyplot as plt
data = pd.read_csv("european_city_distances.csv", index_col='Cities')
mds = manifold.MDS(n_components=2, dissimilarity="precomputed", random_state=6)
results = mds.fit(data.values)
cities = data.columns
coords = results.embedding_
fig = plt.figure(figsize=(12,10))
plt.subplots_adjust(bottom = 0.1)
plt.scatter(coords[:, 0], coords[:, 1])
for label, x, y in zip(cities, coords[:, 0], coords[:, 1]):
plt.annotate(
label,
xy = (x, y),
xytext = (-20, 20),
textcoords = 'offset points'
)
plt.show()

Most of the cities seem to be around the correct general location relative to each other, except a few infractions - Dublin is too far away from London, Istanbul is in the wrong location, etc. However, if I give a different random_state value, it produces a different "map". For example, random_state=1 produces the following map, where many of the cities do not seem to be around the correct general location relative to other cities:

What I don't understand is, dimensionality reduction methods are not supposed to have randomness associated with them, and thus should not give different results for different seeds. But it does here; so what does it mean?
The documentation of the sklearn.manifold.MDS function states that random_state is "the generator used to initialize the centers". So, in particular, I guess what I'm asking is, whatever initialization of the centres we choose, shouldn't all of them lead to one unique result?
I get a much more "accurate" map (to my eyes at least) by giving the following hyperparameter values:
mds = manifold.MDS(n_components=2, dissimilarity="euclidean", n_init=100, max_iter=1000, random_state=1)

python dimensionality-reduction geospatial
$endgroup$
add a comment |
$begingroup$
I took the data from here and wanted to play around with multidimensional scaling with this data. The data looks like this:

In particular, I want to plot the cities in a 2D space, and see how much it matches their real locations in a geographic map from just the information about how far they are from each other, without any explicit latitude and longitude information. This is my code:
import pandas as pd
import numpy as np
from sklearn import manifold
import matplotlib.pyplot as plt
data = pd.read_csv("european_city_distances.csv", index_col='Cities')
mds = manifold.MDS(n_components=2, dissimilarity="precomputed", random_state=6)
results = mds.fit(data.values)
cities = data.columns
coords = results.embedding_
fig = plt.figure(figsize=(12,10))
plt.subplots_adjust(bottom = 0.1)
plt.scatter(coords[:, 0], coords[:, 1])
for label, x, y in zip(cities, coords[:, 0], coords[:, 1]):
plt.annotate(
label,
xy = (x, y),
xytext = (-20, 20),
textcoords = 'offset points'
)
plt.show()

Most of the cities seem to be around the correct general location relative to each other, except a few infractions - Dublin is too far away from London, Istanbul is in the wrong location, etc. However, if I give a different random_state value, it produces a different "map". For example, random_state=1 produces the following map, where many of the cities do not seem to be around the correct general location relative to other cities:

What I don't understand is, dimensionality reduction methods are not supposed to have randomness associated with them, and thus should not give different results for different seeds. But it does here; so what does it mean?
The documentation of the sklearn.manifold.MDS function states that random_state is "the generator used to initialize the centers". So, in particular, I guess what I'm asking is, whatever initialization of the centres we choose, shouldn't all of them lead to one unique result?
I get a much more "accurate" map (to my eyes at least) by giving the following hyperparameter values:
mds = manifold.MDS(n_components=2, dissimilarity="euclidean", n_init=100, max_iter=1000, random_state=1)

python dimensionality-reduction geospatial
$endgroup$
I took the data from here and wanted to play around with multidimensional scaling with this data. The data looks like this:

In particular, I want to plot the cities in a 2D space, and see how much it matches their real locations in a geographic map from just the information about how far they are from each other, without any explicit latitude and longitude information. This is my code:
import pandas as pd
import numpy as np
from sklearn import manifold
import matplotlib.pyplot as plt
data = pd.read_csv("european_city_distances.csv", index_col='Cities')
mds = manifold.MDS(n_components=2, dissimilarity="precomputed", random_state=6)
results = mds.fit(data.values)
cities = data.columns
coords = results.embedding_
fig = plt.figure(figsize=(12,10))
plt.subplots_adjust(bottom = 0.1)
plt.scatter(coords[:, 0], coords[:, 1])
for label, x, y in zip(cities, coords[:, 0], coords[:, 1]):
plt.annotate(
label,
xy = (x, y),
xytext = (-20, 20),
textcoords = 'offset points'
)
plt.show()

Most of the cities seem to be around the correct general location relative to each other, except a few infractions - Dublin is too far away from London, Istanbul is in the wrong location, etc. However, if I give a different random_state value, it produces a different "map". For example, random_state=1 produces the following map, where many of the cities do not seem to be around the correct general location relative to other cities:

What I don't understand is, dimensionality reduction methods are not supposed to have randomness associated with them, and thus should not give different results for different seeds. But it does here; so what does it mean?
The documentation of the sklearn.manifold.MDS function states that random_state is "the generator used to initialize the centers". So, in particular, I guess what I'm asking is, whatever initialization of the centres we choose, shouldn't all of them lead to one unique result?
I get a much more "accurate" map (to my eyes at least) by giving the following hyperparameter values:
mds = manifold.MDS(n_components=2, dissimilarity="euclidean", n_init=100, max_iter=1000, random_state=1)

python dimensionality-reduction geospatial
python dimensionality-reduction geospatial
edited 13 hours ago
Kristada673
asked 14 hours ago
Kristada673Kristada673
1615
1615
add a comment |
add a comment |
0
active
oldest
votes
Your Answer
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "557"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f49316%2fmultidimensional-scaling-producing-different-results-for-different-seeds%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
0
active
oldest
votes
0
active
oldest
votes
active
oldest
votes
active
oldest
votes
Thanks for contributing an answer to Data Science Stack Exchange!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
Use MathJax to format equations. MathJax reference.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f49316%2fmultidimensional-scaling-producing-different-results-for-different-seeds%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown