Multidimensional scaling producing different results for different seeds

I took the data from here and wanted to play around with multidimensional scaling with this data. The data looks like this:

enter image description here

In particular, I want to plot the cities in a 2D space, and see how much it matches their real locations in a geographic map from just the information about how far they are from each other, without any explicit latitude and longitude information. This is my code:

import pandas as pd

import numpy as np

from sklearn import manifold

import matplotlib.pyplot as plt



data = pd.read_csv("european_city_distances.csv", index_col='Cities')



mds = manifold.MDS(n_components=2, dissimilarity="precomputed", random_state=6)

results = mds.fit(data.values)



cities = data.columns

coords = results.embedding_



fig = plt.figure(figsize=(12,10))



plt.subplots_adjust(bottom = 0.1)

plt.scatter(coords[:, 0], coords[:, 1])



for label, x, y in zip(cities, coords[:, 0], coords[:, 1]):

    plt.annotate(

        label,

        xy = (x, y), 

        xytext = (-20, 20),

        textcoords = 'offset points'

    )

plt.show()

enter image description here

Most of the cities seem to be around the correct general location relative to each other, except a few infractions - Dublin is too far away from London, Istanbul is in the wrong location, etc. However, if I give a different random_state value, it produces a different "map". For example, random_state=1 produces the following map, where many of the cities do not seem to be around the correct general location relative to other cities:

enter image description here

What I don't understand is, dimensionality reduction methods are not supposed to have randomness associated with them, and thus should not give different results for different seeds. But it does here; so what does it mean?

The documentation of the sklearn.manifold.MDS function states that random_state is "the generator used to initialize the centers". So, in particular, I guess what I'm asking is, whatever initialization of the centres we choose, shouldn't all of them lead to one unique result?

I get a much more "accurate" map (to my eyes at least) by giving the following hyperparameter values:

mds = manifold.MDS(n_components=2, dissimilarity="euclidean", n_init=100, max_iter=1000, random_state=1)

enter image description here

edited 13 hours ago

asked 14 hours ago

Kristada673

1615

add a comment |

I took the data from here and wanted to play around with multidimensional scaling with this data. The data looks like this:

enter image description here

import pandas as pd

import numpy as np

from sklearn import manifold

import matplotlib.pyplot as plt



data = pd.read_csv("european_city_distances.csv", index_col='Cities')



mds = manifold.MDS(n_components=2, dissimilarity="precomputed", random_state=6)

results = mds.fit(data.values)



cities = data.columns

coords = results.embedding_



fig = plt.figure(figsize=(12,10))



plt.subplots_adjust(bottom = 0.1)

plt.scatter(coords[:, 0], coords[:, 1])



for label, x, y in zip(cities, coords[:, 0], coords[:, 1]):

    plt.annotate(

        label,

        xy = (x, y), 

        xytext = (-20, 20),

        textcoords = 'offset points'

    )

plt.show()

enter image description here

I get a much more "accurate" map (to my eyes at least) by giving the following hyperparameter values:

mds = manifold.MDS(n_components=2, dissimilarity="euclidean", n_init=100, max_iter=1000, random_state=1)

enter image description here

edited 13 hours ago

asked 14 hours ago

Kristada673

1615

add a comment |

I took the data from here and wanted to play around with multidimensional scaling with this data. The data looks like this:

enter image description here

import pandas as pd

import numpy as np

from sklearn import manifold

import matplotlib.pyplot as plt



data = pd.read_csv("european_city_distances.csv", index_col='Cities')



mds = manifold.MDS(n_components=2, dissimilarity="precomputed", random_state=6)

results = mds.fit(data.values)



cities = data.columns

coords = results.embedding_



fig = plt.figure(figsize=(12,10))



plt.subplots_adjust(bottom = 0.1)

plt.scatter(coords[:, 0], coords[:, 1])



for label, x, y in zip(cities, coords[:, 0], coords[:, 1]):

    plt.annotate(

        label,

        xy = (x, y), 

        xytext = (-20, 20),

        textcoords = 'offset points'

    )

plt.show()

enter image description here

I get a much more "accurate" map (to my eyes at least) by giving the following hyperparameter values:

mds = manifold.MDS(n_components=2, dissimilarity="euclidean", n_init=100, max_iter=1000, random_state=1)

enter image description here

edited 13 hours ago

asked 14 hours ago

Kristada673

1615

I took the data from here and wanted to play around with multidimensional scaling with this data. The data looks like this:

enter image description here

import pandas as pd

import numpy as np

from sklearn import manifold

import matplotlib.pyplot as plt



data = pd.read_csv("european_city_distances.csv", index_col='Cities')



mds = manifold.MDS(n_components=2, dissimilarity="precomputed", random_state=6)

results = mds.fit(data.values)



cities = data.columns

coords = results.embedding_



fig = plt.figure(figsize=(12,10))



plt.subplots_adjust(bottom = 0.1)

plt.scatter(coords[:, 0], coords[:, 1])



for label, x, y in zip(cities, coords[:, 0], coords[:, 1]):

    plt.annotate(

        label,

        xy = (x, y), 

        xytext = (-20, 20),

        textcoords = 'offset points'

    )

plt.show()

enter image description here

I get a much more "accurate" map (to my eyes at least) by giving the following hyperparameter values:

mds = manifold.MDS(n_components=2, dissimilarity="euclidean", n_init=100, max_iter=1000, random_state=1)

enter image description here

python dimensionality-reduction geospatial

edited 13 hours ago

asked 14 hours ago

Kristada673

1615

edited 13 hours ago

asked 14 hours ago

Kristada673

1615

edited 13 hours ago

asked 14 hours ago

Kristada673

1615

asked 14 hours ago

Kristada673

1615

asked 14 hours ago

Kristada673

1615

add a comment |

0

active

oldest

votes

Your Answer

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "557"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});

}
});

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f49316%2fmultidimensional-scaling-producing-different-results-for-different-seeds%23new-answer', 'question_page');
}
);

Post as a guest

Name

Required, but never shown

0

active

oldest

votes

0

active

oldest

votes

draft saved

draft discarded

Thanks for contributing an answer to Data Science Stack Exchange!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

Use MathJax to format equations. MathJax reference.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Htydjtk