Multidimensional scaling producing different results for different seeds












1












$begingroup$


I took the data from here and wanted to play around with multidimensional scaling with this data. The data looks like this:



enter image description here



In particular, I want to plot the cities in a 2D space, and see how much it matches their real locations in a geographic map from just the information about how far they are from each other, without any explicit latitude and longitude information. This is my code:



import pandas as pd
import numpy as np
from sklearn import manifold
import matplotlib.pyplot as plt

data = pd.read_csv("european_city_distances.csv", index_col='Cities')

mds = manifold.MDS(n_components=2, dissimilarity="precomputed", random_state=6)
results = mds.fit(data.values)

cities = data.columns
coords = results.embedding_

fig = plt.figure(figsize=(12,10))

plt.subplots_adjust(bottom = 0.1)
plt.scatter(coords[:, 0], coords[:, 1])

for label, x, y in zip(cities, coords[:, 0], coords[:, 1]):
plt.annotate(
label,
xy = (x, y),
xytext = (-20, 20),
textcoords = 'offset points'
)
plt.show()


enter image description here



Most of the cities seem to be around the correct general location relative to each other, except a few infractions - Dublin is too far away from London, Istanbul is in the wrong location, etc. However, if I give a different random_state value, it produces a different "map". For example, random_state=1 produces the following map, where many of the cities do not seem to be around the correct general location relative to other cities:



enter image description here



What I don't understand is, dimensionality reduction methods are not supposed to have randomness associated with them, and thus should not give different results for different seeds. But it does here; so what does it mean?



The documentation of the sklearn.manifold.MDS function states that random_state is "the generator used to initialize the centers". So, in particular, I guess what I'm asking is, whatever initialization of the centres we choose, shouldn't all of them lead to one unique result?





I get a much more "accurate" map (to my eyes at least) by giving the following hyperparameter values:



mds = manifold.MDS(n_components=2, dissimilarity="euclidean", n_init=100, max_iter=1000, random_state=1)


enter image description here










share|improve this question











$endgroup$

















    1












    $begingroup$


    I took the data from here and wanted to play around with multidimensional scaling with this data. The data looks like this:



    enter image description here



    In particular, I want to plot the cities in a 2D space, and see how much it matches their real locations in a geographic map from just the information about how far they are from each other, without any explicit latitude and longitude information. This is my code:



    import pandas as pd
    import numpy as np
    from sklearn import manifold
    import matplotlib.pyplot as plt

    data = pd.read_csv("european_city_distances.csv", index_col='Cities')

    mds = manifold.MDS(n_components=2, dissimilarity="precomputed", random_state=6)
    results = mds.fit(data.values)

    cities = data.columns
    coords = results.embedding_

    fig = plt.figure(figsize=(12,10))

    plt.subplots_adjust(bottom = 0.1)
    plt.scatter(coords[:, 0], coords[:, 1])

    for label, x, y in zip(cities, coords[:, 0], coords[:, 1]):
    plt.annotate(
    label,
    xy = (x, y),
    xytext = (-20, 20),
    textcoords = 'offset points'
    )
    plt.show()


    enter image description here



    Most of the cities seem to be around the correct general location relative to each other, except a few infractions - Dublin is too far away from London, Istanbul is in the wrong location, etc. However, if I give a different random_state value, it produces a different "map". For example, random_state=1 produces the following map, where many of the cities do not seem to be around the correct general location relative to other cities:



    enter image description here



    What I don't understand is, dimensionality reduction methods are not supposed to have randomness associated with them, and thus should not give different results for different seeds. But it does here; so what does it mean?



    The documentation of the sklearn.manifold.MDS function states that random_state is "the generator used to initialize the centers". So, in particular, I guess what I'm asking is, whatever initialization of the centres we choose, shouldn't all of them lead to one unique result?





    I get a much more "accurate" map (to my eyes at least) by giving the following hyperparameter values:



    mds = manifold.MDS(n_components=2, dissimilarity="euclidean", n_init=100, max_iter=1000, random_state=1)


    enter image description here










    share|improve this question











    $endgroup$















      1












      1








      1





      $begingroup$


      I took the data from here and wanted to play around with multidimensional scaling with this data. The data looks like this:



      enter image description here



      In particular, I want to plot the cities in a 2D space, and see how much it matches their real locations in a geographic map from just the information about how far they are from each other, without any explicit latitude and longitude information. This is my code:



      import pandas as pd
      import numpy as np
      from sklearn import manifold
      import matplotlib.pyplot as plt

      data = pd.read_csv("european_city_distances.csv", index_col='Cities')

      mds = manifold.MDS(n_components=2, dissimilarity="precomputed", random_state=6)
      results = mds.fit(data.values)

      cities = data.columns
      coords = results.embedding_

      fig = plt.figure(figsize=(12,10))

      plt.subplots_adjust(bottom = 0.1)
      plt.scatter(coords[:, 0], coords[:, 1])

      for label, x, y in zip(cities, coords[:, 0], coords[:, 1]):
      plt.annotate(
      label,
      xy = (x, y),
      xytext = (-20, 20),
      textcoords = 'offset points'
      )
      plt.show()


      enter image description here



      Most of the cities seem to be around the correct general location relative to each other, except a few infractions - Dublin is too far away from London, Istanbul is in the wrong location, etc. However, if I give a different random_state value, it produces a different "map". For example, random_state=1 produces the following map, where many of the cities do not seem to be around the correct general location relative to other cities:



      enter image description here



      What I don't understand is, dimensionality reduction methods are not supposed to have randomness associated with them, and thus should not give different results for different seeds. But it does here; so what does it mean?



      The documentation of the sklearn.manifold.MDS function states that random_state is "the generator used to initialize the centers". So, in particular, I guess what I'm asking is, whatever initialization of the centres we choose, shouldn't all of them lead to one unique result?





      I get a much more "accurate" map (to my eyes at least) by giving the following hyperparameter values:



      mds = manifold.MDS(n_components=2, dissimilarity="euclidean", n_init=100, max_iter=1000, random_state=1)


      enter image description here










      share|improve this question











      $endgroup$




      I took the data from here and wanted to play around with multidimensional scaling with this data. The data looks like this:



      enter image description here



      In particular, I want to plot the cities in a 2D space, and see how much it matches their real locations in a geographic map from just the information about how far they are from each other, without any explicit latitude and longitude information. This is my code:



      import pandas as pd
      import numpy as np
      from sklearn import manifold
      import matplotlib.pyplot as plt

      data = pd.read_csv("european_city_distances.csv", index_col='Cities')

      mds = manifold.MDS(n_components=2, dissimilarity="precomputed", random_state=6)
      results = mds.fit(data.values)

      cities = data.columns
      coords = results.embedding_

      fig = plt.figure(figsize=(12,10))

      plt.subplots_adjust(bottom = 0.1)
      plt.scatter(coords[:, 0], coords[:, 1])

      for label, x, y in zip(cities, coords[:, 0], coords[:, 1]):
      plt.annotate(
      label,
      xy = (x, y),
      xytext = (-20, 20),
      textcoords = 'offset points'
      )
      plt.show()


      enter image description here



      Most of the cities seem to be around the correct general location relative to each other, except a few infractions - Dublin is too far away from London, Istanbul is in the wrong location, etc. However, if I give a different random_state value, it produces a different "map". For example, random_state=1 produces the following map, where many of the cities do not seem to be around the correct general location relative to other cities:



      enter image description here



      What I don't understand is, dimensionality reduction methods are not supposed to have randomness associated with them, and thus should not give different results for different seeds. But it does here; so what does it mean?



      The documentation of the sklearn.manifold.MDS function states that random_state is "the generator used to initialize the centers". So, in particular, I guess what I'm asking is, whatever initialization of the centres we choose, shouldn't all of them lead to one unique result?





      I get a much more "accurate" map (to my eyes at least) by giving the following hyperparameter values:



      mds = manifold.MDS(n_components=2, dissimilarity="euclidean", n_init=100, max_iter=1000, random_state=1)


      enter image description here







      python dimensionality-reduction geospatial






      share|improve this question















      share|improve this question













      share|improve this question




      share|improve this question








      edited 13 hours ago







      Kristada673

















      asked 14 hours ago









      Kristada673Kristada673

      1615




      1615






















          0






          active

          oldest

          votes












          Your Answer








          StackExchange.ready(function() {
          var channelOptions = {
          tags: "".split(" "),
          id: "557"
          };
          initTagRenderer("".split(" "), "".split(" "), channelOptions);

          StackExchange.using("externalEditor", function() {
          // Have to fire editor after snippets, if snippets enabled
          if (StackExchange.settings.snippets.snippetsEnabled) {
          StackExchange.using("snippets", function() {
          createEditor();
          });
          }
          else {
          createEditor();
          }
          });

          function createEditor() {
          StackExchange.prepareEditor({
          heartbeatType: 'answer',
          autoActivateHeartbeat: false,
          convertImagesToLinks: false,
          noModals: true,
          showLowRepImageUploadWarning: true,
          reputationToPostImages: null,
          bindNavPrevention: true,
          postfix: "",
          imageUploader: {
          brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
          contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
          allowUrls: true
          },
          onDemand: true,
          discardSelector: ".discard-answer"
          ,immediatelyShowMarkdownHelp:true
          });


          }
          });














          draft saved

          draft discarded


















          StackExchange.ready(
          function () {
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f49316%2fmultidimensional-scaling-producing-different-results-for-different-seeds%23new-answer', 'question_page');
          }
          );

          Post as a guest















          Required, but never shown

























          0






          active

          oldest

          votes








          0






          active

          oldest

          votes









          active

          oldest

          votes






          active

          oldest

          votes
















          draft saved

          draft discarded




















































          Thanks for contributing an answer to Data Science Stack Exchange!


          • Please be sure to answer the question. Provide details and share your research!

          But avoid



          • Asking for help, clarification, or responding to other answers.

          • Making statements based on opinion; back them up with references or personal experience.


          Use MathJax to format equations. MathJax reference.


          To learn more, see our tips on writing great answers.




          draft saved


          draft discarded














          StackExchange.ready(
          function () {
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f49316%2fmultidimensional-scaling-producing-different-results-for-different-seeds%23new-answer', 'question_page');
          }
          );

          Post as a guest















          Required, but never shown





















































          Required, but never shown














          Required, but never shown












          Required, but never shown







          Required, but never shown

































          Required, but never shown














          Required, but never shown












          Required, but never shown







          Required, but never shown







          Popular posts from this blog

          Callistus I

          Tabula Rosettana

          How to label and detect the document text images