Find suitable locations using Machine Learning












0












$begingroup$


Just for fun, I am currently trying to find suitable locations to deploy new stores. So what I did so far is to take the actual sites of current stores and to assign surrounding variables to it. These features include for example: point of interest density, population density, region popularity etc. In total I have 9000, 100 dimensional points. 1000 of these points contain stores already, the remaining 8000 do not.



In the next step I want to perform dim reduction using PCA. However, I am not sure how to proceed afterwards. Should I try to cluster the points? Or how can I „predict“ which of the points are suitable candidates for new stores? Maybe using some kind of skip gram model?



Hoping to get some advise:)



Cheers,
Tom










share|improve this question







New contributor




Lossa is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.







$endgroup$

















    0












    $begingroup$


    Just for fun, I am currently trying to find suitable locations to deploy new stores. So what I did so far is to take the actual sites of current stores and to assign surrounding variables to it. These features include for example: point of interest density, population density, region popularity etc. In total I have 9000, 100 dimensional points. 1000 of these points contain stores already, the remaining 8000 do not.



    In the next step I want to perform dim reduction using PCA. However, I am not sure how to proceed afterwards. Should I try to cluster the points? Or how can I „predict“ which of the points are suitable candidates for new stores? Maybe using some kind of skip gram model?



    Hoping to get some advise:)



    Cheers,
    Tom










    share|improve this question







    New contributor




    Lossa is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
    Check out our Code of Conduct.







    $endgroup$















      0












      0








      0





      $begingroup$


      Just for fun, I am currently trying to find suitable locations to deploy new stores. So what I did so far is to take the actual sites of current stores and to assign surrounding variables to it. These features include for example: point of interest density, population density, region popularity etc. In total I have 9000, 100 dimensional points. 1000 of these points contain stores already, the remaining 8000 do not.



      In the next step I want to perform dim reduction using PCA. However, I am not sure how to proceed afterwards. Should I try to cluster the points? Or how can I „predict“ which of the points are suitable candidates for new stores? Maybe using some kind of skip gram model?



      Hoping to get some advise:)



      Cheers,
      Tom










      share|improve this question







      New contributor




      Lossa is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
      Check out our Code of Conduct.







      $endgroup$




      Just for fun, I am currently trying to find suitable locations to deploy new stores. So what I did so far is to take the actual sites of current stores and to assign surrounding variables to it. These features include for example: point of interest density, population density, region popularity etc. In total I have 9000, 100 dimensional points. 1000 of these points contain stores already, the remaining 8000 do not.



      In the next step I want to perform dim reduction using PCA. However, I am not sure how to proceed afterwards. Should I try to cluster the points? Or how can I „predict“ which of the points are suitable candidates for new stores? Maybe using some kind of skip gram model?



      Hoping to get some advise:)



      Cheers,
      Tom







      machine-learning classification prediction






      share|improve this question







      New contributor




      Lossa is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
      Check out our Code of Conduct.











      share|improve this question







      New contributor




      Lossa is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
      Check out our Code of Conduct.









      share|improve this question




      share|improve this question






      New contributor




      Lossa is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
      Check out our Code of Conduct.









      asked 2 days ago









      LossaLossa

      1




      1




      New contributor




      Lossa is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
      Check out our Code of Conduct.





      New contributor





      Lossa is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
      Check out our Code of Conduct.






      Lossa is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
      Check out our Code of Conduct.






















          1 Answer
          1






          active

          oldest

          votes


















          0












          $begingroup$

          Are you sure PCA is the correct way to go?
          It's an analytical problem and being able to interpret the results are very important.



          How about the correlation between the number of stores and nearby features? Find out what makes a good location. What are the most important features? Run forward or backward selection as an example, or use another model/feature selection technique.



          It's not a pure machine learning case you have here. It's a typical analytical data science problem.



          If you still want to do classification, just train a model. You have POI features and some others. You know if there is a store or not :) I might not fully understand the problem here. You train on a 50% a store exist location, and 50% a store does not exist in this location dataset. Train a classifier, and classify other areas.



          I'd still start to visualize and understand the data as I mentioned first. It's much underrated and the way to start solving most problems.



          Hope that gave you some hints,



          Cheers






          share|improve this answer









          $endgroup$













          • $begingroup$
            Hi Carl, I mean I can still interpret PCA using a correlation analysis between the principal components and the original variables right? This should help to get an idea of how the data looks like. Still it would be a nice idea to use the analytical solution to validate the classification result. Thx for your help!
            $endgroup$
            – Lossa
            2 days ago










          • $begingroup$
            You can look at how much each feature adds to the principal components. I would not call that correlation analysis, but maybe that is something you can do. Looking at how much each feature adds to the principal components is not always very interpretable.
            $endgroup$
            – Carl Rynegardh
            2 days ago













          Your Answer





          StackExchange.ifUsing("editor", function () {
          return StackExchange.using("mathjaxEditing", function () {
          StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix) {
          StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\\(","\\)"]]);
          });
          });
          }, "mathjax-editing");

          StackExchange.ready(function() {
          var channelOptions = {
          tags: "".split(" "),
          id: "557"
          };
          initTagRenderer("".split(" "), "".split(" "), channelOptions);

          StackExchange.using("externalEditor", function() {
          // Have to fire editor after snippets, if snippets enabled
          if (StackExchange.settings.snippets.snippetsEnabled) {
          StackExchange.using("snippets", function() {
          createEditor();
          });
          }
          else {
          createEditor();
          }
          });

          function createEditor() {
          StackExchange.prepareEditor({
          heartbeatType: 'answer',
          autoActivateHeartbeat: false,
          convertImagesToLinks: false,
          noModals: true,
          showLowRepImageUploadWarning: true,
          reputationToPostImages: null,
          bindNavPrevention: true,
          postfix: "",
          imageUploader: {
          brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
          contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
          allowUrls: true
          },
          onDemand: true,
          discardSelector: ".discard-answer"
          ,immediatelyShowMarkdownHelp:true
          });


          }
          });






          Lossa is a new contributor. Be nice, and check out our Code of Conduct.










          draft saved

          draft discarded


















          StackExchange.ready(
          function () {
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f47039%2ffind-suitable-locations-using-machine-learning%23new-answer', 'question_page');
          }
          );

          Post as a guest















          Required, but never shown

























          1 Answer
          1






          active

          oldest

          votes








          1 Answer
          1






          active

          oldest

          votes









          active

          oldest

          votes






          active

          oldest

          votes









          0












          $begingroup$

          Are you sure PCA is the correct way to go?
          It's an analytical problem and being able to interpret the results are very important.



          How about the correlation between the number of stores and nearby features? Find out what makes a good location. What are the most important features? Run forward or backward selection as an example, or use another model/feature selection technique.



          It's not a pure machine learning case you have here. It's a typical analytical data science problem.



          If you still want to do classification, just train a model. You have POI features and some others. You know if there is a store or not :) I might not fully understand the problem here. You train on a 50% a store exist location, and 50% a store does not exist in this location dataset. Train a classifier, and classify other areas.



          I'd still start to visualize and understand the data as I mentioned first. It's much underrated and the way to start solving most problems.



          Hope that gave you some hints,



          Cheers






          share|improve this answer









          $endgroup$













          • $begingroup$
            Hi Carl, I mean I can still interpret PCA using a correlation analysis between the principal components and the original variables right? This should help to get an idea of how the data looks like. Still it would be a nice idea to use the analytical solution to validate the classification result. Thx for your help!
            $endgroup$
            – Lossa
            2 days ago










          • $begingroup$
            You can look at how much each feature adds to the principal components. I would not call that correlation analysis, but maybe that is something you can do. Looking at how much each feature adds to the principal components is not always very interpretable.
            $endgroup$
            – Carl Rynegardh
            2 days ago


















          0












          $begingroup$

          Are you sure PCA is the correct way to go?
          It's an analytical problem and being able to interpret the results are very important.



          How about the correlation between the number of stores and nearby features? Find out what makes a good location. What are the most important features? Run forward or backward selection as an example, or use another model/feature selection technique.



          It's not a pure machine learning case you have here. It's a typical analytical data science problem.



          If you still want to do classification, just train a model. You have POI features and some others. You know if there is a store or not :) I might not fully understand the problem here. You train on a 50% a store exist location, and 50% a store does not exist in this location dataset. Train a classifier, and classify other areas.



          I'd still start to visualize and understand the data as I mentioned first. It's much underrated and the way to start solving most problems.



          Hope that gave you some hints,



          Cheers






          share|improve this answer









          $endgroup$













          • $begingroup$
            Hi Carl, I mean I can still interpret PCA using a correlation analysis between the principal components and the original variables right? This should help to get an idea of how the data looks like. Still it would be a nice idea to use the analytical solution to validate the classification result. Thx for your help!
            $endgroup$
            – Lossa
            2 days ago










          • $begingroup$
            You can look at how much each feature adds to the principal components. I would not call that correlation analysis, but maybe that is something you can do. Looking at how much each feature adds to the principal components is not always very interpretable.
            $endgroup$
            – Carl Rynegardh
            2 days ago
















          0












          0








          0





          $begingroup$

          Are you sure PCA is the correct way to go?
          It's an analytical problem and being able to interpret the results are very important.



          How about the correlation between the number of stores and nearby features? Find out what makes a good location. What are the most important features? Run forward or backward selection as an example, or use another model/feature selection technique.



          It's not a pure machine learning case you have here. It's a typical analytical data science problem.



          If you still want to do classification, just train a model. You have POI features and some others. You know if there is a store or not :) I might not fully understand the problem here. You train on a 50% a store exist location, and 50% a store does not exist in this location dataset. Train a classifier, and classify other areas.



          I'd still start to visualize and understand the data as I mentioned first. It's much underrated and the way to start solving most problems.



          Hope that gave you some hints,



          Cheers






          share|improve this answer









          $endgroup$



          Are you sure PCA is the correct way to go?
          It's an analytical problem and being able to interpret the results are very important.



          How about the correlation between the number of stores and nearby features? Find out what makes a good location. What are the most important features? Run forward or backward selection as an example, or use another model/feature selection technique.



          It's not a pure machine learning case you have here. It's a typical analytical data science problem.



          If you still want to do classification, just train a model. You have POI features and some others. You know if there is a store or not :) I might not fully understand the problem here. You train on a 50% a store exist location, and 50% a store does not exist in this location dataset. Train a classifier, and classify other areas.



          I'd still start to visualize and understand the data as I mentioned first. It's much underrated and the way to start solving most problems.



          Hope that gave you some hints,



          Cheers







          share|improve this answer












          share|improve this answer



          share|improve this answer










          answered 2 days ago









          Carl RynegardhCarl Rynegardh

          30119




          30119












          • $begingroup$
            Hi Carl, I mean I can still interpret PCA using a correlation analysis between the principal components and the original variables right? This should help to get an idea of how the data looks like. Still it would be a nice idea to use the analytical solution to validate the classification result. Thx for your help!
            $endgroup$
            – Lossa
            2 days ago










          • $begingroup$
            You can look at how much each feature adds to the principal components. I would not call that correlation analysis, but maybe that is something you can do. Looking at how much each feature adds to the principal components is not always very interpretable.
            $endgroup$
            – Carl Rynegardh
            2 days ago




















          • $begingroup$
            Hi Carl, I mean I can still interpret PCA using a correlation analysis between the principal components and the original variables right? This should help to get an idea of how the data looks like. Still it would be a nice idea to use the analytical solution to validate the classification result. Thx for your help!
            $endgroup$
            – Lossa
            2 days ago










          • $begingroup$
            You can look at how much each feature adds to the principal components. I would not call that correlation analysis, but maybe that is something you can do. Looking at how much each feature adds to the principal components is not always very interpretable.
            $endgroup$
            – Carl Rynegardh
            2 days ago


















          $begingroup$
          Hi Carl, I mean I can still interpret PCA using a correlation analysis between the principal components and the original variables right? This should help to get an idea of how the data looks like. Still it would be a nice idea to use the analytical solution to validate the classification result. Thx for your help!
          $endgroup$
          – Lossa
          2 days ago




          $begingroup$
          Hi Carl, I mean I can still interpret PCA using a correlation analysis between the principal components and the original variables right? This should help to get an idea of how the data looks like. Still it would be a nice idea to use the analytical solution to validate the classification result. Thx for your help!
          $endgroup$
          – Lossa
          2 days ago












          $begingroup$
          You can look at how much each feature adds to the principal components. I would not call that correlation analysis, but maybe that is something you can do. Looking at how much each feature adds to the principal components is not always very interpretable.
          $endgroup$
          – Carl Rynegardh
          2 days ago






          $begingroup$
          You can look at how much each feature adds to the principal components. I would not call that correlation analysis, but maybe that is something you can do. Looking at how much each feature adds to the principal components is not always very interpretable.
          $endgroup$
          – Carl Rynegardh
          2 days ago












          Lossa is a new contributor. Be nice, and check out our Code of Conduct.










          draft saved

          draft discarded


















          Lossa is a new contributor. Be nice, and check out our Code of Conduct.













          Lossa is a new contributor. Be nice, and check out our Code of Conduct.












          Lossa is a new contributor. Be nice, and check out our Code of Conduct.
















          Thanks for contributing an answer to Data Science Stack Exchange!


          • Please be sure to answer the question. Provide details and share your research!

          But avoid



          • Asking for help, clarification, or responding to other answers.

          • Making statements based on opinion; back them up with references or personal experience.


          Use MathJax to format equations. MathJax reference.


          To learn more, see our tips on writing great answers.




          draft saved


          draft discarded














          StackExchange.ready(
          function () {
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f47039%2ffind-suitable-locations-using-machine-learning%23new-answer', 'question_page');
          }
          );

          Post as a guest















          Required, but never shown





















































          Required, but never shown














          Required, but never shown












          Required, but never shown







          Required, but never shown

































          Required, but never shown














          Required, but never shown












          Required, but never shown







          Required, but never shown







          Popular posts from this blog

          How to label and detect the document text images

          Tabula Rosettana

          Aureus (color)