Find suitable locations using Machine Learning
$begingroup$
Just for fun, I am currently trying to find suitable locations to deploy new stores. So what I did so far is to take the actual sites of current stores and to assign surrounding variables to it. These features include for example: point of interest density, population density, region popularity etc. In total I have 9000, 100 dimensional points. 1000 of these points contain stores already, the remaining 8000 do not.
In the next step I want to perform dim reduction using PCA. However, I am not sure how to proceed afterwards. Should I try to cluster the points? Or how can I „predict“ which of the points are suitable candidates for new stores? Maybe using some kind of skip gram model?
Hoping to get some advise:)
Cheers,
Tom
machine-learning classification prediction
New contributor
$endgroup$
add a comment |
$begingroup$
Just for fun, I am currently trying to find suitable locations to deploy new stores. So what I did so far is to take the actual sites of current stores and to assign surrounding variables to it. These features include for example: point of interest density, population density, region popularity etc. In total I have 9000, 100 dimensional points. 1000 of these points contain stores already, the remaining 8000 do not.
In the next step I want to perform dim reduction using PCA. However, I am not sure how to proceed afterwards. Should I try to cluster the points? Or how can I „predict“ which of the points are suitable candidates for new stores? Maybe using some kind of skip gram model?
Hoping to get some advise:)
Cheers,
Tom
machine-learning classification prediction
New contributor
$endgroup$
add a comment |
$begingroup$
Just for fun, I am currently trying to find suitable locations to deploy new stores. So what I did so far is to take the actual sites of current stores and to assign surrounding variables to it. These features include for example: point of interest density, population density, region popularity etc. In total I have 9000, 100 dimensional points. 1000 of these points contain stores already, the remaining 8000 do not.
In the next step I want to perform dim reduction using PCA. However, I am not sure how to proceed afterwards. Should I try to cluster the points? Or how can I „predict“ which of the points are suitable candidates for new stores? Maybe using some kind of skip gram model?
Hoping to get some advise:)
Cheers,
Tom
machine-learning classification prediction
New contributor
$endgroup$
Just for fun, I am currently trying to find suitable locations to deploy new stores. So what I did so far is to take the actual sites of current stores and to assign surrounding variables to it. These features include for example: point of interest density, population density, region popularity etc. In total I have 9000, 100 dimensional points. 1000 of these points contain stores already, the remaining 8000 do not.
In the next step I want to perform dim reduction using PCA. However, I am not sure how to proceed afterwards. Should I try to cluster the points? Or how can I „predict“ which of the points are suitable candidates for new stores? Maybe using some kind of skip gram model?
Hoping to get some advise:)
Cheers,
Tom
machine-learning classification prediction
machine-learning classification prediction
New contributor
New contributor
New contributor
asked 2 days ago
LossaLossa
1
1
New contributor
New contributor
add a comment |
add a comment |
1 Answer
1
active
oldest
votes
$begingroup$
Are you sure PCA is the correct way to go?
It's an analytical problem and being able to interpret the results are very important.
How about the correlation between the number of stores and nearby features? Find out what makes a good location. What are the most important features? Run forward or backward selection as an example, or use another model/feature selection technique.
It's not a pure machine learning case you have here. It's a typical analytical data science problem.
If you still want to do classification, just train a model. You have POI features and some others. You know if there is a store or not :) I might not fully understand the problem here. You train on a 50% a store exist location, and 50% a store does not exist in this location dataset. Train a classifier, and classify other areas.
I'd still start to visualize and understand the data as I mentioned first. It's much underrated and the way to start solving most problems.
Hope that gave you some hints,
Cheers
$endgroup$
$begingroup$
Hi Carl, I mean I can still interpret PCA using a correlation analysis between the principal components and the original variables right? This should help to get an idea of how the data looks like. Still it would be a nice idea to use the analytical solution to validate the classification result. Thx for your help!
$endgroup$
– Lossa
2 days ago
$begingroup$
You can look at how much each feature adds to the principal components. I would not call that correlation analysis, but maybe that is something you can do. Looking at how much each feature adds to the principal components is not always very interpretable.
$endgroup$
– Carl Rynegardh
2 days ago
add a comment |
Your Answer
StackExchange.ifUsing("editor", function () {
return StackExchange.using("mathjaxEditing", function () {
StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix) {
StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\\(","\\)"]]);
});
});
}, "mathjax-editing");
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "557"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Lossa is a new contributor. Be nice, and check out our Code of Conduct.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f47039%2ffind-suitable-locations-using-machine-learning%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
$begingroup$
Are you sure PCA is the correct way to go?
It's an analytical problem and being able to interpret the results are very important.
How about the correlation between the number of stores and nearby features? Find out what makes a good location. What are the most important features? Run forward or backward selection as an example, or use another model/feature selection technique.
It's not a pure machine learning case you have here. It's a typical analytical data science problem.
If you still want to do classification, just train a model. You have POI features and some others. You know if there is a store or not :) I might not fully understand the problem here. You train on a 50% a store exist location, and 50% a store does not exist in this location dataset. Train a classifier, and classify other areas.
I'd still start to visualize and understand the data as I mentioned first. It's much underrated and the way to start solving most problems.
Hope that gave you some hints,
Cheers
$endgroup$
$begingroup$
Hi Carl, I mean I can still interpret PCA using a correlation analysis between the principal components and the original variables right? This should help to get an idea of how the data looks like. Still it would be a nice idea to use the analytical solution to validate the classification result. Thx for your help!
$endgroup$
– Lossa
2 days ago
$begingroup$
You can look at how much each feature adds to the principal components. I would not call that correlation analysis, but maybe that is something you can do. Looking at how much each feature adds to the principal components is not always very interpretable.
$endgroup$
– Carl Rynegardh
2 days ago
add a comment |
$begingroup$
Are you sure PCA is the correct way to go?
It's an analytical problem and being able to interpret the results are very important.
How about the correlation between the number of stores and nearby features? Find out what makes a good location. What are the most important features? Run forward or backward selection as an example, or use another model/feature selection technique.
It's not a pure machine learning case you have here. It's a typical analytical data science problem.
If you still want to do classification, just train a model. You have POI features and some others. You know if there is a store or not :) I might not fully understand the problem here. You train on a 50% a store exist location, and 50% a store does not exist in this location dataset. Train a classifier, and classify other areas.
I'd still start to visualize and understand the data as I mentioned first. It's much underrated and the way to start solving most problems.
Hope that gave you some hints,
Cheers
$endgroup$
$begingroup$
Hi Carl, I mean I can still interpret PCA using a correlation analysis between the principal components and the original variables right? This should help to get an idea of how the data looks like. Still it would be a nice idea to use the analytical solution to validate the classification result. Thx for your help!
$endgroup$
– Lossa
2 days ago
$begingroup$
You can look at how much each feature adds to the principal components. I would not call that correlation analysis, but maybe that is something you can do. Looking at how much each feature adds to the principal components is not always very interpretable.
$endgroup$
– Carl Rynegardh
2 days ago
add a comment |
$begingroup$
Are you sure PCA is the correct way to go?
It's an analytical problem and being able to interpret the results are very important.
How about the correlation between the number of stores and nearby features? Find out what makes a good location. What are the most important features? Run forward or backward selection as an example, or use another model/feature selection technique.
It's not a pure machine learning case you have here. It's a typical analytical data science problem.
If you still want to do classification, just train a model. You have POI features and some others. You know if there is a store or not :) I might not fully understand the problem here. You train on a 50% a store exist location, and 50% a store does not exist in this location dataset. Train a classifier, and classify other areas.
I'd still start to visualize and understand the data as I mentioned first. It's much underrated and the way to start solving most problems.
Hope that gave you some hints,
Cheers
$endgroup$
Are you sure PCA is the correct way to go?
It's an analytical problem and being able to interpret the results are very important.
How about the correlation between the number of stores and nearby features? Find out what makes a good location. What are the most important features? Run forward or backward selection as an example, or use another model/feature selection technique.
It's not a pure machine learning case you have here. It's a typical analytical data science problem.
If you still want to do classification, just train a model. You have POI features and some others. You know if there is a store or not :) I might not fully understand the problem here. You train on a 50% a store exist location, and 50% a store does not exist in this location dataset. Train a classifier, and classify other areas.
I'd still start to visualize and understand the data as I mentioned first. It's much underrated and the way to start solving most problems.
Hope that gave you some hints,
Cheers
answered 2 days ago
Carl RynegardhCarl Rynegardh
30119
30119
$begingroup$
Hi Carl, I mean I can still interpret PCA using a correlation analysis between the principal components and the original variables right? This should help to get an idea of how the data looks like. Still it would be a nice idea to use the analytical solution to validate the classification result. Thx for your help!
$endgroup$
– Lossa
2 days ago
$begingroup$
You can look at how much each feature adds to the principal components. I would not call that correlation analysis, but maybe that is something you can do. Looking at how much each feature adds to the principal components is not always very interpretable.
$endgroup$
– Carl Rynegardh
2 days ago
add a comment |
$begingroup$
Hi Carl, I mean I can still interpret PCA using a correlation analysis between the principal components and the original variables right? This should help to get an idea of how the data looks like. Still it would be a nice idea to use the analytical solution to validate the classification result. Thx for your help!
$endgroup$
– Lossa
2 days ago
$begingroup$
You can look at how much each feature adds to the principal components. I would not call that correlation analysis, but maybe that is something you can do. Looking at how much each feature adds to the principal components is not always very interpretable.
$endgroup$
– Carl Rynegardh
2 days ago
$begingroup$
Hi Carl, I mean I can still interpret PCA using a correlation analysis between the principal components and the original variables right? This should help to get an idea of how the data looks like. Still it would be a nice idea to use the analytical solution to validate the classification result. Thx for your help!
$endgroup$
– Lossa
2 days ago
$begingroup$
Hi Carl, I mean I can still interpret PCA using a correlation analysis between the principal components and the original variables right? This should help to get an idea of how the data looks like. Still it would be a nice idea to use the analytical solution to validate the classification result. Thx for your help!
$endgroup$
– Lossa
2 days ago
$begingroup$
You can look at how much each feature adds to the principal components. I would not call that correlation analysis, but maybe that is something you can do. Looking at how much each feature adds to the principal components is not always very interpretable.
$endgroup$
– Carl Rynegardh
2 days ago
$begingroup$
You can look at how much each feature adds to the principal components. I would not call that correlation analysis, but maybe that is something you can do. Looking at how much each feature adds to the principal components is not always very interpretable.
$endgroup$
– Carl Rynegardh
2 days ago
add a comment |
Lossa is a new contributor. Be nice, and check out our Code of Conduct.
Lossa is a new contributor. Be nice, and check out our Code of Conduct.
Lossa is a new contributor. Be nice, and check out our Code of Conduct.
Lossa is a new contributor. Be nice, and check out our Code of Conduct.
Thanks for contributing an answer to Data Science Stack Exchange!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
Use MathJax to format equations. MathJax reference.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f47039%2ffind-suitable-locations-using-machine-learning%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown