how to predict content based demand
$begingroup$
this is my first post at ds StackExchange, so please be gentle and let me know if something is not clear :)
I have many products (>1M), and I save all the products purchases in a DB with a time stamp. ("purchases data")
each product has 'content features' (e.g product size, product safety rank etc.)
The "purchases data" looks like this:
| time stamp | product id | content features 1 | ... | content features N |
Where each row is a purchase of a product with id at time stamp.
My main target is to get tomorrow most wanted products,
I translate the problem into predicting demand for the next day, or classify each product id and day as high demanded or low demanded),
I struggle with two main problems with these settings:
Generating demand data: I want to convert the "purchases data" into demand for a day ("demand data")- meaning that I group the data by product id and day,
Then I count the number of rows and save it as 'freq' (and also remove row duplication).
The problem is that the minimum product frequency per day would be 1 and not 0
For example: if product #1 was purchase at Sunday 3 times and at Wednesday 2 times, the purchases and demand data would be:
"purchases data" fi(product id) is mapping to content feature i:
| time stamp | product id | content f 1 | ... | content f N |
| Sunday 05:20 | 1 | f1(1) | ... | fn(1) |
| Sunday 08:11 | 1 | f1(1) | ... | fn(1) |
| Sunday 10:25 | 1 | f1(1) | ... | fn(1) |
| Wednesday 08:10 | 1 | f1(1) | ... | fn(1) |
| Wednesday 16:20 | 1 | f1(1) | ... | fn(1) |
"demand data":
| day | product id | content f 1 | ... | content f N | freq |
| Sunday | 1 | f1(1) | ... | fn(1) | 3 |
| Wednesday | 1 | f1(1) | ... | fn(1) | 2 |
But if product #1 was not purchase at Monday there wouldn't be any row
since there is no purchase
data for this item at that timestamp.
Since there are over 1M products, I want to avoid creating rows with 0 frequency,
Is there a way to create (or to design) demand data from purchases data for a huge amount of items (products) without using 0 demand rows?
Content-based time series: After creating "demand data", I want to use it as a time series.
My problem is that I would need to split the data into over 1M series, one for each product/item id, and these series would also be very sparse...
I want to find a way to use the "content features" as input with the time series and have the model learn to use some kind of averaging of time series based on related content features.
What is the best way to model content based series time prediction?
python time-series machine-learning-model forecast forecasting
New contributor
$endgroup$
add a comment |
$begingroup$
this is my first post at ds StackExchange, so please be gentle and let me know if something is not clear :)
I have many products (>1M), and I save all the products purchases in a DB with a time stamp. ("purchases data")
each product has 'content features' (e.g product size, product safety rank etc.)
The "purchases data" looks like this:
| time stamp | product id | content features 1 | ... | content features N |
Where each row is a purchase of a product with id at time stamp.
My main target is to get tomorrow most wanted products,
I translate the problem into predicting demand for the next day, or classify each product id and day as high demanded or low demanded),
I struggle with two main problems with these settings:
Generating demand data: I want to convert the "purchases data" into demand for a day ("demand data")- meaning that I group the data by product id and day,
Then I count the number of rows and save it as 'freq' (and also remove row duplication).
The problem is that the minimum product frequency per day would be 1 and not 0
For example: if product #1 was purchase at Sunday 3 times and at Wednesday 2 times, the purchases and demand data would be:
"purchases data" fi(product id) is mapping to content feature i:
| time stamp | product id | content f 1 | ... | content f N |
| Sunday 05:20 | 1 | f1(1) | ... | fn(1) |
| Sunday 08:11 | 1 | f1(1) | ... | fn(1) |
| Sunday 10:25 | 1 | f1(1) | ... | fn(1) |
| Wednesday 08:10 | 1 | f1(1) | ... | fn(1) |
| Wednesday 16:20 | 1 | f1(1) | ... | fn(1) |
"demand data":
| day | product id | content f 1 | ... | content f N | freq |
| Sunday | 1 | f1(1) | ... | fn(1) | 3 |
| Wednesday | 1 | f1(1) | ... | fn(1) | 2 |
But if product #1 was not purchase at Monday there wouldn't be any row
since there is no purchase
data for this item at that timestamp.
Since there are over 1M products, I want to avoid creating rows with 0 frequency,
Is there a way to create (or to design) demand data from purchases data for a huge amount of items (products) without using 0 demand rows?
Content-based time series: After creating "demand data", I want to use it as a time series.
My problem is that I would need to split the data into over 1M series, one for each product/item id, and these series would also be very sparse...
I want to find a way to use the "content features" as input with the time series and have the model learn to use some kind of averaging of time series based on related content features.
What is the best way to model content based series time prediction?
python time-series machine-learning-model forecast forecasting
New contributor
$endgroup$
add a comment |
$begingroup$
this is my first post at ds StackExchange, so please be gentle and let me know if something is not clear :)
I have many products (>1M), and I save all the products purchases in a DB with a time stamp. ("purchases data")
each product has 'content features' (e.g product size, product safety rank etc.)
The "purchases data" looks like this:
| time stamp | product id | content features 1 | ... | content features N |
Where each row is a purchase of a product with id at time stamp.
My main target is to get tomorrow most wanted products,
I translate the problem into predicting demand for the next day, or classify each product id and day as high demanded or low demanded),
I struggle with two main problems with these settings:
Generating demand data: I want to convert the "purchases data" into demand for a day ("demand data")- meaning that I group the data by product id and day,
Then I count the number of rows and save it as 'freq' (and also remove row duplication).
The problem is that the minimum product frequency per day would be 1 and not 0
For example: if product #1 was purchase at Sunday 3 times and at Wednesday 2 times, the purchases and demand data would be:
"purchases data" fi(product id) is mapping to content feature i:
| time stamp | product id | content f 1 | ... | content f N |
| Sunday 05:20 | 1 | f1(1) | ... | fn(1) |
| Sunday 08:11 | 1 | f1(1) | ... | fn(1) |
| Sunday 10:25 | 1 | f1(1) | ... | fn(1) |
| Wednesday 08:10 | 1 | f1(1) | ... | fn(1) |
| Wednesday 16:20 | 1 | f1(1) | ... | fn(1) |
"demand data":
| day | product id | content f 1 | ... | content f N | freq |
| Sunday | 1 | f1(1) | ... | fn(1) | 3 |
| Wednesday | 1 | f1(1) | ... | fn(1) | 2 |
But if product #1 was not purchase at Monday there wouldn't be any row
since there is no purchase
data for this item at that timestamp.
Since there are over 1M products, I want to avoid creating rows with 0 frequency,
Is there a way to create (or to design) demand data from purchases data for a huge amount of items (products) without using 0 demand rows?
Content-based time series: After creating "demand data", I want to use it as a time series.
My problem is that I would need to split the data into over 1M series, one for each product/item id, and these series would also be very sparse...
I want to find a way to use the "content features" as input with the time series and have the model learn to use some kind of averaging of time series based on related content features.
What is the best way to model content based series time prediction?
python time-series machine-learning-model forecast forecasting
New contributor
$endgroup$
this is my first post at ds StackExchange, so please be gentle and let me know if something is not clear :)
I have many products (>1M), and I save all the products purchases in a DB with a time stamp. ("purchases data")
each product has 'content features' (e.g product size, product safety rank etc.)
The "purchases data" looks like this:
| time stamp | product id | content features 1 | ... | content features N |
Where each row is a purchase of a product with id at time stamp.
My main target is to get tomorrow most wanted products,
I translate the problem into predicting demand for the next day, or classify each product id and day as high demanded or low demanded),
I struggle with two main problems with these settings:
Generating demand data: I want to convert the "purchases data" into demand for a day ("demand data")- meaning that I group the data by product id and day,
Then I count the number of rows and save it as 'freq' (and also remove row duplication).
The problem is that the minimum product frequency per day would be 1 and not 0
For example: if product #1 was purchase at Sunday 3 times and at Wednesday 2 times, the purchases and demand data would be:
"purchases data" fi(product id) is mapping to content feature i:
| time stamp | product id | content f 1 | ... | content f N |
| Sunday 05:20 | 1 | f1(1) | ... | fn(1) |
| Sunday 08:11 | 1 | f1(1) | ... | fn(1) |
| Sunday 10:25 | 1 | f1(1) | ... | fn(1) |
| Wednesday 08:10 | 1 | f1(1) | ... | fn(1) |
| Wednesday 16:20 | 1 | f1(1) | ... | fn(1) |
"demand data":
| day | product id | content f 1 | ... | content f N | freq |
| Sunday | 1 | f1(1) | ... | fn(1) | 3 |
| Wednesday | 1 | f1(1) | ... | fn(1) | 2 |
But if product #1 was not purchase at Monday there wouldn't be any row
since there is no purchase
data for this item at that timestamp.
Since there are over 1M products, I want to avoid creating rows with 0 frequency,
Is there a way to create (or to design) demand data from purchases data for a huge amount of items (products) without using 0 demand rows?
Content-based time series: After creating "demand data", I want to use it as a time series.
My problem is that I would need to split the data into over 1M series, one for each product/item id, and these series would also be very sparse...
I want to find a way to use the "content features" as input with the time series and have the model learn to use some kind of averaging of time series based on related content features.
What is the best way to model content based series time prediction?
python time-series machine-learning-model forecast forecasting
python time-series machine-learning-model forecast forecasting
New contributor
New contributor
New contributor
asked 18 hours ago
SharonSharon
111
111
New contributor
New contributor
add a comment |
add a comment |
1 Answer
1
active
oldest
votes
$begingroup$
Welcome to the site. I would encourage you to think about your problem in a different way. You are focused on "what sold today" whereas you should be focused on "who bought what over a historical timeline".
What you're looking for is known as a recommender system and there are (generally speaking) two types:
- Content-based - what should you recommend based on attributes of products. The algorithm is basically saying, "You bought breakfast cereal, here are other products that might go with your cereal . . ."
- Community-based - what should you recommend based on attributes of people who bought products. The algorithm is basically saying, "You are a female, under 30, with no kids. Other females, under 30 with no kids also liked these products . . ."
I will assume that you don't have info on your customers so let's focus on the content-based recommenders. You are on the right track by thinking about the attributes of products, but you should be thinking about them both (1) over a longer timeline than just yesterday and (2) how the products and their attributes relate to each other. The people who need attribute X might also need attribute Y and that is (most likely) across multiple products and will generate higher demand for those products.
Start researching content-based recommender systems in your language/tool of choice and you will end up with the desired algorithm. From there you can also think about user data collection and then move into a community-based recommender over the long term.
$endgroup$
add a comment |
Your Answer
StackExchange.ifUsing("editor", function () {
return StackExchange.using("mathjaxEditing", function () {
StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix) {
StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\\(","\\)"]]);
});
});
}, "mathjax-editing");
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "557"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sharon is a new contributor. Be nice, and check out our Code of Conduct.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f45807%2fhow-to-predict-content-based-demand%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
$begingroup$
Welcome to the site. I would encourage you to think about your problem in a different way. You are focused on "what sold today" whereas you should be focused on "who bought what over a historical timeline".
What you're looking for is known as a recommender system and there are (generally speaking) two types:
- Content-based - what should you recommend based on attributes of products. The algorithm is basically saying, "You bought breakfast cereal, here are other products that might go with your cereal . . ."
- Community-based - what should you recommend based on attributes of people who bought products. The algorithm is basically saying, "You are a female, under 30, with no kids. Other females, under 30 with no kids also liked these products . . ."
I will assume that you don't have info on your customers so let's focus on the content-based recommenders. You are on the right track by thinking about the attributes of products, but you should be thinking about them both (1) over a longer timeline than just yesterday and (2) how the products and their attributes relate to each other. The people who need attribute X might also need attribute Y and that is (most likely) across multiple products and will generate higher demand for those products.
Start researching content-based recommender systems in your language/tool of choice and you will end up with the desired algorithm. From there you can also think about user data collection and then move into a community-based recommender over the long term.
$endgroup$
add a comment |
$begingroup$
Welcome to the site. I would encourage you to think about your problem in a different way. You are focused on "what sold today" whereas you should be focused on "who bought what over a historical timeline".
What you're looking for is known as a recommender system and there are (generally speaking) two types:
- Content-based - what should you recommend based on attributes of products. The algorithm is basically saying, "You bought breakfast cereal, here are other products that might go with your cereal . . ."
- Community-based - what should you recommend based on attributes of people who bought products. The algorithm is basically saying, "You are a female, under 30, with no kids. Other females, under 30 with no kids also liked these products . . ."
I will assume that you don't have info on your customers so let's focus on the content-based recommenders. You are on the right track by thinking about the attributes of products, but you should be thinking about them both (1) over a longer timeline than just yesterday and (2) how the products and their attributes relate to each other. The people who need attribute X might also need attribute Y and that is (most likely) across multiple products and will generate higher demand for those products.
Start researching content-based recommender systems in your language/tool of choice and you will end up with the desired algorithm. From there you can also think about user data collection and then move into a community-based recommender over the long term.
$endgroup$
add a comment |
$begingroup$
Welcome to the site. I would encourage you to think about your problem in a different way. You are focused on "what sold today" whereas you should be focused on "who bought what over a historical timeline".
What you're looking for is known as a recommender system and there are (generally speaking) two types:
- Content-based - what should you recommend based on attributes of products. The algorithm is basically saying, "You bought breakfast cereal, here are other products that might go with your cereal . . ."
- Community-based - what should you recommend based on attributes of people who bought products. The algorithm is basically saying, "You are a female, under 30, with no kids. Other females, under 30 with no kids also liked these products . . ."
I will assume that you don't have info on your customers so let's focus on the content-based recommenders. You are on the right track by thinking about the attributes of products, but you should be thinking about them both (1) over a longer timeline than just yesterday and (2) how the products and their attributes relate to each other. The people who need attribute X might also need attribute Y and that is (most likely) across multiple products and will generate higher demand for those products.
Start researching content-based recommender systems in your language/tool of choice and you will end up with the desired algorithm. From there you can also think about user data collection and then move into a community-based recommender over the long term.
$endgroup$
Welcome to the site. I would encourage you to think about your problem in a different way. You are focused on "what sold today" whereas you should be focused on "who bought what over a historical timeline".
What you're looking for is known as a recommender system and there are (generally speaking) two types:
- Content-based - what should you recommend based on attributes of products. The algorithm is basically saying, "You bought breakfast cereal, here are other products that might go with your cereal . . ."
- Community-based - what should you recommend based on attributes of people who bought products. The algorithm is basically saying, "You are a female, under 30, with no kids. Other females, under 30 with no kids also liked these products . . ."
I will assume that you don't have info on your customers so let's focus on the content-based recommenders. You are on the right track by thinking about the attributes of products, but you should be thinking about them both (1) over a longer timeline than just yesterday and (2) how the products and their attributes relate to each other. The people who need attribute X might also need attribute Y and that is (most likely) across multiple products and will generate higher demand for those products.
Start researching content-based recommender systems in your language/tool of choice and you will end up with the desired algorithm. From there you can also think about user data collection and then move into a community-based recommender over the long term.
answered 16 hours ago
I_Play_With_DataI_Play_With_Data
937419
937419
add a comment |
add a comment |
Sharon is a new contributor. Be nice, and check out our Code of Conduct.
Sharon is a new contributor. Be nice, and check out our Code of Conduct.
Sharon is a new contributor. Be nice, and check out our Code of Conduct.
Sharon is a new contributor. Be nice, and check out our Code of Conduct.
Thanks for contributing an answer to Data Science Stack Exchange!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
Use MathJax to format equations. MathJax reference.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f45807%2fhow-to-predict-content-based-demand%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown