how to predict content based demand












2












$begingroup$


this is my first post at ds StackExchange, so please be gentle and let me know if something is not clear :)



I have many products (>1M), and I save all the products purchases in a DB with a time stamp. ("purchases data")

each product has 'content features' (e.g product size, product safety rank etc.)



The "purchases data" looks like this:




| time stamp | product id | content features 1 | ... | content features N |




Where each row is a purchase of a product with id at time stamp.



My main target is to get tomorrow most wanted products,
I translate the problem into predicting demand for the next day, or classify each product id and day as high demanded or low demanded),



I struggle with two main problems with these settings:





  1. Generating demand data: I want to convert the "purchases data" into demand for a day ("demand data")- meaning that I group the data by product id and day,

    Then I count the number of rows and save it as 'freq' (and also remove row duplication).

    The problem is that the minimum product frequency per day would be 1 and not 0



For example: if product #1 was purchase at Sunday 3 times and at Wednesday 2 times, the purchases and demand data would be:




"purchases data" fi(product id) is mapping to content feature i:





   | time stamp      | product id | content f 1 | ... | content f N |
| Sunday 05:20 | 1 | f1(1) | ... | fn(1) |
| Sunday 08:11 | 1 | f1(1) | ... | fn(1) |
| Sunday 10:25 | 1 | f1(1) | ... | fn(1) |
| Wednesday 08:10 | 1 | f1(1) | ... | fn(1) |
| Wednesday 16:20 | 1 | f1(1) | ... | fn(1) |




"demand data":





   | day             | product id | content f 1 | ... | content f N | freq |   
| Sunday | 1 | f1(1) | ... | fn(1) | 3 |
| Wednesday | 1 | f1(1) | ... | fn(1) | 2 |


But if product #1 was not purchase at Monday there wouldn't be any row
since there is no purchase
data for this item at that timestamp.





Since there are over 1M products, I want to avoid creating rows with 0 frequency,



Is there a way to create (or to design) demand data from purchases data for a huge amount of items (products) without using 0 demand rows?





  1. Content-based time series: After creating "demand data", I want to use it as a time series.

    My problem is that I would need to split the data into over 1M series, one for each product/item id, and these series would also be very sparse...
    I want to find a way to use the "content features" as input with the time series and have the model learn to use some kind of averaging of time series based on related content features.


What is the best way to model content based series time prediction?










share|improve this question







New contributor




Sharon is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.







$endgroup$

















    2












    $begingroup$


    this is my first post at ds StackExchange, so please be gentle and let me know if something is not clear :)



    I have many products (>1M), and I save all the products purchases in a DB with a time stamp. ("purchases data")

    each product has 'content features' (e.g product size, product safety rank etc.)



    The "purchases data" looks like this:




    | time stamp | product id | content features 1 | ... | content features N |




    Where each row is a purchase of a product with id at time stamp.



    My main target is to get tomorrow most wanted products,
    I translate the problem into predicting demand for the next day, or classify each product id and day as high demanded or low demanded),



    I struggle with two main problems with these settings:





    1. Generating demand data: I want to convert the "purchases data" into demand for a day ("demand data")- meaning that I group the data by product id and day,

      Then I count the number of rows and save it as 'freq' (and also remove row duplication).

      The problem is that the minimum product frequency per day would be 1 and not 0



    For example: if product #1 was purchase at Sunday 3 times and at Wednesday 2 times, the purchases and demand data would be:




    "purchases data" fi(product id) is mapping to content feature i:





       | time stamp      | product id | content f 1 | ... | content f N |
    | Sunday 05:20 | 1 | f1(1) | ... | fn(1) |
    | Sunday 08:11 | 1 | f1(1) | ... | fn(1) |
    | Sunday 10:25 | 1 | f1(1) | ... | fn(1) |
    | Wednesday 08:10 | 1 | f1(1) | ... | fn(1) |
    | Wednesday 16:20 | 1 | f1(1) | ... | fn(1) |




    "demand data":





       | day             | product id | content f 1 | ... | content f N | freq |   
    | Sunday | 1 | f1(1) | ... | fn(1) | 3 |
    | Wednesday | 1 | f1(1) | ... | fn(1) | 2 |


    But if product #1 was not purchase at Monday there wouldn't be any row
    since there is no purchase
    data for this item at that timestamp.





    Since there are over 1M products, I want to avoid creating rows with 0 frequency,



    Is there a way to create (or to design) demand data from purchases data for a huge amount of items (products) without using 0 demand rows?





    1. Content-based time series: After creating "demand data", I want to use it as a time series.

      My problem is that I would need to split the data into over 1M series, one for each product/item id, and these series would also be very sparse...
      I want to find a way to use the "content features" as input with the time series and have the model learn to use some kind of averaging of time series based on related content features.


    What is the best way to model content based series time prediction?










    share|improve this question







    New contributor




    Sharon is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
    Check out our Code of Conduct.







    $endgroup$















      2












      2








      2


      1



      $begingroup$


      this is my first post at ds StackExchange, so please be gentle and let me know if something is not clear :)



      I have many products (>1M), and I save all the products purchases in a DB with a time stamp. ("purchases data")

      each product has 'content features' (e.g product size, product safety rank etc.)



      The "purchases data" looks like this:




      | time stamp | product id | content features 1 | ... | content features N |




      Where each row is a purchase of a product with id at time stamp.



      My main target is to get tomorrow most wanted products,
      I translate the problem into predicting demand for the next day, or classify each product id and day as high demanded or low demanded),



      I struggle with two main problems with these settings:





      1. Generating demand data: I want to convert the "purchases data" into demand for a day ("demand data")- meaning that I group the data by product id and day,

        Then I count the number of rows and save it as 'freq' (and also remove row duplication).

        The problem is that the minimum product frequency per day would be 1 and not 0



      For example: if product #1 was purchase at Sunday 3 times and at Wednesday 2 times, the purchases and demand data would be:




      "purchases data" fi(product id) is mapping to content feature i:





         | time stamp      | product id | content f 1 | ... | content f N |
      | Sunday 05:20 | 1 | f1(1) | ... | fn(1) |
      | Sunday 08:11 | 1 | f1(1) | ... | fn(1) |
      | Sunday 10:25 | 1 | f1(1) | ... | fn(1) |
      | Wednesday 08:10 | 1 | f1(1) | ... | fn(1) |
      | Wednesday 16:20 | 1 | f1(1) | ... | fn(1) |




      "demand data":





         | day             | product id | content f 1 | ... | content f N | freq |   
      | Sunday | 1 | f1(1) | ... | fn(1) | 3 |
      | Wednesday | 1 | f1(1) | ... | fn(1) | 2 |


      But if product #1 was not purchase at Monday there wouldn't be any row
      since there is no purchase
      data for this item at that timestamp.





      Since there are over 1M products, I want to avoid creating rows with 0 frequency,



      Is there a way to create (or to design) demand data from purchases data for a huge amount of items (products) without using 0 demand rows?





      1. Content-based time series: After creating "demand data", I want to use it as a time series.

        My problem is that I would need to split the data into over 1M series, one for each product/item id, and these series would also be very sparse...
        I want to find a way to use the "content features" as input with the time series and have the model learn to use some kind of averaging of time series based on related content features.


      What is the best way to model content based series time prediction?










      share|improve this question







      New contributor




      Sharon is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
      Check out our Code of Conduct.







      $endgroup$




      this is my first post at ds StackExchange, so please be gentle and let me know if something is not clear :)



      I have many products (>1M), and I save all the products purchases in a DB with a time stamp. ("purchases data")

      each product has 'content features' (e.g product size, product safety rank etc.)



      The "purchases data" looks like this:




      | time stamp | product id | content features 1 | ... | content features N |




      Where each row is a purchase of a product with id at time stamp.



      My main target is to get tomorrow most wanted products,
      I translate the problem into predicting demand for the next day, or classify each product id and day as high demanded or low demanded),



      I struggle with two main problems with these settings:





      1. Generating demand data: I want to convert the "purchases data" into demand for a day ("demand data")- meaning that I group the data by product id and day,

        Then I count the number of rows and save it as 'freq' (and also remove row duplication).

        The problem is that the minimum product frequency per day would be 1 and not 0



      For example: if product #1 was purchase at Sunday 3 times and at Wednesday 2 times, the purchases and demand data would be:




      "purchases data" fi(product id) is mapping to content feature i:





         | time stamp      | product id | content f 1 | ... | content f N |
      | Sunday 05:20 | 1 | f1(1) | ... | fn(1) |
      | Sunday 08:11 | 1 | f1(1) | ... | fn(1) |
      | Sunday 10:25 | 1 | f1(1) | ... | fn(1) |
      | Wednesday 08:10 | 1 | f1(1) | ... | fn(1) |
      | Wednesday 16:20 | 1 | f1(1) | ... | fn(1) |




      "demand data":





         | day             | product id | content f 1 | ... | content f N | freq |   
      | Sunday | 1 | f1(1) | ... | fn(1) | 3 |
      | Wednesday | 1 | f1(1) | ... | fn(1) | 2 |


      But if product #1 was not purchase at Monday there wouldn't be any row
      since there is no purchase
      data for this item at that timestamp.





      Since there are over 1M products, I want to avoid creating rows with 0 frequency,



      Is there a way to create (or to design) demand data from purchases data for a huge amount of items (products) without using 0 demand rows?





      1. Content-based time series: After creating "demand data", I want to use it as a time series.

        My problem is that I would need to split the data into over 1M series, one for each product/item id, and these series would also be very sparse...
        I want to find a way to use the "content features" as input with the time series and have the model learn to use some kind of averaging of time series based on related content features.


      What is the best way to model content based series time prediction?







      python time-series machine-learning-model forecast forecasting






      share|improve this question







      New contributor




      Sharon is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
      Check out our Code of Conduct.











      share|improve this question







      New contributor




      Sharon is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
      Check out our Code of Conduct.









      share|improve this question




      share|improve this question






      New contributor




      Sharon is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
      Check out our Code of Conduct.









      asked 18 hours ago









      SharonSharon

      111




      111




      New contributor




      Sharon is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
      Check out our Code of Conduct.





      New contributor





      Sharon is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
      Check out our Code of Conduct.






      Sharon is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
      Check out our Code of Conduct.






















          1 Answer
          1






          active

          oldest

          votes


















          1












          $begingroup$

          Welcome to the site. I would encourage you to think about your problem in a different way. You are focused on "what sold today" whereas you should be focused on "who bought what over a historical timeline".



          What you're looking for is known as a recommender system and there are (generally speaking) two types:




          1. Content-based - what should you recommend based on attributes of products. The algorithm is basically saying, "You bought breakfast cereal, here are other products that might go with your cereal . . ."

          2. Community-based - what should you recommend based on attributes of people who bought products. The algorithm is basically saying, "You are a female, under 30, with no kids. Other females, under 30 with no kids also liked these products . . ."


          I will assume that you don't have info on your customers so let's focus on the content-based recommenders. You are on the right track by thinking about the attributes of products, but you should be thinking about them both (1) over a longer timeline than just yesterday and (2) how the products and their attributes relate to each other. The people who need attribute X might also need attribute Y and that is (most likely) across multiple products and will generate higher demand for those products.



          Start researching content-based recommender systems in your language/tool of choice and you will end up with the desired algorithm. From there you can also think about user data collection and then move into a community-based recommender over the long term.






          share|improve this answer









          $endgroup$













            Your Answer





            StackExchange.ifUsing("editor", function () {
            return StackExchange.using("mathjaxEditing", function () {
            StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix) {
            StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\\(","\\)"]]);
            });
            });
            }, "mathjax-editing");

            StackExchange.ready(function() {
            var channelOptions = {
            tags: "".split(" "),
            id: "557"
            };
            initTagRenderer("".split(" "), "".split(" "), channelOptions);

            StackExchange.using("externalEditor", function() {
            // Have to fire editor after snippets, if snippets enabled
            if (StackExchange.settings.snippets.snippetsEnabled) {
            StackExchange.using("snippets", function() {
            createEditor();
            });
            }
            else {
            createEditor();
            }
            });

            function createEditor() {
            StackExchange.prepareEditor({
            heartbeatType: 'answer',
            autoActivateHeartbeat: false,
            convertImagesToLinks: false,
            noModals: true,
            showLowRepImageUploadWarning: true,
            reputationToPostImages: null,
            bindNavPrevention: true,
            postfix: "",
            imageUploader: {
            brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
            contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
            allowUrls: true
            },
            onDemand: true,
            discardSelector: ".discard-answer"
            ,immediatelyShowMarkdownHelp:true
            });


            }
            });






            Sharon is a new contributor. Be nice, and check out our Code of Conduct.










            draft saved

            draft discarded


















            StackExchange.ready(
            function () {
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f45807%2fhow-to-predict-content-based-demand%23new-answer', 'question_page');
            }
            );

            Post as a guest















            Required, but never shown

























            1 Answer
            1






            active

            oldest

            votes








            1 Answer
            1






            active

            oldest

            votes









            active

            oldest

            votes






            active

            oldest

            votes









            1












            $begingroup$

            Welcome to the site. I would encourage you to think about your problem in a different way. You are focused on "what sold today" whereas you should be focused on "who bought what over a historical timeline".



            What you're looking for is known as a recommender system and there are (generally speaking) two types:




            1. Content-based - what should you recommend based on attributes of products. The algorithm is basically saying, "You bought breakfast cereal, here are other products that might go with your cereal . . ."

            2. Community-based - what should you recommend based on attributes of people who bought products. The algorithm is basically saying, "You are a female, under 30, with no kids. Other females, under 30 with no kids also liked these products . . ."


            I will assume that you don't have info on your customers so let's focus on the content-based recommenders. You are on the right track by thinking about the attributes of products, but you should be thinking about them both (1) over a longer timeline than just yesterday and (2) how the products and their attributes relate to each other. The people who need attribute X might also need attribute Y and that is (most likely) across multiple products and will generate higher demand for those products.



            Start researching content-based recommender systems in your language/tool of choice and you will end up with the desired algorithm. From there you can also think about user data collection and then move into a community-based recommender over the long term.






            share|improve this answer









            $endgroup$


















              1












              $begingroup$

              Welcome to the site. I would encourage you to think about your problem in a different way. You are focused on "what sold today" whereas you should be focused on "who bought what over a historical timeline".



              What you're looking for is known as a recommender system and there are (generally speaking) two types:




              1. Content-based - what should you recommend based on attributes of products. The algorithm is basically saying, "You bought breakfast cereal, here are other products that might go with your cereal . . ."

              2. Community-based - what should you recommend based on attributes of people who bought products. The algorithm is basically saying, "You are a female, under 30, with no kids. Other females, under 30 with no kids also liked these products . . ."


              I will assume that you don't have info on your customers so let's focus on the content-based recommenders. You are on the right track by thinking about the attributes of products, but you should be thinking about them both (1) over a longer timeline than just yesterday and (2) how the products and their attributes relate to each other. The people who need attribute X might also need attribute Y and that is (most likely) across multiple products and will generate higher demand for those products.



              Start researching content-based recommender systems in your language/tool of choice and you will end up with the desired algorithm. From there you can also think about user data collection and then move into a community-based recommender over the long term.






              share|improve this answer









              $endgroup$
















                1












                1








                1





                $begingroup$

                Welcome to the site. I would encourage you to think about your problem in a different way. You are focused on "what sold today" whereas you should be focused on "who bought what over a historical timeline".



                What you're looking for is known as a recommender system and there are (generally speaking) two types:




                1. Content-based - what should you recommend based on attributes of products. The algorithm is basically saying, "You bought breakfast cereal, here are other products that might go with your cereal . . ."

                2. Community-based - what should you recommend based on attributes of people who bought products. The algorithm is basically saying, "You are a female, under 30, with no kids. Other females, under 30 with no kids also liked these products . . ."


                I will assume that you don't have info on your customers so let's focus on the content-based recommenders. You are on the right track by thinking about the attributes of products, but you should be thinking about them both (1) over a longer timeline than just yesterday and (2) how the products and their attributes relate to each other. The people who need attribute X might also need attribute Y and that is (most likely) across multiple products and will generate higher demand for those products.



                Start researching content-based recommender systems in your language/tool of choice and you will end up with the desired algorithm. From there you can also think about user data collection and then move into a community-based recommender over the long term.






                share|improve this answer









                $endgroup$



                Welcome to the site. I would encourage you to think about your problem in a different way. You are focused on "what sold today" whereas you should be focused on "who bought what over a historical timeline".



                What you're looking for is known as a recommender system and there are (generally speaking) two types:




                1. Content-based - what should you recommend based on attributes of products. The algorithm is basically saying, "You bought breakfast cereal, here are other products that might go with your cereal . . ."

                2. Community-based - what should you recommend based on attributes of people who bought products. The algorithm is basically saying, "You are a female, under 30, with no kids. Other females, under 30 with no kids also liked these products . . ."


                I will assume that you don't have info on your customers so let's focus on the content-based recommenders. You are on the right track by thinking about the attributes of products, but you should be thinking about them both (1) over a longer timeline than just yesterday and (2) how the products and their attributes relate to each other. The people who need attribute X might also need attribute Y and that is (most likely) across multiple products and will generate higher demand for those products.



                Start researching content-based recommender systems in your language/tool of choice and you will end up with the desired algorithm. From there you can also think about user data collection and then move into a community-based recommender over the long term.







                share|improve this answer












                share|improve this answer



                share|improve this answer










                answered 16 hours ago









                I_Play_With_DataI_Play_With_Data

                937419




                937419






















                    Sharon is a new contributor. Be nice, and check out our Code of Conduct.










                    draft saved

                    draft discarded


















                    Sharon is a new contributor. Be nice, and check out our Code of Conduct.













                    Sharon is a new contributor. Be nice, and check out our Code of Conduct.












                    Sharon is a new contributor. Be nice, and check out our Code of Conduct.
















                    Thanks for contributing an answer to Data Science Stack Exchange!


                    • Please be sure to answer the question. Provide details and share your research!

                    But avoid



                    • Asking for help, clarification, or responding to other answers.

                    • Making statements based on opinion; back them up with references or personal experience.


                    Use MathJax to format equations. MathJax reference.


                    To learn more, see our tips on writing great answers.




                    draft saved


                    draft discarded














                    StackExchange.ready(
                    function () {
                    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f45807%2fhow-to-predict-content-based-demand%23new-answer', 'question_page');
                    }
                    );

                    Post as a guest















                    Required, but never shown





















































                    Required, but never shown














                    Required, but never shown












                    Required, but never shown







                    Required, but never shown

































                    Required, but never shown














                    Required, but never shown












                    Required, but never shown







                    Required, but never shown







                    Popular posts from this blog

                    How to label and detect the document text images

                    Tabula Rosettana

                    Aureus (color)