Is there any standard or normal range for the amount of LSTM loss function?












1












$begingroup$


I am working on a LSTM network that I get loss amounts around 4.7 e-4 . It seems adding more layers and increasing epochs don't help to decreasing it. I also using a Dropout = 0.2 for each of my layers and implemented all the jobs with Keras library.



I like to know about this loss amount? is this large or is OK> Are there any rule of thumb for loss?



And why I can't decrease my loss amount? Is there any problem here?










share|improve this question











$endgroup$












  • $begingroup$
    0.2 is very very small. You are almost vanishing the signal. Try something like .85
    $endgroup$
    – Media
    yesterday










  • $begingroup$
    @Media: You mean I must eliminate 85% of my hidden units during each iteration?
    $endgroup$
    – user145959
    yesterday






  • 1




    $begingroup$
    You should keep them. .85 means you keep 85 percent of them.
    $endgroup$
    – Media
    yesterday










  • $begingroup$
    @Wow! I thought the opposite!
    $endgroup$
    – user145959
    yesterday










  • $begingroup$
    Can you give more details? What are your features like? Are you normalizing? BatchNorm? etc?
    $endgroup$
    – kylec123
    yesterday
















1












$begingroup$


I am working on a LSTM network that I get loss amounts around 4.7 e-4 . It seems adding more layers and increasing epochs don't help to decreasing it. I also using a Dropout = 0.2 for each of my layers and implemented all the jobs with Keras library.



I like to know about this loss amount? is this large or is OK> Are there any rule of thumb for loss?



And why I can't decrease my loss amount? Is there any problem here?










share|improve this question











$endgroup$












  • $begingroup$
    0.2 is very very small. You are almost vanishing the signal. Try something like .85
    $endgroup$
    – Media
    yesterday










  • $begingroup$
    @Media: You mean I must eliminate 85% of my hidden units during each iteration?
    $endgroup$
    – user145959
    yesterday






  • 1




    $begingroup$
    You should keep them. .85 means you keep 85 percent of them.
    $endgroup$
    – Media
    yesterday










  • $begingroup$
    @Wow! I thought the opposite!
    $endgroup$
    – user145959
    yesterday










  • $begingroup$
    Can you give more details? What are your features like? Are you normalizing? BatchNorm? etc?
    $endgroup$
    – kylec123
    yesterday














1












1








1





$begingroup$


I am working on a LSTM network that I get loss amounts around 4.7 e-4 . It seems adding more layers and increasing epochs don't help to decreasing it. I also using a Dropout = 0.2 for each of my layers and implemented all the jobs with Keras library.



I like to know about this loss amount? is this large or is OK> Are there any rule of thumb for loss?



And why I can't decrease my loss amount? Is there any problem here?










share|improve this question











$endgroup$




I am working on a LSTM network that I get loss amounts around 4.7 e-4 . It seems adding more layers and increasing epochs don't help to decreasing it. I also using a Dropout = 0.2 for each of my layers and implemented all the jobs with Keras library.



I like to know about this loss amount? is this large or is OK> Are there any rule of thumb for loss?



And why I can't decrease my loss amount? Is there any problem here?







lstm loss-function






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited 16 hours ago







user145959

















asked yesterday









user145959user145959

657




657












  • $begingroup$
    0.2 is very very small. You are almost vanishing the signal. Try something like .85
    $endgroup$
    – Media
    yesterday










  • $begingroup$
    @Media: You mean I must eliminate 85% of my hidden units during each iteration?
    $endgroup$
    – user145959
    yesterday






  • 1




    $begingroup$
    You should keep them. .85 means you keep 85 percent of them.
    $endgroup$
    – Media
    yesterday










  • $begingroup$
    @Wow! I thought the opposite!
    $endgroup$
    – user145959
    yesterday










  • $begingroup$
    Can you give more details? What are your features like? Are you normalizing? BatchNorm? etc?
    $endgroup$
    – kylec123
    yesterday


















  • $begingroup$
    0.2 is very very small. You are almost vanishing the signal. Try something like .85
    $endgroup$
    – Media
    yesterday










  • $begingroup$
    @Media: You mean I must eliminate 85% of my hidden units during each iteration?
    $endgroup$
    – user145959
    yesterday






  • 1




    $begingroup$
    You should keep them. .85 means you keep 85 percent of them.
    $endgroup$
    – Media
    yesterday










  • $begingroup$
    @Wow! I thought the opposite!
    $endgroup$
    – user145959
    yesterday










  • $begingroup$
    Can you give more details? What are your features like? Are you normalizing? BatchNorm? etc?
    $endgroup$
    – kylec123
    yesterday
















$begingroup$
0.2 is very very small. You are almost vanishing the signal. Try something like .85
$endgroup$
– Media
yesterday




$begingroup$
0.2 is very very small. You are almost vanishing the signal. Try something like .85
$endgroup$
– Media
yesterday












$begingroup$
@Media: You mean I must eliminate 85% of my hidden units during each iteration?
$endgroup$
– user145959
yesterday




$begingroup$
@Media: You mean I must eliminate 85% of my hidden units during each iteration?
$endgroup$
– user145959
yesterday




1




1




$begingroup$
You should keep them. .85 means you keep 85 percent of them.
$endgroup$
– Media
yesterday




$begingroup$
You should keep them. .85 means you keep 85 percent of them.
$endgroup$
– Media
yesterday












$begingroup$
@Wow! I thought the opposite!
$endgroup$
– user145959
yesterday




$begingroup$
@Wow! I thought the opposite!
$endgroup$
– user145959
yesterday












$begingroup$
Can you give more details? What are your features like? Are you normalizing? BatchNorm? etc?
$endgroup$
– kylec123
yesterday




$begingroup$
Can you give more details? What are your features like? Are you normalizing? BatchNorm? etc?
$endgroup$
– kylec123
yesterday










1 Answer
1






active

oldest

votes


















1












$begingroup$

Although your loss function is an indication of how well the model is training, usually one uses other more intuitive metrics to assess how good the model is.



If you are looking to a classification problem, your loss function is most probably the cross entropy. In what regards the loss function what matters is to understand its behaviour during training, more than its value.



A loss function that reduces its value during training is an indication that the model is effectively training. The loss function will, at some point, start reducing its value, and that means that the model has arrived to a minimum. One need to understand also the interaction of the loss function in training and validation set, and how to detect things like overfitting. If you are not aware of that, there is plenty of literature about the topic.



To know how good a model is, I would use other metrics that give a better indication and intuition. For example, in a classification problem, pne can look to things like Precision, Recall, Accuracy (if the classes are not very unbalance) or even ROC AUC. If it is a regression problem, maybe you are more interested in Mean or Median Absolute Percentage Error (MdAPE or MAPE).






share|improve this answer









$endgroup$













    Your Answer





    StackExchange.ifUsing("editor", function () {
    return StackExchange.using("mathjaxEditing", function () {
    StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix) {
    StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\\(","\\)"]]);
    });
    });
    }, "mathjax-editing");

    StackExchange.ready(function() {
    var channelOptions = {
    tags: "".split(" "),
    id: "557"
    };
    initTagRenderer("".split(" "), "".split(" "), channelOptions);

    StackExchange.using("externalEditor", function() {
    // Have to fire editor after snippets, if snippets enabled
    if (StackExchange.settings.snippets.snippetsEnabled) {
    StackExchange.using("snippets", function() {
    createEditor();
    });
    }
    else {
    createEditor();
    }
    });

    function createEditor() {
    StackExchange.prepareEditor({
    heartbeatType: 'answer',
    autoActivateHeartbeat: false,
    convertImagesToLinks: false,
    noModals: true,
    showLowRepImageUploadWarning: true,
    reputationToPostImages: null,
    bindNavPrevention: true,
    postfix: "",
    imageUploader: {
    brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
    contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
    allowUrls: true
    },
    onDemand: true,
    discardSelector: ".discard-answer"
    ,immediatelyShowMarkdownHelp:true
    });


    }
    });














    draft saved

    draft discarded


















    StackExchange.ready(
    function () {
    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f46350%2fis-there-any-standard-or-normal-range-for-the-amount-of-lstm-loss-function%23new-answer', 'question_page');
    }
    );

    Post as a guest















    Required, but never shown

























    1 Answer
    1






    active

    oldest

    votes








    1 Answer
    1






    active

    oldest

    votes









    active

    oldest

    votes






    active

    oldest

    votes









    1












    $begingroup$

    Although your loss function is an indication of how well the model is training, usually one uses other more intuitive metrics to assess how good the model is.



    If you are looking to a classification problem, your loss function is most probably the cross entropy. In what regards the loss function what matters is to understand its behaviour during training, more than its value.



    A loss function that reduces its value during training is an indication that the model is effectively training. The loss function will, at some point, start reducing its value, and that means that the model has arrived to a minimum. One need to understand also the interaction of the loss function in training and validation set, and how to detect things like overfitting. If you are not aware of that, there is plenty of literature about the topic.



    To know how good a model is, I would use other metrics that give a better indication and intuition. For example, in a classification problem, pne can look to things like Precision, Recall, Accuracy (if the classes are not very unbalance) or even ROC AUC. If it is a regression problem, maybe you are more interested in Mean or Median Absolute Percentage Error (MdAPE or MAPE).






    share|improve this answer









    $endgroup$


















      1












      $begingroup$

      Although your loss function is an indication of how well the model is training, usually one uses other more intuitive metrics to assess how good the model is.



      If you are looking to a classification problem, your loss function is most probably the cross entropy. In what regards the loss function what matters is to understand its behaviour during training, more than its value.



      A loss function that reduces its value during training is an indication that the model is effectively training. The loss function will, at some point, start reducing its value, and that means that the model has arrived to a minimum. One need to understand also the interaction of the loss function in training and validation set, and how to detect things like overfitting. If you are not aware of that, there is plenty of literature about the topic.



      To know how good a model is, I would use other metrics that give a better indication and intuition. For example, in a classification problem, pne can look to things like Precision, Recall, Accuracy (if the classes are not very unbalance) or even ROC AUC. If it is a regression problem, maybe you are more interested in Mean or Median Absolute Percentage Error (MdAPE or MAPE).






      share|improve this answer









      $endgroup$
















        1












        1








        1





        $begingroup$

        Although your loss function is an indication of how well the model is training, usually one uses other more intuitive metrics to assess how good the model is.



        If you are looking to a classification problem, your loss function is most probably the cross entropy. In what regards the loss function what matters is to understand its behaviour during training, more than its value.



        A loss function that reduces its value during training is an indication that the model is effectively training. The loss function will, at some point, start reducing its value, and that means that the model has arrived to a minimum. One need to understand also the interaction of the loss function in training and validation set, and how to detect things like overfitting. If you are not aware of that, there is plenty of literature about the topic.



        To know how good a model is, I would use other metrics that give a better indication and intuition. For example, in a classification problem, pne can look to things like Precision, Recall, Accuracy (if the classes are not very unbalance) or even ROC AUC. If it is a regression problem, maybe you are more interested in Mean or Median Absolute Percentage Error (MdAPE or MAPE).






        share|improve this answer









        $endgroup$



        Although your loss function is an indication of how well the model is training, usually one uses other more intuitive metrics to assess how good the model is.



        If you are looking to a classification problem, your loss function is most probably the cross entropy. In what regards the loss function what matters is to understand its behaviour during training, more than its value.



        A loss function that reduces its value during training is an indication that the model is effectively training. The loss function will, at some point, start reducing its value, and that means that the model has arrived to a minimum. One need to understand also the interaction of the loss function in training and validation set, and how to detect things like overfitting. If you are not aware of that, there is plenty of literature about the topic.



        To know how good a model is, I would use other metrics that give a better indication and intuition. For example, in a classification problem, pne can look to things like Precision, Recall, Accuracy (if the classes are not very unbalance) or even ROC AUC. If it is a regression problem, maybe you are more interested in Mean or Median Absolute Percentage Error (MdAPE or MAPE).







        share|improve this answer












        share|improve this answer



        share|improve this answer










        answered 12 hours ago









        EscachatorEscachator

        248111




        248111






























            draft saved

            draft discarded




















































            Thanks for contributing an answer to Data Science Stack Exchange!


            • Please be sure to answer the question. Provide details and share your research!

            But avoid



            • Asking for help, clarification, or responding to other answers.

            • Making statements based on opinion; back them up with references or personal experience.


            Use MathJax to format equations. MathJax reference.


            To learn more, see our tips on writing great answers.




            draft saved


            draft discarded














            StackExchange.ready(
            function () {
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f46350%2fis-there-any-standard-or-normal-range-for-the-amount-of-lstm-loss-function%23new-answer', 'question_page');
            }
            );

            Post as a guest















            Required, but never shown





















































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown

































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown







            Popular posts from this blog

            How to label and detect the document text images

            Vallis Paradisi

            Tabula Rosettana