Why split data into train and test in linear regression?












0












$begingroup$


I am wondering how train and test set works in linear regression.



If I train the data it will give me a line of best fit, say I for my train data I am using first 70% of dataset => first 70% of the line is from training set and final 30% is from unseen testing set?










share|improve this question







New contributor




h_musk is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.







$endgroup$












  • $begingroup$
    Welcome to SE.DataScience! What do you mean by "first" in "first 70% of the line"?
    $endgroup$
    – Esmailian
    yesterday


















0












$begingroup$


I am wondering how train and test set works in linear regression.



If I train the data it will give me a line of best fit, say I for my train data I am using first 70% of dataset => first 70% of the line is from training set and final 30% is from unseen testing set?










share|improve this question







New contributor




h_musk is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.







$endgroup$












  • $begingroup$
    Welcome to SE.DataScience! What do you mean by "first" in "first 70% of the line"?
    $endgroup$
    – Esmailian
    yesterday
















0












0








0





$begingroup$


I am wondering how train and test set works in linear regression.



If I train the data it will give me a line of best fit, say I for my train data I am using first 70% of dataset => first 70% of the line is from training set and final 30% is from unseen testing set?










share|improve this question







New contributor




h_musk is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.







$endgroup$




I am wondering how train and test set works in linear regression.



If I train the data it will give me a line of best fit, say I for my train data I am using first 70% of dataset => first 70% of the line is from training set and final 30% is from unseen testing set?







machine-learning linear-regression






share|improve this question







New contributor




h_musk is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.











share|improve this question







New contributor




h_musk is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.









share|improve this question




share|improve this question






New contributor




h_musk is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.









asked yesterday









h_muskh_musk

1




1




New contributor




h_musk is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.





New contributor





h_musk is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.






h_musk is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.












  • $begingroup$
    Welcome to SE.DataScience! What do you mean by "first" in "first 70% of the line"?
    $endgroup$
    – Esmailian
    yesterday




















  • $begingroup$
    Welcome to SE.DataScience! What do you mean by "first" in "first 70% of the line"?
    $endgroup$
    – Esmailian
    yesterday


















$begingroup$
Welcome to SE.DataScience! What do you mean by "first" in "first 70% of the line"?
$endgroup$
– Esmailian
yesterday






$begingroup$
Welcome to SE.DataScience! What do you mean by "first" in "first 70% of the line"?
$endgroup$
– Esmailian
yesterday












2 Answers
2






active

oldest

votes


















0












$begingroup$

This testing is a way to asses your model performance. You can check the evaluation metrics for regression, classification and clustering on this link to scikit-learn.



Separating the data enables you to evaluate your model generalization capabilities and have an idea of how it would perform on unseen data.



Also, you can create a validation dataset (a split from the train dataset) to tune hyper-parameters and threshold/bias.




  • The test information should never be seem by the training algorithm by any chance! This might occlude over-fitting and other many bad things you don't want to happen! Check this link on Data Leakage for more information.






share|improve this answer









$endgroup$





















    0












    $begingroup$

    Not just in linear regression, Train-test split is a practice that is followed in the model building and evaluation workflow. Testing your dataset on a testing data that is totally excluded from the training data helps us find whether the model is overfitting or underfitting atleast.




    And always keep in mind - Never train on test data.



                                                          
    - https://developers.google.com/machine-learning/crash-course




    - Referances:




    1. Train/Test Split and Cross Validation in Python

    2. Evaluate the Performance Of Deep Learning Models in Keras

    3. The 7 Steps of Machine Learning






    share|improve this answer









    $endgroup$














      Your Answer





      StackExchange.ifUsing("editor", function () {
      return StackExchange.using("mathjaxEditing", function () {
      StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix) {
      StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\\(","\\)"]]);
      });
      });
      }, "mathjax-editing");

      StackExchange.ready(function() {
      var channelOptions = {
      tags: "".split(" "),
      id: "557"
      };
      initTagRenderer("".split(" "), "".split(" "), channelOptions);

      StackExchange.using("externalEditor", function() {
      // Have to fire editor after snippets, if snippets enabled
      if (StackExchange.settings.snippets.snippetsEnabled) {
      StackExchange.using("snippets", function() {
      createEditor();
      });
      }
      else {
      createEditor();
      }
      });

      function createEditor() {
      StackExchange.prepareEditor({
      heartbeatType: 'answer',
      autoActivateHeartbeat: false,
      convertImagesToLinks: false,
      noModals: true,
      showLowRepImageUploadWarning: true,
      reputationToPostImages: null,
      bindNavPrevention: true,
      postfix: "",
      imageUploader: {
      brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
      contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
      allowUrls: true
      },
      onDemand: true,
      discardSelector: ".discard-answer"
      ,immediatelyShowMarkdownHelp:true
      });


      }
      });






      h_musk is a new contributor. Be nice, and check out our Code of Conduct.










      draft saved

      draft discarded


















      StackExchange.ready(
      function () {
      StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f48780%2fwhy-split-data-into-train-and-test-in-linear-regression%23new-answer', 'question_page');
      }
      );

      Post as a guest















      Required, but never shown

























      2 Answers
      2






      active

      oldest

      votes








      2 Answers
      2






      active

      oldest

      votes









      active

      oldest

      votes






      active

      oldest

      votes









      0












      $begingroup$

      This testing is a way to asses your model performance. You can check the evaluation metrics for regression, classification and clustering on this link to scikit-learn.



      Separating the data enables you to evaluate your model generalization capabilities and have an idea of how it would perform on unseen data.



      Also, you can create a validation dataset (a split from the train dataset) to tune hyper-parameters and threshold/bias.




      • The test information should never be seem by the training algorithm by any chance! This might occlude over-fitting and other many bad things you don't want to happen! Check this link on Data Leakage for more information.






      share|improve this answer









      $endgroup$


















        0












        $begingroup$

        This testing is a way to asses your model performance. You can check the evaluation metrics for regression, classification and clustering on this link to scikit-learn.



        Separating the data enables you to evaluate your model generalization capabilities and have an idea of how it would perform on unseen data.



        Also, you can create a validation dataset (a split from the train dataset) to tune hyper-parameters and threshold/bias.




        • The test information should never be seem by the training algorithm by any chance! This might occlude over-fitting and other many bad things you don't want to happen! Check this link on Data Leakage for more information.






        share|improve this answer









        $endgroup$
















          0












          0








          0





          $begingroup$

          This testing is a way to asses your model performance. You can check the evaluation metrics for regression, classification and clustering on this link to scikit-learn.



          Separating the data enables you to evaluate your model generalization capabilities and have an idea of how it would perform on unseen data.



          Also, you can create a validation dataset (a split from the train dataset) to tune hyper-parameters and threshold/bias.




          • The test information should never be seem by the training algorithm by any chance! This might occlude over-fitting and other many bad things you don't want to happen! Check this link on Data Leakage for more information.






          share|improve this answer









          $endgroup$



          This testing is a way to asses your model performance. You can check the evaluation metrics for regression, classification and clustering on this link to scikit-learn.



          Separating the data enables you to evaluate your model generalization capabilities and have an idea of how it would perform on unseen data.



          Also, you can create a validation dataset (a split from the train dataset) to tune hyper-parameters and threshold/bias.




          • The test information should never be seem by the training algorithm by any chance! This might occlude over-fitting and other many bad things you don't want to happen! Check this link on Data Leakage for more information.







          share|improve this answer












          share|improve this answer



          share|improve this answer










          answered yesterday









          Pedro Henrique MonfortePedro Henrique Monforte

          3379




          3379























              0












              $begingroup$

              Not just in linear regression, Train-test split is a practice that is followed in the model building and evaluation workflow. Testing your dataset on a testing data that is totally excluded from the training data helps us find whether the model is overfitting or underfitting atleast.




              And always keep in mind - Never train on test data.



                                                                    
              - https://developers.google.com/machine-learning/crash-course




              - Referances:




              1. Train/Test Split and Cross Validation in Python

              2. Evaluate the Performance Of Deep Learning Models in Keras

              3. The 7 Steps of Machine Learning






              share|improve this answer









              $endgroup$


















                0












                $begingroup$

                Not just in linear regression, Train-test split is a practice that is followed in the model building and evaluation workflow. Testing your dataset on a testing data that is totally excluded from the training data helps us find whether the model is overfitting or underfitting atleast.




                And always keep in mind - Never train on test data.



                                                                      
                - https://developers.google.com/machine-learning/crash-course




                - Referances:




                1. Train/Test Split and Cross Validation in Python

                2. Evaluate the Performance Of Deep Learning Models in Keras

                3. The 7 Steps of Machine Learning






                share|improve this answer









                $endgroup$
















                  0












                  0








                  0





                  $begingroup$

                  Not just in linear regression, Train-test split is a practice that is followed in the model building and evaluation workflow. Testing your dataset on a testing data that is totally excluded from the training data helps us find whether the model is overfitting or underfitting atleast.




                  And always keep in mind - Never train on test data.



                                                                        
                  - https://developers.google.com/machine-learning/crash-course




                  - Referances:




                  1. Train/Test Split and Cross Validation in Python

                  2. Evaluate the Performance Of Deep Learning Models in Keras

                  3. The 7 Steps of Machine Learning






                  share|improve this answer









                  $endgroup$



                  Not just in linear regression, Train-test split is a practice that is followed in the model building and evaluation workflow. Testing your dataset on a testing data that is totally excluded from the training data helps us find whether the model is overfitting or underfitting atleast.




                  And always keep in mind - Never train on test data.



                                                                        
                  - https://developers.google.com/machine-learning/crash-course




                  - Referances:




                  1. Train/Test Split and Cross Validation in Python

                  2. Evaluate the Performance Of Deep Learning Models in Keras

                  3. The 7 Steps of Machine Learning







                  share|improve this answer












                  share|improve this answer



                  share|improve this answer










                  answered yesterday









                  thanatozthanatoz

                  569319




                  569319






















                      h_musk is a new contributor. Be nice, and check out our Code of Conduct.










                      draft saved

                      draft discarded


















                      h_musk is a new contributor. Be nice, and check out our Code of Conduct.













                      h_musk is a new contributor. Be nice, and check out our Code of Conduct.












                      h_musk is a new contributor. Be nice, and check out our Code of Conduct.
















                      Thanks for contributing an answer to Data Science Stack Exchange!


                      • Please be sure to answer the question. Provide details and share your research!

                      But avoid



                      • Asking for help, clarification, or responding to other answers.

                      • Making statements based on opinion; back them up with references or personal experience.


                      Use MathJax to format equations. MathJax reference.


                      To learn more, see our tips on writing great answers.




                      draft saved


                      draft discarded














                      StackExchange.ready(
                      function () {
                      StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f48780%2fwhy-split-data-into-train-and-test-in-linear-regression%23new-answer', 'question_page');
                      }
                      );

                      Post as a guest















                      Required, but never shown





















































                      Required, but never shown














                      Required, but never shown












                      Required, but never shown







                      Required, but never shown

































                      Required, but never shown














                      Required, but never shown












                      Required, but never shown







                      Required, but never shown







                      Popular posts from this blog

                      How to label and detect the document text images

                      Tabula Rosettana

                      Aureus (color)