Fully convolutional networks with partially segmented data












0












$begingroup$


I am working on classifying satellite imagery using a fully convolutional approach. The images in question are ~8000x8000 pixels and the training data does not label every pixel in the image. Also, class balance is not present in the dataset.



Currently, I break the 8000x8000 images into smaller tiles that can fit in GPU memory (~600x600), and save the patches that contain training data to disk. If a given image patch contains more than one class, I save two patches, with two different segmentation maps. When training the model, I shuffle all of the patches that I saved and feed them into the model.



When training the model, I first load an image patch and its corresponding segmentation mask for all representative classes. Each segmentation mask has a different amount of labeled pixels, so I find the minimum number of labeled pixels that is present in the image data that's in memory.



I then take a random sample of size minimum number of pixels from the segmentation masks. This step ensures class balance through the images in the batch.
The picture below may clear things up.



The top row is the fully segmented data. The bottom is the
random sample of data, with number of examples determined by the minimum examples in each class. Each column represents a different class. The fourth class (fourth column) in the picture had the minimum number of examples present in this batch.



After doing this random sampling, I feed each example into the network one by one as only one example can fit in GPU memory at a time.



On every batch, the number of training examples (labeled pixels) changes - from 50 to 150,000, depending on how many labeled points there are in the minimally annotated example.



I am wondering if this extreme difference in number of labeled examples between batches is problematic, and if there is a better approach.










share|improve this question







New contributor




Tom C is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.







$endgroup$

















    0












    $begingroup$


    I am working on classifying satellite imagery using a fully convolutional approach. The images in question are ~8000x8000 pixels and the training data does not label every pixel in the image. Also, class balance is not present in the dataset.



    Currently, I break the 8000x8000 images into smaller tiles that can fit in GPU memory (~600x600), and save the patches that contain training data to disk. If a given image patch contains more than one class, I save two patches, with two different segmentation maps. When training the model, I shuffle all of the patches that I saved and feed them into the model.



    When training the model, I first load an image patch and its corresponding segmentation mask for all representative classes. Each segmentation mask has a different amount of labeled pixels, so I find the minimum number of labeled pixels that is present in the image data that's in memory.



    I then take a random sample of size minimum number of pixels from the segmentation masks. This step ensures class balance through the images in the batch.
    The picture below may clear things up.



    The top row is the fully segmented data. The bottom is the
    random sample of data, with number of examples determined by the minimum examples in each class. Each column represents a different class. The fourth class (fourth column) in the picture had the minimum number of examples present in this batch.



    After doing this random sampling, I feed each example into the network one by one as only one example can fit in GPU memory at a time.



    On every batch, the number of training examples (labeled pixels) changes - from 50 to 150,000, depending on how many labeled points there are in the minimally annotated example.



    I am wondering if this extreme difference in number of labeled examples between batches is problematic, and if there is a better approach.










    share|improve this question







    New contributor




    Tom C is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
    Check out our Code of Conduct.







    $endgroup$















      0












      0








      0





      $begingroup$


      I am working on classifying satellite imagery using a fully convolutional approach. The images in question are ~8000x8000 pixels and the training data does not label every pixel in the image. Also, class balance is not present in the dataset.



      Currently, I break the 8000x8000 images into smaller tiles that can fit in GPU memory (~600x600), and save the patches that contain training data to disk. If a given image patch contains more than one class, I save two patches, with two different segmentation maps. When training the model, I shuffle all of the patches that I saved and feed them into the model.



      When training the model, I first load an image patch and its corresponding segmentation mask for all representative classes. Each segmentation mask has a different amount of labeled pixels, so I find the minimum number of labeled pixels that is present in the image data that's in memory.



      I then take a random sample of size minimum number of pixels from the segmentation masks. This step ensures class balance through the images in the batch.
      The picture below may clear things up.



      The top row is the fully segmented data. The bottom is the
      random sample of data, with number of examples determined by the minimum examples in each class. Each column represents a different class. The fourth class (fourth column) in the picture had the minimum number of examples present in this batch.



      After doing this random sampling, I feed each example into the network one by one as only one example can fit in GPU memory at a time.



      On every batch, the number of training examples (labeled pixels) changes - from 50 to 150,000, depending on how many labeled points there are in the minimally annotated example.



      I am wondering if this extreme difference in number of labeled examples between batches is problematic, and if there is a better approach.










      share|improve this question







      New contributor




      Tom C is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
      Check out our Code of Conduct.







      $endgroup$




      I am working on classifying satellite imagery using a fully convolutional approach. The images in question are ~8000x8000 pixels and the training data does not label every pixel in the image. Also, class balance is not present in the dataset.



      Currently, I break the 8000x8000 images into smaller tiles that can fit in GPU memory (~600x600), and save the patches that contain training data to disk. If a given image patch contains more than one class, I save two patches, with two different segmentation maps. When training the model, I shuffle all of the patches that I saved and feed them into the model.



      When training the model, I first load an image patch and its corresponding segmentation mask for all representative classes. Each segmentation mask has a different amount of labeled pixels, so I find the minimum number of labeled pixels that is present in the image data that's in memory.



      I then take a random sample of size minimum number of pixels from the segmentation masks. This step ensures class balance through the images in the batch.
      The picture below may clear things up.



      The top row is the fully segmented data. The bottom is the
      random sample of data, with number of examples determined by the minimum examples in each class. Each column represents a different class. The fourth class (fourth column) in the picture had the minimum number of examples present in this batch.



      After doing this random sampling, I feed each example into the network one by one as only one example can fit in GPU memory at a time.



      On every batch, the number of training examples (labeled pixels) changes - from 50 to 150,000, depending on how many labeled points there are in the minimally annotated example.



      I am wondering if this extreme difference in number of labeled examples between batches is problematic, and if there is a better approach.







      machine-learning convnet image-classification






      share|improve this question







      New contributor




      Tom C is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
      Check out our Code of Conduct.











      share|improve this question







      New contributor




      Tom C is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
      Check out our Code of Conduct.









      share|improve this question




      share|improve this question






      New contributor




      Tom C is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
      Check out our Code of Conduct.









      asked yesterday









      Tom CTom C

      1




      1




      New contributor




      Tom C is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
      Check out our Code of Conduct.





      New contributor





      Tom C is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
      Check out our Code of Conduct.






      Tom C is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
      Check out our Code of Conduct.






















          0






          active

          oldest

          votes











          Your Answer





          StackExchange.ifUsing("editor", function () {
          return StackExchange.using("mathjaxEditing", function () {
          StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix) {
          StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\\(","\\)"]]);
          });
          });
          }, "mathjax-editing");

          StackExchange.ready(function() {
          var channelOptions = {
          tags: "".split(" "),
          id: "557"
          };
          initTagRenderer("".split(" "), "".split(" "), channelOptions);

          StackExchange.using("externalEditor", function() {
          // Have to fire editor after snippets, if snippets enabled
          if (StackExchange.settings.snippets.snippetsEnabled) {
          StackExchange.using("snippets", function() {
          createEditor();
          });
          }
          else {
          createEditor();
          }
          });

          function createEditor() {
          StackExchange.prepareEditor({
          heartbeatType: 'answer',
          autoActivateHeartbeat: false,
          convertImagesToLinks: false,
          noModals: true,
          showLowRepImageUploadWarning: true,
          reputationToPostImages: null,
          bindNavPrevention: true,
          postfix: "",
          imageUploader: {
          brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
          contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
          allowUrls: true
          },
          onDemand: true,
          discardSelector: ".discard-answer"
          ,immediatelyShowMarkdownHelp:true
          });


          }
          });






          Tom C is a new contributor. Be nice, and check out our Code of Conduct.










          draft saved

          draft discarded


















          StackExchange.ready(
          function () {
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f46283%2ffully-convolutional-networks-with-partially-segmented-data%23new-answer', 'question_page');
          }
          );

          Post as a guest















          Required, but never shown

























          0






          active

          oldest

          votes








          0






          active

          oldest

          votes









          active

          oldest

          votes






          active

          oldest

          votes








          Tom C is a new contributor. Be nice, and check out our Code of Conduct.










          draft saved

          draft discarded


















          Tom C is a new contributor. Be nice, and check out our Code of Conduct.













          Tom C is a new contributor. Be nice, and check out our Code of Conduct.












          Tom C is a new contributor. Be nice, and check out our Code of Conduct.
















          Thanks for contributing an answer to Data Science Stack Exchange!


          • Please be sure to answer the question. Provide details and share your research!

          But avoid



          • Asking for help, clarification, or responding to other answers.

          • Making statements based on opinion; back them up with references or personal experience.


          Use MathJax to format equations. MathJax reference.


          To learn more, see our tips on writing great answers.




          draft saved


          draft discarded














          StackExchange.ready(
          function () {
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f46283%2ffully-convolutional-networks-with-partially-segmented-data%23new-answer', 'question_page');
          }
          );

          Post as a guest















          Required, but never shown





















































          Required, but never shown














          Required, but never shown












          Required, but never shown







          Required, but never shown

































          Required, but never shown














          Required, but never shown












          Required, but never shown







          Required, but never shown







          Popular posts from this blog

          How to label and detect the document text images

          Vallis Paradisi

          Tabula Rosettana