What is exactly meant by neural network that can take different types of input?












0












$begingroup$


There is a scientific document that implements a convolutional neural network to classify 3 different types of data, although how exactly, is unknown to me.



Here's the explanation of network architecture:




This section describes architecture of our neural net which is
depicted in Fig. 3.
Our network has three types of inputs: Screenshot (we use upper crop of the page with dimensions 1280 × 1280, however this net can
work with arbitrarily sized pages), TextMaps (tensor with dimensions
$128 times 160 times 160$) and Candidate boxes (list of box coordinates of
arbitrary length).



A screenshot is processed by three convolutional layers (the first two layers are initialized with pretrained weights from BVLC
AlexNet). TextMaps are processed with one convolutional layer with
kernel size $1 times 1$ and thus its features capture various
combinations of words. These two layers are then concatenated and
processed by final convolutional layer.



enter image description here






What exactly is implied by Our network has three types of inputs above? Is it possible for convolutional neural network to pass different types of inputs differently?



From my understanding, neural network for Screenshot input would be created like this:



def CNN(features, labels, mode):
input_layer = tf.reshape(image, [-1, 1280, 1280, 1])

# Conv+ReLU
conv_relu_1 = tf.layers.conv2d(
inputs=input_layer,
filters=96,
kernel_size=[11, 11],
padding="same",
activation=tf.nn.relu)

# MaxPool
pool1 = tf.layers.max_pooling2d(inputs=conv1, pool_size=[3, 3], strides=2)

# Conv + ReLU
...


So let's say this is first neural network, then should I create another neural network for TextMaps and concatenate results? Or does every magic just happen in a single neural network?



In short, can I create neural network that takes different types of input individually or do I use different neural networks for each of them and then group their outputs?



Thank you!










share|improve this question











$endgroup$

















    0












    $begingroup$


    There is a scientific document that implements a convolutional neural network to classify 3 different types of data, although how exactly, is unknown to me.



    Here's the explanation of network architecture:




    This section describes architecture of our neural net which is
    depicted in Fig. 3.
    Our network has three types of inputs: Screenshot (we use upper crop of the page with dimensions 1280 × 1280, however this net can
    work with arbitrarily sized pages), TextMaps (tensor with dimensions
    $128 times 160 times 160$) and Candidate boxes (list of box coordinates of
    arbitrary length).



    A screenshot is processed by three convolutional layers (the first two layers are initialized with pretrained weights from BVLC
    AlexNet). TextMaps are processed with one convolutional layer with
    kernel size $1 times 1$ and thus its features capture various
    combinations of words. These two layers are then concatenated and
    processed by final convolutional layer.



    enter image description here






    What exactly is implied by Our network has three types of inputs above? Is it possible for convolutional neural network to pass different types of inputs differently?



    From my understanding, neural network for Screenshot input would be created like this:



    def CNN(features, labels, mode):
    input_layer = tf.reshape(image, [-1, 1280, 1280, 1])

    # Conv+ReLU
    conv_relu_1 = tf.layers.conv2d(
    inputs=input_layer,
    filters=96,
    kernel_size=[11, 11],
    padding="same",
    activation=tf.nn.relu)

    # MaxPool
    pool1 = tf.layers.max_pooling2d(inputs=conv1, pool_size=[3, 3], strides=2)

    # Conv + ReLU
    ...


    So let's say this is first neural network, then should I create another neural network for TextMaps and concatenate results? Or does every magic just happen in a single neural network?



    In short, can I create neural network that takes different types of input individually or do I use different neural networks for each of them and then group their outputs?



    Thank you!










    share|improve this question











    $endgroup$















      0












      0








      0





      $begingroup$


      There is a scientific document that implements a convolutional neural network to classify 3 different types of data, although how exactly, is unknown to me.



      Here's the explanation of network architecture:




      This section describes architecture of our neural net which is
      depicted in Fig. 3.
      Our network has three types of inputs: Screenshot (we use upper crop of the page with dimensions 1280 × 1280, however this net can
      work with arbitrarily sized pages), TextMaps (tensor with dimensions
      $128 times 160 times 160$) and Candidate boxes (list of box coordinates of
      arbitrary length).



      A screenshot is processed by three convolutional layers (the first two layers are initialized with pretrained weights from BVLC
      AlexNet). TextMaps are processed with one convolutional layer with
      kernel size $1 times 1$ and thus its features capture various
      combinations of words. These two layers are then concatenated and
      processed by final convolutional layer.



      enter image description here






      What exactly is implied by Our network has three types of inputs above? Is it possible for convolutional neural network to pass different types of inputs differently?



      From my understanding, neural network for Screenshot input would be created like this:



      def CNN(features, labels, mode):
      input_layer = tf.reshape(image, [-1, 1280, 1280, 1])

      # Conv+ReLU
      conv_relu_1 = tf.layers.conv2d(
      inputs=input_layer,
      filters=96,
      kernel_size=[11, 11],
      padding="same",
      activation=tf.nn.relu)

      # MaxPool
      pool1 = tf.layers.max_pooling2d(inputs=conv1, pool_size=[3, 3], strides=2)

      # Conv + ReLU
      ...


      So let's say this is first neural network, then should I create another neural network for TextMaps and concatenate results? Or does every magic just happen in a single neural network?



      In short, can I create neural network that takes different types of input individually or do I use different neural networks for each of them and then group their outputs?



      Thank you!










      share|improve this question











      $endgroup$




      There is a scientific document that implements a convolutional neural network to classify 3 different types of data, although how exactly, is unknown to me.



      Here's the explanation of network architecture:




      This section describes architecture of our neural net which is
      depicted in Fig. 3.
      Our network has three types of inputs: Screenshot (we use upper crop of the page with dimensions 1280 × 1280, however this net can
      work with arbitrarily sized pages), TextMaps (tensor with dimensions
      $128 times 160 times 160$) and Candidate boxes (list of box coordinates of
      arbitrary length).



      A screenshot is processed by three convolutional layers (the first two layers are initialized with pretrained weights from BVLC
      AlexNet). TextMaps are processed with one convolutional layer with
      kernel size $1 times 1$ and thus its features capture various
      combinations of words. These two layers are then concatenated and
      processed by final convolutional layer.



      enter image description here






      What exactly is implied by Our network has three types of inputs above? Is it possible for convolutional neural network to pass different types of inputs differently?



      From my understanding, neural network for Screenshot input would be created like this:



      def CNN(features, labels, mode):
      input_layer = tf.reshape(image, [-1, 1280, 1280, 1])

      # Conv+ReLU
      conv_relu_1 = tf.layers.conv2d(
      inputs=input_layer,
      filters=96,
      kernel_size=[11, 11],
      padding="same",
      activation=tf.nn.relu)

      # MaxPool
      pool1 = tf.layers.max_pooling2d(inputs=conv1, pool_size=[3, 3], strides=2)

      # Conv + ReLU
      ...


      So let's say this is first neural network, then should I create another neural network for TextMaps and concatenate results? Or does every magic just happen in a single neural network?



      In short, can I create neural network that takes different types of input individually or do I use different neural networks for each of them and then group their outputs?



      Thank you!







      python neural-network tensorflow convnet






      share|improve this question















      share|improve this question













      share|improve this question




      share|improve this question








      edited 6 hours ago







      ShellRox

















      asked 6 hours ago









      ShellRoxShellRox

      1928




      1928






















          1 Answer
          1






          active

          oldest

          votes


















          2












          $begingroup$


          In short, can I create neural network that takes different types of
          input individually or do I use different neural networks for each of
          them and then group their outputs?




          Yes, you can. Check the Functional API of Keras, on how to define multi input/output networks. Then you can create different models for the processing of each input and fuse them together into a single multi-input model using the keras.models.Model() class.



          In the following example, you can see that the main_input is processed differently than the aux_input and both are thereafter merged together to be propagated through the rest of the layers of the network.



          enter image description here






          share|improve this answer











          $endgroup$













          • $begingroup$
            Hello, thank you for the answer, name "multi-input/output convultional neural network" seems to be a key word here. Are you aware of multi-input convolutional neural networks in raw tensorflow (or is it better to use high-level library for this purpose?).
            $endgroup$
            – ShellRox
            6 hours ago






          • 1




            $begingroup$
            Hi, to be honest I mostly work on a higher level with Keras and I strongly believe that you will find it quite easy to implement this kind of architecture in Keras, similarly to the attached image :)
            $endgroup$
            – pcko1
            6 hours ago











          Your Answer





          StackExchange.ifUsing("editor", function () {
          return StackExchange.using("mathjaxEditing", function () {
          StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix) {
          StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\\(","\\)"]]);
          });
          });
          }, "mathjax-editing");

          StackExchange.ready(function() {
          var channelOptions = {
          tags: "".split(" "),
          id: "557"
          };
          initTagRenderer("".split(" "), "".split(" "), channelOptions);

          StackExchange.using("externalEditor", function() {
          // Have to fire editor after snippets, if snippets enabled
          if (StackExchange.settings.snippets.snippetsEnabled) {
          StackExchange.using("snippets", function() {
          createEditor();
          });
          }
          else {
          createEditor();
          }
          });

          function createEditor() {
          StackExchange.prepareEditor({
          heartbeatType: 'answer',
          autoActivateHeartbeat: false,
          convertImagesToLinks: false,
          noModals: true,
          showLowRepImageUploadWarning: true,
          reputationToPostImages: null,
          bindNavPrevention: true,
          postfix: "",
          imageUploader: {
          brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
          contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
          allowUrls: true
          },
          onDemand: true,
          discardSelector: ".discard-answer"
          ,immediatelyShowMarkdownHelp:true
          });


          }
          });














          draft saved

          draft discarded


















          StackExchange.ready(
          function () {
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f44246%2fwhat-is-exactly-meant-by-neural-network-that-can-take-different-types-of-input%23new-answer', 'question_page');
          }
          );

          Post as a guest















          Required, but never shown

























          1 Answer
          1






          active

          oldest

          votes








          1 Answer
          1






          active

          oldest

          votes









          active

          oldest

          votes






          active

          oldest

          votes









          2












          $begingroup$


          In short, can I create neural network that takes different types of
          input individually or do I use different neural networks for each of
          them and then group their outputs?




          Yes, you can. Check the Functional API of Keras, on how to define multi input/output networks. Then you can create different models for the processing of each input and fuse them together into a single multi-input model using the keras.models.Model() class.



          In the following example, you can see that the main_input is processed differently than the aux_input and both are thereafter merged together to be propagated through the rest of the layers of the network.



          enter image description here






          share|improve this answer











          $endgroup$













          • $begingroup$
            Hello, thank you for the answer, name "multi-input/output convultional neural network" seems to be a key word here. Are you aware of multi-input convolutional neural networks in raw tensorflow (or is it better to use high-level library for this purpose?).
            $endgroup$
            – ShellRox
            6 hours ago






          • 1




            $begingroup$
            Hi, to be honest I mostly work on a higher level with Keras and I strongly believe that you will find it quite easy to implement this kind of architecture in Keras, similarly to the attached image :)
            $endgroup$
            – pcko1
            6 hours ago
















          2












          $begingroup$


          In short, can I create neural network that takes different types of
          input individually or do I use different neural networks for each of
          them and then group their outputs?




          Yes, you can. Check the Functional API of Keras, on how to define multi input/output networks. Then you can create different models for the processing of each input and fuse them together into a single multi-input model using the keras.models.Model() class.



          In the following example, you can see that the main_input is processed differently than the aux_input and both are thereafter merged together to be propagated through the rest of the layers of the network.



          enter image description here






          share|improve this answer











          $endgroup$













          • $begingroup$
            Hello, thank you for the answer, name "multi-input/output convultional neural network" seems to be a key word here. Are you aware of multi-input convolutional neural networks in raw tensorflow (or is it better to use high-level library for this purpose?).
            $endgroup$
            – ShellRox
            6 hours ago






          • 1




            $begingroup$
            Hi, to be honest I mostly work on a higher level with Keras and I strongly believe that you will find it quite easy to implement this kind of architecture in Keras, similarly to the attached image :)
            $endgroup$
            – pcko1
            6 hours ago














          2












          2








          2





          $begingroup$


          In short, can I create neural network that takes different types of
          input individually or do I use different neural networks for each of
          them and then group their outputs?




          Yes, you can. Check the Functional API of Keras, on how to define multi input/output networks. Then you can create different models for the processing of each input and fuse them together into a single multi-input model using the keras.models.Model() class.



          In the following example, you can see that the main_input is processed differently than the aux_input and both are thereafter merged together to be propagated through the rest of the layers of the network.



          enter image description here






          share|improve this answer











          $endgroup$




          In short, can I create neural network that takes different types of
          input individually or do I use different neural networks for each of
          them and then group their outputs?




          Yes, you can. Check the Functional API of Keras, on how to define multi input/output networks. Then you can create different models for the processing of each input and fuse them together into a single multi-input model using the keras.models.Model() class.



          In the following example, you can see that the main_input is processed differently than the aux_input and both are thereafter merged together to be propagated through the rest of the layers of the network.



          enter image description here







          share|improve this answer














          share|improve this answer



          share|improve this answer








          edited 6 hours ago

























          answered 6 hours ago









          pcko1pcko1

          1,414217




          1,414217












          • $begingroup$
            Hello, thank you for the answer, name "multi-input/output convultional neural network" seems to be a key word here. Are you aware of multi-input convolutional neural networks in raw tensorflow (or is it better to use high-level library for this purpose?).
            $endgroup$
            – ShellRox
            6 hours ago






          • 1




            $begingroup$
            Hi, to be honest I mostly work on a higher level with Keras and I strongly believe that you will find it quite easy to implement this kind of architecture in Keras, similarly to the attached image :)
            $endgroup$
            – pcko1
            6 hours ago


















          • $begingroup$
            Hello, thank you for the answer, name "multi-input/output convultional neural network" seems to be a key word here. Are you aware of multi-input convolutional neural networks in raw tensorflow (or is it better to use high-level library for this purpose?).
            $endgroup$
            – ShellRox
            6 hours ago






          • 1




            $begingroup$
            Hi, to be honest I mostly work on a higher level with Keras and I strongly believe that you will find it quite easy to implement this kind of architecture in Keras, similarly to the attached image :)
            $endgroup$
            – pcko1
            6 hours ago
















          $begingroup$
          Hello, thank you for the answer, name "multi-input/output convultional neural network" seems to be a key word here. Are you aware of multi-input convolutional neural networks in raw tensorflow (or is it better to use high-level library for this purpose?).
          $endgroup$
          – ShellRox
          6 hours ago




          $begingroup$
          Hello, thank you for the answer, name "multi-input/output convultional neural network" seems to be a key word here. Are you aware of multi-input convolutional neural networks in raw tensorflow (or is it better to use high-level library for this purpose?).
          $endgroup$
          – ShellRox
          6 hours ago




          1




          1




          $begingroup$
          Hi, to be honest I mostly work on a higher level with Keras and I strongly believe that you will find it quite easy to implement this kind of architecture in Keras, similarly to the attached image :)
          $endgroup$
          – pcko1
          6 hours ago




          $begingroup$
          Hi, to be honest I mostly work on a higher level with Keras and I strongly believe that you will find it quite easy to implement this kind of architecture in Keras, similarly to the attached image :)
          $endgroup$
          – pcko1
          6 hours ago


















          draft saved

          draft discarded




















































          Thanks for contributing an answer to Data Science Stack Exchange!


          • Please be sure to answer the question. Provide details and share your research!

          But avoid



          • Asking for help, clarification, or responding to other answers.

          • Making statements based on opinion; back them up with references or personal experience.


          Use MathJax to format equations. MathJax reference.


          To learn more, see our tips on writing great answers.




          draft saved


          draft discarded














          StackExchange.ready(
          function () {
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f44246%2fwhat-is-exactly-meant-by-neural-network-that-can-take-different-types-of-input%23new-answer', 'question_page');
          }
          );

          Post as a guest















          Required, but never shown





















































          Required, but never shown














          Required, but never shown












          Required, but never shown







          Required, but never shown

































          Required, but never shown














          Required, but never shown












          Required, but never shown







          Required, but never shown







          Popular posts from this blog

          How to label and detect the document text images

          Vallis Paradisi

          Tabula Rosettana