MSE loss different in Keras and PyToch












2












$begingroup$


My problem is that in PyTorch I cannot reproduce the MSE loss that I have achieved in Keras.



I have trained the following model in Keras:



from keras.models import Sequential
from keras.layers import Dense

model = Sequential()
model.add(Dense(10))
model.add(Dense(1))
model.compile(optimizer = "adam", loss = "mean_squared_error")
model.fit(X_train, y_train,
batch_size = 32,
epochs = 200
)


The shape of the training data is:



print(X_train.shape)
>>>(3550, 10)
print(y_train.shape)
>>>(3550,)


After training the MSE is ~0.15:



mse_train = model.evaluate(X_train, y_train)
>>>3550/3550 [==============================] - 0s 18us/step
print("Train MSE: ", mse_train)
>>>Train MSE: 0.1499910642017781


Then I initialize the same model in PyTorch:



import torch
import torch.nn as nn

class NN(nn.Module):
def __init__(self):
super(NN, self).__init__()
self.dense1 = nn.Linear(10, 10)
self.dense2 = nn.Linear(10, 1)

def forward(self, x):
out = self.dense1(x)
out = self.dense2(out)
return out

net = NN()
criterion = nn.MSELoss()


And assign the weights I have achieved in Keras:



from keras.models import load_model
keras_model = load_model(MODEL_PATH)

dense_weights = keras_model.layers[0].get_weights()
weights = torch.tensor(dense_weights[0].swapaxes(0,1))
bias = torch.tensor(dense_weights[1])
net.dense1.weight.data = weights
net.dense1.bias.data = bias

dense_weights = keras_model.layers[1].get_weights()
weights = torch.tensor(dense_weights[0].swapaxes(0,1))
bias = torch.tensor(dense_weights[1])
net.dense2.weight.data = weights
net.dense2.bias.data = bias


Now I try to calculate the MSE loss:



X_train_torch = torch.tensor(X_train, dtype=torch.float)
y_train_torch = torch.tensor(y_train, dtype=torch.float)

outputs = net(X_train_torch)
loss = criterion(outputs, y_train_torch)
print("Train loss: ", loss)
>>>Train loss: 0.338391376896338


The MSE is now ~0.34 and is twice as big as calculated in Keras.



What could be the reason? Is there a bug in my calculation?










share|improve this question









New contributor




Andy is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.







$endgroup$

















    2












    $begingroup$


    My problem is that in PyTorch I cannot reproduce the MSE loss that I have achieved in Keras.



    I have trained the following model in Keras:



    from keras.models import Sequential
    from keras.layers import Dense

    model = Sequential()
    model.add(Dense(10))
    model.add(Dense(1))
    model.compile(optimizer = "adam", loss = "mean_squared_error")
    model.fit(X_train, y_train,
    batch_size = 32,
    epochs = 200
    )


    The shape of the training data is:



    print(X_train.shape)
    >>>(3550, 10)
    print(y_train.shape)
    >>>(3550,)


    After training the MSE is ~0.15:



    mse_train = model.evaluate(X_train, y_train)
    >>>3550/3550 [==============================] - 0s 18us/step
    print("Train MSE: ", mse_train)
    >>>Train MSE: 0.1499910642017781


    Then I initialize the same model in PyTorch:



    import torch
    import torch.nn as nn

    class NN(nn.Module):
    def __init__(self):
    super(NN, self).__init__()
    self.dense1 = nn.Linear(10, 10)
    self.dense2 = nn.Linear(10, 1)

    def forward(self, x):
    out = self.dense1(x)
    out = self.dense2(out)
    return out

    net = NN()
    criterion = nn.MSELoss()


    And assign the weights I have achieved in Keras:



    from keras.models import load_model
    keras_model = load_model(MODEL_PATH)

    dense_weights = keras_model.layers[0].get_weights()
    weights = torch.tensor(dense_weights[0].swapaxes(0,1))
    bias = torch.tensor(dense_weights[1])
    net.dense1.weight.data = weights
    net.dense1.bias.data = bias

    dense_weights = keras_model.layers[1].get_weights()
    weights = torch.tensor(dense_weights[0].swapaxes(0,1))
    bias = torch.tensor(dense_weights[1])
    net.dense2.weight.data = weights
    net.dense2.bias.data = bias


    Now I try to calculate the MSE loss:



    X_train_torch = torch.tensor(X_train, dtype=torch.float)
    y_train_torch = torch.tensor(y_train, dtype=torch.float)

    outputs = net(X_train_torch)
    loss = criterion(outputs, y_train_torch)
    print("Train loss: ", loss)
    >>>Train loss: 0.338391376896338


    The MSE is now ~0.34 and is twice as big as calculated in Keras.



    What could be the reason? Is there a bug in my calculation?










    share|improve this question









    New contributor




    Andy is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
    Check out our Code of Conduct.







    $endgroup$















      2












      2








      2


      1



      $begingroup$


      My problem is that in PyTorch I cannot reproduce the MSE loss that I have achieved in Keras.



      I have trained the following model in Keras:



      from keras.models import Sequential
      from keras.layers import Dense

      model = Sequential()
      model.add(Dense(10))
      model.add(Dense(1))
      model.compile(optimizer = "adam", loss = "mean_squared_error")
      model.fit(X_train, y_train,
      batch_size = 32,
      epochs = 200
      )


      The shape of the training data is:



      print(X_train.shape)
      >>>(3550, 10)
      print(y_train.shape)
      >>>(3550,)


      After training the MSE is ~0.15:



      mse_train = model.evaluate(X_train, y_train)
      >>>3550/3550 [==============================] - 0s 18us/step
      print("Train MSE: ", mse_train)
      >>>Train MSE: 0.1499910642017781


      Then I initialize the same model in PyTorch:



      import torch
      import torch.nn as nn

      class NN(nn.Module):
      def __init__(self):
      super(NN, self).__init__()
      self.dense1 = nn.Linear(10, 10)
      self.dense2 = nn.Linear(10, 1)

      def forward(self, x):
      out = self.dense1(x)
      out = self.dense2(out)
      return out

      net = NN()
      criterion = nn.MSELoss()


      And assign the weights I have achieved in Keras:



      from keras.models import load_model
      keras_model = load_model(MODEL_PATH)

      dense_weights = keras_model.layers[0].get_weights()
      weights = torch.tensor(dense_weights[0].swapaxes(0,1))
      bias = torch.tensor(dense_weights[1])
      net.dense1.weight.data = weights
      net.dense1.bias.data = bias

      dense_weights = keras_model.layers[1].get_weights()
      weights = torch.tensor(dense_weights[0].swapaxes(0,1))
      bias = torch.tensor(dense_weights[1])
      net.dense2.weight.data = weights
      net.dense2.bias.data = bias


      Now I try to calculate the MSE loss:



      X_train_torch = torch.tensor(X_train, dtype=torch.float)
      y_train_torch = torch.tensor(y_train, dtype=torch.float)

      outputs = net(X_train_torch)
      loss = criterion(outputs, y_train_torch)
      print("Train loss: ", loss)
      >>>Train loss: 0.338391376896338


      The MSE is now ~0.34 and is twice as big as calculated in Keras.



      What could be the reason? Is there a bug in my calculation?










      share|improve this question









      New contributor




      Andy is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
      Check out our Code of Conduct.







      $endgroup$




      My problem is that in PyTorch I cannot reproduce the MSE loss that I have achieved in Keras.



      I have trained the following model in Keras:



      from keras.models import Sequential
      from keras.layers import Dense

      model = Sequential()
      model.add(Dense(10))
      model.add(Dense(1))
      model.compile(optimizer = "adam", loss = "mean_squared_error")
      model.fit(X_train, y_train,
      batch_size = 32,
      epochs = 200
      )


      The shape of the training data is:



      print(X_train.shape)
      >>>(3550, 10)
      print(y_train.shape)
      >>>(3550,)


      After training the MSE is ~0.15:



      mse_train = model.evaluate(X_train, y_train)
      >>>3550/3550 [==============================] - 0s 18us/step
      print("Train MSE: ", mse_train)
      >>>Train MSE: 0.1499910642017781


      Then I initialize the same model in PyTorch:



      import torch
      import torch.nn as nn

      class NN(nn.Module):
      def __init__(self):
      super(NN, self).__init__()
      self.dense1 = nn.Linear(10, 10)
      self.dense2 = nn.Linear(10, 1)

      def forward(self, x):
      out = self.dense1(x)
      out = self.dense2(out)
      return out

      net = NN()
      criterion = nn.MSELoss()


      And assign the weights I have achieved in Keras:



      from keras.models import load_model
      keras_model = load_model(MODEL_PATH)

      dense_weights = keras_model.layers[0].get_weights()
      weights = torch.tensor(dense_weights[0].swapaxes(0,1))
      bias = torch.tensor(dense_weights[1])
      net.dense1.weight.data = weights
      net.dense1.bias.data = bias

      dense_weights = keras_model.layers[1].get_weights()
      weights = torch.tensor(dense_weights[0].swapaxes(0,1))
      bias = torch.tensor(dense_weights[1])
      net.dense2.weight.data = weights
      net.dense2.bias.data = bias


      Now I try to calculate the MSE loss:



      X_train_torch = torch.tensor(X_train, dtype=torch.float)
      y_train_torch = torch.tensor(y_train, dtype=torch.float)

      outputs = net(X_train_torch)
      loss = criterion(outputs, y_train_torch)
      print("Train loss: ", loss)
      >>>Train loss: 0.338391376896338


      The MSE is now ~0.34 and is twice as big as calculated in Keras.



      What could be the reason? Is there a bug in my calculation?







      keras loss-function pytorch






      share|improve this question









      New contributor




      Andy is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
      Check out our Code of Conduct.











      share|improve this question









      New contributor




      Andy is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
      Check out our Code of Conduct.









      share|improve this question




      share|improve this question








      edited 2 days ago







      Andy













      New contributor




      Andy is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
      Check out our Code of Conduct.









      asked 2 days ago









      AndyAndy

      1113




      1113




      New contributor




      Andy is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
      Check out our Code of Conduct.





      New contributor





      Andy is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
      Check out our Code of Conduct.






      Andy is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
      Check out our Code of Conduct.






















          1 Answer
          1






          active

          oldest

          votes


















          0












          $begingroup$

          The problem was that outputs and y_train_torch had different shapes.






          share|improve this answer








          New contributor




          Andy is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
          Check out our Code of Conduct.






          $endgroup$













            Your Answer





            StackExchange.ifUsing("editor", function () {
            return StackExchange.using("mathjaxEditing", function () {
            StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix) {
            StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\\(","\\)"]]);
            });
            });
            }, "mathjax-editing");

            StackExchange.ready(function() {
            var channelOptions = {
            tags: "".split(" "),
            id: "557"
            };
            initTagRenderer("".split(" "), "".split(" "), channelOptions);

            StackExchange.using("externalEditor", function() {
            // Have to fire editor after snippets, if snippets enabled
            if (StackExchange.settings.snippets.snippetsEnabled) {
            StackExchange.using("snippets", function() {
            createEditor();
            });
            }
            else {
            createEditor();
            }
            });

            function createEditor() {
            StackExchange.prepareEditor({
            heartbeatType: 'answer',
            autoActivateHeartbeat: false,
            convertImagesToLinks: false,
            noModals: true,
            showLowRepImageUploadWarning: true,
            reputationToPostImages: null,
            bindNavPrevention: true,
            postfix: "",
            imageUploader: {
            brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
            contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
            allowUrls: true
            },
            onDemand: true,
            discardSelector: ".discard-answer"
            ,immediatelyShowMarkdownHelp:true
            });


            }
            });






            Andy is a new contributor. Be nice, and check out our Code of Conduct.










            draft saved

            draft discarded


















            StackExchange.ready(
            function () {
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f47035%2fmse-loss-different-in-keras-and-pytoch%23new-answer', 'question_page');
            }
            );

            Post as a guest















            Required, but never shown

























            1 Answer
            1






            active

            oldest

            votes








            1 Answer
            1






            active

            oldest

            votes









            active

            oldest

            votes






            active

            oldest

            votes









            0












            $begingroup$

            The problem was that outputs and y_train_torch had different shapes.






            share|improve this answer








            New contributor




            Andy is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
            Check out our Code of Conduct.






            $endgroup$


















              0












              $begingroup$

              The problem was that outputs and y_train_torch had different shapes.






              share|improve this answer








              New contributor




              Andy is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
              Check out our Code of Conduct.






              $endgroup$
















                0












                0








                0





                $begingroup$

                The problem was that outputs and y_train_torch had different shapes.






                share|improve this answer








                New contributor




                Andy is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
                Check out our Code of Conduct.






                $endgroup$



                The problem was that outputs and y_train_torch had different shapes.







                share|improve this answer








                New contributor




                Andy is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
                Check out our Code of Conduct.









                share|improve this answer



                share|improve this answer






                New contributor




                Andy is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
                Check out our Code of Conduct.









                answered 1 hour ago









                AndyAndy

                1113




                1113




                New contributor




                Andy is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
                Check out our Code of Conduct.





                New contributor





                Andy is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
                Check out our Code of Conduct.






                Andy is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
                Check out our Code of Conduct.






















                    Andy is a new contributor. Be nice, and check out our Code of Conduct.










                    draft saved

                    draft discarded


















                    Andy is a new contributor. Be nice, and check out our Code of Conduct.













                    Andy is a new contributor. Be nice, and check out our Code of Conduct.












                    Andy is a new contributor. Be nice, and check out our Code of Conduct.
















                    Thanks for contributing an answer to Data Science Stack Exchange!


                    • Please be sure to answer the question. Provide details and share your research!

                    But avoid



                    • Asking for help, clarification, or responding to other answers.

                    • Making statements based on opinion; back them up with references or personal experience.


                    Use MathJax to format equations. MathJax reference.


                    To learn more, see our tips on writing great answers.




                    draft saved


                    draft discarded














                    StackExchange.ready(
                    function () {
                    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f47035%2fmse-loss-different-in-keras-and-pytoch%23new-answer', 'question_page');
                    }
                    );

                    Post as a guest















                    Required, but never shown





















































                    Required, but never shown














                    Required, but never shown












                    Required, but never shown







                    Required, but never shown

































                    Required, but never shown














                    Required, but never shown












                    Required, but never shown







                    Required, but never shown