Variational AutoEncoder giving negative loss












1












$begingroup$


I'm learning about variational autoencoders and I've implemented a simple example in keras, model summary below. I've copied the loss function from one of Francois Chollet's blog posts and I'm getting really really negative losses. What am I missing here?



    Model: "model_1"
__________________________________________________________________________________________________
Layer (type) Output Shape Param # Connected to
==================================================================================================
input_1 (InputLayer) [(None, 224)] 0
__________________________________________________________________________________________________
encoding_flatten (Flatten) (None, 224) 0 input_1[0][0]
__________________________________________________________________________________________________
encoding_layer_2 (Dense) (None, 256) 57600 encoding_flatten[0][0]
__________________________________________________________________________________________________
encoding_layer_3 (Dense) (None, 128) 32896 encoding_layer_2[0][0]
__________________________________________________________________________________________________
encoding_layer_4 (Dense) (None, 64) 8256 encoding_layer_3[0][0]
__________________________________________________________________________________________________
encoding_layer_5 (Dense) (None, 32) 2080 encoding_layer_4[0][0]
__________________________________________________________________________________________________
encoding_layer_6 (Dense) (None, 16) 528 encoding_layer_5[0][0]
__________________________________________________________________________________________________
encoder_mean (Dense) (None, 16) 272 encoding_layer_6[0][0]
__________________________________________________________________________________________________
encoder_sigma (Dense) (None, 16) 272 encoding_layer_6[0][0]
__________________________________________________________________________________________________
lambda (Lambda) (None, 16) 0 encoder_mean[0][0]
encoder_sigma[0][0]
__________________________________________________________________________________________________
decoder_layer_1 (Dense) (None, 16) 272 lambda[0][0]
__________________________________________________________________________________________________
decoder_layer_2 (Dense) (None, 32) 544 decoder_layer_1[0][0]
__________________________________________________________________________________________________
decoder_layer_3 (Dense) (None, 64) 2112 decoder_layer_2[0][0]
__________________________________________________________________________________________________
decoder_layer_4 (Dense) (None, 128) 8320 decoder_layer_3[0][0]
__________________________________________________________________________________________________
decoder_layer_5 (Dense) (None, 256) 33024 decoder_layer_4[0][0]
__________________________________________________________________________________________________
decoder_mean (Dense) (None, 224) 57568 decoder_layer_5[0][0]
==================================================================================================
Total params: 203,744
Trainable params: 203,744
Non-trainable params: 0
__________________________________________________________________________________________________
Train on 3974 samples, validate on 994 samples
Epoch 1/10
3974/3974 [==============================] - 3s 677us/sample - loss: -28.1519 - val_loss: -33.5864
Epoch 2/10
3974/3974 [==============================] - 1s 346us/sample - loss: -137258.8175 - val_loss: -3683802.1489
Epoch 3/10
3974/3974 [==============================] - 1s 344us/sample - loss: -14543022903.6056 - val_loss: -107811177469.9396
Epoch 4/10
3974/3974 [==============================] - 1s 363us/sample - loss: -3011718676570.7012 - val_loss: -13131454938476.6816
Epoch 5/10
3974/3974 [==============================] - 1s 350us/sample - loss: -101442605943572.4844 - val_loss: -322685056398605.9375
Epoch 6/10
3974/3974 [==============================] - 1s 344us/sample - loss: -1417424385529640.5000 - val_loss: -3687688508198145.5000
Epoch 7/10
3974/3974 [==============================] - 1s 358us/sample - loss: -11794297368126698.0000 - val_loss: -26632844827070784.0000
Epoch 8/10
3974/3974 [==============================] - 1s 339us/sample - loss: -69508229806130784.0000 - val_loss: -141312065640756336.0000
Epoch 9/10
3974/3974 [==============================] - 1s 345us/sample - loss: -319838384005810432.0000 - val_loss: -599553350073361152.0000
Epoch 10/10
3974/3974 [==============================] - 1s 342us/sample - loss: -1221653451351326464.0000 - val_loss: -2147128507956525312.0000


latent sample func:



def sampling(self,args):
"""Reparameterization trick by sampling fr an isotropic unit Gaussian.
# Arguments
args (tensor): mean and log of variance of Q(z|X)
# Returns
z (tensor): sampled latent vector
"""

z_mean, z_log_var = args
set = tf.shape(z_mean)[0]
batch = tf.shape(z_mean)[1]
dim = tf.shape(z_mean)[-1]
# by default, random_normal has mean=0 and std=1.0
epsilon = tf.random.normal(shape=(set, dim))#tfp.distributions.Normal(mean=tf.zeros(shape=(batch, dim)),loc=tf.ones(shape=(batch, dim)))
return z_mean + (z_log_var * epsilon)


Loss func:



def vae_loss(self,input, x_decoded_mean):
xent_loss = tf.reduce_mean(tf.keras.backend.binary_crossentropy(input, x_decoded_mean))
kl_loss = -0.5 * tf.reduce_sum(tf.square(self.encoded_mean) + tf.square(self.encoded_sigma) - tf.math.log(tf.square(self.encoded_sigma)) - 1, -1)
return xent_loss + kl_loss


Another vae_loss implementation:



def vae_loss(self,input, x_decoded_mean):
gen_loss = tf.reduce_sum(tf.keras.backend.binary_crossentropy(input, x_decoded_mean))
#gen_loss = tf.losses.mean_squared_error(input,x_decoded_mean)
kl_loss = -0.5 * tf.reduce_sum(1 + self.encoded_sigma - tf.square(self.encoded_mean) - tf.exp(self.encoded_sigma), -1)
return tf.reduce_mean(gen_loss + kl_loss)


log_sigma kl_loss:



kl_loss = 0.5 * tf.reduce_sum(tf.square(self.encoded_mean) + tf.square(tf.exp(self.encoded_sigma)) - self.encoded_sigma - 1, axis=-1)









share|improve this question









New contributor




Jed is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.







$endgroup$








  • 2




    $begingroup$
    Welcome to StackExchange.DS! KL loss must be minimized, I think it should be +0.5 to decrease the mean and std toward 0 and 1 respectively. Let me know if this was the problem
    $endgroup$
    – Esmailian
    10 hours ago












  • $begingroup$
    @Esmailian Thanks for the suggestion. Unfortunately, no that doesn't help. I think there is a problem with my implementation of KL divergence and/or the sampling. There are a lot of implementations where both or one of the loss components is negative: blog.keras.io/building-autoencoders-in-keras.html, jmetzen.github.io/2015-11-27/vae.html, etc. Not really sure where the difference lies, but I'm expecting that this is due to slight variations in the overall implementation
    $endgroup$
    – Jed
    8 hours ago






  • 2




    $begingroup$
    In Keras by Francois Chollet, the terms inside K.mean are negative of yours, that's why -0.5 works for them.
    $endgroup$
    – Esmailian
    8 hours ago








  • 2




    $begingroup$
    Also, another trick is that we let the network produce log(sigma) instead of sigma and then exponentiate it (the same as what Francois Chollet does) for stability. Take a look at side notes of this answer.
    $endgroup$
    – Esmailian
    8 hours ago










  • $begingroup$
    thanks, if you make the network generate log_sigma, then is you loss function going to work out to the above log_sigma kl_loss?
    $endgroup$
    – Jed
    8 hours ago
















1












$begingroup$


I'm learning about variational autoencoders and I've implemented a simple example in keras, model summary below. I've copied the loss function from one of Francois Chollet's blog posts and I'm getting really really negative losses. What am I missing here?



    Model: "model_1"
__________________________________________________________________________________________________
Layer (type) Output Shape Param # Connected to
==================================================================================================
input_1 (InputLayer) [(None, 224)] 0
__________________________________________________________________________________________________
encoding_flatten (Flatten) (None, 224) 0 input_1[0][0]
__________________________________________________________________________________________________
encoding_layer_2 (Dense) (None, 256) 57600 encoding_flatten[0][0]
__________________________________________________________________________________________________
encoding_layer_3 (Dense) (None, 128) 32896 encoding_layer_2[0][0]
__________________________________________________________________________________________________
encoding_layer_4 (Dense) (None, 64) 8256 encoding_layer_3[0][0]
__________________________________________________________________________________________________
encoding_layer_5 (Dense) (None, 32) 2080 encoding_layer_4[0][0]
__________________________________________________________________________________________________
encoding_layer_6 (Dense) (None, 16) 528 encoding_layer_5[0][0]
__________________________________________________________________________________________________
encoder_mean (Dense) (None, 16) 272 encoding_layer_6[0][0]
__________________________________________________________________________________________________
encoder_sigma (Dense) (None, 16) 272 encoding_layer_6[0][0]
__________________________________________________________________________________________________
lambda (Lambda) (None, 16) 0 encoder_mean[0][0]
encoder_sigma[0][0]
__________________________________________________________________________________________________
decoder_layer_1 (Dense) (None, 16) 272 lambda[0][0]
__________________________________________________________________________________________________
decoder_layer_2 (Dense) (None, 32) 544 decoder_layer_1[0][0]
__________________________________________________________________________________________________
decoder_layer_3 (Dense) (None, 64) 2112 decoder_layer_2[0][0]
__________________________________________________________________________________________________
decoder_layer_4 (Dense) (None, 128) 8320 decoder_layer_3[0][0]
__________________________________________________________________________________________________
decoder_layer_5 (Dense) (None, 256) 33024 decoder_layer_4[0][0]
__________________________________________________________________________________________________
decoder_mean (Dense) (None, 224) 57568 decoder_layer_5[0][0]
==================================================================================================
Total params: 203,744
Trainable params: 203,744
Non-trainable params: 0
__________________________________________________________________________________________________
Train on 3974 samples, validate on 994 samples
Epoch 1/10
3974/3974 [==============================] - 3s 677us/sample - loss: -28.1519 - val_loss: -33.5864
Epoch 2/10
3974/3974 [==============================] - 1s 346us/sample - loss: -137258.8175 - val_loss: -3683802.1489
Epoch 3/10
3974/3974 [==============================] - 1s 344us/sample - loss: -14543022903.6056 - val_loss: -107811177469.9396
Epoch 4/10
3974/3974 [==============================] - 1s 363us/sample - loss: -3011718676570.7012 - val_loss: -13131454938476.6816
Epoch 5/10
3974/3974 [==============================] - 1s 350us/sample - loss: -101442605943572.4844 - val_loss: -322685056398605.9375
Epoch 6/10
3974/3974 [==============================] - 1s 344us/sample - loss: -1417424385529640.5000 - val_loss: -3687688508198145.5000
Epoch 7/10
3974/3974 [==============================] - 1s 358us/sample - loss: -11794297368126698.0000 - val_loss: -26632844827070784.0000
Epoch 8/10
3974/3974 [==============================] - 1s 339us/sample - loss: -69508229806130784.0000 - val_loss: -141312065640756336.0000
Epoch 9/10
3974/3974 [==============================] - 1s 345us/sample - loss: -319838384005810432.0000 - val_loss: -599553350073361152.0000
Epoch 10/10
3974/3974 [==============================] - 1s 342us/sample - loss: -1221653451351326464.0000 - val_loss: -2147128507956525312.0000


latent sample func:



def sampling(self,args):
"""Reparameterization trick by sampling fr an isotropic unit Gaussian.
# Arguments
args (tensor): mean and log of variance of Q(z|X)
# Returns
z (tensor): sampled latent vector
"""

z_mean, z_log_var = args
set = tf.shape(z_mean)[0]
batch = tf.shape(z_mean)[1]
dim = tf.shape(z_mean)[-1]
# by default, random_normal has mean=0 and std=1.0
epsilon = tf.random.normal(shape=(set, dim))#tfp.distributions.Normal(mean=tf.zeros(shape=(batch, dim)),loc=tf.ones(shape=(batch, dim)))
return z_mean + (z_log_var * epsilon)


Loss func:



def vae_loss(self,input, x_decoded_mean):
xent_loss = tf.reduce_mean(tf.keras.backend.binary_crossentropy(input, x_decoded_mean))
kl_loss = -0.5 * tf.reduce_sum(tf.square(self.encoded_mean) + tf.square(self.encoded_sigma) - tf.math.log(tf.square(self.encoded_sigma)) - 1, -1)
return xent_loss + kl_loss


Another vae_loss implementation:



def vae_loss(self,input, x_decoded_mean):
gen_loss = tf.reduce_sum(tf.keras.backend.binary_crossentropy(input, x_decoded_mean))
#gen_loss = tf.losses.mean_squared_error(input,x_decoded_mean)
kl_loss = -0.5 * tf.reduce_sum(1 + self.encoded_sigma - tf.square(self.encoded_mean) - tf.exp(self.encoded_sigma), -1)
return tf.reduce_mean(gen_loss + kl_loss)


log_sigma kl_loss:



kl_loss = 0.5 * tf.reduce_sum(tf.square(self.encoded_mean) + tf.square(tf.exp(self.encoded_sigma)) - self.encoded_sigma - 1, axis=-1)









share|improve this question









New contributor




Jed is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.







$endgroup$








  • 2




    $begingroup$
    Welcome to StackExchange.DS! KL loss must be minimized, I think it should be +0.5 to decrease the mean and std toward 0 and 1 respectively. Let me know if this was the problem
    $endgroup$
    – Esmailian
    10 hours ago












  • $begingroup$
    @Esmailian Thanks for the suggestion. Unfortunately, no that doesn't help. I think there is a problem with my implementation of KL divergence and/or the sampling. There are a lot of implementations where both or one of the loss components is negative: blog.keras.io/building-autoencoders-in-keras.html, jmetzen.github.io/2015-11-27/vae.html, etc. Not really sure where the difference lies, but I'm expecting that this is due to slight variations in the overall implementation
    $endgroup$
    – Jed
    8 hours ago






  • 2




    $begingroup$
    In Keras by Francois Chollet, the terms inside K.mean are negative of yours, that's why -0.5 works for them.
    $endgroup$
    – Esmailian
    8 hours ago








  • 2




    $begingroup$
    Also, another trick is that we let the network produce log(sigma) instead of sigma and then exponentiate it (the same as what Francois Chollet does) for stability. Take a look at side notes of this answer.
    $endgroup$
    – Esmailian
    8 hours ago










  • $begingroup$
    thanks, if you make the network generate log_sigma, then is you loss function going to work out to the above log_sigma kl_loss?
    $endgroup$
    – Jed
    8 hours ago














1












1








1





$begingroup$


I'm learning about variational autoencoders and I've implemented a simple example in keras, model summary below. I've copied the loss function from one of Francois Chollet's blog posts and I'm getting really really negative losses. What am I missing here?



    Model: "model_1"
__________________________________________________________________________________________________
Layer (type) Output Shape Param # Connected to
==================================================================================================
input_1 (InputLayer) [(None, 224)] 0
__________________________________________________________________________________________________
encoding_flatten (Flatten) (None, 224) 0 input_1[0][0]
__________________________________________________________________________________________________
encoding_layer_2 (Dense) (None, 256) 57600 encoding_flatten[0][0]
__________________________________________________________________________________________________
encoding_layer_3 (Dense) (None, 128) 32896 encoding_layer_2[0][0]
__________________________________________________________________________________________________
encoding_layer_4 (Dense) (None, 64) 8256 encoding_layer_3[0][0]
__________________________________________________________________________________________________
encoding_layer_5 (Dense) (None, 32) 2080 encoding_layer_4[0][0]
__________________________________________________________________________________________________
encoding_layer_6 (Dense) (None, 16) 528 encoding_layer_5[0][0]
__________________________________________________________________________________________________
encoder_mean (Dense) (None, 16) 272 encoding_layer_6[0][0]
__________________________________________________________________________________________________
encoder_sigma (Dense) (None, 16) 272 encoding_layer_6[0][0]
__________________________________________________________________________________________________
lambda (Lambda) (None, 16) 0 encoder_mean[0][0]
encoder_sigma[0][0]
__________________________________________________________________________________________________
decoder_layer_1 (Dense) (None, 16) 272 lambda[0][0]
__________________________________________________________________________________________________
decoder_layer_2 (Dense) (None, 32) 544 decoder_layer_1[0][0]
__________________________________________________________________________________________________
decoder_layer_3 (Dense) (None, 64) 2112 decoder_layer_2[0][0]
__________________________________________________________________________________________________
decoder_layer_4 (Dense) (None, 128) 8320 decoder_layer_3[0][0]
__________________________________________________________________________________________________
decoder_layer_5 (Dense) (None, 256) 33024 decoder_layer_4[0][0]
__________________________________________________________________________________________________
decoder_mean (Dense) (None, 224) 57568 decoder_layer_5[0][0]
==================================================================================================
Total params: 203,744
Trainable params: 203,744
Non-trainable params: 0
__________________________________________________________________________________________________
Train on 3974 samples, validate on 994 samples
Epoch 1/10
3974/3974 [==============================] - 3s 677us/sample - loss: -28.1519 - val_loss: -33.5864
Epoch 2/10
3974/3974 [==============================] - 1s 346us/sample - loss: -137258.8175 - val_loss: -3683802.1489
Epoch 3/10
3974/3974 [==============================] - 1s 344us/sample - loss: -14543022903.6056 - val_loss: -107811177469.9396
Epoch 4/10
3974/3974 [==============================] - 1s 363us/sample - loss: -3011718676570.7012 - val_loss: -13131454938476.6816
Epoch 5/10
3974/3974 [==============================] - 1s 350us/sample - loss: -101442605943572.4844 - val_loss: -322685056398605.9375
Epoch 6/10
3974/3974 [==============================] - 1s 344us/sample - loss: -1417424385529640.5000 - val_loss: -3687688508198145.5000
Epoch 7/10
3974/3974 [==============================] - 1s 358us/sample - loss: -11794297368126698.0000 - val_loss: -26632844827070784.0000
Epoch 8/10
3974/3974 [==============================] - 1s 339us/sample - loss: -69508229806130784.0000 - val_loss: -141312065640756336.0000
Epoch 9/10
3974/3974 [==============================] - 1s 345us/sample - loss: -319838384005810432.0000 - val_loss: -599553350073361152.0000
Epoch 10/10
3974/3974 [==============================] - 1s 342us/sample - loss: -1221653451351326464.0000 - val_loss: -2147128507956525312.0000


latent sample func:



def sampling(self,args):
"""Reparameterization trick by sampling fr an isotropic unit Gaussian.
# Arguments
args (tensor): mean and log of variance of Q(z|X)
# Returns
z (tensor): sampled latent vector
"""

z_mean, z_log_var = args
set = tf.shape(z_mean)[0]
batch = tf.shape(z_mean)[1]
dim = tf.shape(z_mean)[-1]
# by default, random_normal has mean=0 and std=1.0
epsilon = tf.random.normal(shape=(set, dim))#tfp.distributions.Normal(mean=tf.zeros(shape=(batch, dim)),loc=tf.ones(shape=(batch, dim)))
return z_mean + (z_log_var * epsilon)


Loss func:



def vae_loss(self,input, x_decoded_mean):
xent_loss = tf.reduce_mean(tf.keras.backend.binary_crossentropy(input, x_decoded_mean))
kl_loss = -0.5 * tf.reduce_sum(tf.square(self.encoded_mean) + tf.square(self.encoded_sigma) - tf.math.log(tf.square(self.encoded_sigma)) - 1, -1)
return xent_loss + kl_loss


Another vae_loss implementation:



def vae_loss(self,input, x_decoded_mean):
gen_loss = tf.reduce_sum(tf.keras.backend.binary_crossentropy(input, x_decoded_mean))
#gen_loss = tf.losses.mean_squared_error(input,x_decoded_mean)
kl_loss = -0.5 * tf.reduce_sum(1 + self.encoded_sigma - tf.square(self.encoded_mean) - tf.exp(self.encoded_sigma), -1)
return tf.reduce_mean(gen_loss + kl_loss)


log_sigma kl_loss:



kl_loss = 0.5 * tf.reduce_sum(tf.square(self.encoded_mean) + tf.square(tf.exp(self.encoded_sigma)) - self.encoded_sigma - 1, axis=-1)









share|improve this question









New contributor




Jed is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.







$endgroup$




I'm learning about variational autoencoders and I've implemented a simple example in keras, model summary below. I've copied the loss function from one of Francois Chollet's blog posts and I'm getting really really negative losses. What am I missing here?



    Model: "model_1"
__________________________________________________________________________________________________
Layer (type) Output Shape Param # Connected to
==================================================================================================
input_1 (InputLayer) [(None, 224)] 0
__________________________________________________________________________________________________
encoding_flatten (Flatten) (None, 224) 0 input_1[0][0]
__________________________________________________________________________________________________
encoding_layer_2 (Dense) (None, 256) 57600 encoding_flatten[0][0]
__________________________________________________________________________________________________
encoding_layer_3 (Dense) (None, 128) 32896 encoding_layer_2[0][0]
__________________________________________________________________________________________________
encoding_layer_4 (Dense) (None, 64) 8256 encoding_layer_3[0][0]
__________________________________________________________________________________________________
encoding_layer_5 (Dense) (None, 32) 2080 encoding_layer_4[0][0]
__________________________________________________________________________________________________
encoding_layer_6 (Dense) (None, 16) 528 encoding_layer_5[0][0]
__________________________________________________________________________________________________
encoder_mean (Dense) (None, 16) 272 encoding_layer_6[0][0]
__________________________________________________________________________________________________
encoder_sigma (Dense) (None, 16) 272 encoding_layer_6[0][0]
__________________________________________________________________________________________________
lambda (Lambda) (None, 16) 0 encoder_mean[0][0]
encoder_sigma[0][0]
__________________________________________________________________________________________________
decoder_layer_1 (Dense) (None, 16) 272 lambda[0][0]
__________________________________________________________________________________________________
decoder_layer_2 (Dense) (None, 32) 544 decoder_layer_1[0][0]
__________________________________________________________________________________________________
decoder_layer_3 (Dense) (None, 64) 2112 decoder_layer_2[0][0]
__________________________________________________________________________________________________
decoder_layer_4 (Dense) (None, 128) 8320 decoder_layer_3[0][0]
__________________________________________________________________________________________________
decoder_layer_5 (Dense) (None, 256) 33024 decoder_layer_4[0][0]
__________________________________________________________________________________________________
decoder_mean (Dense) (None, 224) 57568 decoder_layer_5[0][0]
==================================================================================================
Total params: 203,744
Trainable params: 203,744
Non-trainable params: 0
__________________________________________________________________________________________________
Train on 3974 samples, validate on 994 samples
Epoch 1/10
3974/3974 [==============================] - 3s 677us/sample - loss: -28.1519 - val_loss: -33.5864
Epoch 2/10
3974/3974 [==============================] - 1s 346us/sample - loss: -137258.8175 - val_loss: -3683802.1489
Epoch 3/10
3974/3974 [==============================] - 1s 344us/sample - loss: -14543022903.6056 - val_loss: -107811177469.9396
Epoch 4/10
3974/3974 [==============================] - 1s 363us/sample - loss: -3011718676570.7012 - val_loss: -13131454938476.6816
Epoch 5/10
3974/3974 [==============================] - 1s 350us/sample - loss: -101442605943572.4844 - val_loss: -322685056398605.9375
Epoch 6/10
3974/3974 [==============================] - 1s 344us/sample - loss: -1417424385529640.5000 - val_loss: -3687688508198145.5000
Epoch 7/10
3974/3974 [==============================] - 1s 358us/sample - loss: -11794297368126698.0000 - val_loss: -26632844827070784.0000
Epoch 8/10
3974/3974 [==============================] - 1s 339us/sample - loss: -69508229806130784.0000 - val_loss: -141312065640756336.0000
Epoch 9/10
3974/3974 [==============================] - 1s 345us/sample - loss: -319838384005810432.0000 - val_loss: -599553350073361152.0000
Epoch 10/10
3974/3974 [==============================] - 1s 342us/sample - loss: -1221653451351326464.0000 - val_loss: -2147128507956525312.0000


latent sample func:



def sampling(self,args):
"""Reparameterization trick by sampling fr an isotropic unit Gaussian.
# Arguments
args (tensor): mean and log of variance of Q(z|X)
# Returns
z (tensor): sampled latent vector
"""

z_mean, z_log_var = args
set = tf.shape(z_mean)[0]
batch = tf.shape(z_mean)[1]
dim = tf.shape(z_mean)[-1]
# by default, random_normal has mean=0 and std=1.0
epsilon = tf.random.normal(shape=(set, dim))#tfp.distributions.Normal(mean=tf.zeros(shape=(batch, dim)),loc=tf.ones(shape=(batch, dim)))
return z_mean + (z_log_var * epsilon)


Loss func:



def vae_loss(self,input, x_decoded_mean):
xent_loss = tf.reduce_mean(tf.keras.backend.binary_crossentropy(input, x_decoded_mean))
kl_loss = -0.5 * tf.reduce_sum(tf.square(self.encoded_mean) + tf.square(self.encoded_sigma) - tf.math.log(tf.square(self.encoded_sigma)) - 1, -1)
return xent_loss + kl_loss


Another vae_loss implementation:



def vae_loss(self,input, x_decoded_mean):
gen_loss = tf.reduce_sum(tf.keras.backend.binary_crossentropy(input, x_decoded_mean))
#gen_loss = tf.losses.mean_squared_error(input,x_decoded_mean)
kl_loss = -0.5 * tf.reduce_sum(1 + self.encoded_sigma - tf.square(self.encoded_mean) - tf.exp(self.encoded_sigma), -1)
return tf.reduce_mean(gen_loss + kl_loss)


log_sigma kl_loss:



kl_loss = 0.5 * tf.reduce_sum(tf.square(self.encoded_mean) + tf.square(tf.exp(self.encoded_sigma)) - self.encoded_sigma - 1, axis=-1)






python keras tensorflow loss-function autoencoder






share|improve this question









New contributor




Jed is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.











share|improve this question









New contributor




Jed is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.









share|improve this question




share|improve this question








edited 8 hours ago







Jed













New contributor




Jed is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.









asked 12 hours ago









JedJed

62




62




New contributor




Jed is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.





New contributor





Jed is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.






Jed is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.








  • 2




    $begingroup$
    Welcome to StackExchange.DS! KL loss must be minimized, I think it should be +0.5 to decrease the mean and std toward 0 and 1 respectively. Let me know if this was the problem
    $endgroup$
    – Esmailian
    10 hours ago












  • $begingroup$
    @Esmailian Thanks for the suggestion. Unfortunately, no that doesn't help. I think there is a problem with my implementation of KL divergence and/or the sampling. There are a lot of implementations where both or one of the loss components is negative: blog.keras.io/building-autoencoders-in-keras.html, jmetzen.github.io/2015-11-27/vae.html, etc. Not really sure where the difference lies, but I'm expecting that this is due to slight variations in the overall implementation
    $endgroup$
    – Jed
    8 hours ago






  • 2




    $begingroup$
    In Keras by Francois Chollet, the terms inside K.mean are negative of yours, that's why -0.5 works for them.
    $endgroup$
    – Esmailian
    8 hours ago








  • 2




    $begingroup$
    Also, another trick is that we let the network produce log(sigma) instead of sigma and then exponentiate it (the same as what Francois Chollet does) for stability. Take a look at side notes of this answer.
    $endgroup$
    – Esmailian
    8 hours ago










  • $begingroup$
    thanks, if you make the network generate log_sigma, then is you loss function going to work out to the above log_sigma kl_loss?
    $endgroup$
    – Jed
    8 hours ago














  • 2




    $begingroup$
    Welcome to StackExchange.DS! KL loss must be minimized, I think it should be +0.5 to decrease the mean and std toward 0 and 1 respectively. Let me know if this was the problem
    $endgroup$
    – Esmailian
    10 hours ago












  • $begingroup$
    @Esmailian Thanks for the suggestion. Unfortunately, no that doesn't help. I think there is a problem with my implementation of KL divergence and/or the sampling. There are a lot of implementations where both or one of the loss components is negative: blog.keras.io/building-autoencoders-in-keras.html, jmetzen.github.io/2015-11-27/vae.html, etc. Not really sure where the difference lies, but I'm expecting that this is due to slight variations in the overall implementation
    $endgroup$
    – Jed
    8 hours ago






  • 2




    $begingroup$
    In Keras by Francois Chollet, the terms inside K.mean are negative of yours, that's why -0.5 works for them.
    $endgroup$
    – Esmailian
    8 hours ago








  • 2




    $begingroup$
    Also, another trick is that we let the network produce log(sigma) instead of sigma and then exponentiate it (the same as what Francois Chollet does) for stability. Take a look at side notes of this answer.
    $endgroup$
    – Esmailian
    8 hours ago










  • $begingroup$
    thanks, if you make the network generate log_sigma, then is you loss function going to work out to the above log_sigma kl_loss?
    $endgroup$
    – Jed
    8 hours ago








2




2




$begingroup$
Welcome to StackExchange.DS! KL loss must be minimized, I think it should be +0.5 to decrease the mean and std toward 0 and 1 respectively. Let me know if this was the problem
$endgroup$
– Esmailian
10 hours ago






$begingroup$
Welcome to StackExchange.DS! KL loss must be minimized, I think it should be +0.5 to decrease the mean and std toward 0 and 1 respectively. Let me know if this was the problem
$endgroup$
– Esmailian
10 hours ago














$begingroup$
@Esmailian Thanks for the suggestion. Unfortunately, no that doesn't help. I think there is a problem with my implementation of KL divergence and/or the sampling. There are a lot of implementations where both or one of the loss components is negative: blog.keras.io/building-autoencoders-in-keras.html, jmetzen.github.io/2015-11-27/vae.html, etc. Not really sure where the difference lies, but I'm expecting that this is due to slight variations in the overall implementation
$endgroup$
– Jed
8 hours ago




$begingroup$
@Esmailian Thanks for the suggestion. Unfortunately, no that doesn't help. I think there is a problem with my implementation of KL divergence and/or the sampling. There are a lot of implementations where both or one of the loss components is negative: blog.keras.io/building-autoencoders-in-keras.html, jmetzen.github.io/2015-11-27/vae.html, etc. Not really sure where the difference lies, but I'm expecting that this is due to slight variations in the overall implementation
$endgroup$
– Jed
8 hours ago




2




2




$begingroup$
In Keras by Francois Chollet, the terms inside K.mean are negative of yours, that's why -0.5 works for them.
$endgroup$
– Esmailian
8 hours ago






$begingroup$
In Keras by Francois Chollet, the terms inside K.mean are negative of yours, that's why -0.5 works for them.
$endgroup$
– Esmailian
8 hours ago






2




2




$begingroup$
Also, another trick is that we let the network produce log(sigma) instead of sigma and then exponentiate it (the same as what Francois Chollet does) for stability. Take a look at side notes of this answer.
$endgroup$
– Esmailian
8 hours ago




$begingroup$
Also, another trick is that we let the network produce log(sigma) instead of sigma and then exponentiate it (the same as what Francois Chollet does) for stability. Take a look at side notes of this answer.
$endgroup$
– Esmailian
8 hours ago












$begingroup$
thanks, if you make the network generate log_sigma, then is you loss function going to work out to the above log_sigma kl_loss?
$endgroup$
– Jed
8 hours ago




$begingroup$
thanks, if you make the network generate log_sigma, then is you loss function going to work out to the above log_sigma kl_loss?
$endgroup$
– Jed
8 hours ago










0






active

oldest

votes












Your Answer





StackExchange.ifUsing("editor", function () {
return StackExchange.using("mathjaxEditing", function () {
StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix) {
StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\\(","\\)"]]);
});
});
}, "mathjax-editing");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "557"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});


}
});






Jed is a new contributor. Be nice, and check out our Code of Conduct.










draft saved

draft discarded


















StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f48697%2fvariational-autoencoder-giving-negative-loss%23new-answer', 'question_page');
}
);

Post as a guest















Required, but never shown

























0






active

oldest

votes








0






active

oldest

votes









active

oldest

votes






active

oldest

votes








Jed is a new contributor. Be nice, and check out our Code of Conduct.










draft saved

draft discarded


















Jed is a new contributor. Be nice, and check out our Code of Conduct.













Jed is a new contributor. Be nice, and check out our Code of Conduct.












Jed is a new contributor. Be nice, and check out our Code of Conduct.
















Thanks for contributing an answer to Data Science Stack Exchange!


  • Please be sure to answer the question. Provide details and share your research!

But avoid



  • Asking for help, clarification, or responding to other answers.

  • Making statements based on opinion; back them up with references or personal experience.


Use MathJax to format equations. MathJax reference.


To learn more, see our tips on writing great answers.




draft saved


draft discarded














StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f48697%2fvariational-autoencoder-giving-negative-loss%23new-answer', 'question_page');
}
);

Post as a guest















Required, but never shown





















































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown

































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown







Popular posts from this blog

How to label and detect the document text images

Vallis Paradisi

Tabula Rosettana