Variational AutoEncoder giving negative loss
$begingroup$
I'm learning about variational autoencoders and I've implemented a simple example in keras, model summary below. I've copied the loss function from one of Francois Chollet's blog posts and I'm getting really really negative losses. What am I missing here?
Model: "model_1"
__________________________________________________________________________________________________
Layer (type) Output Shape Param # Connected to
==================================================================================================
input_1 (InputLayer) [(None, 224)] 0
__________________________________________________________________________________________________
encoding_flatten (Flatten) (None, 224) 0 input_1[0][0]
__________________________________________________________________________________________________
encoding_layer_2 (Dense) (None, 256) 57600 encoding_flatten[0][0]
__________________________________________________________________________________________________
encoding_layer_3 (Dense) (None, 128) 32896 encoding_layer_2[0][0]
__________________________________________________________________________________________________
encoding_layer_4 (Dense) (None, 64) 8256 encoding_layer_3[0][0]
__________________________________________________________________________________________________
encoding_layer_5 (Dense) (None, 32) 2080 encoding_layer_4[0][0]
__________________________________________________________________________________________________
encoding_layer_6 (Dense) (None, 16) 528 encoding_layer_5[0][0]
__________________________________________________________________________________________________
encoder_mean (Dense) (None, 16) 272 encoding_layer_6[0][0]
__________________________________________________________________________________________________
encoder_sigma (Dense) (None, 16) 272 encoding_layer_6[0][0]
__________________________________________________________________________________________________
lambda (Lambda) (None, 16) 0 encoder_mean[0][0]
encoder_sigma[0][0]
__________________________________________________________________________________________________
decoder_layer_1 (Dense) (None, 16) 272 lambda[0][0]
__________________________________________________________________________________________________
decoder_layer_2 (Dense) (None, 32) 544 decoder_layer_1[0][0]
__________________________________________________________________________________________________
decoder_layer_3 (Dense) (None, 64) 2112 decoder_layer_2[0][0]
__________________________________________________________________________________________________
decoder_layer_4 (Dense) (None, 128) 8320 decoder_layer_3[0][0]
__________________________________________________________________________________________________
decoder_layer_5 (Dense) (None, 256) 33024 decoder_layer_4[0][0]
__________________________________________________________________________________________________
decoder_mean (Dense) (None, 224) 57568 decoder_layer_5[0][0]
==================================================================================================
Total params: 203,744
Trainable params: 203,744
Non-trainable params: 0
__________________________________________________________________________________________________
Train on 3974 samples, validate on 994 samples
Epoch 1/10
3974/3974 [==============================] - 3s 677us/sample - loss: -28.1519 - val_loss: -33.5864
Epoch 2/10
3974/3974 [==============================] - 1s 346us/sample - loss: -137258.8175 - val_loss: -3683802.1489
Epoch 3/10
3974/3974 [==============================] - 1s 344us/sample - loss: -14543022903.6056 - val_loss: -107811177469.9396
Epoch 4/10
3974/3974 [==============================] - 1s 363us/sample - loss: -3011718676570.7012 - val_loss: -13131454938476.6816
Epoch 5/10
3974/3974 [==============================] - 1s 350us/sample - loss: -101442605943572.4844 - val_loss: -322685056398605.9375
Epoch 6/10
3974/3974 [==============================] - 1s 344us/sample - loss: -1417424385529640.5000 - val_loss: -3687688508198145.5000
Epoch 7/10
3974/3974 [==============================] - 1s 358us/sample - loss: -11794297368126698.0000 - val_loss: -26632844827070784.0000
Epoch 8/10
3974/3974 [==============================] - 1s 339us/sample - loss: -69508229806130784.0000 - val_loss: -141312065640756336.0000
Epoch 9/10
3974/3974 [==============================] - 1s 345us/sample - loss: -319838384005810432.0000 - val_loss: -599553350073361152.0000
Epoch 10/10
3974/3974 [==============================] - 1s 342us/sample - loss: -1221653451351326464.0000 - val_loss: -2147128507956525312.0000
latent sample func:
def sampling(self,args):
"""Reparameterization trick by sampling fr an isotropic unit Gaussian.
# Arguments
args (tensor): mean and log of variance of Q(z|X)
# Returns
z (tensor): sampled latent vector
"""
z_mean, z_log_var = args
set = tf.shape(z_mean)[0]
batch = tf.shape(z_mean)[1]
dim = tf.shape(z_mean)[-1]
# by default, random_normal has mean=0 and std=1.0
epsilon = tf.random.normal(shape=(set, dim))#tfp.distributions.Normal(mean=tf.zeros(shape=(batch, dim)),loc=tf.ones(shape=(batch, dim)))
return z_mean + (z_log_var * epsilon)
Loss func:
def vae_loss(self,input, x_decoded_mean):
xent_loss = tf.reduce_mean(tf.keras.backend.binary_crossentropy(input, x_decoded_mean))
kl_loss = -0.5 * tf.reduce_sum(tf.square(self.encoded_mean) + tf.square(self.encoded_sigma) - tf.math.log(tf.square(self.encoded_sigma)) - 1, -1)
return xent_loss + kl_loss
Another vae_loss implementation:
def vae_loss(self,input, x_decoded_mean):
gen_loss = tf.reduce_sum(tf.keras.backend.binary_crossentropy(input, x_decoded_mean))
#gen_loss = tf.losses.mean_squared_error(input,x_decoded_mean)
kl_loss = -0.5 * tf.reduce_sum(1 + self.encoded_sigma - tf.square(self.encoded_mean) - tf.exp(self.encoded_sigma), -1)
return tf.reduce_mean(gen_loss + kl_loss)
log_sigma kl_loss:
kl_loss = 0.5 * tf.reduce_sum(tf.square(self.encoded_mean) + tf.square(tf.exp(self.encoded_sigma)) - self.encoded_sigma - 1, axis=-1)
python keras tensorflow loss-function autoencoder
New contributor
$endgroup$
|
show 2 more comments
$begingroup$
I'm learning about variational autoencoders and I've implemented a simple example in keras, model summary below. I've copied the loss function from one of Francois Chollet's blog posts and I'm getting really really negative losses. What am I missing here?
Model: "model_1"
__________________________________________________________________________________________________
Layer (type) Output Shape Param # Connected to
==================================================================================================
input_1 (InputLayer) [(None, 224)] 0
__________________________________________________________________________________________________
encoding_flatten (Flatten) (None, 224) 0 input_1[0][0]
__________________________________________________________________________________________________
encoding_layer_2 (Dense) (None, 256) 57600 encoding_flatten[0][0]
__________________________________________________________________________________________________
encoding_layer_3 (Dense) (None, 128) 32896 encoding_layer_2[0][0]
__________________________________________________________________________________________________
encoding_layer_4 (Dense) (None, 64) 8256 encoding_layer_3[0][0]
__________________________________________________________________________________________________
encoding_layer_5 (Dense) (None, 32) 2080 encoding_layer_4[0][0]
__________________________________________________________________________________________________
encoding_layer_6 (Dense) (None, 16) 528 encoding_layer_5[0][0]
__________________________________________________________________________________________________
encoder_mean (Dense) (None, 16) 272 encoding_layer_6[0][0]
__________________________________________________________________________________________________
encoder_sigma (Dense) (None, 16) 272 encoding_layer_6[0][0]
__________________________________________________________________________________________________
lambda (Lambda) (None, 16) 0 encoder_mean[0][0]
encoder_sigma[0][0]
__________________________________________________________________________________________________
decoder_layer_1 (Dense) (None, 16) 272 lambda[0][0]
__________________________________________________________________________________________________
decoder_layer_2 (Dense) (None, 32) 544 decoder_layer_1[0][0]
__________________________________________________________________________________________________
decoder_layer_3 (Dense) (None, 64) 2112 decoder_layer_2[0][0]
__________________________________________________________________________________________________
decoder_layer_4 (Dense) (None, 128) 8320 decoder_layer_3[0][0]
__________________________________________________________________________________________________
decoder_layer_5 (Dense) (None, 256) 33024 decoder_layer_4[0][0]
__________________________________________________________________________________________________
decoder_mean (Dense) (None, 224) 57568 decoder_layer_5[0][0]
==================================================================================================
Total params: 203,744
Trainable params: 203,744
Non-trainable params: 0
__________________________________________________________________________________________________
Train on 3974 samples, validate on 994 samples
Epoch 1/10
3974/3974 [==============================] - 3s 677us/sample - loss: -28.1519 - val_loss: -33.5864
Epoch 2/10
3974/3974 [==============================] - 1s 346us/sample - loss: -137258.8175 - val_loss: -3683802.1489
Epoch 3/10
3974/3974 [==============================] - 1s 344us/sample - loss: -14543022903.6056 - val_loss: -107811177469.9396
Epoch 4/10
3974/3974 [==============================] - 1s 363us/sample - loss: -3011718676570.7012 - val_loss: -13131454938476.6816
Epoch 5/10
3974/3974 [==============================] - 1s 350us/sample - loss: -101442605943572.4844 - val_loss: -322685056398605.9375
Epoch 6/10
3974/3974 [==============================] - 1s 344us/sample - loss: -1417424385529640.5000 - val_loss: -3687688508198145.5000
Epoch 7/10
3974/3974 [==============================] - 1s 358us/sample - loss: -11794297368126698.0000 - val_loss: -26632844827070784.0000
Epoch 8/10
3974/3974 [==============================] - 1s 339us/sample - loss: -69508229806130784.0000 - val_loss: -141312065640756336.0000
Epoch 9/10
3974/3974 [==============================] - 1s 345us/sample - loss: -319838384005810432.0000 - val_loss: -599553350073361152.0000
Epoch 10/10
3974/3974 [==============================] - 1s 342us/sample - loss: -1221653451351326464.0000 - val_loss: -2147128507956525312.0000
latent sample func:
def sampling(self,args):
"""Reparameterization trick by sampling fr an isotropic unit Gaussian.
# Arguments
args (tensor): mean and log of variance of Q(z|X)
# Returns
z (tensor): sampled latent vector
"""
z_mean, z_log_var = args
set = tf.shape(z_mean)[0]
batch = tf.shape(z_mean)[1]
dim = tf.shape(z_mean)[-1]
# by default, random_normal has mean=0 and std=1.0
epsilon = tf.random.normal(shape=(set, dim))#tfp.distributions.Normal(mean=tf.zeros(shape=(batch, dim)),loc=tf.ones(shape=(batch, dim)))
return z_mean + (z_log_var * epsilon)
Loss func:
def vae_loss(self,input, x_decoded_mean):
xent_loss = tf.reduce_mean(tf.keras.backend.binary_crossentropy(input, x_decoded_mean))
kl_loss = -0.5 * tf.reduce_sum(tf.square(self.encoded_mean) + tf.square(self.encoded_sigma) - tf.math.log(tf.square(self.encoded_sigma)) - 1, -1)
return xent_loss + kl_loss
Another vae_loss implementation:
def vae_loss(self,input, x_decoded_mean):
gen_loss = tf.reduce_sum(tf.keras.backend.binary_crossentropy(input, x_decoded_mean))
#gen_loss = tf.losses.mean_squared_error(input,x_decoded_mean)
kl_loss = -0.5 * tf.reduce_sum(1 + self.encoded_sigma - tf.square(self.encoded_mean) - tf.exp(self.encoded_sigma), -1)
return tf.reduce_mean(gen_loss + kl_loss)
log_sigma kl_loss:
kl_loss = 0.5 * tf.reduce_sum(tf.square(self.encoded_mean) + tf.square(tf.exp(self.encoded_sigma)) - self.encoded_sigma - 1, axis=-1)
python keras tensorflow loss-function autoencoder
New contributor
$endgroup$
2
$begingroup$
Welcome to StackExchange.DS! KL loss must be minimized, I think it should be+0.5
to decrease the mean and std toward 0 and 1 respectively. Let me know if this was the problem
$endgroup$
– Esmailian
10 hours ago
$begingroup$
@Esmailian Thanks for the suggestion. Unfortunately, no that doesn't help. I think there is a problem with my implementation of KL divergence and/or the sampling. There are a lot of implementations where both or one of the loss components is negative: blog.keras.io/building-autoencoders-in-keras.html, jmetzen.github.io/2015-11-27/vae.html, etc. Not really sure where the difference lies, but I'm expecting that this is due to slight variations in the overall implementation
$endgroup$
– Jed
8 hours ago
2
$begingroup$
In Keras by Francois Chollet, the terms insideK.mean
are negative of yours, that's why-0.5
works for them.
$endgroup$
– Esmailian
8 hours ago
2
$begingroup$
Also, another trick is that we let the network producelog(sigma)
instead ofsigma
and then exponentiate it (the same as what Francois Chollet does) for stability. Take a look at side notes of this answer.
$endgroup$
– Esmailian
8 hours ago
$begingroup$
thanks, if you make the network generate log_sigma, then is you loss function going to work out to the above log_sigma kl_loss?
$endgroup$
– Jed
8 hours ago
|
show 2 more comments
$begingroup$
I'm learning about variational autoencoders and I've implemented a simple example in keras, model summary below. I've copied the loss function from one of Francois Chollet's blog posts and I'm getting really really negative losses. What am I missing here?
Model: "model_1"
__________________________________________________________________________________________________
Layer (type) Output Shape Param # Connected to
==================================================================================================
input_1 (InputLayer) [(None, 224)] 0
__________________________________________________________________________________________________
encoding_flatten (Flatten) (None, 224) 0 input_1[0][0]
__________________________________________________________________________________________________
encoding_layer_2 (Dense) (None, 256) 57600 encoding_flatten[0][0]
__________________________________________________________________________________________________
encoding_layer_3 (Dense) (None, 128) 32896 encoding_layer_2[0][0]
__________________________________________________________________________________________________
encoding_layer_4 (Dense) (None, 64) 8256 encoding_layer_3[0][0]
__________________________________________________________________________________________________
encoding_layer_5 (Dense) (None, 32) 2080 encoding_layer_4[0][0]
__________________________________________________________________________________________________
encoding_layer_6 (Dense) (None, 16) 528 encoding_layer_5[0][0]
__________________________________________________________________________________________________
encoder_mean (Dense) (None, 16) 272 encoding_layer_6[0][0]
__________________________________________________________________________________________________
encoder_sigma (Dense) (None, 16) 272 encoding_layer_6[0][0]
__________________________________________________________________________________________________
lambda (Lambda) (None, 16) 0 encoder_mean[0][0]
encoder_sigma[0][0]
__________________________________________________________________________________________________
decoder_layer_1 (Dense) (None, 16) 272 lambda[0][0]
__________________________________________________________________________________________________
decoder_layer_2 (Dense) (None, 32) 544 decoder_layer_1[0][0]
__________________________________________________________________________________________________
decoder_layer_3 (Dense) (None, 64) 2112 decoder_layer_2[0][0]
__________________________________________________________________________________________________
decoder_layer_4 (Dense) (None, 128) 8320 decoder_layer_3[0][0]
__________________________________________________________________________________________________
decoder_layer_5 (Dense) (None, 256) 33024 decoder_layer_4[0][0]
__________________________________________________________________________________________________
decoder_mean (Dense) (None, 224) 57568 decoder_layer_5[0][0]
==================================================================================================
Total params: 203,744
Trainable params: 203,744
Non-trainable params: 0
__________________________________________________________________________________________________
Train on 3974 samples, validate on 994 samples
Epoch 1/10
3974/3974 [==============================] - 3s 677us/sample - loss: -28.1519 - val_loss: -33.5864
Epoch 2/10
3974/3974 [==============================] - 1s 346us/sample - loss: -137258.8175 - val_loss: -3683802.1489
Epoch 3/10
3974/3974 [==============================] - 1s 344us/sample - loss: -14543022903.6056 - val_loss: -107811177469.9396
Epoch 4/10
3974/3974 [==============================] - 1s 363us/sample - loss: -3011718676570.7012 - val_loss: -13131454938476.6816
Epoch 5/10
3974/3974 [==============================] - 1s 350us/sample - loss: -101442605943572.4844 - val_loss: -322685056398605.9375
Epoch 6/10
3974/3974 [==============================] - 1s 344us/sample - loss: -1417424385529640.5000 - val_loss: -3687688508198145.5000
Epoch 7/10
3974/3974 [==============================] - 1s 358us/sample - loss: -11794297368126698.0000 - val_loss: -26632844827070784.0000
Epoch 8/10
3974/3974 [==============================] - 1s 339us/sample - loss: -69508229806130784.0000 - val_loss: -141312065640756336.0000
Epoch 9/10
3974/3974 [==============================] - 1s 345us/sample - loss: -319838384005810432.0000 - val_loss: -599553350073361152.0000
Epoch 10/10
3974/3974 [==============================] - 1s 342us/sample - loss: -1221653451351326464.0000 - val_loss: -2147128507956525312.0000
latent sample func:
def sampling(self,args):
"""Reparameterization trick by sampling fr an isotropic unit Gaussian.
# Arguments
args (tensor): mean and log of variance of Q(z|X)
# Returns
z (tensor): sampled latent vector
"""
z_mean, z_log_var = args
set = tf.shape(z_mean)[0]
batch = tf.shape(z_mean)[1]
dim = tf.shape(z_mean)[-1]
# by default, random_normal has mean=0 and std=1.0
epsilon = tf.random.normal(shape=(set, dim))#tfp.distributions.Normal(mean=tf.zeros(shape=(batch, dim)),loc=tf.ones(shape=(batch, dim)))
return z_mean + (z_log_var * epsilon)
Loss func:
def vae_loss(self,input, x_decoded_mean):
xent_loss = tf.reduce_mean(tf.keras.backend.binary_crossentropy(input, x_decoded_mean))
kl_loss = -0.5 * tf.reduce_sum(tf.square(self.encoded_mean) + tf.square(self.encoded_sigma) - tf.math.log(tf.square(self.encoded_sigma)) - 1, -1)
return xent_loss + kl_loss
Another vae_loss implementation:
def vae_loss(self,input, x_decoded_mean):
gen_loss = tf.reduce_sum(tf.keras.backend.binary_crossentropy(input, x_decoded_mean))
#gen_loss = tf.losses.mean_squared_error(input,x_decoded_mean)
kl_loss = -0.5 * tf.reduce_sum(1 + self.encoded_sigma - tf.square(self.encoded_mean) - tf.exp(self.encoded_sigma), -1)
return tf.reduce_mean(gen_loss + kl_loss)
log_sigma kl_loss:
kl_loss = 0.5 * tf.reduce_sum(tf.square(self.encoded_mean) + tf.square(tf.exp(self.encoded_sigma)) - self.encoded_sigma - 1, axis=-1)
python keras tensorflow loss-function autoencoder
New contributor
$endgroup$
I'm learning about variational autoencoders and I've implemented a simple example in keras, model summary below. I've copied the loss function from one of Francois Chollet's blog posts and I'm getting really really negative losses. What am I missing here?
Model: "model_1"
__________________________________________________________________________________________________
Layer (type) Output Shape Param # Connected to
==================================================================================================
input_1 (InputLayer) [(None, 224)] 0
__________________________________________________________________________________________________
encoding_flatten (Flatten) (None, 224) 0 input_1[0][0]
__________________________________________________________________________________________________
encoding_layer_2 (Dense) (None, 256) 57600 encoding_flatten[0][0]
__________________________________________________________________________________________________
encoding_layer_3 (Dense) (None, 128) 32896 encoding_layer_2[0][0]
__________________________________________________________________________________________________
encoding_layer_4 (Dense) (None, 64) 8256 encoding_layer_3[0][0]
__________________________________________________________________________________________________
encoding_layer_5 (Dense) (None, 32) 2080 encoding_layer_4[0][0]
__________________________________________________________________________________________________
encoding_layer_6 (Dense) (None, 16) 528 encoding_layer_5[0][0]
__________________________________________________________________________________________________
encoder_mean (Dense) (None, 16) 272 encoding_layer_6[0][0]
__________________________________________________________________________________________________
encoder_sigma (Dense) (None, 16) 272 encoding_layer_6[0][0]
__________________________________________________________________________________________________
lambda (Lambda) (None, 16) 0 encoder_mean[0][0]
encoder_sigma[0][0]
__________________________________________________________________________________________________
decoder_layer_1 (Dense) (None, 16) 272 lambda[0][0]
__________________________________________________________________________________________________
decoder_layer_2 (Dense) (None, 32) 544 decoder_layer_1[0][0]
__________________________________________________________________________________________________
decoder_layer_3 (Dense) (None, 64) 2112 decoder_layer_2[0][0]
__________________________________________________________________________________________________
decoder_layer_4 (Dense) (None, 128) 8320 decoder_layer_3[0][0]
__________________________________________________________________________________________________
decoder_layer_5 (Dense) (None, 256) 33024 decoder_layer_4[0][0]
__________________________________________________________________________________________________
decoder_mean (Dense) (None, 224) 57568 decoder_layer_5[0][0]
==================================================================================================
Total params: 203,744
Trainable params: 203,744
Non-trainable params: 0
__________________________________________________________________________________________________
Train on 3974 samples, validate on 994 samples
Epoch 1/10
3974/3974 [==============================] - 3s 677us/sample - loss: -28.1519 - val_loss: -33.5864
Epoch 2/10
3974/3974 [==============================] - 1s 346us/sample - loss: -137258.8175 - val_loss: -3683802.1489
Epoch 3/10
3974/3974 [==============================] - 1s 344us/sample - loss: -14543022903.6056 - val_loss: -107811177469.9396
Epoch 4/10
3974/3974 [==============================] - 1s 363us/sample - loss: -3011718676570.7012 - val_loss: -13131454938476.6816
Epoch 5/10
3974/3974 [==============================] - 1s 350us/sample - loss: -101442605943572.4844 - val_loss: -322685056398605.9375
Epoch 6/10
3974/3974 [==============================] - 1s 344us/sample - loss: -1417424385529640.5000 - val_loss: -3687688508198145.5000
Epoch 7/10
3974/3974 [==============================] - 1s 358us/sample - loss: -11794297368126698.0000 - val_loss: -26632844827070784.0000
Epoch 8/10
3974/3974 [==============================] - 1s 339us/sample - loss: -69508229806130784.0000 - val_loss: -141312065640756336.0000
Epoch 9/10
3974/3974 [==============================] - 1s 345us/sample - loss: -319838384005810432.0000 - val_loss: -599553350073361152.0000
Epoch 10/10
3974/3974 [==============================] - 1s 342us/sample - loss: -1221653451351326464.0000 - val_loss: -2147128507956525312.0000
latent sample func:
def sampling(self,args):
"""Reparameterization trick by sampling fr an isotropic unit Gaussian.
# Arguments
args (tensor): mean and log of variance of Q(z|X)
# Returns
z (tensor): sampled latent vector
"""
z_mean, z_log_var = args
set = tf.shape(z_mean)[0]
batch = tf.shape(z_mean)[1]
dim = tf.shape(z_mean)[-1]
# by default, random_normal has mean=0 and std=1.0
epsilon = tf.random.normal(shape=(set, dim))#tfp.distributions.Normal(mean=tf.zeros(shape=(batch, dim)),loc=tf.ones(shape=(batch, dim)))
return z_mean + (z_log_var * epsilon)
Loss func:
def vae_loss(self,input, x_decoded_mean):
xent_loss = tf.reduce_mean(tf.keras.backend.binary_crossentropy(input, x_decoded_mean))
kl_loss = -0.5 * tf.reduce_sum(tf.square(self.encoded_mean) + tf.square(self.encoded_sigma) - tf.math.log(tf.square(self.encoded_sigma)) - 1, -1)
return xent_loss + kl_loss
Another vae_loss implementation:
def vae_loss(self,input, x_decoded_mean):
gen_loss = tf.reduce_sum(tf.keras.backend.binary_crossentropy(input, x_decoded_mean))
#gen_loss = tf.losses.mean_squared_error(input,x_decoded_mean)
kl_loss = -0.5 * tf.reduce_sum(1 + self.encoded_sigma - tf.square(self.encoded_mean) - tf.exp(self.encoded_sigma), -1)
return tf.reduce_mean(gen_loss + kl_loss)
log_sigma kl_loss:
kl_loss = 0.5 * tf.reduce_sum(tf.square(self.encoded_mean) + tf.square(tf.exp(self.encoded_sigma)) - self.encoded_sigma - 1, axis=-1)
python keras tensorflow loss-function autoencoder
python keras tensorflow loss-function autoencoder
New contributor
New contributor
edited 8 hours ago
Jed
New contributor
asked 12 hours ago
JedJed
62
62
New contributor
New contributor
2
$begingroup$
Welcome to StackExchange.DS! KL loss must be minimized, I think it should be+0.5
to decrease the mean and std toward 0 and 1 respectively. Let me know if this was the problem
$endgroup$
– Esmailian
10 hours ago
$begingroup$
@Esmailian Thanks for the suggestion. Unfortunately, no that doesn't help. I think there is a problem with my implementation of KL divergence and/or the sampling. There are a lot of implementations where both or one of the loss components is negative: blog.keras.io/building-autoencoders-in-keras.html, jmetzen.github.io/2015-11-27/vae.html, etc. Not really sure where the difference lies, but I'm expecting that this is due to slight variations in the overall implementation
$endgroup$
– Jed
8 hours ago
2
$begingroup$
In Keras by Francois Chollet, the terms insideK.mean
are negative of yours, that's why-0.5
works for them.
$endgroup$
– Esmailian
8 hours ago
2
$begingroup$
Also, another trick is that we let the network producelog(sigma)
instead ofsigma
and then exponentiate it (the same as what Francois Chollet does) for stability. Take a look at side notes of this answer.
$endgroup$
– Esmailian
8 hours ago
$begingroup$
thanks, if you make the network generate log_sigma, then is you loss function going to work out to the above log_sigma kl_loss?
$endgroup$
– Jed
8 hours ago
|
show 2 more comments
2
$begingroup$
Welcome to StackExchange.DS! KL loss must be minimized, I think it should be+0.5
to decrease the mean and std toward 0 and 1 respectively. Let me know if this was the problem
$endgroup$
– Esmailian
10 hours ago
$begingroup$
@Esmailian Thanks for the suggestion. Unfortunately, no that doesn't help. I think there is a problem with my implementation of KL divergence and/or the sampling. There are a lot of implementations where both or one of the loss components is negative: blog.keras.io/building-autoencoders-in-keras.html, jmetzen.github.io/2015-11-27/vae.html, etc. Not really sure where the difference lies, but I'm expecting that this is due to slight variations in the overall implementation
$endgroup$
– Jed
8 hours ago
2
$begingroup$
In Keras by Francois Chollet, the terms insideK.mean
are negative of yours, that's why-0.5
works for them.
$endgroup$
– Esmailian
8 hours ago
2
$begingroup$
Also, another trick is that we let the network producelog(sigma)
instead ofsigma
and then exponentiate it (the same as what Francois Chollet does) for stability. Take a look at side notes of this answer.
$endgroup$
– Esmailian
8 hours ago
$begingroup$
thanks, if you make the network generate log_sigma, then is you loss function going to work out to the above log_sigma kl_loss?
$endgroup$
– Jed
8 hours ago
2
2
$begingroup$
Welcome to StackExchange.DS! KL loss must be minimized, I think it should be
+0.5
to decrease the mean and std toward 0 and 1 respectively. Let me know if this was the problem$endgroup$
– Esmailian
10 hours ago
$begingroup$
Welcome to StackExchange.DS! KL loss must be minimized, I think it should be
+0.5
to decrease the mean and std toward 0 and 1 respectively. Let me know if this was the problem$endgroup$
– Esmailian
10 hours ago
$begingroup$
@Esmailian Thanks for the suggestion. Unfortunately, no that doesn't help. I think there is a problem with my implementation of KL divergence and/or the sampling. There are a lot of implementations where both or one of the loss components is negative: blog.keras.io/building-autoencoders-in-keras.html, jmetzen.github.io/2015-11-27/vae.html, etc. Not really sure where the difference lies, but I'm expecting that this is due to slight variations in the overall implementation
$endgroup$
– Jed
8 hours ago
$begingroup$
@Esmailian Thanks for the suggestion. Unfortunately, no that doesn't help. I think there is a problem with my implementation of KL divergence and/or the sampling. There are a lot of implementations where both or one of the loss components is negative: blog.keras.io/building-autoencoders-in-keras.html, jmetzen.github.io/2015-11-27/vae.html, etc. Not really sure where the difference lies, but I'm expecting that this is due to slight variations in the overall implementation
$endgroup$
– Jed
8 hours ago
2
2
$begingroup$
In Keras by Francois Chollet, the terms inside
K.mean
are negative of yours, that's why -0.5
works for them.$endgroup$
– Esmailian
8 hours ago
$begingroup$
In Keras by Francois Chollet, the terms inside
K.mean
are negative of yours, that's why -0.5
works for them.$endgroup$
– Esmailian
8 hours ago
2
2
$begingroup$
Also, another trick is that we let the network produce
log(sigma)
instead of sigma
and then exponentiate it (the same as what Francois Chollet does) for stability. Take a look at side notes of this answer.$endgroup$
– Esmailian
8 hours ago
$begingroup$
Also, another trick is that we let the network produce
log(sigma)
instead of sigma
and then exponentiate it (the same as what Francois Chollet does) for stability. Take a look at side notes of this answer.$endgroup$
– Esmailian
8 hours ago
$begingroup$
thanks, if you make the network generate log_sigma, then is you loss function going to work out to the above log_sigma kl_loss?
$endgroup$
– Jed
8 hours ago
$begingroup$
thanks, if you make the network generate log_sigma, then is you loss function going to work out to the above log_sigma kl_loss?
$endgroup$
– Jed
8 hours ago
|
show 2 more comments
0
active
oldest
votes
Your Answer
StackExchange.ifUsing("editor", function () {
return StackExchange.using("mathjaxEditing", function () {
StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix) {
StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\\(","\\)"]]);
});
});
}, "mathjax-editing");
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "557"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Jed is a new contributor. Be nice, and check out our Code of Conduct.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f48697%2fvariational-autoencoder-giving-negative-loss%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
0
active
oldest
votes
0
active
oldest
votes
active
oldest
votes
active
oldest
votes
Jed is a new contributor. Be nice, and check out our Code of Conduct.
Jed is a new contributor. Be nice, and check out our Code of Conduct.
Jed is a new contributor. Be nice, and check out our Code of Conduct.
Jed is a new contributor. Be nice, and check out our Code of Conduct.
Thanks for contributing an answer to Data Science Stack Exchange!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
Use MathJax to format equations. MathJax reference.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f48697%2fvariational-autoencoder-giving-negative-loss%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
2
$begingroup$
Welcome to StackExchange.DS! KL loss must be minimized, I think it should be
+0.5
to decrease the mean and std toward 0 and 1 respectively. Let me know if this was the problem$endgroup$
– Esmailian
10 hours ago
$begingroup$
@Esmailian Thanks for the suggestion. Unfortunately, no that doesn't help. I think there is a problem with my implementation of KL divergence and/or the sampling. There are a lot of implementations where both or one of the loss components is negative: blog.keras.io/building-autoencoders-in-keras.html, jmetzen.github.io/2015-11-27/vae.html, etc. Not really sure where the difference lies, but I'm expecting that this is due to slight variations in the overall implementation
$endgroup$
– Jed
8 hours ago
2
$begingroup$
In Keras by Francois Chollet, the terms inside
K.mean
are negative of yours, that's why-0.5
works for them.$endgroup$
– Esmailian
8 hours ago
2
$begingroup$
Also, another trick is that we let the network produce
log(sigma)
instead ofsigma
and then exponentiate it (the same as what Francois Chollet does) for stability. Take a look at side notes of this answer.$endgroup$
– Esmailian
8 hours ago
$begingroup$
thanks, if you make the network generate log_sigma, then is you loss function going to work out to the above log_sigma kl_loss?
$endgroup$
– Jed
8 hours ago