Variational AutoEncoder giving negative loss

I'm learning about variational autoencoders and I've implemented a simple example in keras, model summary below. I've copied the loss function from one of Francois Chollet's blog posts and I'm getting really really negative losses. What am I missing here?

    Model: "model_1"

__________________________________________________________________________________________________

Layer (type)                    Output Shape         Param #     Connected to

==================================================================================================

input_1 (InputLayer)            [(None, 224)]        0

__________________________________________________________________________________________________

encoding_flatten (Flatten)      (None, 224)          0           input_1[0][0]

__________________________________________________________________________________________________

encoding_layer_2 (Dense)        (None, 256)          57600       encoding_flatten[0][0]

__________________________________________________________________________________________________

encoding_layer_3 (Dense)        (None, 128)          32896       encoding_layer_2[0][0]

__________________________________________________________________________________________________

encoding_layer_4 (Dense)        (None, 64)           8256        encoding_layer_3[0][0]

__________________________________________________________________________________________________

encoding_layer_5 (Dense)        (None, 32)           2080        encoding_layer_4[0][0]

__________________________________________________________________________________________________

encoding_layer_6 (Dense)        (None, 16)           528         encoding_layer_5[0][0]

__________________________________________________________________________________________________

encoder_mean (Dense)            (None, 16)           272         encoding_layer_6[0][0]

__________________________________________________________________________________________________

encoder_sigma (Dense)           (None, 16)           272         encoding_layer_6[0][0]

__________________________________________________________________________________________________

lambda (Lambda)                 (None, 16)           0           encoder_mean[0][0]

                                                                 encoder_sigma[0][0]

__________________________________________________________________________________________________

decoder_layer_1 (Dense)         (None, 16)           272         lambda[0][0]

__________________________________________________________________________________________________

decoder_layer_2 (Dense)         (None, 32)           544         decoder_layer_1[0][0]

__________________________________________________________________________________________________

decoder_layer_3 (Dense)         (None, 64)           2112        decoder_layer_2[0][0]

__________________________________________________________________________________________________

decoder_layer_4 (Dense)         (None, 128)          8320        decoder_layer_3[0][0]

__________________________________________________________________________________________________

decoder_layer_5 (Dense)         (None, 256)          33024       decoder_layer_4[0][0]

__________________________________________________________________________________________________

decoder_mean (Dense)            (None, 224)          57568       decoder_layer_5[0][0]

==================================================================================================

Total params: 203,744

Trainable params: 203,744

Non-trainable params: 0

__________________________________________________________________________________________________

Train on 3974 samples, validate on 994 samples

Epoch 1/10

3974/3974 [==============================] - 3s 677us/sample - loss: -28.1519 - val_loss: -33.5864

Epoch 2/10

3974/3974 [==============================] - 1s 346us/sample - loss: -137258.8175 - val_loss: -3683802.1489

Epoch 3/10

3974/3974 [==============================] - 1s 344us/sample - loss: -14543022903.6056 - val_loss: -107811177469.9396

Epoch 4/10

3974/3974 [==============================] - 1s 363us/sample - loss: -3011718676570.7012 - val_loss: -13131454938476.6816

Epoch 5/10

3974/3974 [==============================] - 1s 350us/sample - loss: -101442605943572.4844 - val_loss: -322685056398605.9375

Epoch 6/10

3974/3974 [==============================] - 1s 344us/sample - loss: -1417424385529640.5000 - val_loss: -3687688508198145.5000

Epoch 7/10

3974/3974 [==============================] - 1s 358us/sample - loss: -11794297368126698.0000 - val_loss: -26632844827070784.0000

Epoch 8/10

3974/3974 [==============================] - 1s 339us/sample - loss: -69508229806130784.0000 - val_loss: -141312065640756336.0000

Epoch 9/10

3974/3974 [==============================] - 1s 345us/sample - loss: -319838384005810432.0000 - val_loss: -599553350073361152.0000

Epoch 10/10

3974/3974 [==============================] - 1s 342us/sample - loss: -1221653451351326464.0000 - val_loss: -2147128507956525312.0000

latent sample func:

def sampling(self,args):

    """Reparameterization trick by sampling fr an isotropic unit Gaussian.

    # Arguments

        args (tensor): mean and log of variance of Q(z|X)

    # Returns

        z (tensor): sampled latent vector

    """



    z_mean, z_log_var = args

    set = tf.shape(z_mean)[0]

    batch = tf.shape(z_mean)[1]

    dim = tf.shape(z_mean)[-1]

    # by default, random_normal has mean=0 and std=1.0

    epsilon = tf.random.normal(shape=(set, dim))#tfp.distributions.Normal(mean=tf.zeros(shape=(batch, dim)),loc=tf.ones(shape=(batch, dim)))

    return z_mean + (z_log_var * epsilon)

Loss func:

def vae_loss(self,input, x_decoded_mean):

    xent_loss = tf.reduce_mean(tf.keras.backend.binary_crossentropy(input, x_decoded_mean))

    kl_loss = -0.5 * tf.reduce_sum(tf.square(self.encoded_mean) + tf.square(self.encoded_sigma) - tf.math.log(tf.square(self.encoded_sigma)) - 1, -1)

    return xent_loss + kl_loss

Another vae_loss implementation:

def vae_loss(self,input, x_decoded_mean):

    gen_loss = tf.reduce_sum(tf.keras.backend.binary_crossentropy(input, x_decoded_mean))

    #gen_loss = tf.losses.mean_squared_error(input,x_decoded_mean)

    kl_loss = -0.5 * tf.reduce_sum(1 + self.encoded_sigma - tf.square(self.encoded_mean) - tf.exp(self.encoded_sigma), -1)

    return tf.reduce_mean(gen_loss + kl_loss)

log_sigma kl_loss:

kl_loss = 0.5 * tf.reduce_sum(tf.square(self.encoded_mean) + tf.square(tf.exp(self.encoded_sigma)) - self.encoded_sigma - 1, axis=-1)

edited 8 hours ago

asked 12 hours ago

Jed

New contributor

2

$begingroup$
Welcome to StackExchange.DS! KL loss must be minimized, I think it should be +0.5 to decrease the mean and std toward 0 and 1 respectively. Let me know if this was the problem
$endgroup$
– Esmailian
10 hours ago

$begingroup$
@Esmailian Thanks for the suggestion. Unfortunately, no that doesn't help. I think there is a problem with my implementation of KL divergence and/or the sampling. There are a lot of implementations where both or one of the loss components is negative: blog.keras.io/building-autoencoders-in-keras.html, jmetzen.github.io/2015-11-27/vae.html, etc. Not really sure where the difference lies, but I'm expecting that this is due to slight variations in the overall implementation
$endgroup$
– Jed
8 hours ago

2

$begingroup$
In Keras by Francois Chollet, the terms inside K.mean are negative of yours, that's why -0.5 works for them.
$endgroup$
– Esmailian
8 hours ago

2

$begingroup$
Also, another trick is that we let the network produce log(sigma) instead of sigma and then exponentiate it (the same as what Francois Chollet does) for stability. Take a look at side notes of this answer.
$endgroup$
– Esmailian
8 hours ago

$begingroup$
thanks, if you make the network generate log_sigma, then is you loss function going to work out to the above log_sigma kl_loss?
$endgroup$
– Jed
8 hours ago

|
show 2 more comments

    Model: "model_1"

__________________________________________________________________________________________________

Layer (type)                    Output Shape         Param #     Connected to

==================================================================================================

input_1 (InputLayer)            [(None, 224)]        0

__________________________________________________________________________________________________

encoding_flatten (Flatten)      (None, 224)          0           input_1[0][0]

__________________________________________________________________________________________________

encoding_layer_2 (Dense)        (None, 256)          57600       encoding_flatten[0][0]

__________________________________________________________________________________________________

encoding_layer_3 (Dense)        (None, 128)          32896       encoding_layer_2[0][0]

__________________________________________________________________________________________________

encoding_layer_4 (Dense)        (None, 64)           8256        encoding_layer_3[0][0]

__________________________________________________________________________________________________

encoding_layer_5 (Dense)        (None, 32)           2080        encoding_layer_4[0][0]

__________________________________________________________________________________________________

encoding_layer_6 (Dense)        (None, 16)           528         encoding_layer_5[0][0]

__________________________________________________________________________________________________

encoder_mean (Dense)            (None, 16)           272         encoding_layer_6[0][0]

__________________________________________________________________________________________________

encoder_sigma (Dense)           (None, 16)           272         encoding_layer_6[0][0]

__________________________________________________________________________________________________

lambda (Lambda)                 (None, 16)           0           encoder_mean[0][0]

                                                                 encoder_sigma[0][0]

__________________________________________________________________________________________________

decoder_layer_1 (Dense)         (None, 16)           272         lambda[0][0]

__________________________________________________________________________________________________

decoder_layer_2 (Dense)         (None, 32)           544         decoder_layer_1[0][0]

__________________________________________________________________________________________________

decoder_layer_3 (Dense)         (None, 64)           2112        decoder_layer_2[0][0]

__________________________________________________________________________________________________

decoder_layer_4 (Dense)         (None, 128)          8320        decoder_layer_3[0][0]

__________________________________________________________________________________________________

decoder_layer_5 (Dense)         (None, 256)          33024       decoder_layer_4[0][0]

__________________________________________________________________________________________________

decoder_mean (Dense)            (None, 224)          57568       decoder_layer_5[0][0]

==================================================================================================

Total params: 203,744

Trainable params: 203,744

Non-trainable params: 0

__________________________________________________________________________________________________

Train on 3974 samples, validate on 994 samples

Epoch 1/10

3974/3974 [==============================] - 3s 677us/sample - loss: -28.1519 - val_loss: -33.5864

Epoch 2/10

3974/3974 [==============================] - 1s 346us/sample - loss: -137258.8175 - val_loss: -3683802.1489

Epoch 3/10

3974/3974 [==============================] - 1s 344us/sample - loss: -14543022903.6056 - val_loss: -107811177469.9396

Epoch 4/10

3974/3974 [==============================] - 1s 363us/sample - loss: -3011718676570.7012 - val_loss: -13131454938476.6816

Epoch 5/10

3974/3974 [==============================] - 1s 350us/sample - loss: -101442605943572.4844 - val_loss: -322685056398605.9375

Epoch 6/10

3974/3974 [==============================] - 1s 344us/sample - loss: -1417424385529640.5000 - val_loss: -3687688508198145.5000

Epoch 7/10

3974/3974 [==============================] - 1s 358us/sample - loss: -11794297368126698.0000 - val_loss: -26632844827070784.0000

Epoch 8/10

3974/3974 [==============================] - 1s 339us/sample - loss: -69508229806130784.0000 - val_loss: -141312065640756336.0000

Epoch 9/10

3974/3974 [==============================] - 1s 345us/sample - loss: -319838384005810432.0000 - val_loss: -599553350073361152.0000

Epoch 10/10

3974/3974 [==============================] - 1s 342us/sample - loss: -1221653451351326464.0000 - val_loss: -2147128507956525312.0000

latent sample func:

def sampling(self,args):

    """Reparameterization trick by sampling fr an isotropic unit Gaussian.

    # Arguments

        args (tensor): mean and log of variance of Q(z|X)

    # Returns

        z (tensor): sampled latent vector

    """



    z_mean, z_log_var = args

    set = tf.shape(z_mean)[0]

    batch = tf.shape(z_mean)[1]

    dim = tf.shape(z_mean)[-1]

    # by default, random_normal has mean=0 and std=1.0

    epsilon = tf.random.normal(shape=(set, dim))#tfp.distributions.Normal(mean=tf.zeros(shape=(batch, dim)),loc=tf.ones(shape=(batch, dim)))

    return z_mean + (z_log_var * epsilon)

Loss func:

def vae_loss(self,input, x_decoded_mean):

    xent_loss = tf.reduce_mean(tf.keras.backend.binary_crossentropy(input, x_decoded_mean))

    kl_loss = -0.5 * tf.reduce_sum(tf.square(self.encoded_mean) + tf.square(self.encoded_sigma) - tf.math.log(tf.square(self.encoded_sigma)) - 1, -1)

    return xent_loss + kl_loss

Another vae_loss implementation:

def vae_loss(self,input, x_decoded_mean):

    gen_loss = tf.reduce_sum(tf.keras.backend.binary_crossentropy(input, x_decoded_mean))

    #gen_loss = tf.losses.mean_squared_error(input,x_decoded_mean)

    kl_loss = -0.5 * tf.reduce_sum(1 + self.encoded_sigma - tf.square(self.encoded_mean) - tf.exp(self.encoded_sigma), -1)

    return tf.reduce_mean(gen_loss + kl_loss)

log_sigma kl_loss:

kl_loss = 0.5 * tf.reduce_sum(tf.square(self.encoded_mean) + tf.square(tf.exp(self.encoded_sigma)) - self.encoded_sigma - 1, axis=-1)

edited 8 hours ago

asked 12 hours ago

Jed

New contributor

2

$begingroup$
Welcome to StackExchange.DS! KL loss must be minimized, I think it should be +0.5 to decrease the mean and std toward 0 and 1 respectively. Let me know if this was the problem
$endgroup$
– Esmailian
10 hours ago

$begingroup$
@Esmailian Thanks for the suggestion. Unfortunately, no that doesn't help. I think there is a problem with my implementation of KL divergence and/or the sampling. There are a lot of implementations where both or one of the loss components is negative: blog.keras.io/building-autoencoders-in-keras.html, jmetzen.github.io/2015-11-27/vae.html, etc. Not really sure where the difference lies, but I'm expecting that this is due to slight variations in the overall implementation
$endgroup$
– Jed
8 hours ago

2

$begingroup$
In Keras by Francois Chollet, the terms inside K.mean are negative of yours, that's why -0.5 works for them.
$endgroup$
– Esmailian
8 hours ago

2

$begingroup$
Also, another trick is that we let the network produce log(sigma) instead of sigma and then exponentiate it (the same as what Francois Chollet does) for stability. Take a look at side notes of this answer.
$endgroup$
– Esmailian
8 hours ago

$begingroup$
thanks, if you make the network generate log_sigma, then is you loss function going to work out to the above log_sigma kl_loss?
$endgroup$
– Jed
8 hours ago

|
show 2 more comments

    Model: "model_1"

__________________________________________________________________________________________________

Layer (type)                    Output Shape         Param #     Connected to

==================================================================================================

input_1 (InputLayer)            [(None, 224)]        0

__________________________________________________________________________________________________

encoding_flatten (Flatten)      (None, 224)          0           input_1[0][0]

__________________________________________________________________________________________________

encoding_layer_2 (Dense)        (None, 256)          57600       encoding_flatten[0][0]

__________________________________________________________________________________________________

encoding_layer_3 (Dense)        (None, 128)          32896       encoding_layer_2[0][0]

__________________________________________________________________________________________________

encoding_layer_4 (Dense)        (None, 64)           8256        encoding_layer_3[0][0]

__________________________________________________________________________________________________

encoding_layer_5 (Dense)        (None, 32)           2080        encoding_layer_4[0][0]

__________________________________________________________________________________________________

encoding_layer_6 (Dense)        (None, 16)           528         encoding_layer_5[0][0]

__________________________________________________________________________________________________

encoder_mean (Dense)            (None, 16)           272         encoding_layer_6[0][0]

__________________________________________________________________________________________________

encoder_sigma (Dense)           (None, 16)           272         encoding_layer_6[0][0]

__________________________________________________________________________________________________

lambda (Lambda)                 (None, 16)           0           encoder_mean[0][0]

                                                                 encoder_sigma[0][0]

__________________________________________________________________________________________________

decoder_layer_1 (Dense)         (None, 16)           272         lambda[0][0]

__________________________________________________________________________________________________

decoder_layer_2 (Dense)         (None, 32)           544         decoder_layer_1[0][0]

__________________________________________________________________________________________________

decoder_layer_3 (Dense)         (None, 64)           2112        decoder_layer_2[0][0]

__________________________________________________________________________________________________

decoder_layer_4 (Dense)         (None, 128)          8320        decoder_layer_3[0][0]

__________________________________________________________________________________________________

decoder_layer_5 (Dense)         (None, 256)          33024       decoder_layer_4[0][0]

__________________________________________________________________________________________________

decoder_mean (Dense)            (None, 224)          57568       decoder_layer_5[0][0]

==================================================================================================

Total params: 203,744

Trainable params: 203,744

Non-trainable params: 0

__________________________________________________________________________________________________

Train on 3974 samples, validate on 994 samples

Epoch 1/10

3974/3974 [==============================] - 3s 677us/sample - loss: -28.1519 - val_loss: -33.5864

Epoch 2/10

3974/3974 [==============================] - 1s 346us/sample - loss: -137258.8175 - val_loss: -3683802.1489

Epoch 3/10

3974/3974 [==============================] - 1s 344us/sample - loss: -14543022903.6056 - val_loss: -107811177469.9396

Epoch 4/10

3974/3974 [==============================] - 1s 363us/sample - loss: -3011718676570.7012 - val_loss: -13131454938476.6816

Epoch 5/10

3974/3974 [==============================] - 1s 350us/sample - loss: -101442605943572.4844 - val_loss: -322685056398605.9375

Epoch 6/10

3974/3974 [==============================] - 1s 344us/sample - loss: -1417424385529640.5000 - val_loss: -3687688508198145.5000

Epoch 7/10

3974/3974 [==============================] - 1s 358us/sample - loss: -11794297368126698.0000 - val_loss: -26632844827070784.0000

Epoch 8/10

3974/3974 [==============================] - 1s 339us/sample - loss: -69508229806130784.0000 - val_loss: -141312065640756336.0000

Epoch 9/10

3974/3974 [==============================] - 1s 345us/sample - loss: -319838384005810432.0000 - val_loss: -599553350073361152.0000

Epoch 10/10

3974/3974 [==============================] - 1s 342us/sample - loss: -1221653451351326464.0000 - val_loss: -2147128507956525312.0000

latent sample func:

def sampling(self,args):

    """Reparameterization trick by sampling fr an isotropic unit Gaussian.

    # Arguments

        args (tensor): mean and log of variance of Q(z|X)

    # Returns

        z (tensor): sampled latent vector

    """



    z_mean, z_log_var = args

    set = tf.shape(z_mean)[0]

    batch = tf.shape(z_mean)[1]

    dim = tf.shape(z_mean)[-1]

    # by default, random_normal has mean=0 and std=1.0

    epsilon = tf.random.normal(shape=(set, dim))#tfp.distributions.Normal(mean=tf.zeros(shape=(batch, dim)),loc=tf.ones(shape=(batch, dim)))

    return z_mean + (z_log_var * epsilon)

Loss func:

def vae_loss(self,input, x_decoded_mean):

    xent_loss = tf.reduce_mean(tf.keras.backend.binary_crossentropy(input, x_decoded_mean))

    kl_loss = -0.5 * tf.reduce_sum(tf.square(self.encoded_mean) + tf.square(self.encoded_sigma) - tf.math.log(tf.square(self.encoded_sigma)) - 1, -1)

    return xent_loss + kl_loss

Another vae_loss implementation:

def vae_loss(self,input, x_decoded_mean):

    gen_loss = tf.reduce_sum(tf.keras.backend.binary_crossentropy(input, x_decoded_mean))

    #gen_loss = tf.losses.mean_squared_error(input,x_decoded_mean)

    kl_loss = -0.5 * tf.reduce_sum(1 + self.encoded_sigma - tf.square(self.encoded_mean) - tf.exp(self.encoded_sigma), -1)

    return tf.reduce_mean(gen_loss + kl_loss)

log_sigma kl_loss:

kl_loss = 0.5 * tf.reduce_sum(tf.square(self.encoded_mean) + tf.square(tf.exp(self.encoded_sigma)) - self.encoded_sigma - 1, axis=-1)

edited 8 hours ago

asked 12 hours ago

Jed

New contributor

    Model: "model_1"

__________________________________________________________________________________________________

Layer (type)                    Output Shape         Param #     Connected to

==================================================================================================

input_1 (InputLayer)            [(None, 224)]        0

__________________________________________________________________________________________________

encoding_flatten (Flatten)      (None, 224)          0           input_1[0][0]

__________________________________________________________________________________________________

encoding_layer_2 (Dense)        (None, 256)          57600       encoding_flatten[0][0]

__________________________________________________________________________________________________

encoding_layer_3 (Dense)        (None, 128)          32896       encoding_layer_2[0][0]

__________________________________________________________________________________________________

encoding_layer_4 (Dense)        (None, 64)           8256        encoding_layer_3[0][0]

__________________________________________________________________________________________________

encoding_layer_5 (Dense)        (None, 32)           2080        encoding_layer_4[0][0]

__________________________________________________________________________________________________

encoding_layer_6 (Dense)        (None, 16)           528         encoding_layer_5[0][0]

__________________________________________________________________________________________________

encoder_mean (Dense)            (None, 16)           272         encoding_layer_6[0][0]

__________________________________________________________________________________________________

encoder_sigma (Dense)           (None, 16)           272         encoding_layer_6[0][0]

__________________________________________________________________________________________________

lambda (Lambda)                 (None, 16)           0           encoder_mean[0][0]

                                                                 encoder_sigma[0][0]

__________________________________________________________________________________________________

decoder_layer_1 (Dense)         (None, 16)           272         lambda[0][0]

__________________________________________________________________________________________________

decoder_layer_2 (Dense)         (None, 32)           544         decoder_layer_1[0][0]

__________________________________________________________________________________________________

decoder_layer_3 (Dense)         (None, 64)           2112        decoder_layer_2[0][0]

__________________________________________________________________________________________________

decoder_layer_4 (Dense)         (None, 128)          8320        decoder_layer_3[0][0]

__________________________________________________________________________________________________

decoder_layer_5 (Dense)         (None, 256)          33024       decoder_layer_4[0][0]

__________________________________________________________________________________________________

decoder_mean (Dense)            (None, 224)          57568       decoder_layer_5[0][0]

==================================================================================================

Total params: 203,744

Trainable params: 203,744

Non-trainable params: 0

__________________________________________________________________________________________________

Train on 3974 samples, validate on 994 samples

Epoch 1/10

3974/3974 [==============================] - 3s 677us/sample - loss: -28.1519 - val_loss: -33.5864

Epoch 2/10

3974/3974 [==============================] - 1s 346us/sample - loss: -137258.8175 - val_loss: -3683802.1489

Epoch 3/10

3974/3974 [==============================] - 1s 344us/sample - loss: -14543022903.6056 - val_loss: -107811177469.9396

Epoch 4/10

3974/3974 [==============================] - 1s 363us/sample - loss: -3011718676570.7012 - val_loss: -13131454938476.6816

Epoch 5/10

3974/3974 [==============================] - 1s 350us/sample - loss: -101442605943572.4844 - val_loss: -322685056398605.9375

Epoch 6/10

3974/3974 [==============================] - 1s 344us/sample - loss: -1417424385529640.5000 - val_loss: -3687688508198145.5000

Epoch 7/10

3974/3974 [==============================] - 1s 358us/sample - loss: -11794297368126698.0000 - val_loss: -26632844827070784.0000

Epoch 8/10

3974/3974 [==============================] - 1s 339us/sample - loss: -69508229806130784.0000 - val_loss: -141312065640756336.0000

Epoch 9/10

3974/3974 [==============================] - 1s 345us/sample - loss: -319838384005810432.0000 - val_loss: -599553350073361152.0000

Epoch 10/10

3974/3974 [==============================] - 1s 342us/sample - loss: -1221653451351326464.0000 - val_loss: -2147128507956525312.0000

latent sample func:

def sampling(self,args):

    """Reparameterization trick by sampling fr an isotropic unit Gaussian.

    # Arguments

        args (tensor): mean and log of variance of Q(z|X)

    # Returns

        z (tensor): sampled latent vector

    """



    z_mean, z_log_var = args

    set = tf.shape(z_mean)[0]

    batch = tf.shape(z_mean)[1]

    dim = tf.shape(z_mean)[-1]

    # by default, random_normal has mean=0 and std=1.0

    epsilon = tf.random.normal(shape=(set, dim))#tfp.distributions.Normal(mean=tf.zeros(shape=(batch, dim)),loc=tf.ones(shape=(batch, dim)))

    return z_mean + (z_log_var * epsilon)

Loss func:

def vae_loss(self,input, x_decoded_mean):

    xent_loss = tf.reduce_mean(tf.keras.backend.binary_crossentropy(input, x_decoded_mean))

    kl_loss = -0.5 * tf.reduce_sum(tf.square(self.encoded_mean) + tf.square(self.encoded_sigma) - tf.math.log(tf.square(self.encoded_sigma)) - 1, -1)

    return xent_loss + kl_loss

Another vae_loss implementation:

def vae_loss(self,input, x_decoded_mean):

    gen_loss = tf.reduce_sum(tf.keras.backend.binary_crossentropy(input, x_decoded_mean))

    #gen_loss = tf.losses.mean_squared_error(input,x_decoded_mean)

    kl_loss = -0.5 * tf.reduce_sum(1 + self.encoded_sigma - tf.square(self.encoded_mean) - tf.exp(self.encoded_sigma), -1)

    return tf.reduce_mean(gen_loss + kl_loss)

log_sigma kl_loss:

kl_loss = 0.5 * tf.reduce_sum(tf.square(self.encoded_mean) + tf.square(tf.exp(self.encoded_sigma)) - self.encoded_sigma - 1, axis=-1)

python keras tensorflow loss-function autoencoder

edited 8 hours ago

asked 12 hours ago

Jed

New contributor

edited 8 hours ago

asked 12 hours ago

Jed

New contributor

edited 8 hours ago

asked 12 hours ago

Jed

New contributor

asked 12 hours ago

Jed

asked 12 hours ago

Jed

New contributor

Jed is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.

2

$begingroup$
Welcome to StackExchange.DS! KL loss must be minimized, I think it should be +0.5 to decrease the mean and std toward 0 and 1 respectively. Let me know if this was the problem
$endgroup$
– Esmailian
10 hours ago

$begingroup$
@Esmailian Thanks for the suggestion. Unfortunately, no that doesn't help. I think there is a problem with my implementation of KL divergence and/or the sampling. There are a lot of implementations where both or one of the loss components is negative: blog.keras.io/building-autoencoders-in-keras.html, jmetzen.github.io/2015-11-27/vae.html, etc. Not really sure where the difference lies, but I'm expecting that this is due to slight variations in the overall implementation
$endgroup$
– Jed
8 hours ago

2

$begingroup$
In Keras by Francois Chollet, the terms inside K.mean are negative of yours, that's why -0.5 works for them.
$endgroup$
– Esmailian
8 hours ago

2

$begingroup$
Also, another trick is that we let the network produce log(sigma) instead of sigma and then exponentiate it (the same as what Francois Chollet does) for stability. Take a look at side notes of this answer.
$endgroup$
– Esmailian
8 hours ago

$begingroup$
thanks, if you make the network generate log_sigma, then is you loss function going to work out to the above log_sigma kl_loss?
$endgroup$
– Jed
8 hours ago

|
show 2 more comments

2

$begingroup$
Welcome to StackExchange.DS! KL loss must be minimized, I think it should be +0.5 to decrease the mean and std toward 0 and 1 respectively. Let me know if this was the problem
$endgroup$
– Esmailian
10 hours ago

$begingroup$
@Esmailian Thanks for the suggestion. Unfortunately, no that doesn't help. I think there is a problem with my implementation of KL divergence and/or the sampling. There are a lot of implementations where both or one of the loss components is negative: blog.keras.io/building-autoencoders-in-keras.html, jmetzen.github.io/2015-11-27/vae.html, etc. Not really sure where the difference lies, but I'm expecting that this is due to slight variations in the overall implementation
$endgroup$
– Jed
8 hours ago

2

$begingroup$
In Keras by Francois Chollet, the terms inside K.mean are negative of yours, that's why -0.5 works for them.
$endgroup$
– Esmailian
8 hours ago

2

$begingroup$
Also, another trick is that we let the network produce log(sigma) instead of sigma and then exponentiate it (the same as what Francois Chollet does) for stability. Take a look at side notes of this answer.
$endgroup$
– Esmailian
8 hours ago

$begingroup$
thanks, if you make the network generate log_sigma, then is you loss function going to work out to the above log_sigma kl_loss?
$endgroup$
– Jed
8 hours ago

Welcome to StackExchange.DS! KL loss must be minimized, I think it should be +0.5 to decrease the mean and std toward 0 and 1 respectively. Let me know if this was the problem

– Esmailian
10 hours ago

@Esmailian Thanks for the suggestion. Unfortunately, no that doesn't help. I think there is a problem with my implementation of KL divergence and/or the sampling. There are a lot of implementations where both or one of the loss components is negative: blog.keras.io/building-autoencoders-in-keras.html, jmetzen.github.io/2015-11-27/vae.html, etc. Not really sure where the difference lies, but I'm expecting that this is due to slight variations in the overall implementation

– Jed
8 hours ago

In Keras by Francois Chollet, the terms inside K.mean are negative of yours, that's why -0.5 works for them.

– Esmailian
8 hours ago

Also, another trick is that we let the network produce log(sigma) instead of sigma and then exponentiate it (the same as what Francois Chollet does) for stability. Take a look at side notes of this answer.

– Esmailian
8 hours ago

thanks, if you make the network generate log_sigma, then is you loss function going to work out to the above log_sigma kl_loss?

– Jed
8 hours ago

|
show 2 more comments

0

active

oldest

votes

Your Answer

StackExchange.ifUsing("editor", function () {
return StackExchange.using("mathjaxEditing", function () {
StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix) {
StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\$","\$"]]);
});
});
}, "mathjax-editing");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "557"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});

}
});

Jed is a new contributor. Be nice, and check out our Code of Conduct.

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f48697%2fvariational-autoencoder-giving-negative-loss%23new-answer', 'question_page');
}
);

Post as a guest

Name

Required, but never shown

0

active

oldest

votes

0

active

oldest

votes

Jed is a new contributor. Be nice, and check out our Code of Conduct.

draft saved

draft discarded

Jed is a new contributor. Be nice, and check out our Code of Conduct.

Thanks for contributing an answer to Data Science Stack Exchange!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

Use MathJax to format equations. MathJax reference.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Htydjtk