policy gradient loss [on hold]

I am confused with the process for calculating loss. My code is below:

logits = policy.predictions(states) 

negative_likelihoods = tf.nn.softmax_cross_entropy_with_logits(labels=**actions**, logits=logits)



weighted_negative_likelihoods = tf.multiply(negative_likelihoods, q_values)



loss = tf.reduce_mean(weighted_negative_likelihoods)



gradients = loss.gradients(loss, variables)

logits is the output of policy network without softmax.

My question is :

What does actions mean ? Is it the action that agent has executed at t step or it should execute at t step ?
Thanks

edited yesterday

HFulcher

1228

asked yesterday

Kang_Kai

111

New contributor

put on hold as unclear what you're asking by Sean Owen♦ yesterday

Please clarify your specific problem or add additional details to highlight exactly what you need. As it's currently written, it’s hard to tell exactly what you're asking. See the How to Ask page for help clarifying this question. If this question can be reworded to fit the rules in the help center, please edit the question.

add a comment |

I am confused with the process for calculating loss. My code is below:

logits = policy.predictions(states) 

negative_likelihoods = tf.nn.softmax_cross_entropy_with_logits(labels=**actions**, logits=logits)



weighted_negative_likelihoods = tf.multiply(negative_likelihoods, q_values)



loss = tf.reduce_mean(weighted_negative_likelihoods)



gradients = loss.gradients(loss, variables)

logits is the output of policy network without softmax.

My question is :

What does actions mean ? Is it the action that agent has executed at t step or it should execute at t step ?
Thanks

edited yesterday

HFulcher

1228

asked yesterday

Kang_Kai

111

New contributor

put on hold as unclear what you're asking by Sean Owen♦ yesterday

add a comment |

I am confused with the process for calculating loss. My code is below:

logits = policy.predictions(states) 

negative_likelihoods = tf.nn.softmax_cross_entropy_with_logits(labels=**actions**, logits=logits)



weighted_negative_likelihoods = tf.multiply(negative_likelihoods, q_values)



loss = tf.reduce_mean(weighted_negative_likelihoods)



gradients = loss.gradients(loss, variables)

logits is the output of policy network without softmax.

My question is :

What does actions mean ? Is it the action that agent has executed at t step or it should execute at t step ?
Thanks

edited yesterday

HFulcher

1228

asked yesterday

Kang_Kai

111

New contributor

I am confused with the process for calculating loss. My code is below:

logits = policy.predictions(states) 

negative_likelihoods = tf.nn.softmax_cross_entropy_with_logits(labels=**actions**, logits=logits)



weighted_negative_likelihoods = tf.multiply(negative_likelihoods, q_values)



loss = tf.reduce_mean(weighted_negative_likelihoods)



gradients = loss.gradients(loss, variables)

logits is the output of policy network without softmax.

My question is :

What does actions mean ? Is it the action that agent has executed at t step or it should execute at t step ?
Thanks

python tensorflow loss-function policy-gradients

edited yesterday

HFulcher

1228

asked yesterday

Kang_Kai

111

New contributor

edited yesterday

HFulcher

1228

asked yesterday

Kang_Kai

111

New contributor

edited yesterday

HFulcher

1228

edited yesterday

HFulcher

1228

edited yesterday

HFulcher

1228

asked yesterday

Kang_Kai

111

New contributor

asked yesterday

Kang_Kai

111

asked yesterday

Kang_Kai

111

New contributor

Kang_Kai is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.

put on hold as unclear what you're asking by Sean Owen♦ yesterday

add a comment |

0

active

oldest

votes

0

active

oldest

votes

0

active

oldest

votes

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Htydjtk