policy gradient loss [on hold]
$begingroup$
I am confused with the process for calculating loss. My code is below:
logits = policy.predictions(states)
negative_likelihoods = tf.nn.softmax_cross_entropy_with_logits(labels=**actions**, logits=logits)
weighted_negative_likelihoods = tf.multiply(negative_likelihoods, q_values)
loss = tf.reduce_mean(weighted_negative_likelihoods)
gradients = loss.gradients(loss, variables)
logits is the output of policy network without softmax.
My question is :
What does actions mean ? Is it the action that agent has executed at t step or it should execute at t step ?
Thanks
python tensorflow loss-function policy-gradients
New contributor
Kang_Kai is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
$endgroup$
put on hold as unclear what you're asking by Sean Owen♦ yesterday
Please clarify your specific problem or add additional details to highlight exactly what you need. As it's currently written, it’s hard to tell exactly what you're asking. See the How to Ask page for help clarifying this question. If this question can be reworded to fit the rules in the help center, please edit the question.
add a comment |
$begingroup$
I am confused with the process for calculating loss. My code is below:
logits = policy.predictions(states)
negative_likelihoods = tf.nn.softmax_cross_entropy_with_logits(labels=**actions**, logits=logits)
weighted_negative_likelihoods = tf.multiply(negative_likelihoods, q_values)
loss = tf.reduce_mean(weighted_negative_likelihoods)
gradients = loss.gradients(loss, variables)
logits is the output of policy network without softmax.
My question is :
What does actions mean ? Is it the action that agent has executed at t step or it should execute at t step ?
Thanks
python tensorflow loss-function policy-gradients
New contributor
Kang_Kai is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
$endgroup$
put on hold as unclear what you're asking by Sean Owen♦ yesterday
Please clarify your specific problem or add additional details to highlight exactly what you need. As it's currently written, it’s hard to tell exactly what you're asking. See the How to Ask page for help clarifying this question. If this question can be reworded to fit the rules in the help center, please edit the question.
add a comment |
$begingroup$
I am confused with the process for calculating loss. My code is below:
logits = policy.predictions(states)
negative_likelihoods = tf.nn.softmax_cross_entropy_with_logits(labels=**actions**, logits=logits)
weighted_negative_likelihoods = tf.multiply(negative_likelihoods, q_values)
loss = tf.reduce_mean(weighted_negative_likelihoods)
gradients = loss.gradients(loss, variables)
logits is the output of policy network without softmax.
My question is :
What does actions mean ? Is it the action that agent has executed at t step or it should execute at t step ?
Thanks
python tensorflow loss-function policy-gradients
New contributor
Kang_Kai is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
$endgroup$
I am confused with the process for calculating loss. My code is below:
logits = policy.predictions(states)
negative_likelihoods = tf.nn.softmax_cross_entropy_with_logits(labels=**actions**, logits=logits)
weighted_negative_likelihoods = tf.multiply(negative_likelihoods, q_values)
loss = tf.reduce_mean(weighted_negative_likelihoods)
gradients = loss.gradients(loss, variables)
logits is the output of policy network without softmax.
My question is :
What does actions mean ? Is it the action that agent has executed at t step or it should execute at t step ?
Thanks
python tensorflow loss-function policy-gradients
python tensorflow loss-function policy-gradients
New contributor
Kang_Kai is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
New contributor
Kang_Kai is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
edited yesterday
HFulcher
1228
1228
New contributor
Kang_Kai is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
asked yesterday
Kang_KaiKang_Kai
111
111
New contributor
Kang_Kai is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
New contributor
Kang_Kai is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
Kang_Kai is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
put on hold as unclear what you're asking by Sean Owen♦ yesterday
Please clarify your specific problem or add additional details to highlight exactly what you need. As it's currently written, it’s hard to tell exactly what you're asking. See the How to Ask page for help clarifying this question. If this question can be reworded to fit the rules in the help center, please edit the question.
put on hold as unclear what you're asking by Sean Owen♦ yesterday
Please clarify your specific problem or add additional details to highlight exactly what you need. As it's currently written, it’s hard to tell exactly what you're asking. See the How to Ask page for help clarifying this question. If this question can be reworded to fit the rules in the help center, please edit the question.
add a comment |
add a comment |
0
active
oldest
votes
0
active
oldest
votes
0
active
oldest
votes
active
oldest
votes
active
oldest
votes