Uh oh! Wolfram|Alpha doesn't run without JavaScript.
Please enable JavaScript. If you don't know how, you can find instructions
here
.
Once you've done that, refresh this page to start using Wolfram|Alpha.
RL 中,连续策略 pi(s) → 确定性梯度有效。不连续策略 → 需 softmax。
RL 中,连续策略 pi(s) → 确定性梯度有效。不连续策略 → 需 softmax。
Natural Language
Math Input
Extended Keyboard