In RL, continuous policy pi(s) → deterministic gradient works. Discontinuous policy → needs softmax