# Policy Iteration

Policy iteration is the process of iteratively improving a policy, $\pi_t$, using approximations of a state-value function $V^{\pi_t}$. At each iteration $t$, the approximation from the previous step is used to improve ($\overset{I}{\rightarrow}$) the policy, which in turn is used to update ($\overset{E}{\rightarrow}$) the state-value approximation for the next iteration, $V^{\pi_{t+1}}$. Policy iteration ends when the policy becomes stable ($\pi^*$). This is illustrated as follows:
$$V^{\pi_0} \overset{I}{\rightarrow} \pi_1 \overset{E}{\rightarrow} V^{\pi_1} \overset{I}{\rightarrow} \pi_2 \overset{E}{\rightarrow}... \overset{I}{\rightarrow} \pi^*$$