Intuitively, you can do this by giving your agent a "momentum."
Specifically, increase the size of your state space by a factor of four; you keep track of whether the agent last moved up, right, left, or down. Scale up the costs in your network by some large factor and assign a small penalty to moves that change direction.