I understand that epsilon marks the trade-off between exploration and exploitation. At the beginning, you want epsilon to be high so that you take big leaps and learn things. As
As the answer of Vishma Dias described learning rate [decay], I would like to elaborate the epsilon-greedy method that I think the question implicitly mentioned a decayed-epsilon-greedy method for exploration and exploitation.
One way to balance between exploration and exploitation during training RL policy is by using the epsilon-greedy method. For example, =0.3 means with a probability=0.3 the output action is randomly selected from the action space, and with probability=0.7 the output action is greedily selected based on argmax(Q).
An improved of the epsilon-greedy method is called a decayed-epsilon-greedy method. In this method, for example, we train a policy with totally N epochs/episodes (which depends on the problem specific), the algorithm initially sets = (e.g., =0.6), then gradually decreases to end at = (e.g., =0.1) over training epoches/episodes. Specifically, at the initial training process, we let the model more freedom to explore with a high probability (e.g.,=0.6), and then gradually decrease the with a rate r over training epochs/episodes with the following formula:
With this more flexible choice to end at the very small exploration probability , after the training process will focus more on exploitation (i.e., greedy) while it still can explore with a very small probability when the policy is approximately converged.
You can see the advantage of the decayed-epsilon-greedy method in this post.