The difference between Q-learning and SARSA is that Q-learning compares the current state and the best possible next state, whereas SARSA compares the current state against the
If we use only the greedy policy then there will be no exploration so the learning will not work. In the limiting case where epsilon goes to 0 (like 1/t for example), then SARSA and Q-Learning would converge to the optimal policy q*. However with epsilon being fixed, SARSA will converge to the optimal epsilon-greedy policy while Q-Learning will converge to the optimal policy q*.
I write a small note here to explain the differences between the two and hope it can help:
https://tcnguyen.github.io/reinforcement_learning/sarsa_vs_q_learning.html