Word error rate (WER) is a common metric of the performance of a speech recognition, can then be computed as:
WER = (S + D + I) / N
where: