That is a historical definition; in modern usage it simply refers to the size of the font, with the word "em" itself no longer having any practical or relevant meaning. As a matter of fact, the same Wikipedia article expands on this evolution in its usage and meaning in a later section:
One em was traditionally defined as the width of the capital "M" in the current typeface and point size, as the "M" was commonly cast the full-width of the square "blocks", or "em-quads" (also "mutton-quads"), which are used in printing presses. However, in modern typefaces, the character M is usually somewhat less than one em wide. Moreover, as the term has expanded to include a wider variety of languages and character sets, its meaning has evolved; this has allowed it to include those fonts, typefaces, and character sets which do not include a capital "M", such as Chinese and the Arabic alphabet. Thus, em generally means the point size of the font in question, which is the same as the height of the metal body a font was cast on.
Particularly in terms of CSS, an "em" doesn't necessarily refer to the width of the capital M for a particular font; it's just a relative quantity.
If you're asking about the etymology of the word "em", Wikipedia itself only contains a reference to the Adobe Glossary, which has little more to say about it:
A common unit of measurement in typography. Em is traditionally defined as the width of the uppercase M in the current face and point size. It is more properly defined as simply the current point size. For example, in 12-point type, em is a distance of 12 points.
It's not explicitly mentioned anywhere authoritative that it's a phonetic representation of the capital M, but considering its namesake definition I wouldn't rule out such a possibility.