The FAQ of Google Wave Protocol says that [HTML] \"does not have desirable properties\" and that \"HTML makes OT (Operational Transforms) difficult if not impossible\" [1]. Why
I'm assuming here you understand the basics of OT. The principal problem with doing OT on HTML as plain text is that of merging html tags. As a simple example, say we had a document as follows:
Hello world
Alice then decides that world should be in bold:
Hello world
This can be represented with a double insert operation in OT, schematically:
Edit A: Keep 6 : Insert "" : Keep 5 : Insert ""
If Bob decided that 'world' should be italic before he saw Alice's edit, he would add the operation
Edit B: Keep 6 : Insert "" : Keep 5 : Insert ""
If the server received Bob's edit after Alice's, it would need to transform B against A to become B'.
The Keep statements are unchanged through transformation, but Insert "" transformed over Insert "" can become either Keep 3 : Insert "" or Insert "" : Keep 3. Usually the server will be configured to place the later edit after the first edit.
Edit B': Keep 6 : Keep 3 : Insert "" : Keep 5 : Keep 3 : Insert ""
Here the problem becomes obvious. Applying A then B' to the original string gives the invalid html:
Hello world
Theoretically this could be solved by varying pre and post inserts, but this would get hairy for more complicated examples, potentially involving a full document scan for every transformation.
As the other answer noted, this mess can be avoided using out-of band annotations + plain text. Another approach I've only seen so far in academic papers is to treat the XML structure as a tree with OT operations for node addition, deletion, eg:
http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.100.74