Does operational transformation work on structured documents such as HTML if simply treated as plain text?

前端 未结 3 1050
温柔的废话
温柔的废话 2021-02-04 03:18

The FAQ of Google Wave Protocol says that [HTML] \"does not have desirable properties\" and that \"HTML makes OT (Operational Transforms) difficult if not impossible\" [1]. Why

相关标签:
3条回答
  • 2021-02-04 03:38

    I'm assuming here you understand the basics of OT. The principal problem with doing OT on HTML as plain text is that of merging html tags. As a simple example, say we had a document as follows:

    Hello world
    

    Alice then decides that world should be in bold:

    Hello <b>world</b>
    

    This can be represented with a double insert operation in OT, schematically:

    Edit A: Keep 6 : Insert "<b>" : Keep 5 : Insert "</b>"
    

    If Bob decided that 'world' should be italic before he saw Alice's edit, he would add the operation

    Edit B: Keep 6 : Insert "<i>" : Keep 5 : Insert "</i>"
    

    If the server received Bob's edit after Alice's, it would need to transform B against A to become B'.

    The Keep statements are unchanged through transformation, but Insert "" transformed over Insert "" can become either Keep 3 : Insert "" or Insert "" : Keep 3. Usually the server will be configured to place the later edit after the first edit.

    Edit B': Keep 6 : Keep 3 : Insert "<i>" : Keep 5 : Keep 3 : Insert "</i>"
    

    Here the problem becomes obvious. Applying A then B' to the original string gives the invalid html:

    Hello <b><i>world</b></i>
    

    Theoretically this could be solved by varying pre and post inserts, but this would get hairy for more complicated examples, potentially involving a full document scan for every transformation.

    As the other answer noted, this mess can be avoided using out-of band annotations + plain text. Another approach I've only seen so far in academic papers is to treat the XML structure as a tree with OT operations for node addition, deletion, eg:

    http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.100.74

    0 讨论(0)
  • 2021-02-04 03:52

    I don't have a complete answer but I'm interested in seeing more work done on making existing open source operational transformation libraries work with rich text, so I'll contribute what I know.

    The important difference between HTML and the Wave schema seems to be the way text formatting is marked up: a heirarchy of nested tags for HTML vs. out of band annotations (in the footer of the document) with ranges for Wave XML. Out of band annotations are probably a more natural way to mark up text formatting since they allow overlapping (non-nested) formats. It allows something like this (in pseudo-markup), which would not be valid XML using the nested representation:

    (b) This is bold (i) while this range is both bold and italic (/b) and this last bit is just italic (/i)

    Related, here is the relevant issue in the ShareJS project. Perhaps they can implement rich text support by adopting part of the Wave XML schema.

    0 讨论(0)
  • 2021-02-04 03:55

    There are approaches in OT that support SGML (superset of XML), but there are no implementations. Therefore, it is not impossible! Though, I agree, OT is not the best approach to enable XML. This is because OT was designed for linear data structures. But HTML/XML is much more complex: it has attributes, and it is built like a tree. The fact that it is a tree is solvable, but the attributes - which is realized as an ordered associative array - are not supported by OT. Simply because associative arrays are not supported by OT (at the moment). The approach above actually recommends to treat the attributes as a string: E.g. "id='myid' value='mystuff'" But you can easily break the whole syntax of your 'attributes-string', when one user deletes all attributes, and another one inserts a " character directly after "mystuff". This could resolve in some div tag that looks like this <div ">, which is not valid syntax.

    Maybe this interests you:

    CEFX is a project that aimed to support XML - it's dead to my knowledge. But it uses an OT approach. For some reason it is not possible to edit string - only xml elements.

    Google's Drive SDK supports graph-like data structures. It is, however, proprietary and nobody knows how it works.

    I am developing a framework that supports arbitrary data structures. Currently, Text, Json, XML, and HTML are supported. It has a different approach: check it out: Yatta!

    BTW: What the Wave protocol, and Eric Drechsel described is known as Annotations in OT. It is commonly leveraged to support rich text.

    0 讨论(0)
提交回复
热议问题