Suggestions on how build an HTML Diff tool?

前端 未结 16 1736
灰色年华
灰色年华 2021-02-02 02:00

In this post I asked if there were any tools that compare the structure (not actual content) of 2 HTML pages. I ask because I receive HTML templates from our designers, and freq

相关标签:
16条回答
  • 2021-02-02 02:36

    The DOM is a data structure - it's a tree.

    0 讨论(0)
  • 2021-02-02 02:36

    See http://www.semdesigns.com/Products/SmartDifferencer/index.html for a tool that is parameterized by langauge grammar, and produces deltas in terms of language elements (identifiers, expressions, statements, blocks, methods, ...) inserted, deleted, moved, replaced, or has identifiers substituted across it consistently. This tool ignores whitespace reformatting (e.g., different linebreaks or layouts) and semantically indistinguishable values (e.g., it knows that 0x0F and 15 are the same value). This can be applied to HTML using an HTML parser.

    EDIT: 9/12/2009. We've built an experimental SmartDiff tool using an HTML editor.

    0 讨论(0)
  • 2021-02-02 02:37

    If i was to tacke this issue I would do this:

    1. Plan for some kind of a DOM for html pages. starts at lightweight and then add more as needed. I would use composite pattern for the data structure. i.e. every element has children collection of the base class type.
    2. Create a parser to parse html pages.
    3. Using the parser load html element to the DOM.
    4. After the pages' been loaded up to the DOM, you have the hierachical snapshot of your html pages structure.
    5. Keep iterating through every element on both sides till the end of the DOM. You'll find the diff in the structure, when you hit a mismatched of element type.

    In your example you would have only a div element object loaded on one side, on the other side you would have a div element object loaded with 1 child element of type paragraph element. fire up your iterator, first you'll match up the div element, second iterator you'll match up paragraph with nothing. You've got your structural difference.

    0 讨论(0)
  • 2021-02-02 02:38

    You may also have to consider that the 'content' itself could contain additional mark-up so it's probably worth stripping out everything within certain elements (like <div>s with certain IDs or classes) before you do your comparison. For example:

    <div id="mainContent">
    <p>lorem ipsum etc..</p>
    </div>
    

    and

    <div id="mainContent">
    <p>Here is some real content<img class="someImage" src="someImage.jpg" /></p>
    <ul>
    <li>and</li>
    <li>some</li>
    <li>more..</li>
    </ul>
    </div>
    
    0 讨论(0)
提交回复
热议问题