Suggestions on how build an HTML Diff tool?

前端 未结 16 1793
灰色年华
灰色年华 2021-02-02 02:00

In this post I asked if there were any tools that compare the structure (not actual content) of 2 HTML pages. I ask because I receive HTML templates from our designers, and freq

相关标签:
16条回答
  • 2021-02-02 02:14

    This has been an excellent start. A few more clarifications/comments:

    • I probably don't care about IDs, since .net will mangle them
    • some of the structure will be in a repeater or other such control, so I might end up having more or fewer repeating elements

    further thought: I think a good start would be to assume the html is XHTML compliant. I could then infer the schema (using the new .net XmlSchemaInference methods), then diff the schemata. I can then look at the differences and consider whether or not they're significant.

    0 讨论(0)
  • 2021-02-02 02:16

    Run both files through the following Perl script, then use diff -iw to do a case-insensitive, whitespace-ignoring diff.

    #! /usr/bin/perl -w
    
    use strict;
    
    undef $/;
    
    my $html = <STDIN>;
    
    while ($html =~ /\S/) {
      if ($html =~ s/^\s*<//) {
        $html =~ s/^(.*?)>// or die "malformed HTML";
        print "<$1>\n";
      } else {
        $html =~ s/^([^<]+)//;
        print "(text)\n";
      }
    }
    
    0 讨论(0)
  • 2021-02-02 02:18

    Take a look at beyond compare. It has an XML comparison feature that can help you out.

    0 讨论(0)
  • 2021-02-02 02:19

    I would use (or contribute to) html5lib and its SAX output. Just zip through the 2 SAX streams looking for mismatches and highlight the whole corresponding subtree.

    0 讨论(0)
  • 2021-02-02 02:22

    @Mike - that would compare everything, including the content of the page, which isn't want the original poster wanted.

    Assuming that you have access to the browser's DOM (by writing a Firefox/IE plugin or whatever), I would probably put all of the HTML elements into a tree, then compare the two trees. If the tag name is different, then the node is different. You might want to stop enumerating at a certain point (you probably don't care about span, bold, italic, etc. - maybe only worry about divs?), since some tags are really the content, rather than the structure, of the page.

    0 讨论(0)
  • 2021-02-02 02:22

    Pretty Diff can do this. It will compare the code structure only regardless of differences to white space, comments, or even content. Just be sure to check the option "Normalize Content and String Literals".

    http://prettydiff.com/

    0 讨论(0)
提交回复
热议问题