Suggestions on how build an HTML Diff tool?

前端未结

关注

 16  1793

In this post I asked if there were any tools that compare the structure (not actual content) of 2 HTML pages. I ask because I receive HTML templates from our designers, and freq

相关标签:

16条回答

南旧

2021-02-02 02:14
This has been an excellent start. A few more clarifications/comments:
- I probably don't care about IDs, since .net will mangle them
- some of the structure will be in a repeater or other such control, so I might end up having more or fewer repeating elements
further thought: I think a good start would be to assume the html is XHTML compliant. I could then infer the schema (using the new .net XmlSchemaInference methods), then diff the schemata. I can then look at the differences and consider whether or not they're significant.
0 讨论(0)
发布评论:

提交评论
- 加载中...

一整个雨季

2021-02-02 02:16

Run both files through the following Perl script, then use diff -iw to do a case-insensitive, whitespace-ignoring diff.

#! /usr/bin/perl -w

use strict;

undef $/;

my $html = <STDIN>;

while ($html =~ /\S/) {
  if ($html =~ s/^\s*<//) {
    $html =~ s/^(.*?)>// or die "malformed HTML";
    print "<$1>\n";
  } else {
    $html =~ s/^([^<]+)//;
    print "(text)\n";
  }
}

0 讨论(0)

遥遥无期

2021-02-02 02:18

Take a look at beyond compare. It has an XML comparison feature that can help you out.

0 讨论(0)
发布评论:

提交评论
- 加载中...
梦如初夏

2021-02-02 02:19

I would use (or contribute to) html5lib and its SAX output. Just zip through the 2 SAX streams looking for mismatches and highlight the whole corresponding subtree.

0 讨论(0)
发布评论:

提交评论
- 加载中...
傲寒

2021-02-02 02:22

@Mike - that would compare everything, including the content of the page, which isn't want the original poster wanted.

Assuming that you have access to the browser's DOM (by writing a Firefox/IE plugin or whatever), I would probably put all of the HTML elements into a tree, then compare the two trees. If the tag name is different, then the node is different. You might want to stop enumerating at a certain point (you probably don't care about span, bold, italic, etc. - maybe only worry about divs?), since some tags are really the content, rather than the structure, of the page.

0 讨论(0)
发布评论:

提交评论
- 加载中...
天涯浪人

2021-02-02 02:22

Pretty Diff can do this. It will compare the code structure only regardless of differences to white space, comments, or even content. Just be sure to check the option "Normalize Content and String Literals".

http://prettydiff.com/

0 讨论(0)
发布评论:

提交评论
- 加载中...

1 2 3 下一页