Suggestions on how build an HTML Diff tool?

前端未结

关注

 16  1795

In this post I asked if there were any tools that compare the structure (not actual content) of 2 HTML pages. I ask because I receive HTML templates from our designers, and freq

相关标签:

16条回答

轮回少年

2021-02-02 02:23

http://www.mugo.ca/Products/Dom-Diff

Works with FF 3.5. I haven't tested FF 3.6 yet.

0 讨论(0)
发布评论:

提交评论
- 加载中...
慢半拍i

2021-02-02 02:23

If i were to do this, first i would learn HTML. (^-^) Then i would build a tool that strips out all of the actual content and then saves that as a file so it can be piped through WinDiff (or other merge tool).

0 讨论(0)
发布评论:

提交评论
- 加载中...
后悔当初

2021-02-02 02:26

My suggestion is just the basic way of doing it... Of course to tackle the issue you mentioned additional rules must be applied here... Which is in your case, we got a matching div element, and then apply attributes/property matching rules and what not...

To be honest, there are many and complicated rules that need to be applied for the comparison, and its not just a simple matching element to another element. For example what happens if you have duplicates. e.g. 1 div element on one side, and 2 div element on the other side. How are you gonna match up which div elements matches together?

There are alot other complicated issues that you will find in the comparison word. Im speaking based of experience (part of my job is to maitain my company text comparison engine).

0 讨论(0)
发布评论:

提交评论
- 加载中...
北海茫月

2021-02-02 02:27
I think some of the suggestions above don't take into account that there are other tags in the HTML between two pages which would be textually different, but the resulting HTML markup is functionally equivalent. Danimal lists control IDs as an example.

The following two markups are functionlly identical, but would show up as different if you simply compared tags:
```
<div id="ctl00_TopNavHome_DivHeader" class="header4">foo</div>
<div class="header4">foo</div>
```
I was going to suggest Danimal write an HTML translation which looks for the HTML tags and converts both docs into a simplified version of both which omits ID tags and any other tags you designate as irrelevant. This’d likely have to be a work in progress, as you ignore certain attributes/tags and then run into new ones which you also want to ignore.

However, I like the idea of using the XmlSchemaInterface to boil it down to the XML schema, then use a diff tool which understands XML rules.
0 讨论(0)
发布评论:

提交评论
- 加载中...
粉色の甜心

2021-02-02 02:31
I don't know any tool but I know there is a simple way to do this:
- First, use a regular expression tool to strip off all the text in your HTML file. You can use this regular expression to search for the text (?<=^|>)[^><]+?(?=<|$) and replace them with an empty string (""), i.e. delete all the text. After this step, you will have all HTML markup tags. There are a lot of free regular expression tools out there.
- Then, you repeat the first step for the original HTML file.
- Last, you use a diff tool to compare the two sets of HTML markups. This will show what is missing between one set and the other.
0 讨论(0)
发布评论:

提交评论
- 加载中...
傲寒

2021-02-02 02:32

Open each page in the browser and save them as .htm files. Compare the two using windiff.

0 讨论(0)
发布评论:

提交评论
- 加载中...