Building an HTML Diff/Patch Algorithm

后端 未结 3 655
日久生厌
日久生厌 2021-02-04 03:48

A description of what I\'m going to accomplish:

  • Input 2 (N is not essential) HTML documents.
  • Standardize the HTML format
  • Diff the two documents
3条回答
  •  长发绾君心
    2021-02-04 04:30

    You could start by using beautifulsoup to parse both documents.

    Then you have a choice:

    • use prettify to render both documents as more or less standardized HTML and diff those.
    • compare the parse trees.

    The latter allows you to e.g. discard elements that only affect the presentation, not the content. The former is probably easier.

提交回复
热议问题