Where can I find a good MediaWiki Markup parser in PHP?

▼魔方 西西 提交于 2020-01-13 10:11:25

问题


I would try hacking MediaWiki's code a little, but I figured out it would be unnecessary if I can get an independent parser.

Can anyone help me with this?

Thanks.


回答1:


Ben Hughes is right. It's very difficult to get right, especially if you want to parse real articles from big wikis like Wikipedia itself with 100% accuracy. It is discussed frequently in the wikitech mailing list and no alternative parser has come up with the goods despite many attempts.

Firstly it's not really a parser in that it has no such concept as an AST (abstract syntax tree). It's a converter that specifically converts to HTML.

Secondly don't fall into the trap of thinking of wikitext as a markup language which can be extended on rare occasions with HTML. You must think of it as an extension to HTML. It is much easier to add wikitext support to an HTML parser than to add HTML support to a wikitext parser.

What this boils down to is that if you want any other format you will need to convert from HTML to that format.

Basically it is stated that only MediaWiki can parse wikitext. But yes the parser is tightly integrated with the rest of the code. Experienced MediaWiki hackers do not react well to questions about isolating the parser - I've tried (-:

But I've also gone ahead and isolated it anyway. It's not complete or ready to share with anybody yet. But basically you want to start with the MediaWiki source not installed or connected to a database or web server. Make a PHP stub program that includes the parser and call an entry point. Check the error when it fails to run and make a phony stub for the class, function, or global that was accessed. Repeat until you have stubbed most of the places the parser interacts with the rest of MediaWiki.

The problem then comes in keeping your hacked stubbed variant in synch because the source tree changes quickly and the live wikis embrace the changes in the parser very quickly and your variant will have to keep up if it is to work into the future.

Check out my feature request: Bug 25984 - Isolate parser from database dependencies




回答2:


It's actually an incredibly difficult format to parse. You can try to separate out the parser component from media wiki (as it is also php), but it is a tangled mess. I've seen a few partial standalone ones that do a nearly reasonable job for a very limited subset of the markup.

If you happen to implement one, or refactor the current wikipedia one let me know as it could be quite useful.



来源:https://stackoverflow.com/questions/1029012/where-can-i-find-a-good-mediawiki-markup-parser-in-php

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!