问题
Any ideas for a nice parser with an easy to use api that is configurable? I'm looking to feed it data such as http://wikitravel.org/wiki/en/api.php?format=xml&action=parse&prop=wikitext&page=San%20Francisco, choose sections of data I want, and output custom html for each unique type of element? Java would be preferred, but if there's a php/js solution that is compatible with most (99%+) wikitext, that would be okay as well.
回答1:
Sweble is probably the best Java parser of wikitext. It claims to be 100% compliant with wikitext, but I seriously doubt that. It parses wikitext into an abstract syntax tree that you then have to do something with (like convert it to HTML).
There is a page on mediawiki.org that lists wikitext parsers in various programming languages. I don't think any of them do 99+% of wikitext though. In general parsing wikitext is a really complex problem. Wikitext isn't even formally defined anywhere outside of the MediaWiki parser itself.
回答2:
This question was answered years ago, but I wanted to save future visitors the effort I had to take to figure out how to use Sweble.
You can try the documentation at their site, but I couldn't figure it out. Just look at the example source code. Download the source jar for swc-example-basic at https://repo1.maven.org/maven2/org/sweble/wikitext/swc-example-basic/2.0.0/swc-example-basic-2.0.0-sources.jar and look at App.java and TextConverter.java.
Basically, to parse a page and convert it to another form, you first add the following dependency to your project:
<dependency>
<groupId>org.sweble.wikitext</groupId>
<artifactId>swc-engine</artifactId>
<version>2.0.0</version>
</dependency>
Then, do the following:
public String convertWikiText(String title, String wikiText, int maxLineLength) throws LinkTargetException, EngineException {
// Set-up a simple wiki configuration
WikiConfig config = DefaultConfigEnWp.generate();
// Instantiate a compiler for wiki pages
WtEngineImpl engine = new WtEngineImpl(config);
// Retrieve a page
PageTitle pageTitle = PageTitle.make(config, title);
PageId pageId = new PageId(pageTitle, -1);
// Compile the retrieved page
EngProcessedPage cp = engine.postprocess(pageId, wikiText, null);
TextConverter p = new TextConverter(config, maxLineLength);
return (String)p.go(cp.getPage());
}
The TextConverter is a class you'll find in the examples I mentioned above. Customize it to do whatever you want. For example, the following makes sure all bold text is surrounded by "**":
public void visit(WtBold b)
{
write("**");
iterate(b);
write("**");
}
There are a bunch of visit methods on that class for each type of element that you'll encounter.
回答3:
I just had success with Bliki: https://bitbucket.org/axelclk/info.bliki.wiki/wiki/Mediawiki2HTML
Bliki is what is used by XWiki and usage is very easy:
String htmlText = WikiModel.toHtml("This is a simple [[Hello World]] wiki tag");
Here is a list of downloads: https://oss.sonatype.org/content/repositories/snapshots/info/bliki/wiki/bliki-core/
But it is much easier to use this with Maven.
回答4:
You could also use XWiki's rendering engine (http://rendering.xwiki.org). Here's an example of how you'd parse some mediawiki content:
// Initialize Rendering components and allow getting instances
EmbeddableComponentManager componentManager = new EmbeddableComponentManager();
componentManager.initialize(this.getClass().getClassLoader());
// Get the MediaWiki Parser
Parser parser = componentManager.getInstance(Parser.class, "mediawiki/1.0);
// Parse the content in mediawiki markup and generate an AST (it's also possible to use a streaming parser for large content)
XDOM xdom = parser.parse(new StringReader("... input here"));
// Perform any transformation you wish to the XDOM here
...
// Generate XHTML out of the modified XDOM
WikiPrinter printer = new DefaultWikiPrinter();
BlockRenderer renderer = componentManager.getInstance(BlockRenderer.class, "xhtml/1.0");
renderer.render(xdom, printer);
// The result is now in the printer object
printer.toString();
See more examples at http://rendering.xwiki.org/xwiki/bin/view/Main/GettingStarted
Hope it helps.
来源:https://stackoverflow.com/questions/11612118/java-wikitext-parser