Java Wikitext Parser

廉价感情. 提交于 2019-12-20 10:55:24

问题


Any ideas for a nice parser with an easy to use api that is configurable? I'm looking to feed it data such as http://wikitravel.org/wiki/en/api.php?format=xml&action=parse&prop=wikitext&page=San%20Francisco, choose sections of data I want, and output custom html for each unique type of element? Java would be preferred, but if there's a php/js solution that is compatible with most (99%+) wikitext, that would be okay as well.


回答1:


Sweble is probably the best Java parser of wikitext. It claims to be 100% compliant with wikitext, but I seriously doubt that. It parses wikitext into an abstract syntax tree that you then have to do something with (like convert it to HTML).

There is a page on mediawiki.org that lists wikitext parsers in various programming languages. I don't think any of them do 99+% of wikitext though. In general parsing wikitext is a really complex problem. Wikitext isn't even formally defined anywhere outside of the MediaWiki parser itself.




回答2:


This question was answered years ago, but I wanted to save future visitors the effort I had to take to figure out how to use Sweble.

You can try the documentation at their site, but I couldn't figure it out. Just look at the example source code. Download the source jar for swc-example-basic at https://repo1.maven.org/maven2/org/sweble/wikitext/swc-example-basic/2.0.0/swc-example-basic-2.0.0-sources.jar and look at App.java and TextConverter.java.

Basically, to parse a page and convert it to another form, you first add the following dependency to your project:

    <dependency>
        <groupId>org.sweble.wikitext</groupId>
        <artifactId>swc-engine</artifactId>
        <version>2.0.0</version>
    </dependency>

Then, do the following:

public String convertWikiText(String title, String wikiText, int maxLineLength) throws LinkTargetException, EngineException {
    // Set-up a simple wiki configuration
    WikiConfig config = DefaultConfigEnWp.generate();
    // Instantiate a compiler for wiki pages
    WtEngineImpl engine = new WtEngineImpl(config);
    // Retrieve a page
    PageTitle pageTitle = PageTitle.make(config, title);
    PageId pageId = new PageId(pageTitle, -1);
    // Compile the retrieved page
    EngProcessedPage cp = engine.postprocess(pageId, wikiText, null);
    TextConverter p = new TextConverter(config, maxLineLength);
    return (String)p.go(cp.getPage());
}

The TextConverter is a class you'll find in the examples I mentioned above. Customize it to do whatever you want. For example, the following makes sure all bold text is surrounded by "**":

public void visit(WtBold b)
{
    write("**");
    iterate(b);
    write("**");
}

There are a bunch of visit methods on that class for each type of element that you'll encounter.




回答3:


I just had success with Bliki: https://bitbucket.org/axelclk/info.bliki.wiki/wiki/Mediawiki2HTML

Bliki is what is used by XWiki and usage is very easy:

String htmlText = WikiModel.toHtml("This is a simple [[Hello World]] wiki tag");

Here is a list of downloads: https://oss.sonatype.org/content/repositories/snapshots/info/bliki/wiki/bliki-core/

But it is much easier to use this with Maven.




回答4:


You could also use XWiki's rendering engine (http://rendering.xwiki.org). Here's an example of how you'd parse some mediawiki content:

// Initialize Rendering components and allow getting instances
EmbeddableComponentManager componentManager = new EmbeddableComponentManager();
componentManager.initialize(this.getClass().getClassLoader());

// Get the MediaWiki Parser
Parser parser = componentManager.getInstance(Parser.class, "mediawiki/1.0);

// Parse the content in mediawiki markup and generate an AST (it's also possible to use a streaming parser for large content)
XDOM xdom = parser.parse(new StringReader("... input here"));

// Perform any transformation you wish to the XDOM here
...

// Generate XHTML out of the modified XDOM
WikiPrinter printer = new DefaultWikiPrinter();
BlockRenderer renderer = componentManager.getInstance(BlockRenderer.class, "xhtml/1.0");
renderer.render(xdom, printer);

// The result is now in the printer object
printer.toString();

See more examples at http://rendering.xwiki.org/xwiki/bin/view/Main/GettingStarted

Hope it helps.



来源:https://stackoverflow.com/questions/11612118/java-wikitext-parser

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!