问题
Can a DTD be generated from an XML file using Python?
回答1:
The simple answer to the question you ask is "yes, a DTD can be generated from an XML document using Python".
Python is a Turing-complete language, and there are algorithms for generating a DTD from any arbitrary collection of XML or SGML. I believe the standard reference is Rick Kazman, "Structuring the text of the Oxford English Dictionary through finite state transduction," Centre for the New Oxford English Dictionary Tech. Report OED-86-20, Univ. of Waterloo (June 1986), 117 pp.
In the late 1980s, the library consortium OCLC developed a tool called Fred, which induced DTD for bodies of SGML documents; I heard a lot about it informally but do not recall ever seeing published descriptions of its algorithms. However, a quick search of the Web for "OCLC Fred SGML DTD" produces a pointer to Keith E. Shafer, Fred: the SGML Grammar Builder (1996). (A quick glance showed a great deal of material, but I did not see any clear reference to a high-level description of the algorithms used.)
There is also a Norwegian thesis from 1994: Sunniva M. K. Solstrand, "Automatisk generering av DTD fra SGML-kodet materiale", Hovedfagsoppgave i informasjonsvitenskap, Universitetet i Bergen 1994).
As may be seen, there are several computer scientists who do not agree with the commenters who have told you your question is pointless or wrong. It is true, of course, that the quality of document grammar achieved by automatic grammar induction tends to be lower than the quality of document grammar achieved by a human document analyst and DTD writer.
I suspect that the DTD generated would be more plausible if it restricted itself to the content-model patterns described in various articles by Fabio Vitali and his collaborators in Bologna. The initial paper was, I believe, Fabio Vitali, Angelo Di Iorio, and Daniele Gubellini, "Design patterns for descriptive document substructures", Extreme Markup Languages 2005, and later papers have elaborated and described applications. New work in Bologna by Francesco Poggi (not yet published) extends and deepens the analysis. A Web search for "XML design patterns" may provide other attempts at similar sets of grammatical patterns. From a grammar-induction point of view, the effect of such patterns is to reduce the complexity of the induction problem by targeting simpler grammars.
If you meant to ask the rather different question "Can anyone recommend a Python-based tool for generating a DTD from an XML document?", then I can't help you (and there are lots of Stack Overflow moderators who will close the question at once because questions asking for tool recommendations are frowned upon).
来源:https://stackoverflow.com/questions/28578448/how-to-generate-dtd-from-xml