Regex to Indent an XML File

前端 未结 7 1861
攒了一身酷
攒了一身酷 2020-12-21 06:14

Is it possible to write a REGEX (search replace) that when run on an XML string will output that XML string indented nicely?

If so whats the REGEX :)

相关标签:
7条回答
  • 2020-12-21 06:34

    I don't know if a regex, in isolation, could do a pretty-print format of an arbitrary XML input. You would need a regex being applied by a program to find a tag, locate the matching closing tags (if the tag is not self-closed), and so on. Using regex to solve this problem is really using the wrong tool for the job. The simplest possible way to pretty print XML is to use an XML parser, read it in, set appropriate serialization options, and then serialize the XML back out.

    Why do you want to use regex to solve this problem?

    0 讨论(0)
  • 2020-12-21 06:34

    From this link:

      private static Regex indentingRegex=new Regex(@"\<\s*(?<tag>[\w\-]+)(\s+[\w\-]+\s*=\s*""[^""]*""|'[^']*')*\s*\>[^\<]*\<\s*/\s*\k<tag>\s*\>|\<[!\?]((?<=!)--((?!--\>).)*--\>|(""[^""]*""|'[^']'|[^>])*\>)|\<\s*(?<closing>/)?\s*[\w\-]+(\s+[\w\-]+\s*=\s*""[^""]*""|'[^']*')*\s*((/\s*)|(?<opening>))\>|[^\<]*", RegexOptions.ExplicitCapture|RegexOptions.Singleline);
    
      public static string IndentXml(string xml) {
            StringBuilder result=new StringBuilder(xml.Length*2);
            int indent=0;
            for (Match match=indentingRegex.Match(xml); match.Success; match=match.NextMatch()) {
                  if (match.Groups["closing"].Success)
                        indent--;
                  result.AppendFormat("{0}{1}\r\n", new String(' ', indent*2), match.Value);
                  if (match.Groups["opening"].Success&&(!match.Groups["closing"].Success))
                        indent++;
            }
            return result.ToString();
      }
    
    0 讨论(0)
  • 2020-12-21 06:35

    Using a regex for this will be a nightmare. Keeping track of the indentation level based on the hierarchy of the nodes will be almost impossible. Perhaps perl's 5.10 regular expression engine might help since it's now reentrant. But let's not go into that road... Besides you will need to take into account CDATA sections which can embed XML declarations that need to be ignored by the indentation and preserved intact.

    Stick with DOM. As it was suggested in the other answer, some libraries provide already a function that will indent a DOM tree for you. If not building one will be much simplier than creating and maintaining the regexes that will do the same task.

    0 讨论(0)
  • 2020-12-21 06:37

    Is it possible to write a REGEX (search replace) that when run on an XML string [...anything]

    No.

    Use an XML parser to read the string, then an XML serialiser to write it back out in ‘pretty’ mode.

    Each XML processor has its own options so it depends on platform, but here is the somewhat long-winded way that works on DOM Level 3 LS-compliant implementations:

    input= implementation.createLSInput();
    input.stringData= unprettyxml;
    parser= implementation.createLSParser(implementation.MODE_SYNCHRONOUS, null);
    document= parser.parse(input);
    serializer= implementation.createLSSerializer();
    serializer.domConfig.setParameter("format-pretty-print", true);
    prettyxml= serializer.writeToString(document);
    
    0 讨论(0)
  • 2020-12-21 06:48

    Doing this would be far, far simpler if you didn't use a regex. In fact I'm not even sure it's possible with regex.

    Most languages have XML libraries that would make this task very simple. What language are you using?

    0 讨论(0)
  • 2020-12-21 06:50

    This would only be acheivable with multiple regexs, which will perform like a state machine.

    What you are looking for is far better suited to an off the cuff parser.

    0 讨论(0)
提交回复
热议问题