How to pretty print XML from Java?

后端 未结 30 2511
慢半拍i
慢半拍i 2020-11-22 01:55

I have a Java String that contains XML, with no line feeds or indentations. I would like to turn it into a String with nicely formatted XML. How do I do this?



        
相关标签:
30条回答
  • 2020-11-22 02:53

    there is a very nice command line xml utility called xmlstarlet(http://xmlstar.sourceforge.net/) that can do a lot of things which a lot of people use.

    Your could execute this program programatically using Runtime.exec and then readin the formatted output file. It has more options and better error reporting than a few lines of Java code can provide.

    download xmlstarlet : http://sourceforge.net/project/showfiles.php?group_id=66612&package_id=64589

    0 讨论(0)
  • 2020-11-22 02:53

    The solutions I have found here for Java 1.6+ do not reformat the code if it is already formatted. The one that worked for me (and re-formatted already formatted code) was the following.

    import org.apache.xml.security.c14n.CanonicalizationException;
    import org.apache.xml.security.c14n.Canonicalizer;
    import org.apache.xml.security.c14n.InvalidCanonicalizerException;
    import org.w3c.dom.Element;
    import org.w3c.dom.bootstrap.DOMImplementationRegistry;
    import org.w3c.dom.ls.DOMImplementationLS;
    import org.w3c.dom.ls.LSSerializer;
    import org.xml.sax.InputSource;
    import org.xml.sax.SAXException;
    
    import javax.xml.parsers.DocumentBuilderFactory;
    import javax.xml.parsers.ParserConfigurationException;
    import javax.xml.transform.TransformerException;
    import java.io.IOException;
    import java.io.StringReader;
    
    public class XmlUtils {
        public static String toCanonicalXml(String xml) throws InvalidCanonicalizerException, ParserConfigurationException, SAXException, CanonicalizationException, IOException {
            Canonicalizer canon = Canonicalizer.getInstance(Canonicalizer.ALGO_ID_C14N_OMIT_COMMENTS);
            byte canonXmlBytes[] = canon.canonicalize(xml.getBytes());
            return new String(canonXmlBytes);
        }
    
        public static String prettyFormat(String input) throws TransformerException, ParserConfigurationException, IOException, SAXException, InstantiationException, IllegalAccessException, ClassNotFoundException {
            InputSource src = new InputSource(new StringReader(input));
            Element document = DocumentBuilderFactory.newInstance().newDocumentBuilder().parse(src).getDocumentElement();
            Boolean keepDeclaration = input.startsWith("<?xml");
            DOMImplementationRegistry registry = DOMImplementationRegistry.newInstance();
            DOMImplementationLS impl = (DOMImplementationLS) registry.getDOMImplementation("LS");
            LSSerializer writer = impl.createLSSerializer();
            writer.getDomConfig().setParameter("format-pretty-print", Boolean.TRUE);
            writer.getDomConfig().setParameter("xml-declaration", keepDeclaration);
            return writer.writeToString(document);
        }
    }
    

    It is a good tool to use in your unit tests for full-string xml comparison.

    private void assertXMLEqual(String expected, String actual) throws ParserConfigurationException, IOException, SAXException, CanonicalizationException, InvalidCanonicalizerException, TransformerException, IllegalAccessException, ClassNotFoundException, InstantiationException {
        String canonicalExpected = prettyFormat(toCanonicalXml(expected));
        String canonicalActual = prettyFormat(toCanonicalXml(actual));
        assertEquals(canonicalExpected, canonicalActual);
    }
    
    0 讨论(0)
  • 2020-11-22 02:54

    As an alternative to the answers from max, codeskraps, David Easley and milosmns, have a look at my lightweight, high-performance pretty-printer library: xml-formatter

    // construct lightweight, threadsafe, instance
    PrettyPrinter prettyPrinter = PrettyPrinterBuilder.newPrettyPrinter().build();
    
    StringBuilder buffer = new StringBuilder();
    String xml = ..; // also works with char[] or Reader
    
    if(prettyPrinter.process(xml, buffer)) {
         // valid XML, print buffer
    } else {
         // invalid XML, print xml
    }
    

    Sometimes, like when running mocked SOAP services directly from file, it is good to have a pretty-printer which also handles already pretty-printed XML:

    PrettyPrinter prettyPrinter = PrettyPrinterBuilder.newPrettyPrinter().ignoreWhitespace().build();
    

    As some have commented, pretty-printing is just a way of presenting XML in a more human-readable form - whitespace strictly does not belong in your XML data.

    The library is intended for pretty-printing for logging purposes, and also includes functions for filtering (subtree removal / anonymization) and pretty-printing of XML in CDATA and Text nodes.

    0 讨论(0)
  • 2020-11-22 02:55

    Regarding comment that "you must first build a DOM tree": No, you need not and should not do that.

    Instead, create a StreamSource (new StreamSource(new StringReader(str)), and feed that to the identity transformer mentioned. That'll use SAX parser, and result will be much faster. Building an intermediate tree is pure overhead for this case. Otherwise the top-ranked answer is good.

    0 讨论(0)
  • 2020-11-22 02:55

    slightly improved version from milosmns...

    public static String getPrettyXml(String xml) {
        if (xml == null || xml.trim().length() == 0) return "";
    
        int stack = 0;
        StringBuilder pretty = new StringBuilder();
        String[] rows = xml.trim().replaceAll(">", ">\n").replaceAll("<", "\n<").split("\n");
    
        for (int i = 0; i < rows.length; i++) {
            if (rows[i] == null || rows[i].trim().length() == 0) continue;
    
            String row = rows[i].trim();
            if (row.startsWith("<?")) {
                pretty.append(row + "\n");
            } else if (row.startsWith("</")) {
                String indent = repeatString(--stack);
                pretty.append(indent + row + "\n");
            } else if (row.startsWith("<") && row.endsWith("/>") == false) {
                String indent = repeatString(stack++);
                pretty.append(indent + row + "\n");
                if (row.endsWith("]]>")) stack--;
            } else {
                String indent = repeatString(stack);
                pretty.append(indent + row + "\n");
            }
        }
    
        return pretty.toString().trim();
    }
    
    private static String repeatString(int stack) {
         StringBuilder indent = new StringBuilder();
         for (int i = 0; i < stack; i++) {
            indent.append(" ");
         }
         return indent.toString();
    } 
    
    0 讨论(0)
  • 2020-11-22 02:56

    Kevin Hakanson said: "However, if you know your XML string is valid, and you don't want to incur the memory overhead of parsing a string into a DOM, then running a transform over the DOM to get a string back - you could just do some old fashioned character by character parsing. Insert a newline and spaces after every characters, keep and indent counter (to determine the number of spaces) that you increment for every <...> and decrement for every you see."

    Agreed. Such an approach is much faster and has far fewer dependencies.

    Example solution:

    /**
     * XML utils, including formatting.
     */
    public class XmlUtils
    {
      private static XmlFormatter formatter = new XmlFormatter(2, 80);
    
      public static String formatXml(String s)
      {
        return formatter.format(s, 0);
      }
    
      public static String formatXml(String s, int initialIndent)
      {
        return formatter.format(s, initialIndent);
      }
    
      private static class XmlFormatter
      {
        private int indentNumChars;
        private int lineLength;
        private boolean singleLine;
    
        public XmlFormatter(int indentNumChars, int lineLength)
        {
          this.indentNumChars = indentNumChars;
          this.lineLength = lineLength;
        }
    
        public synchronized String format(String s, int initialIndent)
        {
          int indent = initialIndent;
          StringBuilder sb = new StringBuilder();
          for (int i = 0; i < s.length(); i++)
          {
            char currentChar = s.charAt(i);
            if (currentChar == '<')
            {
              char nextChar = s.charAt(i + 1);
              if (nextChar == '/')
                indent -= indentNumChars;
              if (!singleLine)   // Don't indent before closing element if we're creating opening and closing elements on a single line.
                sb.append(buildWhitespace(indent));
              if (nextChar != '?' && nextChar != '!' && nextChar != '/')
                indent += indentNumChars;
              singleLine = false;  // Reset flag.
            }
            sb.append(currentChar);
            if (currentChar == '>')
            {
              if (s.charAt(i - 1) == '/')
              {
                indent -= indentNumChars;
                sb.append("\n");
              }
              else
              {
                int nextStartElementPos = s.indexOf('<', i);
                if (nextStartElementPos > i + 1)
                {
                  String textBetweenElements = s.substring(i + 1, nextStartElementPos);
    
                  // If the space between elements is solely newlines, let them through to preserve additional newlines in source document.
                  if (textBetweenElements.replaceAll("\n", "").length() == 0)
                  {
                    sb.append(textBetweenElements + "\n");
                  }
                  // Put tags and text on a single line if the text is short.
                  else if (textBetweenElements.length() <= lineLength * 0.5)
                  {
                    sb.append(textBetweenElements);
                    singleLine = true;
                  }
                  // For larger amounts of text, wrap lines to a maximum line length.
                  else
                  {
                    sb.append("\n" + lineWrap(textBetweenElements, lineLength, indent, null) + "\n");
                  }
                  i = nextStartElementPos - 1;
                }
                else
                {
                  sb.append("\n");
                }
              }
            }
          }
          return sb.toString();
        }
      }
    
      private static String buildWhitespace(int numChars)
      {
        StringBuilder sb = new StringBuilder();
        for (int i = 0; i < numChars; i++)
          sb.append(" ");
        return sb.toString();
      }
    
      /**
       * Wraps the supplied text to the specified line length.
       * @lineLength the maximum length of each line in the returned string (not including indent if specified).
       * @indent optional number of whitespace characters to prepend to each line before the text.
       * @linePrefix optional string to append to the indent (before the text).
       * @returns the supplied text wrapped so that no line exceeds the specified line length + indent, optionally with
       * indent and prefix applied to each line.
       */
      private static String lineWrap(String s, int lineLength, Integer indent, String linePrefix)
      {
        if (s == null)
          return null;
    
        StringBuilder sb = new StringBuilder();
        int lineStartPos = 0;
        int lineEndPos;
        boolean firstLine = true;
        while(lineStartPos < s.length())
        {
          if (!firstLine)
            sb.append("\n");
          else
            firstLine = false;
    
          if (lineStartPos + lineLength > s.length())
            lineEndPos = s.length() - 1;
          else
          {
            lineEndPos = lineStartPos + lineLength - 1;
            while (lineEndPos > lineStartPos && (s.charAt(lineEndPos) != ' ' && s.charAt(lineEndPos) != '\t'))
              lineEndPos--;
          }
          sb.append(buildWhitespace(indent));
          if (linePrefix != null)
            sb.append(linePrefix);
    
          sb.append(s.substring(lineStartPos, lineEndPos + 1));
          lineStartPos = lineEndPos + 1;
        }
        return sb.toString();
      }
    
      // other utils removed for brevity
    }
    
    0 讨论(0)
提交回复
热议问题