How to Beautify XML Automatically
Beautifying (pretty printing) XML adds proper indentation and line breaks, making XML documents human-readable. XML is widely used for configuration files, data exchange between systems, document storage, and web services (SOAP, RSS, SVG). However, XML can quickly become unreadable when minified, machine-generated, or deeply nested. A beautified XML document with consistent indentation, line breaks, and spacing makes debugging faster, collaboration easier, and reduces the likelihood of syntax errors in manual edits.
Why Beautify XML?
Raw or minified XML can be extremely difficult to read and debug. Consider a minified XML document that contains hundreds of elements nested dozens of levels deep — without proper formatting, finding a specific element or attribute is nearly impossible. Beautifying XML provides several key benefits. It improves readability by converting a dense block of text into a structured, hierarchical view that mirrors the document's logical structure. It simplifies debugging by making missing closing tags, mismatched elements, and structural issues immediately visible. It facilitates version control diffs because properly formatted XML produces meaningful diffs that show actual content changes rather than noise from inconsistent formatting. It enforces team coding standards by establishing a consistent format that all team members can follow. It also helps with automated validation because many XML validators expect properly formatted input and produce more accurate error messages with line numbers pointing to beautified content.
Understanding XML Formatting Rules
Proper XML beautification follows specific formatting conventions, though some details vary by preference and tool. The standard rules include indenting nested elements with consistent whitespace (typically 2 or 4 spaces per level), placing each element on its own line, keeping opening and closing tags on separate lines when the element contains child elements, and placing inline content on the same line as the opening and closing tags for simple elements. Self-closing tags like <br/> should remain on a single line. Attributes can be kept on one line or split across multiple lines for documents with many attributes per element. Comments and processing instructions should maintain their position relative to surrounding elements while receiving appropriate indentation.
Using an Online XML Beautifier
The quickest way to beautify XML is with an online tool. The XML Formatter tool on Help2Code beautifies XML instantly with syntax highlighting and validation. You simply paste your XML into the input area, click a button, and receive perfectly formatted XML with collapsible sections, line numbers, and syntax coloring. Many online formatters also provide options to control indentation size, line width, attribute formatting, and character encoding. This is ideal for one-off formatting tasks, debugging XML snippets, or users who do not want to install software. Some tools also offer minification, validation against DTD or XSD schemas, and conversion between XML and JSON formats.
Beautifying XML in VS Code
Visual Studio Code provides excellent XML formatting capabilities, especially with the right extensions installed. The most popular XML extension for VS Code is Red Hat's XML extension, which provides comprehensive XML support including formatting, validation, schema-based autocompletion, and XPath evaluation:
- Install the XML extension by Red Hat from the VS Code marketplace
- Open your XML file
- Right-click inside the editor and select
Format Document - Or use the keyboard shortcut
Shift+Alt+F
VS Code also supports formatting on save, which automatically beautifies your XML whenever you save the file. You can configure this by adding "editor.formatOnSave": true to your settings. The extension supports both .xml files and other XML-based formats like .xsd, .xslt, .rss, .svg, and .xhtml. You can customize formatting settings such as indentation size, split attributes, and maximum line width through VS Code settings.
Command Line with xmllint
For automation and scripting, command-line tools provide the most efficient XML beautification. xmllint is part of the libxml2 library and is available on Linux, macOS, and Windows (via WSL or Cygwin):
xmllint --format input.xml
This command reads input.xml and outputs the formatted version to stdout. You can redirect the output to a file to overwrite the original or create a new file:
xmllint --format input.xml > formatted.xml
xmllint offers additional useful options for XML processing. The --nsclean option removes redundant namespace declarations. The --valid option validates the document against its DTD during formatting. The --encode option specifies the output encoding. For large batches, you can combine xmllint with find to format all XML files in a directory at once:
find . -name "*.xml" -exec xmllint --format {} --output {} \;
Note that some versions of xmllint have issues with very large files or certain XML constructs. In those cases, xmlstarlet is a robust alternative that offers similar formatting capabilities along with XPath querying and XML editing.
Beautifying XML in JavaScript
For web-based applications and build tools, JavaScript provides several ways to format XML. The XSLTProcessor approach uses XSLT stylesheets to perform the formatting:
function beautifyXml(xml) {
const xsltDoc = new DOMParser().parseFromString(
'<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"><xsl:output method="xml" indent="yes"/><xsl:template match="@*|node()"><xsl:copy><xsl:apply-templates select="@*|node()"/></xsl:copy></xsl:template></xsl:stylesheet>',
'text/xml'
);
const xslt = new XSLTProcessor();
xslt.importStylesheet(xsltDoc);
const result = xslt.transformToDocument(new DOMParser().parseFromString(xml, 'text/xml'));
return new XMLSerializer().serializeToString(result);
}
For Node.js applications, libraries like xml2js and pretty-data provide simpler formatting functions:
const pd = require('pretty-data').pd;
const formattedXml = pd.xml('<root><child><grandchild>text</grandchild></child></root>');
console.log(formattedXml);
In modern browsers, you can also use the DOMParser and XMLSerializer directly without XSLT for basic formatting, though indentation control is limited.
Beautifying XML in PHP
PHP's DOM extension provides built-in XML formatting capabilities that are simple yet powerful:
$dom = new DOMDocument();
$dom->preserveWhiteSpace = false;
$dom->formatOutput = true;
$dom->loadXML($xmlString);
echo $dom->saveXML();
The preserveWhiteSpace property must be set to false before loading the XML to strip existing whitespace, and formatOutput must be true to enable pretty printing. This approach works with both XML strings and files:
$dom = new DOMDocument();
$dom->preserveWhiteSpace = false;
$dom->formatOutput = true;
$dom->load('input.xml');
$dom->save('formatted.xml');
PHP's DOM extension also supports loading HTML, validating against DTD and schemas, XPath queries, and modifying the document structure before saving, making it suitable for comprehensive XML processing pipelines.
Beautifying XML in Python
Python developers have several options for XML formatting. The built-in xml.dom.minidom provides a simple approach:
import xml.dom.minidom
def beautify_xml(xml_string):
dom = xml.dom.minidom.parseString(xml_string)
return dom.toprettyxml(indent=' ')
For more robust formatting, the lxml library offers better performance and more options:
from lxml import etree
def beautify_xml_lxml(xml_string):
root = etree.fromstring(xml_string)
return etree.tostring(root, pretty_print=True, encoding='unicode')
The lxml library also supports XSLT transformations, schema validation, and efficient handling of large XML files.
Integrating XML Beautification into Build Pipelines
For development teams, integrating XML formatting into the build process ensures consistency across the codebase. Pre-commit hooks can automatically format XML files before they are committed to version control. CI/CD pipelines can validate that all XML files are properly formatted as part of the build process. Build tools like Grunt, Gulp, and Webpack have plugins for XML formatting that run as part of the asset pipeline. For Java projects, Maven and Gradle offer XML formatting plugins that can be configured to run during the build lifecycle. This automation eliminates manual formatting inconsistencies and reduces code review friction related to formatting issues.
Common XML Formatting Issues
When beautifying XML, be aware of potential pitfalls. CDATA sections should be preserved as-is — reformatting content inside CDATA blocks can break the document. Mixed content (elements with both text and child elements) requires careful handling to avoid creating whitespace-only text nodes. Namespace declarations may shift position during formatting, though the semantics remain the same. Very long attribute values or deeply nested elements may need special handling to remain readable. Some formatters strip comments by default, so check your tool's settings to preserve them if needed. XML declarations and DOCTYPE declarations should remain at the top of the document and maintain their original formatting.
Conclusion
Automated XML beautification is an essential practice for anyone working with XML documents. Whether you format XML occasionally for debugging or process thousands of files in an automated pipeline, having the right tool makes a significant difference in productivity. Online tools like the XML Formatter on Help2Code are perfect for quick formatting tasks, while command-line tools like xmllint and code libraries in JavaScript, PHP, and Python provide the flexibility needed for automated workflows. Choose the approach that best fits your workflow, and make XML formatting a regular part of your development process to keep your code clean, consistent, and maintainable.