How to Validate and Format XML Online
XML remains widely used for configuration files, data interchange, document storage, SOAP web services, and enterprise application integration. Despite the rise of JSON and YAML, XML continues to power critical infrastructure — from Microsoft Office file formats (DOCX, XLSX) to Android app layouts, SVG graphics, RSS feeds, and financial data exchange standards. Validating and formatting XML ensures correctness, consistency, and readability, which is essential for maintaining complex XML-based systems.
Understanding XML Validation
XML validation is the process of checking whether an XML document conforms to a predefined set of rules. These rules define the expected structure, element names, attribute names, data types, occurrence constraints, and allowed values. Validation can catch errors early in the development process, prevent data corruption, and ensure that different systems can exchange XML data reliably.
Types of Validation
There are two levels of XML validation:
-
Well-formedness checking: Verifies that the XML document follows basic XML syntax rules — every opening tag has a matching closing tag, elements nest properly, attribute values are quoted, and there is exactly one root element. This is the minimal level of validation and is performed by any XML parser.
-
Schema validation: Verifies that the document conforms to a specific schema — a separate document that defines the vocabulary and grammar for a class of XML documents. Schema validation is optional but highly recommended for production systems.
XML Validation Methods
| Method | Schema Type | Use Case |
|---|---|---|
| DTD | Document Type Definition | Legacy systems, SGML compatibility |
| XSD 1.0 | XML Schema Definition | Most common, widely supported |
| XSD 1.1 | XML Schema 1.1 | Complex validation rules, assertions |
| RELAX NG | Compact or XML syntax | Simpler, more readable schemas |
| Schematron | Rule-based assertions | Business rules, cross-field validation |
DTD (Document Type Definition)
DTD is the oldest schema language for XML, inherited from SGML. It defines element types, attribute lists, entities, and notations. DTD syntax is compact but limited — it has no support for data types (everything is text), no namespace awareness, and limited content modeling. DTD uses notations like (#PCDATA) for text content, (child1, child2) for sequences, and (option1 | option2) for choices.
Despite its limitations, DTD is still used in legacy systems and some XML-based document formats. Its main advantage is universal support — every XML parser supports DTD validation.
XSD (XML Schema Definition)
XSD is the most widely used XML schema language. It was developed by the W3C to address DTD's limitations, adding data type support, namespace integration, and a richer set of constraints. XSD schemas are themselves XML documents, making them self-describing and processable by standard XML tools.
XSD 1.0 features include:
- Data types: string, integer, decimal, date, time, boolean, and many others
- Type derivation: Create complex types from simple types or other complex types
- Occurrence constraints: minOccurs and maxOccurs for elements
- Facets: Pattern (regex), enumeration, min/max values, length constraints
- Uniqueness and key constraints: Enforce unique values across elements
- Namespace support: Associate schemas with specific XML namespaces
XSD 1.1 added:
- Assertions: XPath-based conditions that must be true
- Conditional type assignment: Change type based on attribute values
- Open content: Allow unexpected elements in specific positions
RELAX NG
RELAX NG is a simpler, more readable alternative to XSD. It uses a concise syntax (either compact .rnc or XML .rng) that is easier for humans to write and understand. RELAX NG has full namespace support, non-deterministic content models, and clean interleaving support. It is less expressive than XSD for data types but simpler for structure validation.
Schematron
Schematron is a rule-based validation language that uses XPath expressions to define assertions about an XML document. It is ideal for expressing business rules that cannot be captured by XSD or RELAX NG alone. For example, "if the type attribute is 'shipping', then the address element must contain a country child" — a cross-field constraint that Schematron handles naturally.
XML Validation Rules Comparison
| Rule | DTD | XSD | Description |
|---|---|---|---|
| Required elements | Yes | Yes | Element must be present |
| Data types | No | Yes | String, number, date, boolean, etc. |
| Value ranges | No | Yes | min/max inclusive/exclusive constraints |
| Patterns | No | Yes | Regular expression patterns |
| Enumeration | No | Yes | List of allowed values |
| Occurrence | Yes | Yes | minOccurs/maxOccurs |
| Uniqueness | No | Yes | Unique constraints across elements |
| Namespaces | Basic | Full | Schema association with namespaces |
| Conditional rules | No | Yes (XSD 1.1) | Assertions and conditional types |
| Cross-field validation | No | Partial | Schematron for complex rules |
XML Formatting
Formatting (also called pretty printing or beautifying) adds proper indentation and line breaks to make XML documents human-readable. This is essential for debugging, code reviews, documentation, and any scenario where people need to inspect XML data.
<!-- Before (minified) -->
<root><item id="1"><name>Test</name><price>29.99</price></item><item id="2"><name>Sample</name><price>49.99</price></item></root>
<!-- After (formatted with 2-space indentation) -->
<root>
<item id="1">
<name>Test</name>
<price>29.99</price>
</item>
<item id="2">
<name>Sample</name>
<price>49.99</price>
</item>
</root>
Formatting Options
Most XML formatters offer these configurable options:
- Indentation size: 2 spaces, 4 spaces, or tabs
- Attribute ordering: Sort attributes alphabetically or preserve original order
- Newline style: LF (Unix), CRLF (Windows), or CR (classic Mac)
- Empty element formatting:
<element/>vs<element></element> - CDATA preservation: Keep or expand CDATA sections
- Encoding declaration: Add or preserve
<?xml version="1.0" encoding="UTF-8"?>
Tool Comparison
| Tool | Format | Validate | XSD Check | Schema Support | Batch | Platform |
|---|---|---|---|---|---|---|
| Help2Code XML Formatter | Yes | Yes | Yes | XSD 1.0 | No | Online |
| xmllint | Yes | Yes | Yes | XSD, DTD, RELAX NG | Yes | CLI |
| XMLStarlet | Yes | Yes | Yes | XSD, DTD, RELAX NG | Yes | CLI |
| VS Code XML extension | Yes | Yes | Yes | XSD, DTD | No | Editor |
| Oxygen XML Editor | Yes | Yes | Yes | XSD, DTD, RELAX NG, Schematron | Yes | Desktop |
| Notepad++ XML Tools | Yes | Yes | No | DTD only | No | Desktop |
| IntelliJ IDEA | Yes | Yes | Yes | XSD, DTD | No | IDE |
Using xmllint (Command Line)
xmllint is a powerful command-line tool for XML processing, available on most Unix-like systems:
# Check well-formedness
xmllint document.xml
# Validate against DTD
xmllint --valid document.xml
# Validate against XSD schema
xmllint --schema schema.xsd document.xml
# Format XML
xmllint --format document.xml
# Format and save to file
xmllint --format document.xml > formatted.xml
# Output only validation errors
xmllint --noout --schema schema.xsd document.xml
Using XMLStarlet
XMLStarlet is another versatile command-line tool for XML manipulation:
# Validate against XSD
xmlstarlet val --xsd schema.xsd document.xml
# Format XML
xmlstarlet fo document.xml
# Format with 4-space indentation
xmlstarlet fo -t document.xml
# Check well-formedness
xmlstarlet val document.xml
Common XML Validation Errors
Understanding common validation errors helps you fix issues faster:
| Error | Cause | Fix |
|---|---|---|
| "Opening and ending tag mismatch" | Element not properly closed | Add closing tag or fix nesting |
| "Document is empty" | No root element | Add a root element wrapping all content |
| "Attribute value must be quoted" | Attribute value missing quotes | Add double or single quotes around values |
| "Invalid character found" | Control characters or invalid Unicode | Remove non-XML characters |
| "Element type must be followed by attribute specifications" | Attribute syntax error | Fix attribute list for the element |
| "Content is not allowed in prolog" | Text before XML declaration | Remove text before <?xml ...?> |
| "Schema validation error: Element '...' is not valid" | Value or structure violates schema | Check schema constraints on the element |
Free Online Tools
The Help2Code XML Formatter tool provides a comprehensive set of XML processing features in your browser:
- Format XML: Add proper indentation and line breaks for readability
- Validate XML: Check well-formedness and identify syntax errors with line-specific error messages
- Validate against XSD: Upload an XSD schema and validate your XML against it
- Minify XML: Remove all whitespace to produce compact output for transmission
- Tree view: Navigate complex XML documents with an interactive collapsible tree
- Download: Save the formatted or validated output as a file
All processing happens client-side in your browser, meaning no XML data is sent to any server. This is important for sensitive XML documents containing proprietary or personal information.
Best Practices for XML Management
- Always validate XML against a schema in production systems. Schema validation catches structural errors that can cause silent data corruption.
- Version your schemas. As your XML format evolves, maintain backward compatibility and document changes.
- Use XML namespaces to avoid element name conflicts when combining data from multiple sources.
- Keep XML well-structured — avoid deep nesting (more than 5-6 levels), which makes documents hard to read and process.
- Use consistent formatting across your team. Agree on indentation style, attribute ordering, and line width.
- Avoid mixed content (text mixed with child elements) unless absolutely necessary — it complicates processing.
- Consider CDATA for large text blocks containing special characters like
<and&that would otherwise need escaping. - Use entity references for common special characters:
&for&,<for<,>for>,"for",'for'.
Conclusion
XML validation and formatting are essential skills for working with XML-based systems. Whether you are debugging a configuration file, developing a SOAP API, processing RSS feeds, or creating document templates, understanding how to validate against schemas and format for readability will save time and prevent errors. Use the Help2Code XML Formatter tool for quick online validation and formatting, and keep command-line tools like xmllint in your toolbox for automated and batch processing tasks.
Use the XML Formatter tool on Help2Code to validate, format, and beautify XML documents instantly in your browser.