XML Escape Characters: Complete Reference
XML uses specific characters as markup delimiters. If your text contains these characters, you must escape them to prevent the XML parser from misinterpreting them as markup. Unlike HTML, which has hundreds of named entities, XML defines only five predefined entities. For all other characters, you use numeric character references.
The Five Predefined XML Entities
| Character | Entity | Numeric | Description |
|---|---|---|---|
< |
< |
< |
Less than (starts a tag) |
> |
> |
> |
Greater than (ends a tag) |
& |
& |
& |
Ampersand (starts an entity) |
' |
' |
' |
Apostrophe / single quote |
" |
" |
" |
Double quote |
These five entities cover the characters that have special meaning in XML markup. Unlike HTML, XML also defines ' as a standard entity. Every XML parser must recognise these five entities.
When Each Character Must Be Escaped
The < character must always be escaped in text content and attribute values. The parser treats any < as the start of a new tag.
The & character must always be escaped unless it starts a valid entity reference. Unescaped & characters cause XML parsing errors.
The > character technically only needs escaping when it appears as part of the sequence ]]>, which would terminate a CDATA section. However, many developers escape it for consistency.
The " character must be escaped in double-quoted attribute values. In single-quoted attributes, ' must be escaped instead.
CDATA Sections
When your text contains many characters that would need escaping, use a CDATA section instead. CDATA tells the parser to treat the content as character data, not markup.
<example>
<![CDATA[
if (x < y && y > z) {
console.log("x < y && y > z");
}
]]>
</example>
Everything between <![CDATA[ and ]]> is treated as literal text. The only sequence not allowed inside CDATA is ]]> itself. CDATA is ideal for embedding code samples, JSON strings, or any text with many special characters.
Numeric Character References
For characters beyond the five predefined entities, use numeric references. Decimal format uses &# followed by the Unicode code point and a semicolon. Hexadecimal format uses &#x.
Decimal: © → ©
Hex: © → ©
Decimal: € → €
Hex: € → €
Escaping in Code
PHP
// Using DOMDocument
$doc = new DOMDocument();
$element = $doc->createElement('message');
$element->appendChild($doc->createTextNode('AT&T <special>'));
$doc->appendChild($element);
echo $doc->saveXML();
// Output: <message>AT&T <special></message>
// Using htmlspecialchars (with ENT_XML1 flag)
$text = 'AT&T <special>';
echo htmlspecialchars($text, ENT_XML1, 'UTF-8');
Python
import xml.sax.saxutils as saxutils
text = 'AT&T <special>'
escaped = saxutils.escape(text)
print(escaped)
# Output: AT&T <special>
# With additional entities
escaped = saxutils.escape(text, {'"': '"'})
JavaScript (Node.js)
function escapeXml(str) {
return str
.replace(/&/g, '&')
.replace(/</g, '<')
.replace(/>/g, '>')
.replace(/"/g, '"')
.replace(/'/g, ''');
}
console.log(escapeXml('AT&T <special>'));
// Output: AT&T <special>
Java
import org.apache.commons.text.StringEscapeUtils;
String text = "AT&T <special>";
String escaped = StringEscapeUtils.escapeXml11(text);
System.out.println(escaped);
// Output: AT&T <special>
XML vs HTML Escaping
XML is stricter than HTML. In HTML, certain entities like © are recognised by all browsers. In XML, only the five predefined entities are guaranteed. All other named entities must be declared in a DTD (Document Type Definition) or use numeric references.
HTML parsers are also more forgiving. An unescaped & in HTML might render correctly if it is followed by something that does not look like an entity. XML parsers must reject malformed input with an error.
Common Mistakes
Forgetting to escape &. This is the most common XML escaping error. If you write <text>AT&T</text>, the parser sees &T; as an undefined entity and throws an error.
Using HTML entities in XML. Entities like ©, —, and are not defined in standard XML. Use their numeric equivalents ©, —, and   instead.
Over-escaping. If you escape content that is already inside a CDATA section, the entities appear literally. CDATA content should not be escaped.
Online Tool
The XML Escape & Unescape tool on Help2Code converts text to XML-safe format and back. It handles the five predefined entities and numeric character references.
Conclusion
XML escaping is straightforward once you remember the five predefined entities and understand when to use CDATA sections. Use your language's XML library for automatic escaping in production code, and use the XML Escape & Unescape tool for quick conversions.