How to Encode and Decode HTML Entities
Encoding and decoding HTML entities is a routine task for any web developer. You encode when you need to display special characters safely in HTML, and you decode when you need to process HTML content back into plain text. Every programming language provides built-in functions for both operations, but the exact function names and behaviours vary.
When to Encode vs Decode
Encoding converts special characters to their entity references. You encode data before inserting it into HTML to prevent XSS attacks and ensure correct rendering. Decoding converts entity references back to characters. You decode when extracting text from HTML content, processing RSS feeds, or reading HTML email bodies.
The critical rule: encode when outputting to HTML, never when storing in the database. Store raw text, encode at render time.
PHP
PHP provides two core functions and a dedicated extension for HTML entity handling.
// Encode special characters (HTML5)
$text = 'AT&T "special" <offer>';
echo htmlspecialchars($text, ENT_QUOTES | ENT_HTML5, 'UTF-8');
// Output: AT&T "special" <offer>
// Encode ALL entities (including accented characters)
echo htmlentities($text, ENT_QUOTES | ENT_HTML5, 'UTF-8');
// Decode entities back to characters
$encoded = 'AT&T "special"';
echo htmlspecialchars_decode($encoded, ENT_QUOTES);
// Output: AT&T "special"
echo html_entity_decode('& < > "', ENT_QUOTES, 'UTF-8');
// Output: & < > "
Use htmlspecialchars for most cases — it handles the five essential characters. Use htmlentities only when you need to encode all characters with HTML entity equivalents, such as accented letters.
JavaScript (Browser)
In the browser, the simplest approach is to use the DOM API.
// Encode
function encodeHtml(str) {
const div = document.createElement('div');
div.appendChild(document.createTextNode(str));
return div.innerHTML;
}
console.log(encodeHtml('AT&T "special" <offer>'));
// Output: AT&T "special" <offer>
// Decode
function decodeHtml(str) {
const div = document.createElement('div');
div.innerHTML = str;
return div.textContent;
}
console.log(decodeHtml('AT&T "special"'));
// Output: AT&T "special"
JavaScript (Node.js)
Node.js does not have a DOM, so you need to use the he package or the entities package.
const he = require('he');
// Encode
console.log(he.encode('AT&T "special" <offer>'));
// Output: AT&T "special" <offer>
// Decode
console.log(he.decode('AT&T "special"'));
// Output: AT&T "special"
Python
Python's html module handles both encoding and decoding.
import html
# Encode
text = 'AT&T "special" <offer>'
safe = html.escape(text, quote=True)
print(safe)
# Output: AT&T "special" <offer>
# Decode
encoded = 'AT&T "special"'
decoded = html.unescape(encoded)
print(decoded)
# Output: AT&T "special"
Ruby
Ruby's CGI module and ERB::Util provide encoding functions.
require 'cgi'
require 'erb'
# Encode with CGI
encoded = CGI.escapeHTML('AT&T "special" <offer>')
puts encoded
# Output: AT&T "special" <offer>
# Encode with ERB::Util
encoded = ERB::Util.html_escape('AT&T "special" <offer>')
puts encoded
# Decode
decoded = CGI.unescapeHTML('AT&T "special"')
puts decoded
# Output: AT&T "special"
Java
Java provides StringEscapeUtils from the Apache Commons Text library.
import org.apache.commons.text.StringEscapeUtils;
public class HtmlEntities {
public static void main(String[] args) {
String text = "AT&T \"special\" <offer>";
// Encode
String encoded = StringEscapeUtils.escapeHtml4(text);
System.out.println(encoded);
// Output: AT&T "special" <offer>
// Decode
String decoded = StringEscapeUtils.unescapeHtml4(encoded);
System.out.println(decoded);
// Output: AT&T "special"
}
}
Online Tool
For quick one-off conversions, use the HTML Entity Encoder & Decoder tool. Paste your text, choose encode or decode, and copy the result. It handles all named and numeric entities.
Common Pitfalls
Double encoding occurs when you encode already-encoded text. If & is encoded again, it becomes &amp;. Always check whether your framework automatically encodes output before adding manual encoding.
Missing the quote flag in PHP. The ENT_QUOTES flag encodes both single and double quotes. Without it, single quotes remain unencoded, which can break JavaScript contexts.
Encoding for the wrong context. HTML entity encoding is correct for HTML body and attribute contexts but wrong for URLs, JavaScript, and CSS. Use URL encoding, JavaScript escaping, or CSS escaping respectively.
Conclusion
HTML entity encoding and decoding are well-supported across all major programming languages. Use your language's built-in functions for production code and an online tool for quick tasks. Always encode at render time, never at storage time.