public abstract class HtmlUtils
extends java.lang.Object
Escapes and unescapes based on the W3C HTML 4.01 recommendation, handling character entity references.
Reference: http://www.w3.org/TR/html4/charset.html
For a comprehensive set of String escaping utilities, consider
Apache Commons Text
and its StringEscapeUtils
class. We do not use that class here in order
to avoid a runtime dependency on Commons Text just for HTML escaping. Furthermore,
Spring's HTML escaping is more flexible and 100% HTML 4.0 compliant.
Modifier and Type | Field and Description |
---|---|
private static HtmlCharacterEntityReferences |
characterEntityReferences
Shared instance of pre-parsed HTML character entity references.
|
Constructor and Description |
---|
HtmlUtils() |
Modifier and Type | Method and Description |
---|---|
static java.lang.String |
htmlEscape(java.lang.String input)
Turn special characters into HTML character references.
|
static java.lang.String |
htmlEscape(java.lang.String input,
java.lang.String encoding)
Turn special characters into HTML character references.
|
static java.lang.String |
htmlEscapeDecimal(java.lang.String input)
Turn special characters into HTML character references.
|
static java.lang.String |
htmlEscapeDecimal(java.lang.String input,
java.lang.String encoding)
Turn special characters into HTML character references.
|
static java.lang.String |
htmlEscapeHex(java.lang.String input)
Turn special characters into HTML character references.
|
static java.lang.String |
htmlEscapeHex(java.lang.String input,
java.lang.String encoding)
Turn special characters into HTML character references.
|
static java.lang.String |
htmlUnescape(java.lang.String input)
Turn HTML character references into their plain text UNICODE equivalent.
|
private static final HtmlCharacterEntityReferences characterEntityReferences
public static java.lang.String htmlEscape(java.lang.String input)
Handles complete character set defined in HTML 4.01 recommendation.
Escapes all special characters to their corresponding
entity reference (e.g. <
).
input
- the (unescaped) input stringpublic static java.lang.String htmlEscape(java.lang.String input, java.lang.String encoding)
Handles complete character set defined in HTML 4.01 recommendation.
Escapes all special characters to their corresponding
entity reference (e.g. <
) at least as required by the
specified encoding. In other words, if a special character does
not have to be escaped for the given encoding, it may not be.
input
- the (unescaped) input stringencoding
- the name of a supported charset
public static java.lang.String htmlEscapeDecimal(java.lang.String input)
Handles complete character set defined in HTML 4.01 recommendation.
Escapes all special characters to their corresponding numeric reference in decimal format (Decimal;).
input
- the (unescaped) input stringpublic static java.lang.String htmlEscapeDecimal(java.lang.String input, java.lang.String encoding)
Handles complete character set defined in HTML 4.01 recommendation.
Escapes all special characters to their corresponding numeric reference in decimal format (Decimal;) at least as required by the specified encoding. In other words, if a special character does not have to be escaped for the given encoding, it may not be.
input
- the (unescaped) input stringencoding
- the name of a supported charset
public static java.lang.String htmlEscapeHex(java.lang.String input)
Handles complete character set defined in HTML 4.01 recommendation.
Escapes all special characters to their corresponding numeric reference in hex format (Hex;).
input
- the (unescaped) input stringpublic static java.lang.String htmlEscapeHex(java.lang.String input, java.lang.String encoding)
Handles complete character set defined in HTML 4.01 recommendation.
Escapes all special characters to their corresponding numeric reference in hex format (Hex;) at least as required by the specified encoding. In other words, if a special character does not have to be escaped for the given encoding, it may not be.
input
- the (unescaped) input stringencoding
- the name of a supported charset
public static java.lang.String htmlUnescape(java.lang.String input)
Handles complete character set defined in HTML 4.01 recommendation and all reference types (decimal, hex, and entity).
Correctly converts the following formats:
&#Entity; - (Example: &) case sensitive &#Decimal; - (Example: D)
&#xHex; - (Example: å) case insensitive
Gracefully handles malformed character references by copying original characters as is when encountered.
input
- the (escaped) input string