Installation
Quick Take
API — rEntDecode()
The main function rEntDecode()
is imported like this:
It’s a function which takes two input arguments:
Input argument | Type | Obligatory | Description |
---|---|---|---|
input Type: String Obligatory: yes | |||
input | String | yes | HTML source |
opts Type: Plain object Obligatory: no | |||
opts | Plain object | no | The Optional Options Object |
The Optional Options Object has the following shape:
The Optional Options Object completely matches the he.js options as of v1.1.1
:
Key | Type | Default | Description |
---|---|---|---|
isAttributeValue Type: Boolean Default: false | |||
isAttributeValue | Boolean | false | If on, entities will be decoded as if they were in attribute values. If off (default), entities will be decoded as if they were in HTML text. Read more here. |
strict Type: Boolean Default: false | |||
strict | Boolean | false | If on, entities that can cause parsing errors will cause throw s. Read more here. |
Here are all defaults in one place for copying:
Function will return ranges — a null
or array of one or more range arrays:
API — defaults
You can import defaults
:
It's a plain object:
The main function calculates the options to be used by merging the options you passed with these defaults.
API — version
You can import version
:
More on the algorithm
The biggest pain to code and the main USP of this library is being able to recursively decode and give the result as ranges.
By recursively, we mean, the input string is decoded over and over until there’s no difference in the result between previous and last decoding. Practically, this means we can tackle the unlikely, but possible cases of double and triple encoded strings, for example, this is a double-encoded string: —
. The original m-dash was turned into —
on the first encoding round; then during second round its ampersand got turned into &
which lead to —
.
By ranges we mean, the result is not a decoded string, but instructions — what to change in that string in order for the string to be decoded. Practically, this means, we decode and don’t lose the original character indexes. In turn, this means, we can gather more “instructions” (ranges) and join them later.
Where’s encode?
If you wonder, where’s encode()
in ranges, we don’t need it! When you traverse the string and gather ranges, you can pass each code point grapheme (where emoji of length six should be counted “one”) through he.js
encode, compare “before” and “after” and if the two are different, create a new range for it.
The decode()
is not that simple because the input string has to be processed, you can’t iterate grapheme-by-grapheme (or character-by-character, if you don’t care about Unicode’s astral characters).