ranges-ent-decode open source npm package

Installation

Choose the installation type:

Quick Take

API — `rEntDecode()`

The main function rEntDecode() is imported like this:

It’s a function which takes two input arguments:

Input argument	Type	Obligatory	Description
	`input` Type: String Obligatory: yes
`input`	String	yes	HTML source

	`opts` Type: Plain object Obligatory: no
`opts`	Plain object	no	The Optional Options Object

The Optional Options Object has the following shape:

The Optional Options Object completely matches the he.js options as of v1.1.1:

Key	Type	Default	Description
	`isAttributeValue` Type: Boolean Default: `false`
`isAttributeValue`	Boolean	`false`	If on, entities will be decoded as if they were in attribute values. If off (default), entities will be decoded as if they were in HTML text. Read more here.

	`strict` Type: Boolean Default: `false`
`strict`	Boolean	`false`	If on, entities that can cause parsing errors will cause `throw`s. Read more here.

Here are all defaults in one place for copying:

Function will return ranges — a null or array of one or more range arrays:

API — `defaults`

You can import defaults:

It's a plain object:

The main function calculates the options to be used by merging the options you passed with these defaults.

API — `version`

You can import version:

More on the algorithm

The biggest pain to code and the main USP of this library is being able to recursively decode and give the result as ranges.

By recursively, we mean, the input string is decoded over and over until there’s no difference in the result between previous and last decoding. Practically, this means we can tackle the unlikely, but possible cases of double and triple encoded strings, for example, this is a double-encoded string: &mdash;. The original m-dash was turned into — on the first encoding round; then during second round its ampersand got turned into & which lead to &mdash;.

By ranges we mean, the result is not a decoded string, but instructions — what to change in that string in order for the string to be decoded. Practically, this means, we decode and don’t lose the original character indexes. In turn, this means, we can gather more “instructions” (ranges) and join them later.

Where’s encode?

If you wonder, where’s encode() in ranges, we don’t need it! When you traverse the string and gather ranges, you can pass each ~~code point~~ grapheme (where emoji of length six should be counted “one”) through he.js encode, compare “before” and “after” and if the two are different, create a new range for it.

The decode() is not that simple because the input string has to be processed, you can’t iterate grapheme-by-grapheme (or character-by-character, if you don’t care about Unicode’s astral characters).

Changelog

Open Changelog

ranges-ent-decode6.0.3

Installation

Quick Take

API — rEntDecode()

API — defaults

API — version