detergent open source npm package

Installation

Choose the installation type:

Quick Take

Purpose

detergent prepares text for copy-pasting into HTML, especially; the email-template HTML:

deletes invisible Unicode characters (like ETX)
collapses whitespace
trims
prevents widow words
recursively decodes entities and encodes it back, preferring named HTML entities over numeric-ones, switching to numeric for entities which don’t render correctly across common email clients
optionally strips HTML (with optional granular control over which tags exactly)
improves English grammar style: converts M- and N-dashes (spec), apostrophes (spec) and curly quotes (spec)

Extra features are:

You can skip the HTML encoding of non-Latin language letters. Useful when you are deploying Japanese or Chinese emails because otherwise, everything would be HTML-encoded.
Detergent is both XHTML and HTML-friendly. You can set which way you want your <br>’s to appear: with a closing slash, <br/> (XHTML) or without (HTML), <br> — that’s to reduce code validator errors.

API — `det()`

The main function det() is imported like this:

It takes two input arguments:

Input argument	Type	Obligatory	Description
	`str` Type: String Obligatory: yes
`str`	String	yes	The string to clean.

	`opts` Type: Object Obligatory: no
`opts`	Object	no	Optional Options Object.

The Optional Options Object has the following shape:

Key	Type	Default	Description
	`fixBrokenEntities` Type: Boolean Default: True
`fixBrokenEntities`	Boolean	True	should we try to fix any broken named HTML entities like `&nsp;` (“b” missing)

	`removeWidows` Type: Boolean Default: True
`removeWidows`	Boolean	True	replace the last space in paragraph with a non-breaking space

	`convertEntities` Type: Boolean Default: True
`convertEntities`	Boolean	True	encode all non-ASCII chars

	`convertDashes` Type: Boolean Default: True
`convertDashes`	Boolean	True	typographically-correct the n/m-dashes

	`convertApostrophes` Type: Boolean Default: True
`convertApostrophes`	Boolean	True	typographically-correct the apostrophes

	`replaceLineBreaks` Type: Boolean Default: True
`replaceLineBreaks`	Boolean	True	replace all line breaks with `br`’s

	`removeLineBreaks` Type: Boolean Default: False
`removeLineBreaks`	Boolean	False	put everything on one line (removes any line breaks, inserting space where necessary)

	`useXHTML` Type: Boolean Default: True
`useXHTML`	Boolean	True	add closing slashes on `br`’s

	`dontEncodeNonLatin` Type: Boolean Default: True
`dontEncodeNonLatin`	Boolean	True	skip non-latin character encoding (for example, CJK, Alefbet Ivri or Arabic abjad)

	`addMissingSpaces` Type: Boolean Default: True
`addMissingSpaces`	Boolean	True	adds missing spaces after dots/colons/semicolons, unless it’s an URL

	`convertDotsToEllipsis` Type: Boolean Default: True
`convertDotsToEllipsis`	Boolean	True	convert three dots into `…` — ellipsis character. When set to `false`, all encoded ellipses will be converted to three dots.

	`stripHtml` Type: Boolean Default: True
`stripHtml`	Boolean	True	by default, all HTML tags are stripped (with exception to `opts.keepBoldEtc` — option to ignore `b`, `strong` and other tags). You can turn off HTML tag removal completely here.

	`stripHtmlButIgnoreTags` Type: Array Default: `["b", "strong", "i", "em", "br", "sup"]`
`stripHtmlButIgnoreTags`	Array	`["b", "strong", "i", "em", "br", "sup"]`	List zero or more strings, each meaning a tag name that should not be stripped. For example, `["a", "sup"]`.

	`stripHtmlAddNewLine` Type: Array Default: `["li", "/ul"]`
`stripHtmlAddNewLine`	Array	`["li", "/ul"]`	List of zero or more tag names which, if stripped, are replaced with a line break. Closing tags must start with slash.

	`cb` Type: something falsy or a function Default: `null`
`cb`	something falsy or a function	`null`	Callback function to additionally process characters between tags (like turning letters uppercase)

Here are all defaults in one place for copying:

The default set is a wise choice for the most common scenario — preparing text to be pasted into HTML.

You can also set the options to numeric 0 or 1, that’s shorter than Boolean true or false.

Returns

Function returns a plain object (marked type Res above):

Key	Type	Description
	`res` Type: String
`res`	String	The cleaned string

	`applicableOpts` Type: Plain Object
`applicableOpts`	Plain Object	Copy of the options object without keys that have array values, each set to boolean, is that function applicable to given input

API — defaults

You can import defaults from opts:

It's a plain object:

The main function calculates the options to be used by merging the options you passed with these defaults.

We also use these for testing purposes, in test-mixer, to generate all possible combinations of this options object.

API — `version`

You can import version:

`applicableOpts`

Next generation web applications are designed to show only the options that are applicable to the given input. This saves user’s time and also conserves mental resources — you don’t even need to read all the labels of the options if they are not applicable.

At the moment, detergent currently has 14 option keys, 12 of them boolean. That’s not a lot but if you use the tool every day, every optimisation counts.

We got inspiration for this feature while visiting competitor application typograf — it has 110 checkboxes grouped into 12 groups and options are hidden twice — first sidebar is hidden when you visit the page, second, option groups are collapsed.

Another example of overwhelming options set — Kangax minifier — html-minifier — it’s got 26 options with lots of descriptions.

Detergent tackles this challenge differently. While it processes the given input, it makes a note, is particular option applicable or not. This is done independently from the actual options settings.

For example, detergent’s output might look like this — all options not applicable because there’s nothing to do on “abc”:

{
  res: "abc",
  applicableOpts: {
    fixBrokenEntities: false,
    removeWidows: false,
    convertEntities: false,
    convertDashes: false,
    convertApostrophes: false,
    replaceLineBreaks: false,
    removeLineBreaks: false,
    useXHTML: false,
    dontEncodeNonLatin: false,
    addMissingSpaces: false,
    convertDotsToEllipsis: false,
    stripHtml: false
  }
}

Now, the UI, driven by detergent, could grey-out those toggles.

The options keys which have values of a type array (for example, stripHtmlButIgnoreTags and stripHtmlAddNewLine) are omitted from applicableOpts report.

`opts.cb`

One of the unique (and complex) features of this program is the HTML tag recognition. We process only the text and don’t touch the tags and their attributes. For example, widow word removal won’t add non-breaking spaces within your tags if you choose not to strip the HTML.

opts.cb lets you perform additional operations on all the string characters outside any HTML tags. For example, here we upper-case all non-HTML-tag characters:

import { det } from "detergent";
const { res } = det(`aAa\n\nbBb\n\ncCc`, {
  cb: (str) => str.toUpperCase(),
});
console.log(res);
// => "AAA<br/>\n<br/>\nBBB<br/>\n<br/>\nCCC"

Changelog

Open Changelog

detergent9.0.3

Installation

Quick Take

Purpose

API — det()

API — defaults

API — version

applicableOpts

opts.cb

Changelog

detergent^9.0.3

API — `det()`

API — `version`

`applicableOpts`

`opts.cb`