Installation
Quick Take
Purpose
detergent
prepares text for copy-pasting into HTML, especially; the email-template HTML:
- deletes invisible Unicode characters (like ETX)
- collapses whitespace
- trims
- prevents widow words
- recursively decodes entities and encodes it back, preferring named HTML entities over numeric-ones, switching to numeric for entities which don’t render correctly across common email clients
- optionally strips HTML (with optional granular control over which tags exactly)
- improves English grammar style: converts M- and N-dashes (spec), apostrophes (spec) and curly quotes (spec)
Extra features are:
- You can skip the HTML encoding of non-Latin language letters. Useful when you are deploying Japanese or Chinese emails because otherwise, everything would be HTML-encoded.
- Detergent is both XHTML and HTML-friendly. You can set which way you want your
<br>
’s to appear: with a closing slash,<br/>
(XHTML) or without (HTML),<br>
— that’s to reduce code validator errors.
API — det()
The main function det()
is imported like this:
It takes two input arguments:
Input argument | Type | Obligatory | Description |
---|---|---|---|
str Type: String Obligatory: yes | |||
str | String | yes | The string to clean. |
opts Type: Object Obligatory: no | |||
opts | Object | no | Optional Options Object. |
The Optional Options Object has the following shape:
Key | Type | Default | Description |
---|---|---|---|
fixBrokenEntities Type: Boolean Default: True | |||
fixBrokenEntities | Boolean | True | should we try to fix any broken named HTML entities like &nsp; (“b” missing) |
removeWidows Type: Boolean Default: True | |||
removeWidows | Boolean | True | replace the last space in paragraph with a non-breaking space |
convertEntities Type: Boolean Default: True | |||
convertEntities | Boolean | True | encode all non-ASCII chars |
convertDashes Type: Boolean Default: True | |||
convertDashes | Boolean | True | typographically-correct the n/m-dashes |
convertApostrophes Type: Boolean Default: True | |||
convertApostrophes | Boolean | True | typographically-correct the apostrophes |
replaceLineBreaks Type: Boolean Default: True | |||
replaceLineBreaks | Boolean | True | replace all line breaks with br ’s |
removeLineBreaks Type: Boolean Default: False | |||
removeLineBreaks | Boolean | False | put everything on one line (removes any line breaks, inserting space where necessary) |
useXHTML Type: Boolean Default: True | |||
useXHTML | Boolean | True | add closing slashes on br ’s |
dontEncodeNonLatin Type: Boolean Default: True | |||
dontEncodeNonLatin | Boolean | True | skip non-latin character encoding (for example, CJK, Alefbet Ivri or Arabic abjad) |
addMissingSpaces Type: Boolean Default: True | |||
addMissingSpaces | Boolean | True | adds missing spaces after dots/colons/semicolons, unless it’s an URL |
convertDotsToEllipsis Type: Boolean Default: True | |||
convertDotsToEllipsis | Boolean | True | convert three dots into … — ellipsis character. When set to false , all encoded ellipses will be converted to three dots. |
stripHtml Type: Boolean Default: True | |||
stripHtml | Boolean | True | by default, all HTML tags are stripped (with exception to opts.keepBoldEtc — option to ignore b , strong and other tags). You can turn off HTML tag removal completely here. |
stripHtmlButIgnoreTags Type: Array Default: ["b", "strong", "i", "em", "br", "sup"] | |||
stripHtmlButIgnoreTags | Array | ["b", "strong", "i", "em", "br", "sup"] | List zero or more strings, each meaning a tag name that should not be stripped. For example, ["a", "sup"] . |
stripHtmlAddNewLine Type: Array Default: ["li", "/ul"] | |||
stripHtmlAddNewLine | Array | ["li", "/ul"] | List of zero or more tag names which, if stripped, are replaced with a line break. Closing tags must start with slash. |
cb Type: something falsy or a function Default: null | |||
cb | something falsy or a function | null | Callback function to additionally process characters between tags (like turning letters uppercase) |
Here are all defaults in one place for copying:
The default set is a wise choice for the most common scenario — preparing text to be pasted into HTML.
You can also set the options to numeric 0
or 1
, that’s shorter than Boolean true
or false
.
Returns
Function returns a plain object (marked type Res
above):
Key | Type | Description |
---|---|---|
res Type: String | ||
res | String | The cleaned string |
applicableOpts Type: Plain Object | ||
applicableOpts | Plain Object | Copy of the options object without keys that have array values, each set to boolean, is that function applicable to given input |
API — defaults
You can import defaults from opts
:
It's a plain object:
The main function calculates the options to be used by merging the options you passed with these defaults.
We also use these for testing purposes, in test-mixer
, to generate all possible combinations of this options object.
API — version
You can import version
:
applicableOpts
Next generation web applications are designed to show only the options that are applicable to the given input. This saves user’s time and also conserves mental resources — you don’t even need to read all the labels of the options if they are not applicable.
At the moment, detergent
currently has 14 option keys, 12 of them boolean. That’s not a lot but if you use the tool every day, every optimisation counts.
We got inspiration for this feature while visiting competitor application typograf — it has 110 checkboxes grouped into 12 groups and options are hidden twice — first sidebar is hidden when you visit the page, second, option groups are collapsed.
Another example of overwhelming options set — Kangax minifier — html-minifier
— it’s got 26 options with lots of descriptions.
Detergent tackles this challenge differently. While it processes the given input, it makes a note, is particular option applicable or not. This is done independently from the actual options settings.
For example, detergent’s output might look like this — all options not applicable because there’s nothing to do on “abc”:
{
res: "abc",
applicableOpts: {
fixBrokenEntities: false,
removeWidows: false,
convertEntities: false,
convertDashes: false,
convertApostrophes: false,
replaceLineBreaks: false,
removeLineBreaks: false,
useXHTML: false,
dontEncodeNonLatin: false,
addMissingSpaces: false,
convertDotsToEllipsis: false,
stripHtml: false
}
}
Now, the UI, driven by detergent
, could grey-out those toggles.
The options keys which have values of a type array (for example, stripHtmlButIgnoreTags
and stripHtmlAddNewLine
) are omitted from applicableOpts
report.
opts.cb
One of the unique (and complex) features of this program is the HTML tag recognition. We process only the text and don’t touch the tags and their attributes. For example, widow word removal won’t add non-breaking spaces within your tags if you choose not to strip the HTML.
opts.cb
lets you perform additional operations on all the string characters outside any HTML tags. For example, here we upper-case all non-HTML-tag characters:
import { det } from "detergent";
const { res } = det(`aAa\n\nbBb\n\ncCc`, {
cb: (str) => str.toUpperCase(),
});
console.log(res);
// => "AAA<br/>\n<br/>\nBBB<br/>\n<br/>\nCCC"