Quick Take

import { strict as assert } from "assert";
import { det, opts, version } from "detergent";

// on default setting, widow removal and encoding are enabled:
  det("clean this text £").res,
  "clean this text £"


detergent prepares text for placing into HTML, especially, email-template HTML:

Extra features are:

  • You can skip the HTML encoding of non-Latin language letters. Useful when you are deploying Japanese or Chinese emails because otherwise, everything would be HTML-encoded.
  • Detergent is both XHTML and HTML-friendly. You can set which way you want your <br>'s to appear: with a closing slash, <br/> (XHTML) or without (HTML), <br> — that's to reduce code validator errors.


The main function is exported in a plain object under key det:

const { det } = require("detergent");
// or request everything:
const { det, opts, version } = require("detergent");
// this gives extra plain object `opts` with default options. Handy when
// developing front-ends that consume the Detergent.

det is the main function. See its API below.

opts is default options' object. You pass it (or its tweaked version) to det.

version returns same-named package.json key's value - the version of the particular copy of Detergent you've got.

API - det() Input

The det above is a function. You pass two input arguments to it:

Input argument Type Obligatory? Description
input String yes The string you want to clean.
options Object no Options object. See its key arrangement below.

API - det() second input argument, the options object

Options object's key Type of its value Default Description
fixBrokenEntities Boolean True should we try to fix any broken named HTML entities like &nsp; ("b" missing)
removeWidows Boolean True replace the last space in paragraph with a non-breaking space
convertEntities Boolean True encode all non-ASCII opens in a new tab chars
convertDashes Boolean True typographically-correct the n/m-dashes
convertApostrophes Boolean True typographically-correct the apostrophes
replaceLineBreaks Boolean True replace all line breaks with br's
removeLineBreaks Boolean False put everything on one line (removes any line breaks, inserting space where necessary)
useXHTML Boolean True add closing slashes on br's
dontEncodeNonLatin Boolean True skip non-latin character encoding (for example, CJK opens in a new tab, Alefbet Ivri or Arabic abjad)
addMissingSpaces Boolean True adds missing spaces after dots/colons/semicolons, unless it's an URL
convertDotsToEllipsis Boolean True convert three dots into &hellip; - ellipsis character. When set to false, all encoded ellipses will be converted to three dots.
stripHtml Boolean True by default, all HTML tags are stripped (with exception to opts.keepBoldEtc - option to ignore b, strong and other tags). You can turn off HTML tag removal completely here.
stripHtmlButIgnoreTags Array ["b", "strong", "i", "em", "br", "sup"] List zero or more strings, each meaning a tag name that should not be stripped. For example, ["a", "sup"].
stripHtmlAddNewLine Array ["li", "/ul"] List of zero or more tag names which, if stripped, are replaced with a line break. Closing tags must start with slash.
cb something falsy or a function null Callback function to additionally process characters between tags (like turning letters uppercase)

Here it is in one place:

det("text to clean", {
fixBrokenEntities: true,
removeWidows: true,
convertEntities: true,
convertDashes: true,
convertApostrophes: true,
replaceLineBreaks: true,
removeLineBreaks: false,
useXHTML: true,
dontEncodeNonLatin: true,
addMissingSpaces: true,
convertDotsToEllipsis: true,
stripHtml: true,
stripHtmlButIgnoreTags: ["b", "strong", "i", "em", "br", "sup"],
stripHtmlAddNewLine: ["li", "/ul"],
cb: null,

The default set is a wise choice for the most common scenario - preparing text to be pasted into HTML.

You can also set the options to numeric 0 or 1, that's shorter than Boolean true or false.

API - det() output - an object

output object's key Type of its value Description
res String The cleaned string
applicableOpts Plain Object Copy of options object without keys that have array values, each set to boolean, is that function applicable to given input

Function det returns a plain object, for example:

res: "abc",
applicableOpts: {
fixBrokenEntities: false,
removeWidows: false,
convertEntities: false,
convertDashes: false,
convertApostrophes: false,
replaceLineBreaks: false,
removeLineBreaks: false,
useXHTML: false,
dontEncodeNonLatin: false,
addMissingSpaces: false,
convertDotsToEllipsis: false,
stripHtml: false


Next generation web applications are designed to show only the options that are applicable to the given input. This saves user's time and also conserves mental resources — you don't even need to read all the labels of the options if they are not applicable.

Detergent currently has 14 option keys, 12 of them boolean. That's not a lot but if you use the tool every day, every optimisation counts.

We got inspiration for this feature while visiting competitor application typograf opens in a new tab — it has 110 checkboxes grouped into 12 groups and options are hidden twice — first sidebar is hidden when you visit the page, second, option groups are collapsed.

Another example of overwhelming options set — Kangax minifier — html-minifier opens in a new tab — it's got 26 options with heavy descriptions.

Detergent tackles this problem by changing its algorithm: it processes the given input and then makes a note, is particular option applicable or not, independently, is it enabled or not. Then, if it's enabled, it changes the result value.

For example, detergent's output might look like this — all options not applicable because there's nothing to do on "abc":

res: "abc",
applicableOpts: {
fixBrokenEntities: false,
removeWidows: false,
convertEntities: false,
convertDashes: false,
convertApostrophes: false,
replaceLineBreaks: false,
removeLineBreaks: false,
useXHTML: false,
dontEncodeNonLatin: false,
addMissingSpaces: false,
convertDotsToEllipsis: false,
stripHtml: false

The options keys which have values of a type array (stripHtmlButIgnoreTags and stripHtmlAddNewLine) are omitted from applicableOpts report.


Custom settings object with one custom setting convertEntities (others are left default):

const { det } = require("detergent");
let { res } = det("clean this text £", {
convertEntities: 0, // <--- zero is like "false", turns off the feature
// > 'clean this text £'


One of the unique (and complex) features of this program is HTML tag recognition. We process only the text and don't touch the tags, for example, widow word removal won't add non-breaking spaces within your tags if you choose not to strip the HTML.

opts.cb lets you perform additional operations on all the string characters outside any HTML tags. For example, detergent.io opens in a new tab uppercase-lowercase functionality relies on opts.cb.

Here's an example, consider this case — HTML tags skipped when turning letters uppercase:

const { det } = require("detergent");
const { res } = det(`aAa\n\nbBb\n\ncCc`, {
cb: (str) => str.toUpperCase(),
// => "AAA<br/>\n<br/>\nBBB<br/>\n<br/>\nCCC"


See it in the monorepo opens in a new tab, on GitHub.


To report bugs or request features or assistance, raise an issue on GitHub opens in a new tab.

Any code contributions welcome! All Pull Requests will be dealt promptly.


MIT opens in a new tab

Copyright © 2010–2021 Roy Revelt and other contributors

Related articles:

Related packages:

📦 html-entities-not-email-friendly 0.6.1
All HTML entities which are not email template friendly
📦 string-apostrophes 2.0.1
Comprehensive, HTML-entities-aware tool to typographically-correct the apostrophes and single/double quotes
📦 string-collapse-white-space 10.0.1
Replace chunks of whitespace with a single spaces
📦 string-fix-broken-named-entities 6.0.1
Finds and fixes common and not so common broken named HTML entities, returns ranges array of fixes
📦 string-left-right 5.0.1
Looks up the first non-whitespace character to the left/right of a given index
📦 string-remove-widows 3.0.1
Helps to prevent widow words in a text
📦 string-strip-html 9.0.1
Strips HTML tags from strings. No parser, accepts mixed sources