Installation
Quick Take
Examples
Idea
This library takes a string and removes widow words, by replacing last space in the paragraph with non-breaking space.
- Not just adds but if want, removes widow word prevention measures
- Tackles both paragraphs and single lines
- Recognises existing measures and if found, skips operation
- Option to encode for HTML, CSS or JS strings or put a raw non-breaking space
- Does not mangle the line endings (Mac
LF
, Old styleCR
or Windows-styleCR LF
) - A customisable minimum amount of words per line/paragraph to trigger widow word removal
- Can be used in different stages of the workflow: before HTML/CSS/JS-encoding or after
- Optionally replaces spaces with non-breaking spaces in front of all kinds of dashes
- Optionally replaces spaces with non-breaking spaces within UK postcodes
- Optionally it can skip content between templating tags, for example, Nunjucks
{{
and}}
— presets are given for Jinja, Nunjucks, Liquid, Hexo and Hugo
API — removeWidows()
The main function removeWidows()
is imported like this:
It’s a function which takes three input arguments:
Input argument | Type | Obligatory | Description |
---|---|---|---|
str Type: String Obligatory: yes | |||
str | String | yes | String which we will process |
opts Type: Plain object Obligatory: no | |||
opts | Plain object | no | Put options here |
The Optional Options Object has the following shape:
Key | Type | Default | Description |
---|---|---|---|
removeWidowPreventionMeasures Type: boolean Default: false | |||
removeWidowPreventionMeasures | boolean | false | If it’s true , it will replace all widow word nbsp locations, with a single space |
convertEntities Type: boolean Default: true | |||
convertEntities | boolean | true | If it’s false , raw non-breaking space is inserted. If true , encoded in particular language (default HTML) |
targetLanguage Type: string Default: html | |||
targetLanguage | string | html | Choose out of html , css or js — non-breaking spaces will be encoded in this language |
UKPostcodes Type: boolean Default: false | |||
UKPostcodes | boolean | false | If enabled, every whitespace between two parts of UK postcodes will be replaced with non-breaking space |
hyphens Type: boolean Default: true | |||
hyphens | boolean | true | Whitespace in front of dashes (- ), n-dashes (– ) or m-dashes (— ) will be replaced with a non-breaking space |
minWordCount Type: natural number, 0 (disables feature), falsy thing (disables feature)Default: 4 | |||
minWordCount | natural number, 0 (disables feature), falsy thing (disables feature) | 4 | Minimum word count on a paragraph to trigger widow removal |
minCharCount Type: natural number, 0 (disables feature), falsy thing (disables feature)Default: 20 | |||
minCharCount | natural number, 0 (disables feature), falsy thing (disables feature) | 20 | Minimum non-whitespace character count on a paragraph to trigger widow removal |
ignore Type: array of zero or more strings OR string Default: [] | |||
ignore | array of zero or more strings OR string | [] | List templating languages whose heads/tails will be recognised and skipped |
reportProgressFunc Type: function or null Default: null | |||
reportProgressFunc | function or null | null | If function is given, it will be pinged a natural number, for each percentage-done (in its first input argument) |
reportProgressFuncFrom Type: natural number or 0 Default: 0 | |||
reportProgressFuncFrom | natural number or 0 | 0 | Normally reportProgressFunc() reports percentages starting from zero, but you can set it to a custom value |
reportProgressFuncTo Type: natural number Default: 100 | |||
reportProgressFuncTo | natural number | 100 | Normally reportProgressFunc() reports percentages up to 100 , but you can set it to a custom value |
tagRanges Type: array of zero or more arrays Default: [] | |||
tagRanges | array of zero or more arrays | [] | If you know where the HTML tags are, provide string index ranges here |
Here are all defaults in one place for copying:
The function will return a plain object (Res
type above):
Key in a returned object | Type | Description |
---|---|---|
res Type: String | ||
res | String | Processed string |
ranges Type: Null or Array of one or more Ranges (arrays) | ||
ranges | Null or Array of one or more Ranges (arrays) | Same Ranges used to produce the res |
log Type: Plain object | ||
log | Plain object | See its format below |
whatWasDone Type: Plain object | ||
whatWasDone | Plain object | Was it widow removal or just decoding performed ? |
for example, here’s how the output could look like:
{
res: "Lorem ipsum dolor sit amet",
ranges: [
[21, 27, " "]
],
log: {
timeTakenInMilliseconds: 42
},
whatWasDone: {
removeWidows: true,
convertEntities: false
}
}
API — defaults
You can import defaults
:
It's a plain object:
The main function calculates the options to be used by merging the options you passed with these defaults.
API — version
You can import version
:
opts.targetLanguage
Not all text ends up in HTML. As you know, you can inject the content via CSS pseudo attributes and also text might be prepared to be pasted into JSON.
This program allows you to customise the target encoding for chosen language: html
, css
or js
.
Here’s an HTML with HTML-encoded non-breaking space:
Some raw text in a very long line.
Here’s CSS analogue:
span:before {
content: "Some raw text in a very long\00A0line.";
}
Here’s JavaScript analogue:
alert("Some raw text in a very long\u00A0line.");
For example, a minimal application would look like this:
import { removeWidows } from "string-remove-widows";
// second input argument is a plain object, the Optional Options Object:
const result = removeWidows("Some raw text in a very long line.", {
targetLanguage: "css",
});
// now the widow words will be prevented considering that content will go to CSS content:
console.log(result);
// => "Some raw text in a very long\00A0line."
opts.ignore
Very often text already contains templating language literals.
For example, this Nunjucks snippet:
Hi{% if data.firstName %} data.firstName{% endif %}!
We intend to either say Hi John!
to customer John or just Hi!
if we don’t know the customer’s name.
But if we run widow words removal on this piece of text, we don’t want
inserted into the middle of endif
:
Hi{% if data.firstName %} data.firstName{% endif %}!
^^^^^^
That’s where opts.ignore
comes in. You can list heads/tails (chunks from which to start ignoring/where to stop) manually:
import { removeWidows } from "string-remove-widows";
const result = removeWidows("Here is a very long line of text", {
targetLanguage: "html",
ignore: [
{
heads: "{{",
tails: "}}",
},
{
heads: ["{% if", "{%- if"],
tails: ["{% endif", "{%- endif"],
},
],
});
or you can just pick a template:
all
jinja
nunjucks
liquid
hugo
hexo
for example:
import { removeWidows } from "string-remove-widows";
const result = removeWidows("Here is a very long line of text", {
targetLanguage: "html",
ignore: "jinja",
});
If you want widest support of literals, all languages at once, put “all”.
opts.tagRanges
Sometimes input string can contain HTML tags. We didn’t go that far as to code up full HTML tag recognition, more so that such thing would duplicate already existing libraries, namely, string-strip-html
.
opts.tagRanges
accepts known HTML tag ranges (or, in fact, any “black spots” to skip):
import { stripHtml } from "string-strip-html";
import { removeWidows } from "string-remove-widows";
const input = `something in front here <a style="display: block;">x</a> <b style="display: block;">y</b>`;
// first, gung-ho approach - no tag locations provided:
const res1 = removeWidows(input).res;
console.log(res1);
// => something in front here <a style="display: block;">x</a> <b style="display: block;">y</b>
// ^^^^^^
// notice how non-breaking space is wrongly put inside the tag
//
// but, if you provide the tag ranges, program works correctly:
const tagRanges = stripHtml(input, { returnRangesOnly: true });
console.log(JSON.stringify(knownHTMLTagRanges, null, 4));
// => [[24, 51], [52, 56], [57, 84], [85, 89]]
// now, plug the tag ranges into opts.tagRanges:
const res2 = removeWidows(input, { tagRanges }).res;
console.log(res2);
// => something in front here <a style="display: block;">x</a> <b style="display: block;">y</b>
Compared to competition
This program, string-remove-widows | widow-js | @simmo/widower | |
---|---|---|---|
Can both add and remove nbsp s | |||
Can both add and remove nbsp s | ✅ | ❌ | ❌ |
Option to choose between raw, HTML, CSS or JS-encoded nbsp s | |||
Option to choose between raw, HTML, CSS or JS-encoded nbsp s | ✅ | ❌ | ❌ |
Can replace spaces in front of hyphens, n- and m-dashes | |||
Can replace spaces in front of hyphens, n- and m-dashes | ✅ | ❌ | ❌ |
Can prepare UK postcodes | |||
Can prepare UK postcodes | ✅ | ❌ | ❌ |
Does not mangle different types of line endings (LF , CRLF , CR ) | |||
Does not mangle different types of line endings (LF , CRLF , CR ) | ✅ | ✅ | ✅ |
Customisable minimal word count threshold | |||
Customisable minimal word count threshold | ✅ | ✅ | ❌ |
Customisable minimal character count threshold | |||
Customisable minimal character count threshold | ✅ | ❌ | ❌ |
Progress reporting function for web worker web apps | |||
Progress reporting function for web worker web apps | ✅ | ❌ | ❌ |
Reports string index ranges of what was done | |||
Reports string index ranges of what was done | ✅ | ❌ | ❌ |
Non-breaking space location’s whitespace does not necessarily have to be a single space | |||
Non-breaking space location’s whitespace does not necessarily have to be a single space | ✅ | ❌ | ❌ |
Presets for Jinja, Nunjucks, Liquid, Hugo and Hexo templating languages | |||
Presets for Jinja, Nunjucks, Liquid, Hugo and Hexo templating languages | ✅ | ❌ | ❌ |
Decoupled API^ | |||
Decoupled API^ | ✅ | ❌ | ✅ |
CommonJS build | |||
CommonJS build | ✅ | ❌ | ✅ |
ES Modules build | |||
ES Modules build | ✅ | ❌ | ❌ |
UMD build for browser | |||
UMD build for browser | ✅ | ✅ | ❌ |
Can process live DOM of a web page | |||
Can process live DOM of a web page | ❌ | ✅ | ❌ |
Licence | |||
Licence | MIT | ISC | MIT |
^ A decoupled API means that at its core, the program is a function ”string-in, string-out“ and is not coupled with DOM, file I/O, network or other unrelated operations. Such API makes it easier to test and create many different applications on top of a decoupled API.
For example, our competitor widow.js has two coupled parts: 1. API which does string-in, string-out, and 2. DOM processing functions. It could have been two npm libraries. In the end, people who don’t need DOM operations can’t use it.
One decoupled, ”string-in, string-out“ library like string-remove-widows
might power all these at once:
- Web page DOM-manipulation library
- a CLI application to process files or piped streams
- an Express REST endpoint on a server,
- a serverless lambda on AWS,
- an Electron desktop program