Changelog

All notable changes to this project will be documented in this file.
See Conventional Commits opens in a new tab for commit guidelines.

8.1.0

12 Aug 2022

Features

8.0.17

18 Apr 2022

🔧 Fixed

8.0.0

9 Sept 2021

Features

💥 BREAKING CHANGES

  • programs now are in ES Modules and won’t work with Common JS require()

7.1.0

24 May 2021

Features

  • config file based major bump blacklisting (e15f9bb)

7.0.15

11 Apr 2021

Reverts

  • Revert “chore: setup refresh” (23cf206)

7.0.1

28 Jan 2021

🔧 Fixed

  • add testStats to npmignore (f3c84e9)

7.0.0

23 Jan 2021

Features

💥 BREAKING CHANGES

  • there should not be any, but bumping anyway as the source was rewritten in TypeScript

6.1.0

6 Dec 2020

Features

  • separate mixer test util into standalone lib and tap it, some more tweaks (3021a6f)

6.0.0

28 Nov 2020

💥 BREAKING CHANGES

  • when opts.removeWidows is off, it’s interpreted as explicit request to turn   and raw non-breaking space characters into normal spaces

5.11.0

16 Aug 2020

Features

  • update to the latest string-strip-html v.5 (230e35b)

5.10.0

6 May 2020

Features

  • update to the latest deps (bfd94e8)

5.9.0

26 Apr 2020

Features

  • harden the eslint rules set and rebase a little (5f5e0a8)

5.8.1

27 Nov 2019

🔧 Fixed

  • apostrophes algorithm improvements (3415b17)
  • characters are index zero were not included in the clause (6ad8f5f)
  • remove cb from reported applicable opts (f79c469)

5.8.0

21 Nov 2019

Features

5.7.1

11 Nov 2019

🔧 Fixed

  • fix the html tag skipping during widow removal (2fd262b)

5.7.0

5 Nov 2019

Features

  • make known html tag ranges to be ignored during the widow removal (f7992a7)

5.6.0

4 Nov 2019

Features

  • add few previously missing invisible space-like characters (33ee0ed)

  • Besides U+200A HAIR SPACE, now we detect and turn into normal space all other fancy space characters:

    • U+1680 OGHAM SPACE MARK
    • U+2000 EN QUAD
    • U+2001 EM QUAD
    • U+2002 EN SPACE
    • U+2003 EM SPACE
    • U+2004 THREE-PER-EM SPACE
    • U+2005 FOUR-PER-EM SPACE
    • U+2006 SIX-PER-EM SPACE
    • U+2007 FIGURE SPACE
    • U+2008 PUNCTUATION SPACE
    • U+2009 THIN SPACE
    • U+202F NARROW NO-BREAK SPACE
    • U+205F MEDIUM MATHEMATICAL SPACE
    • U+3000 IDEOGRAPHIC SPACE

Reference: https://www.fileformat.info/info/unicode/category/Zs/list.htm

5.5.0

27 Oct 2019

Features

  • remove few dependencies to reduce the build size, add more unit tests (5e836c8)

5.4.0

26 Oct 2019

🔧 Fixed

Features

  • algorithm improvements around empty whitespace blocks (0dcb178)

5.3.0

25 Oct 2019
  • converts n-dash found in pattern “space — n-dash — space” into an m-dash
  • fixed bug within the ESP templating tag recognition — algorithm exited later than expected
  • corrected some dash applicable opts reporting and added many dash-related tests

5.2.0

21 Oct 2019

Features

5.1.0

2 Oct 2019

Features

5.0.0

17 Sept 2019

Let’s migrate Detergent onto a monorepo, split some of its functionality into standalone packages (html-entities-not-email-friendly (npm, gitlab monorepo), string-apostrophes (npm, gitlab monorepo) and string-remove-widows (npm, gitlab monorepo), for starters).

💥 BREAKING CHANGES

We’re renaming the main function detergent to det.

Before:

const { detergent, opts, version } = require(“detergent”);

Now:

const { det, opts, version } = require(“detergent”);

That’s necessary because of the UMD build — if you were to tap detergent on a web page, you’d call the script:

<script src=“https://cdn.jsdelivr.net/npm/detergent/dist/detergent.umd.js”></script>

Then you’d get a global variable “detergent” which you consume like this:

const { det, opts, version } = detergent;

Notice the difference between deterget global exported object which contains det-the-function. Both can’t be named “detergent”.

Here are other notable features:

Separating functionality into standalone packages

  • Widow word removal is now a separate package, string-remove-widows (npm, gitlab monorepo)
  • Typographically correct apostrophes and quotes conversion is now a separate package, string-apostrophes (npm, gitlab monorepo)
  • Broken named HTML entity fixing library is now a separate package, string-fix-broken-named-entities (npm, gitlab monorepo)
  • The list of named HTML entities which are not email-friendly is now a separate package, html-entities-not-email-friendly (npm, gitlab monorepo)

Detergent is now HTML-aware

Now you have full control over HTML-stripping, thanks to string-strip-html (npm, gitlab monorepo).

It’s controlled by a new options keys opts.stripHtml and opts.stripHtmlButIgnoreTags.

If you give Detergent a piece of HTML and disable HTML stripping, it will detect the tags and process the text between the tags. Furthermore, it should recognise some common templating languages.

Apostrophes and quotes processing rehaul

All apostrophe and quote processing is done “in-house” now, without relying on 3rd party libraries.

  • We removed curl-quotes from dependencies. Now we do more, pass its unit tests (that it wasn’t passing itself) and added even more unit tests regarding quotes and apostrophes.
  • Now correctly setting ’tis, ’twas, ’t. Previously, buggy curl-quotes.js was using left single curly quote even though their own (failing) unit tests were requiring right single curly quote. For the first time, their unit tests are being satisfied by a library (this-one).
  • Sets two Hawaiian words with okina’s correctly: Hawai’i and O’ahu as left single curly quotes. See https://practicaltypography.com/apostrophes.html

Dashes rehaul

All dashes processing is done “in-house” now, without relying on 3rd party libraries.

Other changes

  • Removing opts.keepBoldEtc in favor of more universal opts.stripHtmlButIgnoreTags which will have the same default tag set like opts.keepBoldEtc had. This also allowed us to put two internal functions, encryptBoldItalic() and decryptBoldItalic() to pastures. No more mutating the string in order to hide tags from removal algorithm!
  • Removing opts.removeSoftHyphens, it’s now permanently on. We had to do it because each option doubled the automated test count and it was not worth to have it, soft hyphens are very rare and this option was too granular.
  • o.removeLineBreaks now correctly accounts for \r\n-Windows-style line breaks.

4.0.0

17 May 2018

There are no API changes but I removed default from main export so bumping major just in case it breaks some code.

🏗️ Improvements

  • Set up Prettier on a custom ESLint rule set.
  • Removed package.lock and .editorconfig
  • Wired Rollup to remove comments from non-dev builds. This means we can now leave the console.logs in the source code — there’s no need to comment-out console.log statements or care about them not spilling into production. Now it’s done automatically.
  • Unit tests are pointing at ES modules build, which means that code coverage is correct now, without Babel functions being missed. This is important because now code coverage is real again and now there are no excuses not to perfect it.

3.7.0

27 Apr 2018

🏗️ Improvements

  • Implemented throw error pinning on all unit tests.
  • Moved object-boolean-combinations to devdeps. I don’t know how it got into deps but it was not right.

3.6.0

23 Apr 2018

🏗️ Improvements

  • Removed airbnb-base ESLint preset and set up Prettier on recommended rules.
  • Removed package-lock.json
  • One unit test.
  • Set up two different Rollup builds — dev, which keeps console.logs and prod, which strips them.

Practically, this means source code can keep console.logs and there’s no need to do anything about them when building the production build. npm test task which is ran before committing will not call dev Rollup build and therefore rollup-plugin-strip will kick in and strip all console.logs.

This is a huge boost for my productivity.

3.5.0

1 Jan 2018

Features

3.4.0

31 Dec 2017

🏗️ Improvements

  • Dependency update
  • More unit tests to increase the code coverage on util.js
  • Setup tweaks and some rebasing

3.3.0

27 Nov 2017

Features

  • Switched to custom HTML stripping library tailored specificly for Detergent. Now legit brackets are recognised and not removed, for example: Equations: a < b and c > d are important would not treat < b and > as a tag any more.
  • Recognises improvised arrows comprising of 4 and more dashes, like ->, -->, ---> and so on.
  • Contributors list to readme.
  • Closes #19 — now recognises left-to-right and right-to-left marks.

🔧 Fixed

  • Removed dependency string.js, it was causing security alerts.

3.2.0

25 Sept 2017

Features

  • The main source now is in ES2015 modules with import/export.
  • Implemented Rollup to generate 3 flavours of this package: CommonJS, UMD and ESM module with import/export.

3.1.6

19 Sept 2017

✈️ Changes

  • Small rebase — tapped the line trimming function in string-collapse-white-space what rendered the current string.js-based function redundant. All functionality stays the same, it’s a rebase.

3.1.0

13 Sept 2017

Features

  • Widow removal now detects Jinja/Nunjucks code. For example, if the input string starts with { and ends with }, it will automatically deactivate.

3.0.0

13 Sept 2017

Three Things Changed

  • Main exports of the module is not the main detergent() function, but an object which contains detergent() function and default options object as two separate keys:
module.exports = {
  detergent: detergent,
  opts: defaultsObj,
};

This means, from now on, import Deterent like this:

const detergent = require(“detergent”).detergent;

I’m building a new front-end for detergent.io and I want to automate the options list, that’s why I need the opts exported.

  • ✨💥✨ The result of the main function detergent() is now not a string but an object. Result is now placed under key res. This is done so I can place additional info in the future, what was added or removed exactly, what kinds of invisible characters were encountered and so on.
  • Removed JS Standard and switched to raw eslint with airbnb-base config preset with 2 overrides: 1. no semicolons. 2. allowing plus-plus in for loops. For posterity JS Standard is using half-year old version of ESLint and its config is too relaxed, it’s ignoring many good practice rules.

2.32.0

7 Sept 2017

The previous algorithm was not aiming for anything specific, which led to a goal of easy to read and develop code. Rest was secondary (correctness aside of course). In this rebase issue, the main aim is efficiency (besides correctness): both when ran by JS engine as well as algorithm’s in general.

I implemented JS optimisations like for looping backward (optimisation for JS engine) and general ones like cutting down on operations and making them only when it’s the best time to do so. I reviewed all locations of all functions and weighed are they necessary at all (or can they be replaced by something more efficient).

I separated all the operations performed on input into three stages: the first stage is blanket operations to prepare text, like decoding and broken code patching. Second stage is new, we traverse the string character-by-character and perform all the operations that can be performed at such level. Third stage is the rest, a set of consecutive functions mutating the result one-after-another until it’s done.

This second stage relieved us from roughly half of the blanked functions that previously mutated the string again and again. Now, all deletion/insertion procedures are recorded during (a single) traversal in Step 2; then a string is crunched in one go. It’s done using combo of string-slices-array-push and string-replace-slices-array.

Features

  • Horizontal ellipsis is converted only when there are three dots in one lump, not more and setting is on. Gung-ho regex replacements would not do this correctly by the way.
  • Horizontal ellipsis switch makes the journey strictly either way: either all kinds of what could be interpreted as ellipsis are converted to fancy … (or unencoded character if the encoding is turned off) OR those above are converted to dot dot dot. There are no gray cases. Unlike before.
  • Script tags are now stripped together with their contents. Solves #15, thanks @nacimgoura
  • More tests to thoroughly prove that single quotes in any format (') are not encoded. Ever. They can be converted to fancy single quote, but in a single straight shape, they should always stay the same.

🔧 Fixed

  • 💥 upper-case dependency. It was buggy, by the way, reporting ’1′ as uppercase. For those concerned that didn’t affect Detergent’s correctness.
  • 💥 lower-case dependency. It was buggy as well. Same thing.

2.31.0

28 Aug 2017

Features

  • opts.convertDotsToEllipsis — now you can customise, do you want three dots converted to horizontal ellipsis, &hellip;, or not.
  • Tapped check-types-mini to enforce peace and order within an options object. Now unrecognised options object’s keys will throw as well.

🔧 Fixed

  • 💥 Dependency lodash.clonedeep — the Object.assign against an empty object does the same job — it does not mutate the input arguments.

2.30.0

20 Jul 2017

Features

  • Bunch of new badges to readme.
  • .npmignore and added /media/ to it, along all dotfiles. This will reduce your npm installation footprint.

2.29.0

20 Jul 2017

Features

  • Feature for issue #14 — Detergent strips all HTML (except bolt/italic/strong/em) code, but in the process, some content might be misformatted. For example, the content in unordered lists would get bunged up together without spaces. Now that’s fixed. By default, every <li> will be put onto a new line, as well as closing </ul>. If you want everything on one line, set opts.removeLineBreaks to true.

🔧 Fixed

  • 💥 Some Lodash dependencies, replacing them with native ES6-ones.

2.28.0

8 Jul 2017

🔧 Fixed

  • 💥 As the features grew, the “Builds” time on Travis grew too. Currently Travis fails around 50% of the cases because it hits 50 minutes mark while running the end-to-end unit tests. Therefore, I’m removing Travis for good. It makes no sense anyway, as there are no “Builds” for this library, only unit tests, which can be ran locally.

2.27.0

8 Jul 2017

🔧 Fixed

  • Code refresh: updated all deps, generated up-to-date package-lock and did some small code rebasing related to all this.

2.26.0

12 Apr 2017

Features

  • Options key o.addMissingSpaces now allows you to control, do you want to add missing spaces after full stops/colons/semicolons, or not. This does not break the API as the new default setting matches previously non-customiseable setting.

2.25.0

7 Apr 2017

🏗️ Improvements

  • Tiny rebasing: separated all functions into util.js, added some measures to protect against options object settings in wrong type (values other type than Boolean).

2.24.0

5 Apr 2017

🏗️ Improvements

  • Widows won’t be added if there’s right closing slash following the space. Also, they won’t be added if there’s hr or br preceding the space. This is necessary to cater the cases when Detergent is being ran on a code which has concealed HTML tags where brackets are swapped with custom strings. For example, cases like aaaaaaaaaaa%%%1br /%%%2aaaaaaaaaaa should get identified as concealed HTML and widow removal should not be triggered.

🔧 Fixed

  • 💥 strip-bom library dependency was redundant; ‘\uFEFF’ was already in the invisible character list and removed along all other invisibles.

2.23.0

24 Mar 2017

🏗️ Improvements

  • Swooping in on full stop + letter fixes. I found the file names where extension is mentioned get separated into two parts. I came up with the idea: two errors rarely happen at one place. “string1.string2″ is a double error because space after full stop is missing and letter that follows is in capital. This leads to the algorithm:

    If there is no space after full stop, and letter that follows is uppercase, add a full stop. If lowecase letter follows full stop, leave it as it is.

    Additionally, the algorithm is now checking, does any of the known extensions follow the full stop (in any case). If so, space between the full stop and extension is not added. This should cover all false positives where file names are involved.

2.22.0

22 Mar 2017

🏗️ Improvements

  • Now correctly recognises and ignores legitimate minus signs, such as -20°C when it comes after a space. If algorithm will detect a number of curency symbol after a dash, it will not add a space after it or turn it into an m-dash. It does not matter now, a space character precedes all that or not.
  • Updated Husky to latest.

Features

  • More tests.

✈️ Changes

  • 🔧 Now consuming JS Standard linter in normal fashion, not “any latest”, but within the current major range.

2.21.0

9 Mar 2017

Features

2.20.0

22 Feb 2017

Features

  • Widow removal now identifies UK postcodes and replaces the space with non-breaking space.

2.19.0

4 Jan 2017

Features

  • URL recognition — now Detergent won’t add spaces within an URL.
  • New tests — to maintain the coverage and prove the surrounded text is cleaned correctly as before.

2.18.0

23 Dec 2016

Features

  • JS Standard on a precommit hook to enforce an order everywhere
  • Tweaks for BitHound to ignore the fact that we are going to use the latest version AVA, Coveralls and Standard no matter what, to reduce maintenance time spent on all my libraries.
  • Some tweaks to completely pass JS Standard (there were redundant regex escapes for example)

2.17.0

21 Dec 2016

Features

  • Test coverage and a badge
  • Changelog
  • Tweaked travis and bithound setup files
  • Hardened the .gitignore
  • Consolidated Readme badge links to svg’s and url’s in the footer

🔧 Fixed

  • 🔧 Renamed some tests to match better what’s inside
  • 🔧 The latest AVA (*) is requested with an ignore on the BitHound