Changelog

All notable changes to this project will be documented in this file.
See Conventional Commits for commit guidelines.

8.4.0

22 Dec 2022

✨ Features

Back-porting the latest v13.0.3 to CJS and releasing as non-pure ESM (no type: module in package.json).

13.0.0

1 Dec 2022

💥 BREAKING CHANGES

Minimum supported Node version is v14.18; we’re dropping v12 support

12.0.0

14 Nov 2022

🔧 Fixed

improve throw messages and rebase few tests (c60cbf5)

✨ Features

return both string and ranges (3932619)

💥 BREAKING CHANGES

main function returns both string and ranges; output is not a plain object; no more opts.returnRangesOnly

11.6.13

13 Oct 2022

🔧 Fixed

tweaks to broken code algorithm to align dumpLinkHrefsNearby enabled and disabled (234faa1)

11.6.10

5 Oct 2022

🔧 Fixed

respect stripTogetherWithTheirContents when dumpLinkHrefsNearby is on (43209e7), closes #54

11.6.0

31 Aug 2022

✨ Features

treat HTML-encoded Combining Grapheme Joiner (U+034F) character as whitespace (5a0d7ec)

11.5.0

31 Aug 2022

✨ Features

add opts.reportProgressFunc (4045496)

11.4.0

31 Aug 2022

✨ Features

remove indentations in front of text too (6527eb8)

11.3.0

18 Aug 2022

✨ Features

make opts.dumpLinkHrefsNearby sub-keys all optional too (90810d8)

11.2.0

12 Aug 2022

✨ Features

export types (11b5fb9)

11.1.0

1 Aug 2022

✨ Features

improvements to opts.dumpLinkHrefsNearby against punctuation (daab255)

11.0.1

26 Jul 2022

🔧 Fixed

add more precautions when assuming string methods will exist (00804b6)
align edge whitespace processing on cb and not on cb (43ee6d5)
fix stripRecognisedHTMLOnly enabled to strip single letter tags (1b7ff49)

11.0.0

16 Jul 2022

✨ Features

add inline tag recognition — x<b>y</b>z strips to xyz because b is inline element — but — x<div>y</div>z strips to x y z because div is not inline element (cbac254), closes #49

💥 BREAKING CHANGES

Bumping major just formally, there are no API changes. The inline tags now have the correct surrounding whitespace.

10.1.0

13 Jul 2022

✨ Features

improve whitespace control around punctuation (d8318a7), closes #49

10.0.0

6 Jul 2022

✨ Features

Efficiency improvements — any tags within <script> are now skipped. But this comes at expense of tackling the weird case when two paired tags are overlapping,
```
<script>
  a
  <style>
    b
  </script>
  c
</style>
```
That’s a strange broken code case, but it still warrants a major semver bump. Our perf measurement doesn’t cover the particular branch of the algorithm, so there is no perf difference in the records.

🔧 Fixed

Correct DOCTYPE attribute values pinged to the callback (all-name, no value)
Fixes a rare case when the program could enter the infinite loop condition when it encounters Nunjucks-Jinja-like (but different) templating literals. We added a hard check to prevent the backwards index offset.

💥 BREAKING CHANGES

Bumping major just because formally, DOCTYPE tag attributes are now pinged to the callback differently than before (it’s done correctly now, but differently nonetheless).

9.1.7

21 Mar 2022

🔧 Fixed

correct the types (7ec82ab)

9.1.0

22 Nov 2021

✨ Features

opts.ignoreTagsWithTheirContents (39dad96)
opts.stripRecognisedHTMLOnly (50010a8)

9.0.0

9 Sept 2021

✨ Features

migrate to ES Modules (8c9d95d)

💥 BREAKING CHANGES

programs now are in ES Modules and won’t work with Common JS require()

8.3.0

24 May 2021

🔧 Fixed

skip jinja-nunjucks tags to run faster (307a578)

✨ Features

config file based major bump blacklisting (e15f9bb)

8.2.12

11 Apr 2021

⏪ Reverts

Revert “chore: setup refresh” (23cf206)

8.2.0

7 Feb 2021

✨ Features

better recognition for Rails or Phoenix templating tags (9aeddc3), closes #2

8.1.0

28 Jan 2021

✨ Features

extend ESP tag recognition to all <%… tags (d552f86)

8.0.1

28 Jan 2021

🔧 Fixed

add testStats to npmignore (f3c84e9)

8.0.0

23 Jan 2021

✨ Features

rewrite in TS, start using named exports (e6fe544)

💥 BREAKING CHANGES

previously you’d consume like: import stripHtml from … — now: import { stripHtml } from …

7.0.0

28 Nov 2020

Accidental version bump during migration to SourceHut. Sorry about that.

6.3.0

10 Nov 2020

✨ Features

algorithm improvements (5c2a45f)

6.2.0

26 Oct 2020

✨ Features

better recognise some JSON patterns (450d30a)

6.1.0

13 Oct 2020

🔧 Fixed

fix filteredTagLocations closing location on paired tags (43ce393)

✨ Features

wildcard ALL option for opts.stripTogetherWithTheirContents (d2031ab)

6.0.0

15 Sept 2020

🔧 Fixed

correct filteredTagLocations for pair tags which are stripped with content (6bd6f4c)

💥 BREAKING CHANGES

now filteredTagLocations shows only one range for pair tags which are to be stripped with their contents

5.0.0

16 Aug 2020

Why change what’s returned, upon user’s request, when we can return everything and let the user pick?

💥 BREAKING CHANGES

That’s why we removed opts.returnRangesOnly.

Function’s output is a plain object now, containing:

cleaned string (considering opts.ignoreTags and opts.onlyStripTags)
gathered ranges, used to produce cleaned string (considering opts.ignoreTags and opts.onlyStripTags)
tag locations of all spotted HTML tags IGNORING the whitelist/blacklist opts.ignoreTags and opts.onlyStripTags
locations of filtered HTML tags (considering opts.ignoreTags and opts.onlyStripTags)
plus, some statistics goodies

stripHtml(“abc<a>click me</a>def”);
// => {
//      log: {
//        timeTakenInMilliseconds: 6
//      },
//      result: “abc click me def”,
//      ranges: [
//        [3, 6, “ “],
//        [14, 18, “ “],
//      ],
//      allTagLocations: [
//        [3, 6],
//        [14, 18],
//      ],
//      filteredTagLocations: [
//        [3, 6],
//        [14, 18],
//      ],
//    }

allTagLocations can be used for syntax highlighting, for example.

Migration instructions:

Previously, function on defaults returned result string. Now it’s under result key, in output plain object. Previously, you could request ranges output via opts.returnRangesOnly. Now ranges are always present under key ranges.

Some people mistakenly took ranges output for exact tag locations. Now exact tag locations are under allTagLocations key.

That’s different from ranges output, because ranges are instructions: what to add, what to replace and can be merged and their character indexes covered will include whitespace management.

allTagLocations, on other hand, are exact tag locations. If you slice them using String.slice() you’ll get string from bracket-to-bracket like <a>.

4.4.0

26 Apr 2020

✨ Features

harden the linting rules and make them all pass (812d17e)

4.3.0

23 Sept 2019

✨ Features

respect double line breaks (2c09d59), closes #15

4.2.0

4 Sept 2019

✨ Features

add previously missing tag.lastClosingBracketAt on ignored tags (f35e595)
make the callback (opts.cb) ping the ignored tags too (d9302e7)
report tag.slashPresent as index of the slash, not as a boolean (96ce6c8)

4.1.0

24 Aug 2019

✨ Features

implement callback interface, opts.cb (79bc8dc)

3.5.0

20 Jan 2019

Various documentation and setup tweaks after we migrated to monorepo
Setup refresh: updated dependencies and all config files using automated tools

3.3.0

26 Dec 2018

🔧 Fixed

🐛 Throwing case when tag is the last in string and has closing bracket missing (ef44f63)

✨ Features

Algorithm improvements (8a82b8e)
Delete trailing whitespace after dirty code chunk: tag + missing opening bracket tag (71f720c)
Improvements to exclamation mark punctuation (e31fd3b)
opts.dumpLinkHrefsNearby and algorithm improvements (777407e)
Add opts.onlyStripTags (7bb49c8)
Add opts.trimOnlySpaces (b8c6f29)

3.2.0

22 Jul 2018

Fixed opts.returnRangesOnly — when there are no HTML tags in the input and the option is on, an empty array is returned (as opposed to the input string, incorrectly returned previously). Sorry about that.

3.1.0

17 Jul 2018

Added opts.onlyStripTags

3.0.0

3 Jul 2018

Breaking changes: opts.dumpLinkHrefsNearby was previously Boolean. Now it’s a plain object and its key enabled (opts.dumpLinkHrefsNearby.enabled) does the same thing that opts.dumpLinkHrefsNearby did before v3.

This makes it easier for us to contain all new opts.dumpLinkHrefsNearby settings in one place:

{
  ignoreTags: [],
  stripTogetherWithTheirContents: stripTogetherWithTheirContentsDefaults,
  skipHtmlDecoding: false,
  returnRangesOnly: false,
  trimOnlySpaces: false,
  dumpLinkHrefsNearby: { // <------ CHANGED!
    enabled: false, // <-------- 💥 NEW!
    putOnNewLine: false, // <--- 💥 NEW!
    wrapHeads: “", // <--------- 💥 NEW!
    wrapTails: “” // <---------- 💥 NEW!
  }
}

Now, input string is returned trimmed of whitespace in the beginning and in the end.

2.4.0

20 Jun 2018

Two range- dependencies have been renamed, namely ranges-push and ranges-apply. We tapped them.

2.3.0

8 Jun 2018

Improvements to dirty code recognition algorithm

2.2.0

2 Jun 2018

opts.dumpLinkHrefsNearby — handy when producing Email Text versions
Improved algorithm to understand HTML code that has been abruptly chopped off. If you select bunch of HTML where beginning is valid, but ending is somewhere in the middle of the tags, styles or whatnot, now that tag will be removed.
Improved algorithm to detect and clean tags without closing bracket, if a new tag follows, with or without whitespace in between.

64 unit tests, 451 assertions, 2226 lines of unit tests at 90% line coverage.

2.1.0

31 May 2018

opts.trimOnlySpaces — up until now, by default, the outsides of the string was trimmed using String.trim() which erased:
- non-breaking spaces (in combination with recursive entity decoding this means will also be erased)
- tabs
- line breaks (\n), carriage returns (\r) and combination thereof (\r\n)
- some other less common but space-like characters.
This becomes a challenge in automated environments where data is considered to be clean and multiple datum can be parts of another. For example, we might be cleaning JSON fields where value is “sandwitched” out of three fields: “Hi “, “%%-firstname-%%", “, welcome to special club!". To improve formatting, some outer spaces like after “Hi” can be replaced with a non-breaking space. This way, words would never wrap there. However, if all fields were cleaned by a tool which used this HTML stripping function, outer non-breaking spaces would get deleted and result would end up: “HiJohn, welcome to special club!". This option makes trimming more strict — only spaces deleted during string trimming.

2.0.0

30 May 2018

One day I noticed that my Nunjucks code (just a greater-than comparison against a number) gets falsely interpreted as HTML by this library and went on to rewrite the whole thing from scratch. Now it’s leaner, cleaner and with the same and double extra more unit tests.

✨ Features

An even smarter algorithm, now being able to detect missing opening bracket on a tag, for example. Even latest Chrome v.66 can’t do that.
Increased unit test assertion count from 148 to 370. Covering even more legit and stinky code cases.
opts.returnRangesOnly

1.4.0

11 May 2018

Set up Prettier
Removed package.lock and .editorconfig
Wired Rollup to remove comments from non-dev builds. It means, we can now leave the console.logs in the source code — Rollup will remove from production code.
Unit tests are pointing at ES modules build, which means that code coverage is correct now, without Babel functions being missed

1.3.0

19 Feb 2018

Now strips HTML comments too.

1.2.0

31 Dec 2017

Improvements to opts.stripTogetherWithTheirContents and done a lot of rebasing.

1.1.0

7 Dec 2017

Add opts.stripTogetherWithTheirContents

1.0.0

27 Nov 2017

First public release