string-unfancy4.0.14

Replace all n/m dashes, curly quotes with their simpler equivalents

§ Quick Take

import { strict as assert } from "assert";
import { unfancy } from "string-unfancy";

// U+2019
// https://www.fileformat.info/info/unicode/char/2019/index.htm
// https://mothereff.in/js-escapes
const rightSingleQuote = "\u2019";

assert.equal(
  unfancy(`someone${rightSingleQuote}s`),
  "someone's"
);

// works with encoded HTML:
assert.equal(unfancy("someone’s"), "someone's");

§ Idea

Convert typographically-correct opens in a new tab characters (like curly quotes opens in a new tab or m-dashes opens in a new tab) to their basic counterparts (like apostrophes or hyphens).

It's the opposite of detergent and string-apostrophes.

It's used in ASCII-restricted places where encoding is too unwieldy, for example, image alt attribute values in email templates. Or stripping down the formatted markdown value, removing backticks and so on.

§ API

unfancy(str)

Caveat: if the input is not a string it will throw.

Function returns a string.

§ Example - Gulp streams

If you are using Gulp to build email templates, you can tap the stream, apply a function to it, then within that function, replace opens in a new tab all instances of alt="..." with their unfancied versions.

First, you need to require gulp-tap opens in a new tab and string-unfancy:

const tap = require("gulp-tap");
const unfancy = require("string-unfancy");

Then, tap your main build task's stream, probably towards the end of the pipeline:

...
.pipe(tap((file) => {
file.contents = Buffer.from(unfancy(file.contents.toString()))
}))
.pipe(gulp.dest('dist')) // that's the final write happening, yours might be different
...

Then, declare a function somewhere within your gulpfile.js:

function unfancy(input) {
input = input.replace(/alt="[^"]*"/g, (el) => {
return unfancy(el);
});
return input;
}

As you see above, we're running an inline function opens in a new tab upon all regex-matched characters.

And that's it! All image alt attributes will lose their HTML encoding and will have their fancy special characters converted to simple ASCII letter equivalents.

§ Can we use lodash.deburr instead?

No. It won't even convert opens in a new tab a single m-dash! It's a different tool for a different purpose.

§ Changelog

See it in the monorepo opens in a new tab, on GitHub.

§ Contributing

To report bugs or request features or assistance, raise an issue on GitHub opens in a new tab.

Any code contributions welcome! All Pull Requests will be dealt promptly.

§ Licence

MIT opens in a new tab

Copyright © 2010–2021 Roy Revelt and other contributors

Related packages:

📦 detergent 7.0.14
Extracts, cleans and encodes text
📦 html-img-alt 2.0.14
Adds missing alt attributes to img tags. Non-parsing
📦 string-remove-widows 2.0.14
Helps to prevent widow words in a text
📦 string-left-right 4.0.14
Looks up the first non-whitespace character to the left/right of a given index
📦 string-match-left-right 7.0.8
Match substrings on the left or right of a given index, ignoring whitespace
📦 string-collapse-leading-whitespace 5.0.14
Collapse the leading and trailing whitespace of a string
📦 string-uglify 1.4.14
Shorten sets of strings deterministically, to be git-friendly