string-unfancy5.0.1

Replace all n/m dashes, curly quotes with their simpler equivalents

Quick Take

import { strict as assert } from "assert";
import { unfancy } from "string-unfancy";

// U+2019
// https://www.fileformat.info/info/unicode/char/2019/index.htm
// https://mothereff.in/js-escapes
const rightSingleQuote = "\u2019";

assert.equal(
  unfancy(`someone${rightSingleQuote}s`),
  "someone's"
);

// works with encoded HTML:
assert.equal(unfancy("someone’s"), "someone's");

Idea

Convert typographically-correct opens in a new tab characters (like curly quotes opens in a new tab or m-dashes opens in a new tab) to their basic counterparts (like apostrophes or hyphens).

It's the opposite of detergent and string-apostrophes.

It's used in ASCII-restricted places where encoding is too unwieldy, for example, image alt attribute values in email templates. Or stripping down the formatted markdown value, removing backticks and so on.

API

unfancy(str)

Caveat: if the input is not a string it will throw.

Function returns a string.

Example - Gulp streams

If you are using Gulp to build email templates, you can tap the stream, apply a function to it, then within that function, replace opens in a new tab all instances of alt="..." with their unfancied versions.

First, you need to require gulp-tap opens in a new tab and string-unfancy:

const tap = require("gulp-tap");
const unfancy = require("string-unfancy");

Then, tap your main build task's stream, probably towards the end of the pipeline:

...
.pipe(tap((file) => {
file.contents = Buffer.from(unfancy(file.contents.toString()))
}))
.pipe(gulp.dest('dist')) // that's the final write happening, yours might be different
...

Then, declare a function somewhere within your gulpfile.js:

function unfancy(input) {
input = input.replace(/alt="[^"]*"/g, (el) => {
return unfancy(el);
});
return input;
}

As you see above, we're running an inline function opens in a new tab upon all regex-matched characters.

And that's it! All image alt attributes will lose their HTML encoding and will have their fancy special characters converted to simple ASCII letter equivalents.

Can we use lodash.deburr instead?

No. It won't even convert opens in a new tab a single m-dash! It's a different tool for a different purpose.

Changelog

See it in the monorepo opens in a new tab, on GitHub.

Contributing

To report bugs or request features or assistance, raise an issue on GitHub opens in a new tab.

Any code contributions welcome! All Pull Requests will be dealt promptly.

Licence

MIT opens in a new tab

Copyright © 2010–2021 Roy Revelt and other contributors

Related packages:

📦 detergent 8.0.1
Extracts, cleans and encodes text
📦 html-img-alt 3.0.1
Adds missing alt attributes to img tags. Non-parsing
📦 string-convert-indexes 5.0.1
Convert between native JS string character indexes and grapheme-count-based indexes
📦 string-find-heads-tails 5.0.1
Finds where are arbitrary templating marker heads and tails located
📦 string-overlap-one-on-another 3.0.1
Lay one string on top of another, with an optional offset
📦 string-split-by-whitespace 3.0.1
Split string into array by chunks of whitespace
📦 string-fix-broken-named-entities 6.0.1
Finds and fixes common and not so common broken named HTML entities, returns ranges array of fixes