string-convert-indexes3.0.1
§ Quick Take
import { strict as assert } from "assert";
import {
nativeToUnicode,
unicodeToNative,
} from "string-convert-indexes";
// CONVERTING NATIVE JS INDEXES TO UNICODE-CHAR-COUNT-BASED
// 𝌆 - \uD834\uDF06
// at index 1, we have low surrogate, that's still grapheme index zero
assert.equal(
nativeToUnicode("\uD834\uDF06aa", "1"),
"0"
);
// notice it's retained as string. The same type as input is retained!
// at index 2, we have first letter a - that's second index, counting graphemes
assert.equal(nativeToUnicode("\uD834\uDF06aa", 3), 2);
// convert many indexes at once - any nested data structure is fine:
assert.deepEqual(
nativeToUnicode("\uD834\uDF06aa", [1, 0, 2, 3]),
[0, 0, 1, 2]
);
// numbers from an AST-like complex structure are still picked out and converted:
assert.deepEqual(
nativeToUnicode("\uD834\uDF06aa", [
1,
"0",
[[[2]]],
3,
]),
[
0, // notice matching type is retained
"0", // notice matching type is retained
[[[1]]],
2,
]
);
// CONVERTING UNICODE-CHAR-COUNT-BASED TO NATIVE JS INDEXES
// 𝌆 - \uD834\uDF06
assert.deepEqual(
unicodeToNative("\uD834\uDF06aa", [0, 1, 2]),
[0, 2, 3]
);
assert.deepEqual(
unicodeToNative("\uD834\uDF06aa", [1, 0, 2]),
[2, 0, 3]
);
assert.throws(() =>
unicodeToNative("\uD834\uDF06aa", [1, 0, 2, 3])
);
// throws an error!
// that's because there's no character (counting Unicode characters) with index 3
// we have only three Unicode characters, so indexes go only up until 2
§ Idea
Native JS string index system is not based on grapheme count — while "a" length is one, emoji "🧢" is two-character-long, because it's two characters actually, \uD83E
and \uDDE2
.
In ideal world, JS string index system would count emoji as one character-long. That's so-called grapheme-based index system. Letter "a" and cap emoji "🧢" are both graphemes.
This program is a converter that converts between the two systems, it's based on grapheme-splitter
.
§ API
This program exports two functions:
nativeToUnicode(str, indexes)
It converts JS native indexes to indexes (used in let's say String.slice()
), based on grapheme count.
... and ...
unicodeToNative(str, indexes)
It converts grapheme count-based indexes to JS native indexes.
§ API - Input
API for both functions, nativeToUnicode()
and unicodeToNative()
is the same:
Input argument | Type | Obligatory? | Description |
---|---|---|---|
str | String | yes | The string in which you want to perform a search |
indexes | Whatever | yes | Normally a natural number or zero but it can be numeric string or nested AST of thereof. |
§ Changelog
See it in the monorepo , on Sourcehut.
§ Licence
Copyright © 2010–2020 Roy Revelt and other contributors