string-convert-indexes2.0.2

Convert between native JS string character indexes and grapheme-count-based indexes

§ Quick Take

import { strict as assert } from "assert";
import {
  nativeToUnicode,
  unicodeToNative,
} from "string-convert-indexes";

// CONVERTING NATIVE JS INDEXES TO UNICODE-CHAR-COUNT-BASED
// 𝌆 - \uD834\uDF06

// at index 1, we have low surrogate, that's still grapheme index zero
assert.equal(
  nativeToUnicode("\uD834\uDF06aa", "1"),
  "0"
);
// notice it's retained as string. The same type as input is retained!

// at index 2, we have first letter a - that's second index, counting graphemes
assert.equal(nativeToUnicode("\uD834\uDF06aa", 3), 2);

// convert many indexes at once - any nested data structure is fine:
assert.deepEqual(
  nativeToUnicode("\uD834\uDF06aa", [1, 0, 2, 3]),
  [0, 0, 1, 2]
);

// numbers from an AST-like complex structure are still picked out and converted:
assert.deepEqual(
  nativeToUnicode("\uD834\uDF06aa", [
    1,
    "0",
    [[[2]]],
    3,
  ]),
  [
    0, // notice matching type is retained
    "0", // notice matching type is retained
    [[[1]]],
    2,
  ]
);

// CONVERTING UNICODE-CHAR-COUNT-BASED TO NATIVE JS INDEXES
// 𝌆 - \uD834\uDF06

assert.deepEqual(
  unicodeToNative("\uD834\uDF06aa", [0, 1, 2]),
  [0, 2, 3]
);

assert.deepEqual(
  unicodeToNative("\uD834\uDF06aa", [1, 0, 2]),
  [2, 0, 3]
);

assert.throws(() =>
  unicodeToNative("\uD834\uDF06aa", [1, 0, 2, 3])
);
// throws an error!
// that's because there's no character (counting Unicode characters) with index 3
// we have only three Unicode characters, so indexes go only up until 2

§ Idea

Native JS string index system is not based on grapheme count — while "a" length is one, emoji "🧢" is two-character-long, because it's two characters actually, \uD83E and \uDDE2.

In ideal world, JS string index system would count emoji as one character-long. That's so-called grapheme-based index system. Letter "a" and cap emoji "🧢" are both graphemes.

This program is a converter that converts between the two systems, it's based on grapheme-splitter opens in a new tab.

§ API

This program exports two functions:

nativeToUnicode(str, indexes)

It converts JS native indexes to indexes (used in let's say String.slice()), based on grapheme count.

... and ...

unicodeToNative(str, indexes)

It converts grapheme count-based indexes to JS native indexes.

§ API - Input

API for both functions, nativeToUnicode() and unicodeToNative() is the same:

Input argumentTypeObligatory?Description
strStringyesThe string in which you want to perform a search
indexesWhateveryesNormally a natural number or zero but it can be numeric string or nested AST of thereof.

§ Licence

MIT opens in a new tab

Copyright © 2010–2020 Roy Revelt and other contributors

Related packages:

📦 ast-monkey-traverse 1.12.20
Utility library to traverse AST
📦 string-trim-spaces-only 2.8.23
Like String.trim() but you can choose granularly what to trim
📦 string-split-by-whitespace 1.6.72
Split string into array by chunks of whitespace
📦 string-apostrophes 1.2.30
Comprehensive, HTML-entities-aware tool to typographically-correct the apostrophes and single/double quotes
📦 string-extract-class-names 5.9.32
Extract class (or id) name from a string
📦 string-find-heads-tails 3.16.16
Finds where are arbitrary templating marker heads and tails located