§ Quick Take

import { strict as assert } from "assert";
import { splitByW } from "string-split-by-whitespace";

// Split by whitespace is easy - use native String.prototype.split()
assert.deepEqual("abc  def ghi".split(/\s+/), [
  "abc",
  "def",
  "ghi",
]);

const source = `\n     \n    a\t \nb    \n      \t`;

// this program is nearly equivalent to regex-based split:
assert.deepEqual(source.split(/\s+/), [
  "",
  "a",
  "b",
  "",
]);
assert.deepEqual(splitByW(source), ["a", "b"]);
// regex-based split needs more filtration but it's native solution

// ADDITIONALLY...

// this program allows to exclude certain index ranges:
assert.deepEqual(
  splitByW("a b c d e", {
    ignoreRanges: [[0, 2]], // that's "a" and space after it
  }),
  ["b", "c", "d", "e"]
);

§ Purpose

When String.split(/\s+/) is not enough, for example, when you need to exclude certain substrings, this program will help.

It splits the string by whitespace — definition of "whitespace" being "anything that trims to zero-length" — that's tabs, line breaks (CR and LF), space character and raw non-breaking space. There are quite few Unicode characters across the whole Unicode range.

§ API

splitByW(str, [opts])

In other words, it's a function which takes two input arguments, second-one being optional (marked by square brackets).

§ API - Input

Input argumentTypeObligatory?Description
strStringyesSource string upon which to perform the operation
optsPlain objectnoOptional Options Object, see below for its API

§ An Optional Options Object

Optional Options Object's keyType of its valueDefaultDescription
ignoreRangesArray of zero or more range arrays[]Feed zero or more string slice ranges, arrays of two natural number indexes, like [[1, 5], [6, 10]]. Algorithm will not include these string index ranges in the results.

The opts.ignoreRanges can be an empty array, but if it contains anything else then arrays inside, error will be thrown.

§ API - Output

Program returns array of zero or more strings. Empty string yields empty array.

§ opts.ignoreRanges

Some basics first. When we say "heads" or "tails", we mean some templating literals that wrap a value. "heads" is frontal part, for example {{ below, "tails" is ending part, for example }} below:

Hi {{ firstName }}!

Now imagine that we extracted heads and tails and we know their ranges: [[3, 5], [16, 18]]. (If you select {{ and }} from in front of "Hi" to where each head and tail starts and ends, you'll see that these numbers match).

Now, imagine, we want to split Hi {{ firstName }}! into array ["Hi", "firstname", "!"].

For that we need to skip two ranges, those of a head and tail.

That's where opts.ignoreRanges become handy.

In example below, we used library string-find-heads-tails to extract the ranges of variables' heads and tails in a string, then split by whitespace:

const input = "some interesting {{text}} {% and %} {{ some more }} text.";
const headsAndTails = strFindHeadsTails(
input,
["{{", "{%"],
["}}", "%}"]
).reduce((acc, curr) => {
acc.push([curr.headsStartAt, curr.headsEndAt]);
acc.push([curr.tailsStartAt, curr.tailsEndAt]);
return acc;
}, []);
const res1 = split(input, {
ignoreRanges: headsAndTails,
});
console.log(`res1 = ${JSON.stringify(res1, null, 4)}`);
// => ['some', 'interesting', 'text', 'and', 'some', 'more', 'text.']

You can ignore whole variables, from heads to tails, including variable's names:

const input = "some interesting {{text}} {% and %} {{ some more }} text.";
const wholeVariables = strFindHeadsTails(
input,
["{{", "{%"],
["}}", "%}"]
).reduce((acc, curr) => {
acc.push([curr.headsStartAt, curr.tailsEndAt]);
return acc;
}, []);
const res2 = split(input, {
ignoreRanges: wholeVariables,
});
// => ['some', 'interesting', 'text.']

We need to perform the array.reduce to adapt to the string-find-heads-tails output, which is in format (index numbers are only examples):

[
{
headsStartAt: ...,
headsEndAt: ...,
tailsStartAt: ...,
tailsEndAt: ...,
},
...
]

and with the help of array.reduce we turn it into our format:

(first example with res1)

[
[headsStartAt, headsEndAt],
[tailsStartAt, tailsEndAt],
...
]

(second example with res2)

[
[headsStartAt, tailsEndAt],
...
]

§ Changelog

See it in the monorepo opens in a new tab, on GitHub.

§ Contributing

To report bugs or request features or assistance, raise an issue on GitHub opens in a new tab.

Any code contributions welcome! All Pull Requests will be dealt promptly.

§ Licence

MIT opens in a new tab

Copyright © 2010–2021 Roy Revelt and other contributors

Related packages:

📦 detergent 7.0.14
Extracts, cleans and encodes text
📦 string-collapse-white-space 9.0.14
Replace chunks of whitespace with a single spaces
📦 string-character-is-astral-surrogate 1.12.14
Tells, is given character a part of astral character, specifically, a high and low surrogate
📦 string-match-left-right 7.0.8
Match substrings on the left or right of a given index, ignoring whitespace
📦 string-process-comma-separated 2.0.14
Extracts chunks from possibly comma or whatever-separated string
📦 string-find-heads-tails 4.0.14
Finds where are arbitrary templating marker heads and tails located
📦 string-remove-duplicate-heads-tails 5.0.14
Detect and (recursively) remove head and tail wrappings around the input string