string-split-by-whitespace npm package

Installation

Choose the installation type:

Quick Take

Purpose

When String.split(/\s+) is not enough, for example, when you need to exclude certain substrings, this program will help.

It splits the string by whitespace — definition of “whitespace” being “anything that trims to zero-length” — that’s tabs, line breaks (CR and LF), space character and raw non-breaking space. There are quite few Unicode characters across the whole Unicode range.

API — `splitByW()`

The main function splitByW() is imported like this:

It’s a function which takes three input arguments:

Input argument	Type	Obligatory	Description
	`str` Type: String Obligatory: yes
`str`	String	yes	Source string.

	`opts` Type: Plain object Obligatory: no
`opts`	Plain object	no	Optional Options Object.

The Optional Options Object has the following shape:

Key	Type	Default	Description
	`ignoreRanges` Type: Array of zero or more range arrays Default: `[]`
`ignoreRanges`	Array of zero or more range arrays	`[]`	Feed zero or more string slice ranges, arrays of two natural number indexes, like `[[1, 5], [6, 10]]`. Algorithm will not include these string index ranges in the results.

Here are all defaults in one place for copying:

The function will return an array of zero or more strings. Empty string yields empty array.

`opts.ignoreRanges`

Some basics first. When we say “heads” or “tails”, we mean some templating literals that wrap a value. “heads” is frontal part, for example {{ below, “tails” is ending part, for example }} below:

Hi {{ firstName }}!

Now imagine that we extracted heads and tails and we know their ranges: [[3, 5], [16, 18]]. (If you select {{ and }} from in front of “Hi” to where each head and tail starts and ends, you’ll see that these numbers match).

Now, imagine, we want to split Hi {{ firstName }}! into array ["Hi", "firstname", "!"].

For that we need to skip two ranges, those of a head and tail.

That’s where opts.ignoreRanges become handy.

In example below, we used library string-find-heads-tails to extract the ranges of variables’ heads and tails in a string, then split by whitespace:

const input = "some interesting {{text}} {% and %} {{ some more }} text.";
const headsAndTails = strFindHeadsTails(
  input,
  ["{{", "{%"],
  ["}}", "%}"]
).reduce((acc, curr) => {
  acc.push([curr.headsStartAt, curr.headsEndAt]);
  acc.push([curr.tailsStartAt, curr.tailsEndAt]);
  return acc;
}, []);
const res1 = split(input, {
  ignoreRanges: headsAndTails,
});
console.log(`res1 = ${JSON.stringify(res1, null, 4)}`);
// => ['some', 'interesting', 'text', 'and', 'some', 'more', 'text.']

You can ignore whole variables, from heads to tails, including variable’s names:

const input = "some interesting {{text}} {% and %} {{ some more }} text.";
const wholeVariables = strFindHeadsTails(
  input,
  ["{{", "{%"],
  ["}}", "%}"]
).reduce((acc, curr) => {
  acc.push([curr.headsStartAt, curr.tailsEndAt]);
  return acc;
}, []);
const res2 = split(input, {
  ignoreRanges: wholeVariables,
});
// => ['some', 'interesting', 'text.']

We need to perform the array.reduce to adapt to the string-find-heads-tails output, which is in format (index numbers are only examples):

[
  {
    headsStartAt: ...,
    headsEndAt: ...,
    tailsStartAt: ...,
    tailsEndAt: ...,
  },
  ...
]

and with the help of array.reduce we turn it into our format:

(first example with res1)

[
  [headsStartAt, headsEndAt],
  [tailsStartAt, tailsEndAt],
  ...
]

(second example with res2)

[
  [headsStartAt, tailsEndAt],
  ...
]

API — `defaults`

You can import defaults:

It's a plain object:

The main function calculates the options to be used by merging the options you passed with these defaults.

API — `version`

You can import version:

Changelog

Open Changelog

string-split-by-whitespace4.0.3

Installation

Quick Take

Purpose

API — splitByW()

opts.ignoreRanges

API — defaults

API — version

Changelog

string-split-by-whitespace^4.0.3

API — `splitByW()`

`opts.ignoreRanges`

API — `defaults`

API — `version`