string-process-comma-separated1.2.14

Extracts chunks from possibly comma or whatever-separated string

§ Quick Take

import { strict as assert } from "assert";
import processCommaSeparated from "string-process-comma-separated";

const gatheredChunks = [];
const gatheredErrors = [];
const rawnbsp = "\u00a0";

// it's a callback-interface:
processCommaSeparated(
  `<FRAMESET rows=" ,,\t50% ,${rawnbsp} 50% ,\t\t,">`,
  {
    from: 16, // <- beginning of the attribute's value
    to: 35, // <- ending of the attribute's value
    separator: ",",
    cb: (idxFrom, idxTo) => {
      gatheredChunks.push([idxFrom, idxTo]);
    },
    errCb: (ranges, message) => {
      gatheredErrors.push({ ranges, message });
    },
  }
);

assert.deepEqual(gatheredChunks, [
  [20, 23],
  [27, 30],
]);

assert.deepEqual(gatheredErrors, [
  {
    ranges: [[16, 17]],
    message: "Remove whitespace.",
  },
  { ranges: [[17, 18]], message: "Remove separator." },
  { ranges: [[18, 19]], message: "Remove separator." },
  {
    ranges: [[19, 20]],
    message: "Remove whitespace.",
  },
  {
    ranges: [[23, 24]],
    message: "Remove whitespace.",
  },
  {
    ranges: [[25, 27]],
    message: "Remove whitespace.",
  },
  {
    ranges: [[30, 31]],
    message: "Remove whitespace.",
  },
  {
    ranges: [[32, 34]],
    message: "Remove whitespace.",
  },
  { ranges: [[31, 32]], message: "Remove separator." },
  { ranges: [[34, 35]], message: "Remove separator." },
]);

§ Purpose

Imagine, you need to extract and validate 50% and 50% of HTML attribute values: <FRAMESET rows="50%, 50%">.

The first algorithm idea seems simple:

str
.split(",")
.forEach(oneOfValues => {
...
})

But in real life, the proper extraction is quite complex and you need to cover all error cases:

  • There might be surrounding whitespace <FRAMESET rows=" 50%, 50% ">
  • There might be spaces after the comma - it might be OK or not - <FRAMESET rows=" 50%, 50% ">
  • Plain errors like leading comma - <FRAMESET rows=" ,, 50%, 50% ">
  • There might be non-space characters that look like space like NBSPopens in a new tab

This program helps to extract chunks of strings from potentially comma-separated list of string (it might be a single value, without commas).

Separator is configurable via opts.separator, so it might be not comma if you wish.

Errors are pinged to a separate callback function.

§ Usage

Same thing like in Array.forEach, we use callbacks, which allows you to tailor what happens with the values that the program gives you.

Here is quite a contrived example, too crazy to be real, but it shows the capabilities of the algorithm:

Instead of expected,

<frameset rows="50%,50%"></frameset>

we have:

<frameset rows=" ,,\t50% ,    50% ,\t\t,"></frameset>

The program above extracts both values 50% (string index ranges are fed to the callback, [20, 23] and [27, 30]) and reports all rogue spaces, tabs, non-breaking space and commas.

This program saves you time from having to tackle all those possible error cases: rogue separators, consecutive separators and spaces.

§ API

processCommaSeparated(str, [opts])

In other words, it's a function which takes two input arguments, second-one being optional (marked by square brackets).

§ API - Input

Input argumentKey value's typeObligatory?Description
inputStringyesInput string
optsPlain objectyesOptions Object. See below for its API.

If input arguments are supplied have any other types, an error will be thrown. Empty string or no options object (thus no callbacks) is fine, program will exit early.

§ Options Object

Main thing, you must pass the callbacks in the options object, cb and errCb:

An Options Object's keyType of its valueDefaultDescription
fromInteger or falsy0Where in the string does the comma-separated chunk start
toInteger or falsystr.lengthWhere in the string does the comma-separated chunk end
offsetInteger or falsy0Handy when you've been given cropped string and want to report real indexes. Offset adds that number to each reported index.
leadingWhitespaceOKBooleanfalseIs whitespace at the beginning of the range OK?
trailingWhitespaceOKBooleanfalseIs whitespace at the end of the range OK?
oneSpaceAfterCommaOKBooleanfalseCan values have space after comma?
innerWhitespaceAllowedBooleanfalseAfter we split into chunks, can those chunks have whitespace?
separatorString, non-whitespace,What is the separator character?
cbFunctionnullFunction to ping the extracted value ranges to
errCbFunctionnullFunction to ping the errors to

Here is the default options object in one place:

{
from: 0,
to: str.length,
offset: 0,
leadingWhitespaceOK: false,
trailingWhitespaceOK: false,
oneSpaceAfterCommaOK: false,
innerWhitespaceAllowed: false,
separator: ",",
cb: null,
errCb: null
}

§ API - Function's Output

The function does not return anything (it returns undefined to be precise) — you extract the values via callbacks.

§ API - opts.cb - INPUT

opts is a plain object. Its key's cb value must be a function.

Like in the example above — processCommaSeparated is a function, the second argument is the options object. Below, we set an arrow function to be cb value (you could pass a "normal", declared function as well).

const gatheredChunks = [];
...
processCommaSeparated(
`<FRAMESET...`,
{
...
cb: (idxFrom, idxTo) => {
gatheredChunks.push([idxFrom, idxTo]);
},
...
}
);

The program will pass two arguments to the callback function you pass:

Passed argument at positionWe call itTypeDescription
1idxFromIntegerWhere does the extracted value start
2idxToIntegerWhere does the extracted value end

For example, if you passed the whole string abc,def (we assume it's whole HTML attribute's value, already extracted) and didn't give opts.from and opts.to and thus, program traversed the whole string, it would ping your callback function with two ranges: [0, 3] and [4, 7]. Full code:

const processCommaSeparated = require("string-process-comma-separated");
const gatheredChunks = [];
processCommaSeparated("abc,def", {
cb: (idxFrom, idxTo) => {
gatheredChunks.push([idxFrom, idxTo]);
},
});
console.log(JSON.stringify(gatheredChunks, null, 4));
// => [
// [0, 3],
// [4, 7]
// ],

We omitted the error callback for brevity (opts.errCb, see its API below), here would be no errors anyway.

§ API - opts.cb - OUTPUT

Strictly speaking, function you pass as opts.cb value does not return anything, it's like Array.forEach(key => {}) — you don't expect that arrow function to return something, as in:

["abc", "def"].forEach((key) => {
return "whatever"; // <-- that returned value will be lost
});

Above, return does not matter; you grab key value and do things with it instead.

Same way with our program's callbacks.

§ API - opts.errCb - INPUT

Similar to opts.cb, here two arguments are passed into the callback function, only this time first one is ranges, second-one is message string.

const processCommaSeparated = require("string-process-comma-separated");
const gatheredChunks = [];
const gatheredErrors = [];
processCommaSeparated(`<FRAMESET rows="50%, 50%">`, {
from: 16,
to: 24,
cb: (idxFrom, idxTo) => {
gatheredChunks.push([idxFrom, idxTo]);
},
errCb: (ranges, message) => {
gatheredErrors.push({ ranges, message });
},
});
console.log(JSON.stringify(gatheredChunks, null, 4));
// => [
// [16, 19],
// [21, 24]
// ]
console.log(JSON.stringify(gatheredErrors, null, 4));
// => [
// {
// ranges: [[20, 21]],
// message: "Remove the whitespace."
// }
// ]
Passed argument at positionWe call itTypeDescription
1rangesArray of zero or more arraysRanges which indicate the "fix" recipe.
2messageStringMessage about the error.

A quick primer on ranges — each range is an array of two or three elements. First two match String.slice indexes. If an optional third is present, it means what to add instead. Two element range array — only deletion. Three element range array — replacement.

We have made more range processing libraries.

§ API - opts.errCb - OUTPUT

Same thing like in opts.cb — whatever your callback function returns does not matter. You take the values that are passed into function's arguments and do things with them. You don't return anything from the callback function.

["abc", "def"].forEach((key) => {
return "whatever"; // <-- that returned value will be lost
});

This returned string "whatever" will be discarded. It's not Array.map. Same with this program.

§ opts.innerWhitespaceAllowed

Sometimes comma-separated values are keywords — then we don't want to allow any whitespace between characters:

<input accept=".jpg,.g if,.png">
                      ^

But sometimes it's fine, like in media queries:

<link rel="stylesheet" media="screen and (max-width: 100px)" href="zzz.css" />
                                                    ^

Setting opts.innerWhitespaceAllowed by default doesn't allow inner whitespace within split chunk but you can turn it off.

§ Licence

MITopens in a new tab

Copyright © 2010–2020 Roy Revelt and other contributors

Related packages:

📦 detergent 5.11.7
Extracts, cleans and encodes text
📦 string-overlap-one-on-another 1.5.65
Lay one string on top of another, with an optional offset
📦 string-unfancy 3.9.65
Replace all n/m dashes, curly quotes with their simpler equivalents
📦 string-remove-thousand-separators 3.0.72
Detects and removes thousand separators (dot/comma/quote/space) from string-type digits
📦 string-extract-sass-vars 1.2.9
Parse SASS variables file into a plain object of CSS key-value pairs
📦 string-extract-class-names 5.9.32
Extract class (or id) name from a string
📦 string-left-right 2.3.31
Looks up the first non-whitespace character to the left/right of a given index