string-find-malformed open source npm package

Installation

Choose the installation type:

Quick Take

The Purpose

We need a program to help to find malformed string instances.

For example, consider opening HTML comment tag, <!--.

There can be many things wrong with it:

Missing characters from the set, for example, <-- or <!-
Rogue characters present between characters in the set, for example: <!-.- or <z!--
Also rogue whitespace characters: <! -- or <!- -

Basically, something too similar to what we are looking for, but not exactly the same.

Idea

Levenshtein distance is a number which signifies, how many character changes is needed to turn one string into another.

In technical terms, for example, we would look for a set of characters Levenshtein distance 1, but disregarding the whitespace.

Difference between dog, dot is 1 (“g” needs to be changed into “t”).

Another thing, not all characters are equal (sorry for a pun) — a whitespace should/could be disregarded. For example, five spaces is not the same as any five characters: <! -- is definitely an instance of malformed <!-- but <!<a id-- is very weird — even though both might be Levenshtein distance 5.

Takeaway — program will aggressively chomp the whitespace but it will be sensitive to all other characters.

API — `findMalformed()`

The main function findMalformed() is imported like this:

It’s a function which takes three input arguments:

Input argument	Type	Obligatory	Description
	`str` Type: String Obligatory: yes
`str`	String	yes	The string in which you want to perform a search

	`refStr` Type: String Obligatory: yes
`refStr`	String	yes	What to look for

	`cb` Type: Function Obligatory: yes
`cb`	Function	yes	You supply a callback function. It will be called on each finding. See its API below.

	`opts` Type: Plain object Obligatory: no
`opts`	Plain object	no	Optional Options Object.

None of the input arguments will be mutated by this program, we have unit tests to prove that.

The Optional Options Object has the following shape:

Key	Type	Default	Description
	`stringOffset` Type: Natural number or zero Default: `0`
`stringOffset`	Natural number or zero	`0`	Every index fed to the callback will be incremented by this much.

	`maxDistance` Type: Natural number or zero Default: `1`
`maxDistance`	Natural number or zero	`1`	Controls, how many characters can differ before we disregard the particular chunk as a result, Levenshtein distance

	`ignoreWhitespace` Type: Boolean Default: `true`
`ignoreWhitespace`	Boolean	`true`	Whitepace (characters that trim to zero length) is skipped by default.

Here are all defaults in one place for copying:

The function will return undefined because it has a callback-style API, same like Array.prototype.forEach() for example.

API — a callback input argument

The third input argument is a callback function that you supply. When a result is found, this function is called and a plain object is passed to function’s first argument.

For example:

import { findMalformed } from "string-find-malformed";
// we create an empty array to dump the results into
const gathered = [];
// we call the function
findMalformed(
  // first input argument: source
  "abcdef",
  // second input argument: what to look for but mangled
  "bde",
  // callback function:
  (obj) => {
    gathered.push(obj);
  },
  // empty options object:
  {}
);
console.log(gathered);
// => [
//      {
//        idxFrom: 1,
//        idxTo: 5
//      }
//    ]

// you can double-check with String.slice():
console.log(abcdef.slice(1, 5));
// => "bcde"
// it's mangled because rogue letter "c" is between the "good" letters.

The result above means, mangled bde is present in abcdef on indexes range from 1 to 5. The indexes follow the same principles as in String.slice().

API — `defaults`

You can import defaults:

It's a plain object:

The main function calculates the options to be used by merging the options you passed with these defaults.

API — `version`

You can import version:

Further Ideas

Nobody would mistype “owned” as “ewned” — “fat finger” errors occur on vicinity keys, in this case, “o” can be mistyped with “i” or “p” because those keys are near. Key “e” is far, it’s unrealistic.

In this light, Levenshtein distance is not strictly suited for purpose. Alternative version of it should be written, where algorithm considers both distance AND neighbouring keys and evaluates accordingly.

Changelog

Open Changelog

string-find-malformed4.0.3