Installation
Quick Take
Examples
Open is-language-code playgroundPurpose
This program tells, is a given string a valid language tag.
It is based on RFC #5646 “Tags for Identifying Languages” which was released in 2009 and as of Jan 2020 it is still current.
Language tags are used in many places, for example, in HTML attribute hreflang
:
<link rel="alternate" href="http://example.com" hreflang="es-es" />
It’s impossible to properly match the spec using regex only — you can validate that allowed characters are in allowed places but you can’t validate the meaning those characters have. The position of subtag and arrangement matters. Also, this program returns explanations why it deemed the input not to be a language tag.
For example, de-419-DE
is wrong because it contains two region tags, 419
and DE
.
Existing regex-based solutions like ietf-language-tag-regex
don’t have much of a logic besides enforcing subtag order and subtag length, for example, it reports any string, longer than two characters, as a valid language tag. We, on other hand, validate each value against known IANA-registered names.
API — isLangCode()
The main function isLangCode()
is imported like this:
It’s a function which takes one input argument:
Theoretically, the input string is optional — if the input is not a string or an empty string, a false
will be returned. The program is liberal and never throws errors.
Function returns a plain object:
{
res: boolean,
message: null | string
}
Key’s name | Type | Description |
---|---|---|
res Type: boolean | ||
res | boolean | Answers, is this a valid language code |
message Type: null or string | ||
message | null or string | Explains what’s wrong if the answer is negative |
For example,
{
res: false,
message: `Unrecognised language subtag, "posix".`
}
or
{
res: true,
message: null
}
Non-string or empty-string inputs always yield false
, program does not throw.
Language tags are not case-sensitive (there exist conventions for the capitalization of some of the subtags but they don’t carry meaning). For performance reasons, all references of the input uses lowercase, even if you entered in uppercase. For example, en-US-POSIX
would get reported as lowercase “posix”:
{
res: false,
message: `Unrecognised language subtag, "posix".`
}
API — version
You can import version
:
By the way
Back in 1989, code iw
was replaced with he
so we won’t include iw
. Similar way, ji
and in
are not included.
The following codes have been added in 1989 (nothing later): ug (Uigur), iu (Inuktitut, also called Eskimo), za (Zhuang), he (Hebrew, replacing iw), yi (Yiddish, replacing ji), and id (Indonesian, replacing in). — https://www.ietf.org/rfc/rfc1766.txt