Quick Take

import { strict as assert } from "assert";
import { isLangCode } from "is-language-code";

assert.deepEqual(isLangCode("de-419-DE"), {
  res: false,
  message: 'Two region subtags, "419" and "de".',

assert.deepEqual(isLangCode("sr-Latn"), {
  res: true,
  message: null,



This program tells, is a given string a valid language tag.

It is based on RFC #5646 opens in a new tab "Tags for Identifying Languages" which was released in 2009 and as of Jan 2020 it is still current opens in a new tab.

Language tags are used in many places, for example, in HTML attribute hreflang:

<link rel="alternate" href="http://example.com" hreflang="es-es" />

It's impossible to properly match the spec using regex only - you can validate that allowed characters are in allowed places but you can't validate the meaning those characters have. The position of subtag and arrangement matters. Also, this program returns explanations why it deemed the input not to be a language tag.

For example, de-419-DE is wrong because it contains two region tags, 419 and DE.

Existing regex-based solutions like ietf-language-tag-regex opens in a new tab don't have much of a logic besides enforcing subtag order and subtag length, for example, it reports any string, longer than two characters, as a valid language tag. We, on other hand, validate each value against known IANA-registered names.



In other words, a function which takes a string.

Theoretically, input string is optional — if the input is not a string or an empty string, a negative answer will be returned. The program is liberal and doesn't throw errors.

Returns a plain object:

Key's name Key value's type Description
res boolean Answers, is this valid language code
message null or string Explains what's wrong if answer is negative

For example,

res: false,
message: `Unrecognised language subtag, "posix".`


res: true,
message: null

Non-string or empty-string inputs always yield false, program does not throw.

Language tags are not case-sensitive (there exist conventions for the capitalization of some of the subtags but they don't carry meaning). For performance reasons, all references of the input uses lowercase, even if you entered in uppercase. For example, en-US-POSIX would get reported as lowercase "posix":

res: false,
message: `Unrecognised language subtag, "posix".`

By the way

Back in 1989, code iw was replaced with he so we won't include iw. Similar way, ji and in are not included.

The following codes have been added in 1989 (nothing later): ug (Uigur), iu (Inuktitut, also called Eskimo), za (Zhuang), he (Hebrew, replacing iw), yi (Yiddish, replacing ji), and id (Indonesian, replacing in). — https://www.ietf.org/rfc/rfc1766.txt


See it in the monorepo opens in a new tab, on GitHub.


To report bugs or request features or assistance, raise an issue on GitHub opens in a new tab.

Any code contributions welcome! All Pull Requests will be dealt promptly.


MIT opens in a new tab

Copyright © 2010–2021 Roy Revelt and other contributors

Related packages:

📦 ietf-language-tag-regex opens in a new tab
Regular expressions for matching IETF language tags (BCP 47)
📦 emlint 5.0.1
Pluggable email template code linter
📦 html-crush 5.0.1
Minifies HTML/CSS: valid or broken, pure or mixed with other languages
📦 stristri 4.0.1
Extracts or deletes HTML, CSS, text and/or templating tags from string
📦 string-strip-html 9.0.1
Strips HTML tags from strings. No parser, accepts mixed sources
📦 detect-is-it-html-or-xhtml 5.0.1
Answers, is the string input string more an HTML or XHTML (or neither)
📦 detect-templating-language 3.0.1
Detects various templating languages present in string