`codsen-parser` vs. `hyntax`

by Roy Revelt · posted on Wednesday, 03 March 2021 · js

Imagine the HTML: <a></b>.

Parser-wise, how would you design the AST architecture, considering cases like that?

hyntax HTML parser, for example, interprets it like this:

{
  "nodeType": "document",
  "content": {
    "children": [
      {
        "nodeType": "tag",
        "parentRef": "[Circular ~]",
        "content": {
          "openStart": {
            "type": "token:open-tag-start",
            "content": "<a",
            "startPosition": 0,
            "endPosition": 1
          },
          "name": "a",
          "openEnd": {
            "type": "token:open-tag-end",
            "content": ">",
            "startPosition": 2,
            "endPosition": 2
          },
          "selfClosing": false
        }
      }
    ]
  }
}

Notice the </b> is not even mentioned here!

In contrast, codsen-parser:

[
  {
    "type": "tag",
    "start": 0,
    "end": 3,
    "value": "<a>",
    "tagNameStartsAt": 1,
    "tagNameEndsAt": 2,
    "tagName": "a",
    "recognised": true,
    "closing": false,
    "void": false,
    "pureHTML": true,
    "kind": "inline",
    "attribs": [],
    "children": []
  },
  {
    "type": "tag",
    "start": 3,
    "end": 7,
    "value": "</b>",
    "tagNameStartsAt": 5,
    "tagNameEndsAt": 6,
    "tagName": "b",
    "recognised": true,
    "closing": true,
    "void": false,
    "pureHTML": true,
    "kind": "inline",
    "attribs": [],
    "children": []
  }
]

The takeaway is that “normal” parsers like hyntax are not aimed at tackling broken code. That’s why I’m working on codsen-parser and codsen-tokenizer. The tokenizer builds tokens, plain objects, and the parser consumes tokenizer, taking those tokens and nesting them into an object tree.

I decided to ship the parser and tokenizer as separate programs because somebody might not need the AST, the plain object tree.

codsen-parser vs. hyntax

`codsen-parser` vs. `hyntax`