§ Quick Take

import { strict as assert } from "assert";
import { comb, defaults, version } from "email-comb";

// aptly named classes:
const source = `<head>
<style type="text/css">
.unused1[z] {a:1;}
.used[z] {a:2;}
</style>
</head>
<body class="  used  "><a class="used unused3">z</a>
</body>
`;

const intended = `<head>
<style type="text/css">
.used[z] {a:2;}
</style>
</head>
<body class="used"><a class="used">z</a>
</body>
`;

assert.equal(comb(source).result, intended);

§ Idea

This library removes unused CSS from HTML without parsing it

STRENGTHS:

  • Aimed at Email development but works everywhere where CSS is contained within the same HTML file (no external stylesheets)
  • Accepts HTML mixed with other templating/programming languages
  • Works on broken or incomplete or invalid HTML/XHTML code
  • Works on both classes and id's
  • Optionally uglifies the class or id names
  • The algorithm will cope with style tags inside the body tag or multiple style tags
  • Can strip CSS and HTML comments; recognises Outlook conditional comments (both "if-Outlook" and "if-not-Outlook")
  • Has email-specific features like removing empty Outlook conditional comments
  • Attempts to fix some code issues, for example, remove space in < body (which would otherwise break in Chrome)
  • API contains no file I/O operations or anything front-end-related — it's "string-in, string-out"
  • All dependencies are either our own or Lodash's or Mr Sorhus'
  • CommonJS, ES Modules and UMD builds available, published to npm and available to consume via CDN's like jsdelivr.comopens in a new tab
  • Complete console logging set and retained in the source (which is automatically removed from builds)
  • Modern setup: node-tap tests pointing at ES Modules build, Rollup bundling the builds, coverage high, prettier and ESLint in place
  • It's not opinionated - it won't W3C-validate, enforce DOCTYPE's or add any new code to your code. Some parsers, for example, just can't stand an HTML without a DOCTYPE.
  • It's quite fast. We measure performance on a file with 2,000 redundant css styles and it takes less than a second.

WEAKNESSES:

  • This is typical for non-parsing programs — broken code normally breaks parsers and when using parser-based programs, that's how you find out something's wrong with your code. EmailComb, being a non-parsing program, will never break! That means, you have to find other means (like linters) to detect, is your code broken. This might be a strength or a weakness, depends how you look at it.
  • Does not support external stylesheets or JS injecting more classes (because it's an email development-oriented tool)

COMPETITORS:

We believe that being an email-oriented tool, for email templates, EmailComb is superior to all web-development-oriented unused CSS removal tools out there:

But try yourselves.

§ API

This package exports a plain object: { comb, defaults, version }:

  • Key comb has a value which is the main function, you will call that function like this: comb()
  • Key defaults has a value, a plain object, which is defaults of the main function
  • Key version is a string, for example, "2.0.12" and mirrors same key package.json
comb(str, [options]);

§ API - Input

The main function comb which you require/import

import { comb } from "email-comb";

takes two input arguments:

Input argumentTypeObligatory?Description
strStringyesYour HTML file contents, as string
optionsPlain objectnoAny options, as a plain object, see below

For example,

// Require it first. You get a function which you can feed with strings:
const { comb } = require("email-comb");
// Let's define a string to work upon:
const html = '<html>zzz</html><body class="class-1">zzz</body>';
// Assign a new string to the output of this library:
const { result } = comb(html, {
whitelist: [".class-1", "#id-1", ".module-*"],
});
// Log its result:
console.log("result = " + JSON.stringify(result, null, 4));

§ API - Optional Options Object

Optionally, you can pass the Optional Options Object as a second argument:

Options object's keyTypeDefaultExampleDescription
whitelistArray[][".class-1", "#id-1", ".module-*"]List all classes or id's you want this library to ignore. You can use all matcheropens in a new tab patterns.
backendArray[][{ heads: "{{", tails: "}}" }, { heads: "{%", tails: "%}" }]If your code has back-end code within clss or id values, for example, class="{{ red }} main-box" you can stop {{, red and }} to be treated as class names
uglifyBooleanfalsen/aWill rename all class and id names to be few characters-long. This might reduce your file size by another kilobyte.
removeHTMLCommentsBooleantruen/aWhen enabled, all HTML comments (<!-- to -->) will be removed
removeCSSCommentsBooleantruen/aWhen enabled, all CSS comments (/* to */) will be removed
doNotRemoveHTMLCommentsWhoseOpeningTagContainsArray of zero or more insensitive strings["[if", "[endif"]n/aEmail code often contains Outlook or IE conditional comments which you probably don't want to remove. Whatever strings you list here, if comment's opening tag will contain these, that tag will not be removed.
reportProgressFuncFunction or something falsynulln/aIf supplied, it will ping the function you assign passing current percentage done (natural number) as an input argument
reportProgressFuncFromNatural number0n/aBy default, percentages are reported from 0 to 100. This value overrides this starting percentage value.
reportProgressFuncToNatural number100n/aBy default, percentages are reported from 0 to 100. This value overrides this ending percentage value.

Here are all options in one place in case you need to copy the whole thing:

{
whitelist: [],
backend: [],
uglify: false,
removeHTMLComments: true,
removeCSSComments: true,
doNotRemoveHTMLCommentsWhoseOpeningTagContains: ["[if", "[endif"],
reportProgressFunc: null,
reportProgressFuncFrom: 0,
reportProgressFuncTo: 100,
}

§ API - Output

For example, output could look like this:

{
log: {
timeTakenInMiliseconds: 55,
traversedTotalCharacters: 504,
traversedTimesInputLength: 4.24,
originalLength: 118,
cleanedLength: 87,
bytesSaved: 32,
percentageReducedOfOriginal: 27,
nonIndentationsWhitespaceLength: 9,
nonIndentationsTakeUpPercentageOfOriginal: 8,
commentsLength: 10,
commentsTakeUpPercentageOfOriginal: 1,
},
result: "<html>...",
countAfterCleaning: 3,
countBeforeCleaning: 15,
allInHead: allClassesAndIdsWithinHead,
allInBody: allClassesAndIdsWithinBody,
deletedFromHead: [".unused1", ".unused2"],
deletedFromBody: [".unused1", ".unused1", "#unused1"],
}

So a plain object is returned. It will have the following keys:

KeyIts value's typeDescription
logPlain objectVarious information about performed operations
resultStringA string containing cleaned HTML
countBeforeCleaningNumberHow many unique classes and id's were in total before cleaning
countAfterCleaningNumberHow many unique classes and id's were in total after cleaning
allInHeadArrayDeduped and sorted array of all classes and id's between <head> tags
allInBodyArrayDeduped and sorted array of all classes and id's between <body> tags
deletedFromHeadArrayArray of classes/id's that were deleted inside <head> at least once^
deletedFromBodyArrayArray of classes/id's that were deleted inside <body> at least once^

^ To be very precise, if class or id name was deleted at once, it gets in this list. Mind you, some used classes or id's can be sandwiched with unused (.used.unused) and end up removed in some instances and get reported here, but it does not mean they were removed completely as species.

§ opts.whitelist

Since the main purpose of this library is to clean email HTML, it needs to cater for email code specifics. One of them is that CSS styles will contain fix or hack styles, meant for email software. For example, here are few of them:

#outlook a { padding:0; } .ReadMsgBody { width:100%; }

.ExternalClass fixes are not needed any more in email templates, see email-bugs/issues/4opens in a new tab

You will not be using these classes within the <body> of your HTML code, so they would get removed as "unused" because they are present in <head> only. To avoid that, pass the classes, and id's in the whitelist key's value, as an array. For example:

var html = "<!DOCTYPE html>...";
comb(html, {
whitelist: ["#outlook", ".ExternalClass", ".ReadMsgBody"],
});

You can also use a wildcard, for example in order to whitelist classes module-1, module-2 ... module-99, module-100, you can simply whitelist them as module-*:

var html = "<!DOCTYPE html>...";
comb(html, {
whitelist: [".module-*"],
});
// => all class names that begin with ".module-" will not be touched by this library.

§ opts.backend

This library, differently from competition, is aiming to support code which contains back-end code: other programming languages (Java JSP's), other templating languages (like Nunjucks) and/or proprietary ESP templating languages.

All different languages can be present in the input source, and parser won't care, EXCEPT when they are in class or id names. For example, <td class="mt10 {{ module.on }} module-box blackbg". Notice how {{ module.on }} sits in the middle and it's variable value from a different programming language. Eventually, it will be rendered into strings on or off but at this stage, this is raw, unrendered template and we want to remove all unused CSS from it.

It's possible to clean this too.

If you let this library know how are your back-end language's variables marked, for example, that "heads" are {{ and "tails" are }} (as in Hi {{data.firstname}}), the algorithm will ignore all variables within class or id names.

If you don't put templating variables into classes or id's, don't use the feature because it still costs computing resources to perform those checks.

Here's an example:

// Require it first. You get a function which you can feed with strings.
// Notice you can name it any way you want (because in the source it's using "export default").
const { comb } = require("email-comb");

// Let's define a string equal to some processed HTML:
const res = comb(
`<!doctype html>
<html>
<head>
<style>
.aaa {
color: black;
}
</style></head>
<body class="{% var1 %}">
<div class="{{ var2 }}">
</div>
</body>
</html>
`
,
{
// <------------ Optional Options Object - second input argument of our function, remove()
backend: [
{
heads: "{{", // define heads and tails in pairs
tails: "}}",
},
{
heads: "{%", // second pair
tails: "%}",
},
],
}
).result; // <------ output of this library is a plain object. String result is in a key "result". We grab it here.

// Log the result:
console.log("res =\n" + res);
// res =
// <!doctype html>
// <html>
// <head>
// </head>
// <body class="{% var1 %}">
// <div class="{{ var2 }}">
// </div>
// </body>
// </html>
//

In templating languages, it's also possible to have IF-ELSE clauses. For example, in Nunjucks, you can have:

<td class="db{% if module_on || oodles %}on{% else %}off{% endif %} pt10"></td>

db and pt10 are normal CSS class names, but everything else between {% and %} is Nunjucks code.

Now, in those cases, notice that Nunjucks code is only wrapping the variables. Even if you set heads to {% and tails to %}, classes on and off will not get ignored and theoretically can get removed!!!

The solution is to ensure that all back-end class names are contained within back-end tags. With Nunjucks, it is easily done by performing calculations outside class= declarations, then assigning the calculation's result to a variable and using the variable instead.

For example, let's rewrite the same snippet used above:

{% set switch = 'off' %} {% if module_on || oodles %} {% set switch = 'on' %} {%
else %}
<td class="db {{ switch }} pt10"></td>

Now, set heads to {{ and tails to }} and switch will be ignored completely.

§ Tapping the stream in Gulp

In Gulp, everything flows as vinyl Buffer streams. You could tapopens in a new tab the stream, convert it to string, perform the operations (like remove unused CSS), then convert it back to Buffer and place the stream back. We wanted to come up with a visual analogy example using waste pipes but thought we'd rather won't.

Code-wise, here's the idea:

const tap = require("gulp-tap");
const { comb } = require("email-comb");
const util = require("gulp-util");
const whitelist = [
".External*",
".ReadMsgBody",
".yshortcuts",
".Mso*",
"#outlook",
".module*",
];

gulp.task("build", () => {
return gulp.src("emails/*.html").pipe(
tap((file) => {
const cleanedHtmlResult = comb(file.contents.toString(), {
whitelist,
});
util.log(
util.colors.green(
`\nremoved ${
cleanedHtmlResult.deletedFromHead.length
}
from head: ${cleanedHtmlResult.deletedFromHead.join(" ")}`

)
);
util.log(
util.colors.green(
`\nremoved ${
cleanedHtmlResult.deletedFromBody.length
}
from body: ${cleanedHtmlResult.deletedFromBody.join(" ")}`

)
);
file.contents = Buffer.from(cleanedHtmlResult.result);
})
);
});

§ Extreme example of unused CSS

This piece of HTML doesn't even have <head> and <style> CSS is at the very bottom, within <body>. Our application still cleans it allright:

<html>
<body id="unused-1">
<table class="unused-2 unused-3">
<tr>
<td class="unused-4 unused-5">text</td>
</tr>
</table>

<style>
.unused-6 {
display: block;
}
#unused-7 {
height: auto;
}
</style>
</body>
</html>

Cleaned result:

<html>
<body>
<table>
<tr>
<td>text</td>
</tr>
</table>
</body>
</html>

§ Removing unused CSS from web pages

This library is meant to be used on any HTML where there are no external CSS stylesheets. It's quite rare to find a web page which would not have any external stylesheets.

§ Processing campaigns' HTML

Email templates, the HTML files, are coded in two stages: 1) design file to static HTML; 2) static HTML to "campaign" - HTML with all templating.

For example, Price is {% if data.purchasePrice > 100 %}...{% endif %} is HTML mixed with Nunjucks/Jinja - that greater-than bracket is not an HTML bracket.

email-comb will work fine on both static HTML or wired up campaign HTML. As a non-parsing tool, it skips the code it "doesn't understand".

§ Licence

MITopens in a new tab

Copyright © 2015–2020 Roy Revelt and other contributors

Related articles:

Related packages:

📦 html-crush 2.0.8
Minifies HTML/CSS: valid or broken, pure or mixed with other languages
📦 email-homey 2.7.71
Generate homepage in the Browsersync root with links/screenshots to all your email templates
📦 email-all-chars-within-ascii-cli 1.10.78
Command line app to scan email templates, are all their characters within ASCII range
📦 email-all-chars-within-ascii 2.9.71
Scans all characters within a string and checks are they within ASCII range