The algorithm will cope with style tags inside the body tag or multiple style tags
Can strip CSS and HTML comments; recognises Outlook conditional comments (both "if-Outlook" and "if-not-Outlook")
Has email-specific features like removing empty Outlook conditional comments
Attempts to fix some code issues, for example, remove space in < body (which would otherwise break in Chrome)
API contains no file I/O operations or anything front-end-related — it's "string-in, string-out"
All dependencies are either our own or Lodash's or Mr Sorhus'
CommonJS, ES Modules and UMD builds available, published to npm and available to consume via CDN's like jsdelivr.com
Complete console logging set and retained in the source (which is automatically removed from builds)
Modern setup: node-tap tests pointing at ES Modules build, Rollup bundling the builds, coverage high, prettier and ESLint in place
It's not opinionated - it won't W3C-validate, enforce DOCTYPE's or add any new code to your code. Some parsers, for example, just can't stand an HTML without a DOCTYPE.
It's quite fast. We measure performance on a file with 2,000 redundant css styles and it takes less than a second.
WEAKNESSES:
Broken code causes parsers throw errors — that's how you find out something's wrong with your code in parser-based programs. But EmailComb, being a non-parsing program, will never throw! That means, you have to find other means (like linters) to detect, is your code broken. This might be a strength or a weakness, depends how you look at it.
Does not support external stylesheets or JS injecting more classes (because it's an email development-oriented tool)
COMPETITORS:
We believe that being an email-oriented tool, for email templates, EmailComb is superior to all web-development-oriented unused CSS removal tools out there:
If your code has back-end code within clss or id values, for example, class="{{ red }} main-box" you can stop {{, red and }} to be treated as class names
uglify
Boolean
false
n/a
Will rename all class and id names to be few characters-long. This might reduce your file size by another kilobyte.
removeHTMLComments
Boolean
true
n/a
When enabled, all HTML comments (<!-- to -->) will be removed
removeCSSComments
Boolean
true
n/a
When enabled, all CSS comments (/* to */) will be removed
doNotRemoveHTMLCommentsWhoseOpeningTagContains
Array of zero or more insensitive strings
["[if", "[endif"]
n/a
Email code often contains Outlook or IE conditional comments which you probably don't want to remove. Whatever strings you list here, if comment's opening tag will contain these, that tag will not be removed.
reportProgressFunc
Function or something falsy
null
n/a
If supplied, it will ping the function you assign passing current percentage done (natural number) as an input argument
reportProgressFuncFrom
Natural number
0
n/a
By default, percentages are reported from 0 to 100. This value overrides this starting percentage value.
reportProgressFuncTo
Natural number
100
n/a
By default, percentages are reported from 0 to 100. This value overrides this ending percentage value.
Here are all options in one place in case you need to copy the whole thing:
So a plain object is returned. It will have the following keys:
Key
Its value's type
Description
log
Plain object
Various information about performed operations
result
String
A string containing cleaned HTML
countBeforeCleaning
Number
How many unique classes and id's were in total before cleaning
countAfterCleaning
Number
How many unique classes and id's were in total after cleaning
allInHead
Array
Deduped and sorted array of all classes and id's between <head> tags
allInBody
Array
Deduped and sorted array of all classes and id's between <body> tags
deletedFromHead
Array
Array of classes/id's that were deleted inside <head>at least once^
deletedFromBody
Array
Array of classes/id's that were deleted inside <body>at least once^
^ To be very precise, if class or id name was deleted at once, it gets in this list. Mind you, some used classes or id's can be sandwiched with unused (.used.unused) and end up removed in some instances and get reported here, but it does not mean they were removed completely as species.
Since the main purpose of this library is to clean email HTML, it needs to cater for email code specifics. One of them is that CSS styles will contain fix or hack styles, meant for email software. For example, here are few of them:
#outlook a { padding:0; } .ReadMsgBody { width:100%; }
.ExternalClass fixes are not needed any more in email templates, see email-bugs/issues/4
You will not be using these classes within the <body> of your HTML code, so they would get removed as "unused" because they are present in <head> only. To avoid that, pass the classes, and id's in the whitelist key's value, as an array. For example:
var html ="<!DOCTYPE html>..."; comb(html,{ whitelist:["#outlook",".ExternalClass",".ReadMsgBody"], });
You can also use a wildcard, for example in order to whitelist classes module-1, module-2 ... module-99, module-100, you can simply whitelist them as module-*:
var html ="<!DOCTYPE html>..."; comb(html,{ whitelist:[".module-*"], }); // => all class names that begin with ".module-" will not be touched by this library.
This library, differently from competition, is aiming to support code which contains back-end code: other programming languages (Java JSP's), other templating languages (like Nunjucks) and/or proprietary ESP templating languages.
All different languages can be present in the input source, and parser won't care, EXCEPT when they are in class or id names. For example, <td class="mt10 {{ module.on }} module-box blackbg". Notice how {{ module.on }} sits in the middle and it's variable value from a different programming language. Eventually, it will be rendered into strings on or off but at this stage, this is raw, unrendered template and we want to remove all unused CSS from it.
It's possible to clean this too.
If you let this library know how are your back-end language's variables marked, for example, that "heads" are {{ and "tails" are }} (as in Hi {{data.firstname}}), the algorithm will ignore all variables within class or id names.
If you don't put templating variables into classes or id's, don't use the feature because it still costs computing resources to perform those checks.
Here's an example:
// Require it first. You get a function which you can feed with strings. // Notice you can name it any way you want (because in the source it's using "export default"). const{ comb }=require("email-comb");
// Let's define a string equal to some processed HTML: const res =comb( `<!doctype html> <html> <head> <style> .aaa { color: black; } </style></head> <body class="{% var1 %}"> <div class="{{ var2 }}"> </div> </body> </html> `, { // <------------ Optional Options Object - second input argument of our function, remove() backend:[ { heads:"{{",// define heads and tails in pairs tails:"}}", }, { heads:"{%",// second pair tails:"%}", }, ], } ).result;// <------ output of this library is a plain object. String result is in a key "result". We grab it here.
db and pt10 are normal CSS class names, but everything else between {% and %} is Nunjucks code.
Now, in those cases, notice that Nunjucks code is only wrapping the variables. Even if you set heads to {% and tails to %}, classes on and off will not get ignored and theoretically can get removed!!!
The solution is to ensure that all back-end class names are contained within back-end tags. With Nunjucks, it is easily done by performing calculations outside class= declarations, then assigning the calculation's result to a variable and using the variable instead.
For example, let's rewrite the same snippet used above:
{% set switch = 'off' %} {% if module_on || oodles %} {% set switch = 'on' %} {% else %} <tdclass="db {{ switch }} pt10"></td>
Now, set heads to {{ and tails to }} and switch will be ignored completely.
In Gulp, everything flows as vinyl Buffer streams. You could tap the stream, convert it to string, perform the operations (like remove unused CSS), then convert it back to Buffer and place the stream back. We wanted to come up with a visual analogy example using waste pipes but thought we'd rather won't.
This library is meant to be used on any HTML where there are no external CSS stylesheets. It's quite rare to find a web page which would not have any external stylesheets.
Email templates, the HTML files, are coded in two stages: 1) design file to static HTML; 2) static HTML to "campaign" - HTML with all templating.
For example, Price is {% if data.purchasePrice > 100 %}...{% endif %} is HTML mixed with Nunjucks/Jinja - that greater-than bracket is not an HTML bracket.
email-comb will work fine on both static HTML or wired up campaign HTML. As a non-parsing tool, it skips the code it "doesn't understand".
As our npm package count grows, the README automation becomes more and more an issue. Installation instructions, badges, contribution guidelines can be automatically generated, but many other chapters can't.