Email or username:

Password:

Forgot your password?
Bramus

Can you Syntax Highlight a code snippet on the web without overloading the DOM with a ton of `<span>` elements wrapped around the tokens?

Thanks to the Custom Highlight API, you can!

codepen.io/bramus/full/VwRqGVo

50 comments
Bramus

As a first step you need to define the various highlight styles in your CSS using `::highlight(x)` and also register them in the registry using `CSS.highlights.set(x, new Highlight())`

(x being the types of tokens: comment, property, boolean, class-name, etc.)

CSS Code with all highlights
JavaScript Code that registers all the highlights
Bramus

With that in place, and after tokenizing code snippets (e.g. using @prismjs), it’s only a matter of assigning the tokens to the corresponding Highlight.

`CSS.highlights.get(token.type).add(range)`

JavaScript that tokenizes a code snippet (using Prism) and then uses the registered Highlights to apply syntax highlighting
Bramus

The Custom Highlight API is supported in Chrome 105+ and Safari 17.2+. Firefox has experimental support.

Mayank

@bramus oh dang, didn't realize firefox is working on it 👀

Bramus

@hi_mayank Got the info from MDN. Not sure how up-to-date it is.

Mayank

@bramus ah nvm then, i thought there was some new development. there's so much in nightly/canary/TP builds that takes a while to show up because it isn't actively being worked on or just isn't ready

(popover api has been in firefox nightly for 9 months now)

Eric A. Meyer

@bramus @hi_mayank The Pen worked for me in Firefox Nightly, so it appears MDN is accurate on this.

Also, I don’t understand how it works, because your script and style blocks don’t have all the `span`s around highlighted stuff, but the Prism home page’s code blocks do have all those `span`s. So how Prism helps here is completely opaque.

Mayank

@Meyerweb @bramus it looks like prism is being used for tokenizing the code before creating ranges.

also damn it's an impressive demo, with the literal `<style>`/`<script>` tags being made visible and highlighted

		let tokens = Prism.tokenize(
			codeBlock.innerText,
			codeBlock.tagName == &#39;STYLE&#39; ? Prism.languages.css : Prism.languages.javascript
		);
Bramus

@hi_mayank @Meyerweb I have prism set up in manual mode, meaning it doesn't automatically kick in.

I then manually call it to tokenize the code. This gives me a bunch of numbers about which token is where and what type it is.

This info is then used to populate the Custom Highlight API.

Nathan Knowler

@bramus @hi_mayank @Meyerweb I think a next step towards a declarative API—one that doesn’t require JS—would be some way of teaching the browser grammars using something like PEG. Then you could set an attribute on a `<code>` element to specify which grammar you want to be used for it to auto-tokenize and highlight that code block.

Bramus

@knowler @hi_mayank @Meyerweb I don’t think that would work.

Which languages do you include? Which versions of those languages? When do these definition files get updated? Would you be able to load your own? …

Reminds me of authors requesting to put jQuery in the browser. Same questions arose.

(We actually got that last thing … not by including jQuery in browsers but by having better JavaScript/DOM APIs nowadays)

Mayank replied to Bramus

@bramus @knowler i think it could still work for the languages of the web - HTML, CSS, JS. the browser already understands these and even has syntax highlighting implemented inside devtools

Nathan Knowler replied to Bramus

@bramus @hi_mayank @Meyerweb That’s kinda why I’m suggesting supporting grammars provided by the author instead of the browser supporting a set of languages out of the box. It’d be language/dialect/version agnostic. The browser would use it to generate parsers to use internally for tokenizing blocks of text. The author could link to a grammar, give it a name, then when they want to use it for a code block just tell it to use that named grammar.

```html
<link rel=grammar type=text/peg href=/css.peg name=css-2025>
<code grammar=css-2025>
@scope { /* some code */ }
</code>
```

@bramus @hi_mayank @Meyerweb That’s kinda why I’m suggesting supporting grammars provided by the author instead of the browser supporting a set of languages out of the box. It’d be language/dialect/version agnostic. The browser would use it to generate parsers to use internally for tokenizing blocks of text. The author could link to a grammar, give it a name, then when they want to use it for a code block just tell it to use that named grammar.

Doug Parker replied to Nathan

@knowler @bramus @hi_mayank @Meyerweb This is a very cool API, and I can definitely see value in removing the JS dependency.

I wonder if a slightly more feasible approach might be to define a syntax for declaring token ranges. Then Prism could emit these spans and styling. Something like:

```
<code highlights="keyword 0 4 identifier 6 8 comment 12 18">
const foo = // ...
</code>
```

(Range 0-4 is a `keyword`, range 6-8 is an `identifier`, etc. All subject to `::highlight` styles of those names.)

This feels like a lot less complexity in the browser, but the downside is that each `<code>` block needs it's own token ranges. I think Prism could still do this, but the advantage of @knowler's approach is using a single grammer for all `<code>` blocks in that language.

@knowler @bramus @hi_mayank @Meyerweb This is a very cool API, and I can definitely see value in removing the JS dependency.

I wonder if a slightly more feasible approach might be to define a syntax for declaring token ranges. Then Prism could emit these spans and styling. Something like:

```
<code highlights="keyword 0 4 identifier 6 8 comment 12 18">
const foo = // ...
</code>
```

Nathan Knowler replied to Doug

@develwithoutacause @bramus @hi_mayank @Meyerweb Ya, that sort of API is kinda what got me thinking about using grammars. I do think that would still be a really nice low cost API though. Scales down alright, even though you’d probably need a tool to manage it (otherwise… lots of counting and attention to whitespace).

Doug Parker replied to Nathan

@knowler @bramus @hi_mayank @Meyerweb Yeah, definitely would need to be generated by a tool.

Eric A. Meyer

@bramus @hi_mayank I think I grasp what you’re saying, but is there an article or something that breaks this sort of thing down in detail?

Bramus

@Meyerweb @hi_mayank Sorry, not yet.

My typical flow is hack it together, put it on socials, and then maybe later write about it.

What could help you better understand is to console.log(tokens) right after Prism has done its thing.

MDN might also have some good info (I'm afk right now, so can't check)

Mia replied to Bramus

@bramus @Meyerweb @hi_mayank I'm curious too, need to do some sleuthing.

This clearly wants to be a custom element, right? No shadow dom required, and the progressive enhancement story is "just add color". A perfect use-case.

Mayank replied to Mia

@mia @bramus @Meyerweb
these docs might help you understand the tokens: prismjs.com/docs/Prism.html#.t
prismjs.com/docs/Token.html

and then you plug the token positions into ranges (and the token types and ranges into highlights): developer.mozilla.org/en-US/do

Jon replied to Mayank

@hi_mayank @mia @bramus @Meyerweb is Prism essential in producing these tokens or can other JS highlighters (highlight.js etc.) produce these too?

Mayank replied to Jon

@scrwd @mia @bramus @Meyerweb you can use anything you want. the custom highlight api doesn't care

Jon replied to Mayank

@hi_mayank @mia @bramus @Meyerweb just trying to get my head around it - looks like key JS requirement is creating and registering the text ranges - and to do this requires a start an end "position" - so presumably you could "tokenise" somewhere other than the client as long as you shipped this big list of number pairs and types with each code example? Would maybe assume that with a large number of code examples it is very quickly less bytes to do it all clientside though - just thinking aloud…

Bramus replied to Mia

@mia @Meyerweb @hi_mayank Ooh, good idea. Would be possible indeed :)

Bundyo

@bramus @hi_mayank Seems to be working in Dev Edition with the about:config option turned on.

Bramus

If you want to know the details: did a full write up on this one: bram.us/2024/02/18/custom-high

Also comes with an extra demo that syntax highlights the code in a [contenteditable] as you type.

Axel Rauschmayer

@bramus It’s interesting that this new approach prevents you from using many CSS features.

I’m using Highlight.js to syntax-highlight code for LaTeX (to produce PDFs from Markdown). And I had to look up sequences of CSS class names in CSS files to get color, text weight, etc. That’s what you need to do here, too.

It’s a shame there is no declarative (non-JS) version of this API.

Bramus

@rauschma The styling of these highlights works similarly to highlights that occur when selecting text.

Would be weird if layout started to jump as you select text, so it makes sense to limit what aspect of styling you can change.

Mia

@bramus This is cool. Do you have a demo, or code available somewhere to play with?

Mia

@bramus haha, on my phone it looked like a link to the spec. ok, thanks!

tbeseda

@bramus Slick! but if there's no markup it's not really a semantic document anymore, right? not standalone anyway.
To me, “overloading” the DOM with meaningful elements to denote information about a piece of text is the least concerning thing when it comes to creating elements these days.
Also, was there consideration from ECMA about using `Range`? seems like that might be more useful as a non-browser primitive.

Bramus

@tbeseda The way these highlighters typically work is by wrapping things in spans with a bunch of classes. These spans with classes add no semantics at all.

Also, sometimes - e.g. on large files with many tokens - they can cause performance issues because of the larger DOM tree.

Don't have info on how this API came to be, so don't know if and when TC-39 was consulted.

tbeseda

@bramus good point about performance.
I have written a couple highlighters and one tried to be semantic about the added elements. A lot of work for minimal pay off.
I’m not opposed to the new feature, just wish there was a way without the script.
Cool to see the progress.

Chee Aun 🤔

@bramus curious to know, would this work for text inside <textarea>?

Bramus

@cheeaun Doesn't work on inputs/textareas (for some reason I haven't found yet)

Chee Aun 🤔

@bramus that's actually the exact thing that I want to implement (for some time). I also couldn't find any reasons except some comments confirming that inputs/textareas don't work 😢

Ьλ∂λ

@cheeaun @bramus I was thinking of hacking something for textarea based on github.com/kueblc/LDT

(which overlays a transparent <textarea> on top of a styled <pre>)

Not sure the highlight API has been thought out for dynamic content... I don't know if you can change the bounds of a range after it's been registered and have the output updated... We'll see 🙂

Ьλ∂λ

@bramus @cheeaun

That makes LDT redundant...

Prism is ~10Kb larger, but its parsing abilities is heaps and bounds better than what LDT offers with just regexps.

Prism also supports styling the content of script and style tags in HTML, out of the box...

Ьλ∂λ

@bramus @cheeaun FWIW, you may want to add

spellcheck="false"

On the contenteditable elements, otherwise "Bramus" is styled as a spelling mistake on click.

Somehow "const" and "EOF" are deemed acceptable.

Roma Komarov

@bramus @cheeaun Yeah, I also wanted to use them for inputs, as on paper it sounds perfect for them: no actual elements added, but the API does not allow for that (as the ranges require actual text content, not the input/textrarea values).

Roma Komarov

@bramus Some random thoughts:

- Making the `<style>` contentEditable in this example is fun :)
- I wonder if it is worth it to simplify registering the tokens: when we iterate through them, we already know their types, so we could potentially register any new types as they go (though, I imagine, this might be slightly less performant with extra checks; but less code, and no need to maintain the list of the tokens to highlight).

Paweł Grzybek

@bramus This demo is incredibly cool.

I see a huge value in this API. I am wondering more about the trade-off: more spans vs more tokenization logic. What if code blocks can be rendered on the server? Is it worth the effort? I have so many questions about this…

Thanks for sharing this snippet dude!

Dave Rupert

@bramus Do CSS Highlights support bold/italic? I saw those in your code but had a play and couldn't get it to work. MDN lists a stricter set of allowed properties. Is that changing?

developer.mozilla.org/en-US/do

Patrick Brosset

@davatron5000 @bramus I originally wrote this MDN page, and my recollection is that, no, highlight() doesn't support anything that would have an impact on layout. It can only be used to alter the painting of the range.

Bramus

@patrickbrosset @davatron5000 Correct. Only a limited set of styles are allowed. No changing the font-weight or the like.

Dave Rupert

@bramus @patrickbrosset Aw, that's too bad. I guess I understand the reasoning tho.

Go Up