Can you Syntax Highlight a code snippet on the web...

Can you Syntax Highlight a code snippet on the web without overloading the DOM with a ton of `<span>` elements wrapped around the tokens?

Thanks to the Custom Highlight API, you can!

https://codepen.io/bramus/full/VwRqGVo

Like 15 February at 23:04 | Open on front-end.social

50 comments

Bramus

As a first step you need to define the various highlight styles in your CSS using `::highlight(x)` and also register them in the registry using `CSS.highlights.set(x, new Highlight())`

(x being the types of tokens: comment, property, boolean, class-name, etc.)

JavaScript Code that registers all the highlights

15 February at 23:05 | Open on front-end.social

Bramus

With that in place, and after tokenizing code snippets (e.g. using @prismjs), it’s only a matter of assigning the tokens to the corresponding Highlight.

`CSS.highlights.get(token.type).add(range)`

JavaScript that tokenizes a code snippet (using Prism) and then uses the registered Highlights to apply syntax highlighting

15 February at 23:06 | Open on front-end.social

Bramus

The Custom Highlight API is supported in Chrome 105+ and Safari 17.2+. Firefox has experimental support.

15 February at 23:06 | Open on front-end.social

Mayank

@bramus oh dang, didn't realize firefox is working on it 👀

15 February at 23:17 | Open on hachyderm.io

Bramus

@hi_mayank Got the info from MDN. Not sure how up-to-date it is.

15 February at 23:57 | Open on front-end.social

Mayank

@bramus ah nvm then, i thought there was some new development. there's so much in nightly/canary/TP builds that takes a while to show up because it isn't actively being worked on or just isn't ready

(popover api has been in firefox nightly for 9 months now)

16 February at 0:10 | Open on hachyderm.io

Eric A. Meyer

@bramus @hi_mayank The Pen worked for me in Firefox Nightly, so it appears MDN is accurate on this.

Also, I don’t understand how it works, because your script and style blocks don’t have all the `span`s around highlighted stuff, but the Prism home page’s code blocks do have all those `span`s. So how Prism helps here is completely opaque.

16 February at 1:16 | Open on mastodon.social

Mayank

@Meyerweb @bramus it looks like prism is being used for tokenizing the code before creating ranges.

also damn it's an impressive demo, with the literal `<style>`/`<script>` tags being made visible and highlighted

let tokens = Prism.tokenize(
codeBlock.innerText,
codeBlock.tagName == 'STYLE' ? Prism.languages.css : Prism.languages.javascript
);

16 February at 1:21 | Open on hachyderm.io

Bramus

@hi_mayank @Meyerweb I have prism set up in manual mode, meaning it doesn't automatically kick in.

I then manually call it to tokenize the code. This gives me a bunch of numbers about which token is where and what type it is.

This info is then used to populate the Custom Highlight API.

16 February at 1:28 | Open on front-end.social

Nathan Knowler

@bramus @hi_mayank @Meyerweb I think a next step towards a declarative API—one that doesn’t require JS—would be some way of teaching the browser grammars using something like PEG. Then you could set an attribute on a `<code>` element to specify which grammar you want to be used for it to auto-tokenize and highlight that code block.

16 February at 2:00 | Open on sunny.garden

Bramus

@knowler @hi_mayank @Meyerweb I don’t think that would work.

Which languages do you include? Which versions of those languages? When do these definition files get updated? Would you be able to load your own? …

Reminds me of authors requesting to put jQuery in the browser. Same questions arose.

(We actually got that last thing … not by including jQuery in browsers but by having better JavaScript/DOM APIs nowadays)

16 February at 2:26 | Open on front-end.social

Mayank replied to Bramus

@bramus @knowler i think it could still work for the languages of the web - HTML, CSS, JS. the browser already understands these and even has syntax highlighting implemented inside devtools

16 February at 2:32 | Open on hachyderm.io

Nathan Knowler replied to Bramus

@bramus @hi_mayank @Meyerweb That’s kinda why I’m suggesting supporting grammars provided by the author instead of the browser supporting a set of languages out of the box. It’d be language/dialect/version agnostic. The browser would use it to generate parsers to use internally for tokenizing blocks of text. The author could link to a grammar, give it a name, then when they want to use it for a code block just tell it to use that named grammar.

```html
<link rel=grammar type=text/peg href=/css.peg name=css-2025>
<code grammar=css-2025>
@scope { /* some code */ }
</code>
```

Expand text...

16 February at 2:44 | Open on sunny.garden

Doug Parker replied to Nathan

@knowler @bramus @hi_mayank @Meyerweb This is a very cool API, and I can definitely see value in removing the JS dependency.

I wonder if a slightly more feasible approach might be to define a syntax for declaring token ranges. Then Prism could emit these spans and styling. Something like:

```
<code highlights="keyword 0 4 identifier 6 8 comment 12 18">
const foo = // ...
</code>
```

(Range 0-4 is a `keyword`, range 6-8 is an `identifier`, etc. All subject to `::highlight` styles of those names.)

This feels like a lot less complexity in the browser, but the downside is that each `<code>` block needs it's own token ranges. I think Prism could still do this, but the advantage of @knowler's approach is using a single grammer for all `<code>` blocks in that language.

@knowler @bramus @hi_mayank @Meyerweb This is a very cool API, and I can definitely see value in removing the JS dependency.

I wonder if a slightly more feasible approach might be to define a syntax for declaring token ranges. Then Prism could emit these spans and styling. Something like:

```
<code highlights="keyword 0 4 identifier 6 8 comment 12 18">
const foo = // ...
</code>
```

Expand text...

16 February at 4:28 | Open on techhub.social

Nathan Knowler replied to Doug

@develwithoutacause @bramus @hi_mayank @Meyerweb Ya, that sort of API is kinda what got me thinking about using grammars. I do think that would still be a really nice low cost API though. Scales down alright, even though you’d probably need a tool to manage it (otherwise… lots of counting and attention to whitespace).

16 February at 5:02 | Open on sunny.garden

Doug Parker replied to Nathan

@knowler @bramus @hi_mayank @Meyerweb Yeah, definitely would need to be generated by a tool.

16 February at 5:05 | Open on techhub.social

Eric A. Meyer

@bramus @hi_mayank I think I grasp what you’re saying, but is there an article or something that breaks this sort of thing down in detail?

16 February at 3:26 | Open on mastodon.social

Bramus

@Meyerweb @hi_mayank Sorry, not yet.

My typical flow is hack it together, put it on socials, and then maybe later write about it.

What could help you better understand is to console.log(tokens) right after Prism has done its thing.

MDN might also have some good info (I'm afk right now, so can't check)

16 February at 3:31 | Open on front-end.social

Mia (web luddite) replied to Bramus

@bramus @Meyerweb @hi_mayank I'm curious too, need to do some sleuthing.

This clearly wants to be a custom element, right? No shadow dom required, and the progressive enhancement story is "just add color". A perfect use-case.

16 February at 3:35 | Open on front-end.social

Mayank replied to Mia (web luddite)

@mia @bramus @Meyerweb
these docs might help you understand the tokens: https://prismjs.com/docs/Prism.html#.tokenize
https://prismjs.com/docs/Token.html

and then you plug the token positions into ranges (and the token types and ranges into highlights): https://developer.mozilla.org/en-US/docs/Web/API/CSS_Custom_Highlight_API#create_ranges

16 February at 3:42 | Open on hachyderm.io

Jon replied to Mayank

@hi_mayank @mia @bramus @Meyerweb is Prism essential in producing these tokens or can other JS highlighters (highlight.js etc.) produce these too?

17 February at 20:09 | Open on mastodon.social

Mayank replied to Jon

@scrwd @mia @bramus @Meyerweb you can use anything you want. the custom highlight api doesn't care

17 February at 20:10 | Open on hachyderm.io

Jon replied to Mayank

@hi_mayank @mia @bramus @Meyerweb just trying to get my head around it - looks like key JS requirement is creating and registering the text ranges - and to do this requires a start an end "position" - so presumably you could "tokenise" somewhere other than the client as long as you shipped this big list of number pairs and types with each code example? Would maybe assume that with a large number of code examples it is very quickly less bytes to do it all clientside though - just thinking aloud…

17 February at 21:50 | Open on mastodon.social

Bramus replied to Mia (web luddite)

@mia @Meyerweb @hi_mayank Ooh, good idea. Would be possible indeed :)

16 February at 4:02 | Open on front-end.social

Bundyo

@bramus @hi_mayank Seems to be working in Dev Edition with the about:config option turned on.

21 February at 7:47 | Open on bundyo.com

Bramus

If you want to know the details: did a full write up on this one: https://www.bram.us/2024/02/18/custom-highlight-api-for-syntax-highlighting/

Also comes with an extra demo that syntax highlights the code in a [contenteditable] as you type.

21 February at 7:50 | Open on front-end.social

Axel Rauschmayer

@bramus It’s interesting that this new approach prevents you from using many CSS features.

I’m using Highlight.js to syntax-highlight code for LaTeX (to produce PDFs from Markdown). And I had to look up sequences of CSS class names in CSS files to get color, text weight, etc. That’s what you need to do here, too.

It’s a shame there is no declarative (non-JS) version of this API.

25 February at 10:34 | Open on fosstodon.org

Bramus

@rauschma The styling of these highlights works similarly to highlights that occur when selecting text.

Would be weird if layout started to jump as you select text, so it makes sense to limit what aspect of styling you can change.

25 February at 14:58 | Open on front-end.social

Mia (web luddite)

@bramus This is cool. Do you have a demo, or code available somewhere to play with?

15 February at 23:11 | Open on front-end.social

Bramus

@mia First post has the link ;)

15 February at 23:56 | Open on front-end.social

Mia (web luddite)

@bramus haha, on my phone it looked like a link to the spec. ok, thanks!

16 February at 0:38 | Open on front-end.social

tbeseda

@bramus Slick! but if there's no markup it's not really a semantic document anymore, right? not standalone anyway.
To me, “overloading” the DOM with meaningful elements to denote information about a piece of text is the least concerning thing when it comes to creating elements these days.
Also, was there consideration from ECMA about using `Range`? seems like that might be more useful as a non-browser primitive.

15 February at 23:58 | Open on indieweb.social

Bramus

@tbeseda The way these highlighters typically work is by wrapping things in spans with a bunch of classes. These spans with classes add no semantics at all.

Also, sometimes - e.g. on large files with many tokens - they can cause performance issues because of the larger DOM tree.

Don't have info on how this API came to be, so don't know if and when TC-39 was consulted.

16 February at 0:04 | Open on front-end.social

tbeseda

@bramus good point about performance.
I have written a couple highlighters and one tried to be semantic about the added elements. A lot of work for minimal pay off.
I’m not opposed to the new feature, just wish there was a way without the script.
Cool to see the progress.

16 February at 0:09 | Open on indieweb.social

Chee Aun 🤔

@bramus curious to know, would this work for text inside <textarea>?

16 February at 5:23 | Open on mastodon.social

Bramus

@cheeaun Doesn't work on inputs/textareas (for some reason I haven't found yet)

16 February at 5:37 | Open on front-end.social

Chee Aun 🤔

@bramus that's actually the exact thing that I want to implement (for some time). I also couldn't find any reasons except some comments confirming that inputs/textareas don't work 😢

16 February at 5:51 | Open on mastodon.social

Ьλ∂λ

@cheeaun @bramus I was thinking of hacking something for textarea based on https://github.com/kueblc/LDT

(which overlays a transparent <textarea> on top of a styled <pre>)

Not sure the highlight API has been thought out for dynamic content... I don't know if you can change the bounds of a range after it's been registered and have the output updated... We'll see 🙂

25 February at 15:42 | Open on mamot.fr

Bramus

@pygy @cheeaun Perfectly possible.

Here's a demo that does on-the-fly highlighting: https://www.bram.us/2024/02/18/custom-highlight-api-for-syntax-highlighting/#highlighting-contenteditable

25 February at 15:45 | Open on front-end.social

Ьλ∂λ

@bramus @cheeaun

That makes LDT redundant...

Prism is ~10Kb larger, but its parsing abilities is heaps and bounds better than what LDT offers with just regexps.

Prism also supports styling the content of script and style tags in HTML, out of the box...

25 February at 17:48 | Open on mamot.fr

Ьλ∂λ

@bramus @cheeaun FWIW, you may want to add

spellcheck="false"

On the contenteditable elements, otherwise "Bramus" is styled as a spelling mistake on click.

Somehow "const" and "EOF" are deemed acceptable.

25 February at 17:55 | Open on mamot.fr

Roma Komarov

@bramus @cheeaun Yeah, I also wanted to use them for inputs, as on paper it sounds perfect for them: no actual elements added, but the API does not allow for that (as the ranges require actual text content, not the input/textrarea values).

16 February at 9:30 | Open on front-end.social

Roma Komarov

@bramus Some random thoughts:

- Making the `<style>` contentEditable in this example is fun :)
- I wonder if it is worth it to simplify registering the tokens: when we iterate through them, we already know their types, so we could potentially register any new types as they go (though, I imagine, this might be slightly less performant with extra checks; but less code, and no need to maintain the list of the tokens to highlight).

16 February at 9:59 | Open on front-end.social

Paweł Grzybek

@bramus This demo is incredibly cool.

I see a huge value in this API. I am wondering more about the trade-off: more spans vs more tokenization logic. What if code blocks can be rendered on the server? Is it worth the effort? I have so many questions about this…

Thanks for sharing this snippet dude!

16 February at 10:52 | Open on mastodon.social

Dave Rupert

@bramus Do CSS Highlights support bold/italic? I saw those in your code but had a play and couldn't get it to work. MDN lists a stricter set of allowed properties. Is that changing?

https://developer.mozilla.org/en-US/docs/Web/CSS/::highlight#allowable_properties

16 February at 15:17 | Open on mastodon.social

Patrick Brosset

@davatron5000 @bramus I originally wrote this MDN page, and my recollection is that, no, highlight() doesn't support anything that would have an impact on layout. It can only be used to alter the painting of the range.

16 February at 15:35 | Open on mas.to

Bramus

@patrickbrosset @davatron5000 Correct. Only a limited set of styles are allowed. No changing the font-weight or the like.

16 February at 19:11 | Open on front-end.social

Dave Rupert

@bramus @patrickbrosset Aw, that's too bad. I guess I understand the reasoning tho.

16 February at 19:37 | Open on mastodon.social

Terence Eden

@bramus that's brilliant!

18 February at 22:17 | Open on mastodon.social

Bramus

@Edent Thanks!

18 February at 22:21 | Open on front-end.social

Go Up