Email or username:

Password:

Forgot your password?
Top-level
Bramus

@hi_mayank @Meyerweb I have prism set up in manual mode, meaning it doesn't automatically kick in.

I then manually call it to tokenize the code. This gives me a bunch of numbers about which token is where and what type it is.

This info is then used to populate the Custom Highlight API.

15 comments
Nathan Knowler

@bramus @hi_mayank @Meyerweb I think a next step towards a declarative API—one that doesn’t require JS—would be some way of teaching the browser grammars using something like PEG. Then you could set an attribute on a `<code>` element to specify which grammar you want to be used for it to auto-tokenize and highlight that code block.

Bramus

@knowler @hi_mayank @Meyerweb I don’t think that would work.

Which languages do you include? Which versions of those languages? When do these definition files get updated? Would you be able to load your own? …

Reminds me of authors requesting to put jQuery in the browser. Same questions arose.

(We actually got that last thing … not by including jQuery in browsers but by having better JavaScript/DOM APIs nowadays)

Mayank replied to Bramus

@bramus @knowler i think it could still work for the languages of the web - HTML, CSS, JS. the browser already understands these and even has syntax highlighting implemented inside devtools

Nathan Knowler replied to Bramus

@bramus @hi_mayank @Meyerweb That’s kinda why I’m suggesting supporting grammars provided by the author instead of the browser supporting a set of languages out of the box. It’d be language/dialect/version agnostic. The browser would use it to generate parsers to use internally for tokenizing blocks of text. The author could link to a grammar, give it a name, then when they want to use it for a code block just tell it to use that named grammar.

```html
<link rel=grammar type=text/peg href=/css.peg name=css-2025>
<code grammar=css-2025>
@scope { /* some code */ }
</code>
```

@bramus @hi_mayank @Meyerweb That’s kinda why I’m suggesting supporting grammars provided by the author instead of the browser supporting a set of languages out of the box. It’d be language/dialect/version agnostic. The browser would use it to generate parsers to use internally for tokenizing blocks of text. The author could link to a grammar, give it a name, then when they want to use it for a code block just tell it to use that named grammar.

Doug Parker replied to Nathan

@knowler @bramus @hi_mayank @Meyerweb This is a very cool API, and I can definitely see value in removing the JS dependency.

I wonder if a slightly more feasible approach might be to define a syntax for declaring token ranges. Then Prism could emit these spans and styling. Something like:

```
<code highlights="keyword 0 4 identifier 6 8 comment 12 18">
const foo = // ...
</code>
```

(Range 0-4 is a `keyword`, range 6-8 is an `identifier`, etc. All subject to `::highlight` styles of those names.)

This feels like a lot less complexity in the browser, but the downside is that each `<code>` block needs it's own token ranges. I think Prism could still do this, but the advantage of @knowler's approach is using a single grammer for all `<code>` blocks in that language.

@knowler @bramus @hi_mayank @Meyerweb This is a very cool API, and I can definitely see value in removing the JS dependency.

I wonder if a slightly more feasible approach might be to define a syntax for declaring token ranges. Then Prism could emit these spans and styling. Something like:

```
<code highlights="keyword 0 4 identifier 6 8 comment 12 18">
const foo = // ...
</code>
```

Nathan Knowler replied to Doug

@develwithoutacause @bramus @hi_mayank @Meyerweb Ya, that sort of API is kinda what got me thinking about using grammars. I do think that would still be a really nice low cost API though. Scales down alright, even though you’d probably need a tool to manage it (otherwise… lots of counting and attention to whitespace).

Doug Parker replied to Nathan

@knowler @bramus @hi_mayank @Meyerweb Yeah, definitely would need to be generated by a tool.

Eric A. Meyer

@bramus @hi_mayank I think I grasp what you’re saying, but is there an article or something that breaks this sort of thing down in detail?

Bramus

@Meyerweb @hi_mayank Sorry, not yet.

My typical flow is hack it together, put it on socials, and then maybe later write about it.

What could help you better understand is to console.log(tokens) right after Prism has done its thing.

MDN might also have some good info (I'm afk right now, so can't check)

Mia (web luddite) replied to Bramus

@bramus @Meyerweb @hi_mayank I'm curious too, need to do some sleuthing.

This clearly wants to be a custom element, right? No shadow dom required, and the progressive enhancement story is "just add color". A perfect use-case.

Mayank replied to Mia (web luddite)

@mia @bramus @Meyerweb
these docs might help you understand the tokens: prismjs.com/docs/Prism.html#.t
prismjs.com/docs/Token.html

and then you plug the token positions into ranges (and the token types and ranges into highlights): developer.mozilla.org/en-US/do

Jon replied to Mayank

@hi_mayank @mia @bramus @Meyerweb is Prism essential in producing these tokens or can other JS highlighters (highlight.js etc.) produce these too?

Mayank replied to Jon

@scrwd @mia @bramus @Meyerweb you can use anything you want. the custom highlight api doesn't care

Jon replied to Mayank

@hi_mayank @mia @bramus @Meyerweb just trying to get my head around it - looks like key JS requirement is creating and registering the text ranges - and to do this requires a start an end "position" - so presumably you could "tokenise" somewhere other than the client as long as you shipped this big list of number pairs and types with each code example? Would maybe assume that with a large number of code examples it is very quickly less bytes to do it all clientside though - just thinking aloud…

Bramus replied to Mia (web luddite)

@mia @Meyerweb @hi_mayank Ooh, good idea. Would be possible indeed :)

Go Up