The big news (fresh as of this morning) is the 0.1 release of the Vello crate. We consider this alpha quality, as there are some known rough edges, but getting it out there is a big step.
There's lots more, including the fact that Xilem is now running on winit. Thanks to all the community who helped make this happen!
My RustLab 2023 talk video is now posted: https://youtu.be/mmW_RbTyj8c. I give an overview of Vello, the high performance GPU 2D graphics renderer I've been working on.
A major reason to attend was to meet people; there were quite a few I've interacted with online but never met in person. And there are a few people I had intended to talk to but missed. Hopefully I'll get another opportunity before long, especially RustNL where I'll be speaking.
A major reason to attend was to meet people; there were quite a few I've interacted with online but never met in person. And there are a few people I had intended to talk to but missed. Hopefully I'll get another opportunity before long, especially...
This one is largely planning and getting things ready (no screenshots this time). I'm very hopeful about joining forces with winit, as I think we'll get a lot more collaboration with the rest of the Rust UI ecosystem.
An update on my exploration into GPU sorting, particularly a fast algorithm that can be ported to #WebGPU. There's a lot of really cool work going on in the space. I've created a page on the Linebender wiki that has a survey of papers, resources, and implementations: https://linebender.org/wiki/gpu/sorting/
I think it's a reasonably good snapshot of the current state, and hope to update it as things progress. Also contributors welcome! Should be useful to people doing Gaussian splatting in particular.
An update on my exploration into GPU sorting, particularly a fast algorithm that can be ported to #WebGPU. There's a lot of really cool work going on in the space. I've created a page on the Linebender wiki that has a survey of papers, resources, and implementations: https://linebender.org/wiki/gpu/sorting/
Some of the proposals there may be controversial, but I'm hoping that stirs passionate discussion. In any case, I'm excited about what this will bring.
In related news, we're now seriously exploring joining forces with winit for window creation. You'll hear more about that when we have something to show.
Some of the proposals there may be controversial, but I'm hoping that stirs passionate discussion. In any case, I'm excited about what this will bring.
@raph Cool! Currently my biggest gripe about winit is that they require you to run the render loop on the main thread on macos/ios (this might be a limitation of underlying objc crates though). I have some uses for running the render loop off the main thread!
Among other creative outlets, I am a semi-pro voice actor. Last week, I filled in at the last minute for a spot that Google Maps needed, and you can now see it here: https://www.youtube.com/shorts/s-p-BIAT-qE
One of my life ambitions is to do voice for a video game. The window for that may be closing thanks to AI, but I also know I have lots of gamedev followers here. Get in touch!
@raph Not sure that’s true. One of the things that sets Baldurs Gate 3 apart is the absolutely stellar voice acting. It injects so much emotion into the story and that’s just not something current generation AI can do.
(They did a lot of motion capture, though, so maybe voice acting and “classic” acting are blending together, but I suppose that separation was more imposed by technology in the first place.)
Some extremely exciting news: Google Fonts is funding four developers from the open source community to work on Rust UI in 2024. Goals include demonstrating extremely high performance, using Vello for GPU-accelerated 2D rendering, and a port to Android.
All the work is in the open, and we look forward to working with others in the Rust UI ecosystem.
I just found out that Prof Wirth has died. He was a major inspiration for me as a kid. I eagerly read his book on Pascal, at the time not appreciating how unusual it was for its elegance and simplicity. I also followed with interest his development of the Oberon language and Lilith workstation. When I was 13, he gave a talk not too far away, I think it might have been Johns Hopkins, and my dad took me to it. It was a wonderful experience, he was very kind and encouraging, as the photo shows.
@raph It is so important that we really take time to listen to kids especially when they admire you. These memories are what produce the future professionals and leaders. Thanks for sharing!
@raph Our big assignment at uni in 1988–1989 was to write a dating app in Pascal, including personality, preferences and compatibility heuristics to suggest matches. Mine seemed to work pretty well… It was fun to write and I should have kept going with it, ready to beat Zuck by a few years!
@raph At my first student job, I got the task to implement a parser. My boss gave me Wirth's book Compilerbau and said, read the first four chapters, more than that will only confuse you. I did as he said and could write the parser.
Some 30 years later, I tried to read the other chapters, past chapter four, and was confused. In between I had studied computer science and gained a lot of professional programming experience, but for chapters five and on, I was still too stupid.
Are you interested in working on parallel sorting algorithms in #WebGPU? If so, get in touch. I'm especially interested in segmented merge sort (https://moderngpu.github.io/segsort.html), but other classic parallel sort algorithms including bitonic and radix are on the table.
I would find this fascinating to work on myself (and that still might happen), but my time is pretty packed. Even so, I'm excited about the prospects to where I might be able to fund some work out of my own pocket. Let's discuss.
@raph it would be awesome to have a bunch of mostly adaptable demos of key algorithms.
Matrix multiply seems important.
And performance tuning across devices.
There is a nifty little project that does a few things like this for Vulkan compute.
@raph it would be awesome to have a bunch of mostly adaptable demos of key algorithms.
Matrix multiply seems important.
And performance tuning across devices.
There is a nifty little project that does a few things like this for Vulkan compute.
I got a bunch of responses to this, some private. Thanks for those!
I also did a bit of an exploration myself, including a bit of a fusion of FidelityFX sort and Onesweep ported to WebGPU, with the warp-local multi-split adapted to use shared memory instead of subgroups (warp operations). Details are in this Zulip thread: https://xi.zulipchat.com/#narrow/stream/197075-gpu/topic/Sorting.20revisited
This is a preliminary investigation, which shows that performant WebGPU sorting is likely feasible. I hope it gets that conversation started.
I got a bunch of responses to this, some private. Thanks for those!
I also did a bit of an exploration myself, including a bit of a fusion of FidelityFX sort and Onesweep ported to WebGPU, with the warp-local multi-split adapted to use shared memory instead of subgroups (warp operations). Details are in this Zulip thread: https://xi.zulipchat.com/#narrow/stream/197075-gpu/topic/Sorting.20revisited
@raph I’ve been working a bit on similar things, have a radix sort working, and have most things set for a merge sort (was planning bitonic for raked workgroups, then merge for the rest, but I’m seeing comments that bitonic approaches work for the whole thing it seems)
A few years ago, I went on a listening kick of Jacob Collier songs. I found them technically amazing, but emotionally... a bit empty.
Then I learn he has a new song out, Little Blue. The official version is good, but the acoustic version with a live choir (https://www.youtube.com/watch?v=IQvzX0Z3HE4) just opened me up - I burst out in tears in the middle of it, I'm still not sure exactly why.
I'm on my way to Florence for #RustLab, where I'll be talking about Vello, high performance 2D graphics rendering in Rust. I'm excited about the conference, looking forward to meeting people there. Then I'll enjoy a week of vacation.
I've been on the quiet side, but Vello has seen significant progress in the past few months. GPU-side stroke expansion, with the full set of styles and correct handling of transforms, is in flight and close to landing. The near future looks bright.
I just merged multisampled path rendering into Vello main. That's been a journey - I developed the original prototype in a week back in February, in a lakefront cabin.
There are bugs and more performance work, but I think I have a handle on that. The last few weeks have been mostly numerical robustness, which is pretty grueling. Next up is GPU-side stroke expansion, based on Euler spirals, which I'm hoping is more fun.
@raph Glad to hear the numerical robustness is nearing completion! A monumental hurdle. By GPU-side stroke expansion, can I expect shader support for more line-cap types? ;)
Whew, I just sent a PR to fix the numerical robustness issues in stroke expansion in kurbo (https://github.com/linebender/kurbo/pull/304). That was a long journey, but better late than never.
I enjoy the challenge of making code numerically robust, but it is hard work, and I was running low on motivation near the end. It feels good sometimes to just keep pushing through.
I've been thinking quite a bit about a "Good Parallel Computer" which would overcome the worst limitations of GPUs, which I find increasingly frustrating. Basically, instead of launching compute shaders in an (x, y, z) cube, you have a programmable controller which launches workgroups (multiple kinds) when inputs are ready, enabling queues and other things.
I know of the GRAMPS paper, Vortex, and Tenstorrent. What other things are out there? Who should I be talking to?
@raph
How close to theoretical peak do you want to get? What shape of problem are you trying to optimize for?
Have you heard of Epic's Verse language project?
A little update, as I haven't been posting much. For work, I've been plugging away at numerical robustness of my multisampled GPU path renderer. That's been an arduous journey, but I think I'm about at the end of it, and I'll have a writeup of the techniques (inspired by hyperreal numbers) soonish.
For play, I've been noodling at getting DVI output from an RP2040 in Rust. That's coming along well. I'm interested in commissioning or collaborating to make tilemap based pixel art.
@raph Did you have to overclock the RP2040 or use a dedicated chip for DVI? I'd love to replace the VGA cable on my RP2040 with an HDMI cable at some point.
Lazy Mastodon: is it valid to output an HDMI audio packet every other scan line? I'm looking at https://github.com/Wren6991/PicoDVI/pull/45 and it seems to be at variance with the letter of the spec in a few ways, but I'm not sure it's a real problem or just technical. I'm sure most implementations these says have fairly deep buffers for sample rate conversion etc.
This is an extremely niche question. If you're also interested in driving HDMI from RP2040, maybe we should talk.
As I discuss in the post, in the long term I'm hopeful for a future with robust GPU compute infrastructure, based on well-documented standards and rigorous testing so that we can be confident that GPU hardware will actually run our code (including sophisticated lock-free algorithms). We have a long way to go, and I think #WGSL is a better path toward that future than proprietary GPU languages.
As I discuss in the post, in the long term I'm hopeful for a future with robust GPU compute infrastructure, based on well-documented standards and rigorous testing so that we can be confident that GPU hardware will actually run our code (including sophisticated lock-free algorithms). We have a long way to go, and I think #WGSL is a better path toward that future than...
@raph If only there was some kind of standard, like a royalty-free high level syntax independant IR that could be used on all vendors without being tied to a specific hardware or software vendor ...
@raph The Metal approach to "standard" features reminds me of the first OSX versions, where pthread.h was mostly made up of `#define standard_pthread_api() (void)`