@BartWronski Wish they had less Python so that they were easier to productionise, though :(
Top-level
11 comments
@philpax @BartWronski I don't know if this will solve the problem for you, but here's a rust version of stable diffusion https://github.com/LaurentMazare/diffusers-rs It uses tch-rs, which uses "py"torch, but only the c++ part of it (no python is involved). I've shipped binaries using tch-rs to other machines by just copying around a few `.so` files (the ones in the `/lib` of a pytorch tarball), but not to consumers, so I can't speak to what pitfalls there might be for that. @gregmorenz @BartWronski Yeah, really excited about that! It's a huge step forward (no more Python in the deployment stack!) Unfortunately, as you mention, it still requires you to ship Torch dependencies and/or ensure that the user has the correct version of CUDA. Server-side deployment should be a lot simpler, but client-side deployment is still problematic :( @philpax @gregmorenz @philpax Stable Diffusion is IMO a research library/project, not a product. People are making fantastic simpler wrappers, but I still don't consider it a commercial product. Any productization will require wrapping it up properly and packaging, like with anything. I also don't think one can hope to get any good performance without CUDA (but it can be bundled like with other software). @BartWronski @gregmorenz Mm, perhaps - I suppose it really depends on your definition of "product" :) It's already quite usable by end-users and the rate of development on it is out of this world, but by that same logic, it's quite hard to package it up as a library. For my purposes I'm just running a SD web UI with an API locally, but at some point I would like a library-like solution that can run on arbitrary GPUs. wonnx is the closest I've seen so far. @philpax @BartWronski @gregmorenz So in the course of trying to do research on GPU rendering of 2D graphics, I've inadvertently done a lot of research into portable GPU infrastructure. I believe it's possible, and not a massive amount of work, to build compute infra that would run workloads like SD on Metal + DX12 + Vulkan, with something like 1M of binary and no additional complex runtime requirements. For some reason I haven't been able to figure out, nobody really seems to care. @philpax @BartWronski @gregmorenz So many of the pieces are in place, and I think it will happen, first by running on WebGPU, then optimizing from there. As @philpax says, wonnx looks pretty good, but being restricted to WGSL leaves a *lot* of performance on the table compared with what GPUs can do. @raph @BartWronski @gregmorenz Yeah, there's definitely a lowest common denominator problem with wgpu, but I imagine it'll be "good enough" for the short to medium term. In the future, I hope that an actual standard for this kind of ML acceleration is formulated, but it's not really in Team Green's best interests to facilitate that... @philpax @BartWronski @gregmorenz I agree. And more to the point, once you actually get it running, then the open source community can incrementally optimizes pieces of it until it runs pretty well. The missing piece (that seems to have very little community interest) is ahead of time compiled shaders, which would also let you do custom WGSL extensions. @raph @philpax @BartWronski @gregmorenz My experience is that this work needs commercial sponsorship of a particular shape. Research framework users generally will only use something free + open source, yet adequate support for a given piece of hardware is way beyond student or hobbyist work. I started a company that built a performance-portable deep learning framework and learned this and many lessons slowly π |
@philpax Without Python, they would not take off so easily and there'd be no ML revolution IMO.
And IMO Python is never a bottleneck in the productization of ML models.
If you really want to deploy something, especially optimally/on lower-end devices, typically you'd need to rewrite inference anyway. (For example in OpenCL, or CUDA).