Zero-Copy GPU Inference from WebAssembly on Apple Silicon

(abacusnoir.com)

86 points | by agambrahma 11 hours ago

8 comments

itamos 47 minutes ago
On one side it sounds promising to exploit shared memory properties to speed up inference. But on the other hand, the well established inference engines are perhaps already well optimized to overlap compute and communication efficiently. In this case the host-device copies are likely not a problem to tackle.
fulafel 3 hours ago
> Apple Silicon changes the physics. The CPU and GPU share the same physical memory (Apple's Unified Memory Architecture) ... no bus!
Beware the reality distortion field: This is of course how it's worked on most x86 machines for a long time. And also on most Macs when they were using Intel chips.
[-]
- littlecranky67 3 hours ago
  Why did all my x86 onboard iGPU reserve a fixed amount of RAM on boot, inaccessible to the OS? Why do dGPU bring their own VRAM and how to directly manipulate it from the CPU without copying?
  [-]
  - ben-schaaf 41 minutes ago
    Correct me if I'm wrong, but that reserved memory is for the framebuffer? The iBoot bootloader also reserves some memory for the framebuffer.
    dGPUs bring their own VRAM because it's a different type of memory, allowing them to get higher performance than they could with DDR. The M4 Max requires 128GB of LPDDR5X to reach its ~500GB/s bandwidth. The RX Vega 64 had that same bandwidth in 2017 with just 8GB of HBM2.
  - fulafel 3 hours ago
    To the first question: blame Windows I guess. But even on older chips, GPU code could access memory allocated on the CPU side so this didn't cap the amount of data your GPGPU code could crunch.
nl 3 hours ago
I'm pretty sure this is just "yes (parts of), memory control in WASM works"[1].
The whole Apple Silicon thing is (in this case) just added details that don't actually matter.
[1] https://github.com/WebAssembly/memory-control/blob/main/prop...
[-]
- eis 2 hours ago
  Apple Silicon uses unified memory where the CPU and GPU use the exact same memory and no copies from RAM to VRAM are needed. The article opens with mentioning just that and indeed it is the whole point of the article.
  [-]
  - fho 46 minutes ago
    I am always a bit baffled why Apple gets credited with this. Unified memory has been a thing for decades. I can still load the biggest models on my 10th gen Intel Core CPU and the integrated GPU can run inference.
    The difference being that modern integrated GPU are just that much faster and can run inference at tolerable speeds.
    (Plus NPUs being a thing now, but that also started much earlier. Thr 10th gen Intel Core architecture already had instructions to deal with "AI" workloads... just very preliminary)
    [-]
    - mirekrusin 11 minutes ago
      That’s shared, not unified, it’s partitioned where cpu and gpu copies are managed by driver. Lunar lake (2024) is getting closer but still not as tightly integrated as apple and capped to 32GB only (Apple has up to 512GB). AMD ryzen ai max is closer to Apple but still 3 times slower memory.
saagarjha 8 hours ago
I'm curious what this offers over just building the host side code to be native?
[-]
- swiftcoder 1 hour ago
  For one thing, it's a lot easier to distribute a webpage than a native app
  [-]
  - saagarjha 20 minutes ago
    This doesn't work with webpages though
- jsomedon 6 hours ago
  My quick guess is that this approach offers near zero overhead for gpu to access data inside sandbox with all the security/privacy benefit of sandbox.
trueno 9 hours ago
> on Apple Silicon, a WebAssembly module's linear memory can be shared directly with the GPU: no copies, no serialization, no intermediate buffers
enhance
> no copies, no serialization, no intermediate buffers
would it kill people to write their own stuff why are we doing this. out of all the things people immediately cede to AI they cede their human ability to communicate and convey/share ideas. this timeline is bonkers.
[-]
- Aurornis 8 hours ago
  I’ve become overly sensitive to it as well because it’s such a reliable indicator that there are other problems in the work.
  I’ve wasted so much time looking at interesting repos this year before discovering that one of the main claims was a hallucination, or that when I got to the specific part of the codebase it just had a big note from the LLM that’s it’s a placeholder until it can figure out how to do the requested thing.
  The people who have AI write their articles don’t care if it works or if it’s correct. They’re trying to get jobs and want something quick and interesting that will appeal to a lazy hiring manager. We’re just taking the bait too.
  [-]
  - trueno 5 hours ago
    > The people who have AI write their articles don’t care if it works or if it’s correct.
    I'd build on this: The people who have AI write their articles very likely don't know how their thing works or is correct. High chance they'll stumble when they are expected to speak about whatever it is they are presenting with some authority and demonstration of knowledge. Human to human, not being able to do that = obliterates trust. Places it somewhere near the realm of misinformation, which everyone unilaterally has no interest in consuming.
    Good luck to people who want to fluff expertise and present as more-capable for job prospects, the world is shit and I know there's more people who need income than there are jobs that provide for our basic human needs, but this level of AI crutching is just going to bode poorly for those who think this is going to get them where they need to go.
- rvz 9 hours ago
  This sort of obvious pattern is an instant AI dead give-away that I keep on seeing in hundreds of blogs and code posted on this site:
```
   "Here is X - it makes Y"

   "That's not X, it's Y."

   "...no this, no that, no X, no Y."
```
  Another way of telling via code is by deducing the experience of the author if they became an expert of a different language since...yesterday.
  There will be a time where it will be problematic for those who over-rely on AI and will struggle on on-site interviews with whiteboard tests.
  [-]
  - bensyverson 9 hours ago
    I think the days of on-site interviews with whiteboard tests may be drawing to a close faster than you suspect
    [-]
    - JSR_FDED 8 hours ago
      Huh, I’m 100% going to interview this way the next time I have to hire an engineer. I can’t think of a better way to get a sense of how a candidate reasons about things, and of their values - do they have a sense of responsibility, conscientiousness, team fit.
      All other things that could be LLM-mediated have no more signal.
      [-]
      - andsoitis 7 hours ago
        > I can’t think of a better way to get a sense of how a candidate reasons about things
        Some ideas to help you: ask the candidate something underspecified and watch what they do first. Do they ask clarifying questions, make their assumptions explicit? After they answer ask what would change their mind, where does that break down? Pick a topic they know and ask them to explain it to a smart non-engineer. Make them estimate something they can’t look up (forces them to decompose, bound, and calibrate). Once they’ve proposed a solution to a question, change the constraints to see if they can adapt or whether they’re stuck.
        What you want to evaluate is dynamic reasoning, adaptability.
    - z0r 7 hours ago
      Is this implying that you don't believe people will hire programmers anymore?
    - m00dy 9 hours ago
      I also think we will never go back to good old days.
      [-]
      - dylan604 6 hours ago
        It'll put the "everything old becomes new again" idea to the test.
- notepad0x90 7 hours ago
  I don't know, to me your sentiment sounds a lot like how back in the day they used to say "you can't just use a calculator all the time, use your brain and show the work on pen and paper".
  humans have been using tools to communicate since pre-history. language itself is one tool of communication invented to supersede body-language and grunting and noises. the thought and idea is theirs, it was communicated. Would it be that much different if they used a spellchecker extensively to edit their work?
  I get why you're annoyed but is it really such a big deal? random people aren't to blame for whatever other annoyances "AI slop" has created.
  [-]
  - trueno 5 hours ago
    Calculators have never been the medium in which we communicate our human experience and knowledge transfer. Calculators aren't part of the social fabric or culture. Very 2d extrapolation that somehow resulted in an alleged parallel. Language is woven deeply into civilization and our histories & been a part of our species literal survival against the most unforgiving odds/environments. Using what is effectively a ghost writer nukes trust. You cannot ascertain anything about the person behind the blog if it's clear they used AI to write it. And without that there's no way to infer expertise, rule out hallucinations, falsehoods presented as matter of fact, and the whole broad set of things LLM's get wrong because of their limitations as a technology. I have literally nothing to go off of that would prove this person knows what they are talking about. Why would anyone want to consume that?
    Would it kill anyone at all to add a preamble that is forthcoming about using AI to write something? A chance to say these are my ideas and I've used claude to help me state it eloquently because <english is not my first language / i dont write well / claude said it better than i ever could> etc ? Not doing that, presenting as more capable/knowing than one probably is, is what destroys trust immediately the moment it's sniffed out that AI was used to write something.
    It's irresponsible, a self-nerf, and it's annoying. Venn diagram there is basically a circle. We're all familiar with how vibe coding appears to weaken your ability to write code, like skipping the gym and expecting good muscle density. All I'm saying is people shouldn't be skipping the gym for literally communicating with each other because there's gonna be a lot of times in life where you're not gonna be able to whip out chat jippity to continue a real conversation with another person. Ceding that turf means you're willingly trading your ability to deal with real life scenarios with other human beings for short term gain. It's funny how the universe tends to find balance. Yeah, being well read and expressing ideas well is a skill, it takes work.
    [-]
    - porridgeraisin 2 hours ago
      It's not that deep man, it's just a blog post about some software library. There's no civilisational communication going on here, relax. This whole thing will become irrelevant in a few decades before the end of our lifespans. It's just never that deep.
      Why does it matter if it's their thought or not. If you currently care about GPU inference from webassembly on apple silicon, you can use this article. That's really about it.
      Now if you care about GPU inference from wasm on apple silicon, and you found problems with this articles content, then great, comment about it. If you say that the problem with the content is due to the usual surface level slop LLMs belt out, then great complain about LLMs. But your comment didn't say anything about gpu inference from wasm on apple silicon.
  - ben-schaaf 6 hours ago
    > the thought and idea is theirs, it was communicated
    Are they? I don't know how much they used AI, the entire article could be written from a one sentence prompt and so I'd argue that the thoughts and ideas are not their own.
    This isn't like using a spell checker, it's like using a ghost writer.
  - rdedev 6 hours ago
    > language itself is one tool of communication invented to supersede body-language and grunting and noises
    That's a pretty utilitarian view of language. How would it feel if everyone spoke and wrote like a PR representative? This is what an article written by an LLM is starting to sound like.
    I'm even willing to argue that the way in which you convey your ideas is as important as the idea itself. Like we could all be eating soylent for our daily nutritional requirements but we don't. The taste of the food we eat is important. It's the same with writing for me
  - nullsanity 6 hours ago
    [dead]
wmf 9 hours ago
This works in wasmtime not browsers.
[-]
- thrill 9 hours ago
  Why would it not work in a browser?
  [-]
  - m00dy 9 hours ago
    it would be hard to share the same memory location with gpu, right ?
    [-]
    - junon 9 hours ago
      If the browser supported it it could expose it via a buffer view or something, but that'd be quite the security surface area one would think.
EthanFrostHI 6 hours ago
[flagged]
pjmlp 5 hours ago
Goodbye WebAssembly "security".
Also, these folks should be amazed by 8 and 16 bit games development, or games consoles in general.