Darkbloom – Private inference on idle Macs

(darkbloom.dev)

132 points | by twapi 2 hours ago

24 comments

tgma 1 hour ago
I installed this so you don't have to. It did feel a bit quirky and not super polished. Fails to download the image model. The audio/tts model fails to load.
In 15 minutes of serving Gemma, I got precisely zero actual inference requests, and a bunch of health checks and two attestations.
At the moment they don't have enough sustained demand to justify the earning estimates.
[-]
- thatxliner 12 minutes ago
  and I don't think they ever will unless they're highly competitive (hopefully that price they have stays? at least for users)
  I was thinking of building this exact thing a year ago but my main stopper was economics: it would never make sense for someone to use the API, thus nobody can make money off of zero demand.
  I guess we just have to look at how Uber and Airbnb bootstrapped themselves. Another issue with my original idea was that it was for compute in general, when the main, best use-case, is long(er)-running software like AI training (but I guess inference is long running enough).
  But there already exist software out there that lets you rent out your GPU so...
  [-]
  - starkeeper 3 minutes ago
    What's a good place to do this?
  - tgma 9 minutes ago
    People underestimate how efficient cost/token is for beefy GPUs if you are able to batch.
kennywinker 2 hours ago
I have a hard time believing their numbers. If you can pay off a mac mini in 2-4 months, and make $1-2k profit every month after that, why wouldn’t their business model just be buying mac minis?
[-]
- chaoz_ 1 hour ago
  Solid q. I think the part of it is that it’s really easy to attract some “mass” (capital) of users, as there are definitely quite a few of idle Macs in the world.
  Non-VC play (not required until you can raise on your own terms!) and clear differentiation.
  If you want to go full-business-evaluation, I would be more worried about someone else implementing same thing with more commission (imo 95% and first to market is good enough).
- znnajdla 55 minutes ago
  The numbers are obviously high, because if this takes off then the price for inference will also drop. But I still think it’s a solid economic model that benefits low income countries the most. In Ukraine, for example, I know people who live on $200/month. A couple Mac Minis could feed a family in many places.
  As a business owner, I can think of multiple reasons why a decentralized network is better for me as a business than relying on a hyperscaler inference provider. 1. No dependency on a BigTech provider who can cut me off or change prices at any time. I’m willing to pay a premium for that. 2. I get a residential IP proxy network built-in. AI scrapers pay big money for that. 3. No censorship. 4. Lower latency if inference nodes are located close to me.
  [-]
  - kennywinker 41 minutes ago
    How many of those people who could live off $200USD/month can afford or already have a mac mini in the house?
- dnnddidiej 33 minutes ago
  It is too good to be true. When you see it is making more than a claude code subscription for fuck all work per day.
  Prolly gonna make $50 a year tops.
- thih9 1 hour ago
  > These are estimates only. We do not guarantee any specific utilization or earnings. Actual earnings depend on network demand, model popularity, your provider reputation score, and how many other providers are serving the same model.
  Others are reporting low demand, eg.: https://news.ycombinator.com/item?id=47789171
- gleenn 1 hour ago
  Power and racking are difficult and expensive?
  [-]
  - kennywinker 1 hour ago
    How difficult? Is running 1000 minis worth $1,000,000/month of effort? I feel like it is.
    [-]
    - runako 1 hour ago
      There are many people who do not have ready access to a million dollars to purchase said Mac minis, much less the operating capital to rack & operate them.
      Very smart play to build a platform, get scale, and prove out the software. Then either add a small network fee (this could be on money movement on/off platform), add a higher tier of service for money, and/or just use the proof points to go get access to capital and become an operator in your own pool.
      [-]
      - nxpnsv 1 hour ago
        If those numbers are true, they could tart with one Mac and can double every few months. But, I guess there are also many people who do not have ready access to whatever a Mac mini costs either...
    - ffsm8 1 hour ago
      And at that scale (1k) it ain't even that hard, a single room could be enough to hazardly drop them on shelves with a big fan to draw out the heat
- foota 2 hours ago
  Capital and availability?
  [-]
  - kennywinker 1 hour ago
    I guess if it only works at scale capital is maybe the answer. Like enough cash to buy 5 or 10 or even 100 minis seem doable - but if the idea only works well when you have 10,000 running - that makes some sense.
nl 2 hours ago
They use the TEE to check that the model and code is untampered with. That's a good, valid approach and should work (I've done similar things on AWS with their TEE)
The key question here is how they avoid the outside computer being able to view the memory of the internal process:
> An in-process inference design that embeds the in- ference engine directly in a hardened process, elimi- nating all inter-process communication channels that could be observed, with optional hypervisor mem- ory isolation that extends protection from software- enforced to hardware-enforced via ARM Stage 2 page tables at zero performance cost.[1]
I was under the impression this wasn't possible if you are using the GPU. I could be misled on this though.
[1] https://github.com/Layr-Labs/d-inference/blob/master/papers/...
[-]
- flockonus 1 hour ago
  While they do make this argument, realistically anyone sending their prompt/data to an external server should assume there will be some level of retention.
  And more so in particular, anyone using Darkbloom with commercial intents should only really send non-sensitive data (no tokens, customer data, ...) I'd say only classification tasks, imagine generation, etc.
- ramoz 1 hour ago
  Macs do not have an accessible hardware TEE.
  Macs have secure enclaves.
  [-]
  - nl 1 hour ago
    Good point!
    But they argue that:
    > PT_DENY_ATTACH (ptrace constant 31): Invoked at process startup before any sensitive data is loaded. Instructs the macOS kernel to permanently deny all ptracerequests against this process, including from root. This blocks lldb, dtrace, and Instruments.
    > Hardened Runtime: The binary is code-signed with hardened runtime options and explicitly without the com.apple.security.get-task-allow entitlement. The kernel denies task_for_pid() and mach_vm_read()from any external process.
    > System Integrity Protection (SIP): Enforces both of the above at the kernel level. With SIP enabled, root cannot circumvent Hardened Runtime protections, load unsigned kernel extensions, or modify protected sys- tem binaries. Section 5.1 proves that SIP, once verified, is immutable for the process lifetime.
    gives them memory protection.
    To me that is surprising.
    [-]
    - dinobones 1 hour ago
      Couldn't someone just uhh... patch their macOS/kernel, mock these things out, then behold, you can now access all the data?
      If it's not running fully end to end in some secure enclave, then it's always just a best effort thing. Good marketing though.
    - ramoz 1 hour ago
      I'm not arguing anything. This is how it works. There is no but.
      Protection here is conditional, best-effort. There are no true guarantees, nor actual verifiability.
pants2 1 hour ago
Cool idea. Just some back-of-the-envelope math here (not trusting what's on their site):
My M5 Pro can generate 130 tok/s (4 streams) on Gemma 4 26B. Darkbloom's pricing is $0.20 per Mtok output.
That's about $2.24/day or $67/mo revenue if it's fully utilized 24/7.
Now assuming 50W sustained load, that's about 36 kWh/mo, at ~$.25/kWh approx. $9/mo in costs.
Could be good for lunch money every once in a while! Around $700/yr.
[-]
- kennywinker 1 hour ago
  Their example big earner models are FLUX.2 Klein 4B and FLUX.2 Klein 9B, which i imagine could generate a lot more tokens/s than a 26B model on your machine.
  For Gemma 4 26B their math is:
  single_tok/s = (307 GB/s / 4 GB) * 0.60 = 46.0 tok/s
  batched_tok/s = 46.0 * 10 * 0.9 = 414.4 tok/s
  tok/hr = 414.4 * 3600 = 1,492,020
  revenue/hr = (1,492,020 / 1M) * $0.200000 = $0.2984
  I have no idea if that is a good estimate of how much an M5 Pro can generate - but that’s what it says on their site.
  They do a bit of a sneaky thing with power calculation: they subtract 12Ws of idle power, because they are assuming your machine is idling 24/7, so the only cost is the extra 18W they estimate you’ll use doing inference. Idk about you, but i do turn my machine off when i am not using it.
- mavamaarten 1 hour ago
  Well. Running your machine to do inference will utilize more than 50W sustained load, I'd say more than double that. Plus electricity is more expensive here (but granted, I do have solar panels). Plus don't forget to factor in that your hardware will age faster.
  I'd say it's not worth it. But the idea is cool.
  [-]
  - kennywinker 1 hour ago
    Their estimate is based on significantly lower consumption when under load. E.g. 25W for an M4 Pro mac mini. I have no idea if that’s realistic - but the m4s are supposedly pretty efficient (https://www.jeffgeerling.com/blog/2024/m4-mac-minis-efficien...)
- nnx 45 minutes ago
  > My M5 Pro can generate 130 tok/s (4 streams) on Gemma 4 26B.
  This seems high. At which quantization? Using LM Studio or something else?
  Note: Darkbloom seems to run everything on Q8 MLX.
- todotask2 1 hour ago
  OpenAI has only about 5% paying customers, how does it generate revenue?
  I don’t think this is a sustainable business model. For example, Cubbit tried to build decentralised storage, but I backed out because better alternatives now exist, and hardware continues to improve and become cheaper over time.
  Your electricity and ownership are going to get lower return and does not actually requce CO2.
- znnajdla 1 hour ago
  Maybe lunch money for you, but there are people in some parts of the world who live on $200/month. Like Ukraine.
  [-]
  - sethherr 1 hour ago
    But they probably don’t have M5 MacBook pros idling
    [-]
    - tonyedgecombe 18 minutes ago
      Or reliable energy or internet.
- xendo 1 hour ago
  Any idea what makes for such a diff between your and theirs numbers? Batching? Or could they do a crazy prefix caching across all nodes to reduce the actual processing.
- chaoz_ 1 hour ago
  Genuinely curious, is there any way to estimate amortization of Mac?
  I’d imagine 1 year of heavy usage would somehow affect its quality.
  [-]
  - pants2 1 hour ago
    Yeah, only way to get there is assuming they're not giving prompt caching discounts while my laptop is getting prompt caching benefits, with very many large prompts. So yes I am skeptical of their numbers.
- MrDrMcCoy 1 hour ago
  Don't forget to factor in cooling costs.
  [-]
  - pants2 1 hour ago
    Or saved heating costs in the winter!
ramoz 1 hour ago
Unfortunately, verifiable privacy is not physically possible on MacBooks of today. Don't let a nice presentation fool you.
Apple Silicon has a Secure Enclave, but not a public SGX/TDX/SEV-style enclave for arbitrary code, so these claims are about OS hardening, not verifiable confidential execution.
It would be nice if it were possible. There's a lot of cool innovations possible beyond privacy.
[-]
- znnajdla 52 minutes ago
  As if you get privacy with the inference providers available today? I have more trust in a randomly selected machine on a decentralized network not being compromised than in a centralized provider like OpenAI pinky promising not to read your chats.
  [-]
  - ramoz 49 minutes ago
    Inference providers don't claim private inference. However, they must uphold certain security and legal compliances.
    You have no guarantees over any random connected laptop connected across the world.
- geon 1 hour ago
  Every hardware key will be broken if there is enough incentive to do so. Their claims read like pure hubris.
  [-]
  - znnajdla 51 minutes ago
    Who cares about AI privacy? Most people don’t. If you do, run locally.
amdivia 9 minutes ago
Until we have breakthroughs in homomorphic encryption compute, I won't trust such privacy claims
0xbadcafebee 37 minutes ago
I'm not sure how the economics works out. Pricing for AI inference is based on supply/demand/scarcity. If your hardware is scarce, that means low supply; combine with high demand, it's now valuable. But what happens if you enable every spare Mac on the planet to join the game? Now your supply is high, which means now it's less valuable. So if this becomes really popular, you don't make much money. But if it doesn't become somewhat popular, you don't get any requests, and don't make money. The only way they could ensure a good return would be to first make it popular, then artificially lower the number of hosts.
utkarsh_apoorva 29 minutes ago
Like the concept. This is not a business - should be an open source GitHub repo maybe.
They lost me with just one microcopy - “start earning”. Huge red signal.
[-]
- Hamuko 13 minutes ago
  But why would I donate my Mac Studio's idle time if I couldn't "start earning"?
TuringNYC 2 hours ago
I'd love a way to do this locally -- pool all the PCs in our own office for in-office pools of compute. Any suggestions from anyone? We currently run ollama but manually manage the pools
[-]
- damezumari 1 hour ago
  https://github.com/exo-explore/exo
stuxnet79 1 hour ago
So basically ... Pied Piper.
[-]
- JaggerJo 42 minutes ago
  finally!
pants2 1 hour ago
You might not even know it as a user but the payment/distribution here is all built on crypto+stablecoins. This is a great use case for it.
[-]
- rvz 1 hour ago
  Good. Another great non-speculative use-case for crypto and stablecoins.
  [-]
  - kennywinker 1 hour ago
    Amazing! Let me see, doing the math r/n… carry the one, yup that makes the total number of non-speculative uses for crypto and stablecoin: 1
    ;P
dr_kiszonka 1 hour ago
"These are estimates only. We do not guarantee any specific utilization or earnings. Actual earnings depend on network demand, model popularity, your provider reputation score, and how many other providers are serving the same model.
When your Mac is idle (no inference requests), it consumes minimal power — you don't lose significant money waiting for requests. The electricity costs shown only apply during active inference.
Text models typically see the highest and most consistent demand. Image generation and transcription requests are bursty — high volume during peaks, quiet otherwise."
gndp 49 minutes ago
They are almost claiming FHE, isn't it just a matter of creating the right tool to get the generated tokens from RAM before it gets encrypted for transfer. How is it fundamentally different than chutes?
BingBingBap 1 hour ago
Generate images requested by randoms on the internet on your hardware.
What could possibly go wrong?
jboggan 55 minutes ago
Is this named after the 2011 split album with Grimes and d'Eon?
resonanormal 48 minutes ago
I could imagine this working for the openclaw community if the price is right
koliber 56 minutes ago
Apple should build this, and start giving away free Macs subsidized by idle usage.
chaoz_ 2 hours ago
That solution actually makes great sense. So Apple won in some strange way again?
Guess there are limitations on size of the models, but if top-tier models will getting democratized I don’t see a reason not to use this API. The only thing that comes to me is data privacy concerns.
I think batch-evals for non-sensitive data has great PMF here.
[-]
- 59nadir 9 minutes ago
  > So Apple won in some strange way again?
  Heh, what did they win exactly? This is just a way for another company to extract value out of the single region of the world where Apple is a relevant vendor, and it happens to be the one where it's the easiest to pull people into schemes.
- rvz 2 hours ago
  Yes. They never needed to participate in the AI race to zero.
  Because they were already at the finish line with Apple Silicon.
  > I don’t see a reason not to use this API. The only thing that comes to me is data privacy concerns.
  The whole inference is end-to-end encrypted so none of the nodes can see the prompts or the messages.
  [-]
  - chaoz_ 1 hour ago
    Fun question: can some (part of it) be a crypto token that I can buy? :))
    That would finally be a crypto thing which is backed by value I believe in.
bentt 2 hours ago
I thought this was Apple’s plan all along. How is this not already their thing?
jaylane 22 minutes ago
latest (v0.3.8) tar doesn't contain image-bank or gRPCServerCLI dependencies so installer fails.
DeathArrow 2 hours ago
Why only Macs? If we think of all PCs and mobile phones running idle, the potential is much larger.
[-]
- btown 2 hours ago
  From the paper: https://github.com/Layr-Labs/d-inference/blob/master/papers/...
  > Apple’s attestation servers will only generate the FreshnessCode for a genuine device that checks in via APNs. A software-only adversary cannot forge the MDA certificate chain (Assumption 3). Com- bined with SIP enforcement (preventing binary replace- ment) and Secure Boot (preventing bootloader tampering), this provides strong evidence that the signing key resides in genuine Apple hardware.
- nl 2 hours ago
  They use the Apple TEE which they claim also protects GPU memory (I wasn't aware of this).
  NVidia data center GPUs have a similar path, but not their consumer ones. Not sure about the NVidia Spark.
  It's possible AMD Strix Halo can do this, but unlikely for any other PC based GPU environments.
  [-]
  - MrDrMcCoy 1 hour ago
    Epyc has that VM encrypted memory thing, which comes pretty close. It does raise an interesting question, though: would a PCIe card passed through to a VM be able to DMA access the memory of neighboring devices?
- stryakr 2 hours ago
  simple first target, PCs have more variability
dcreater 1 hour ago
I cant buy credits - says page could not load
rvz 2 hours ago
Should have called it “Inferanet” with this idea.
Away this looks like a great idea and might have a chance at solving the economic issue with running nodes for cheap inference and getting paid for it.
0xelpabl0 46 minutes ago
[dead]