I always wonder how many system crashes that we put on the software or the OS are actually just sub optimal components. Computers are so complex and so fast that just a little bit of instability can probably lead to data corruption.
The optimist in me hopes that the bullwhip effect will lead to cheap ram in a few years, and that the glut allows for the wider adoption and support of ECC memory.
I’d just like to see a repeat of the glut of HBM-backed processors like the Xeon Max 9480 that dipped as low as 900/cpu, about 2000ish all in, and with bandwidth that compares favorably to a 3090.
Not just wholesale crashes, but all sorts of misbehavior. For example, cheap WiFi/BT/ethernet can wreak havoc on your connectivity and out of spec USB peripherals can cause all sorts of problems. Both can bring sleep/power saving problems.
Most people using computers aren't technical enough to be able to discern these things, however, and many buy the cheapest thing on the shelf and so these subpar components persist.
Yup. When building "upcycled" PCs out of used second-to-last-gen components, I learned very quickly to only ever use brand-new, high quality PSUs ... the alternative is insanity
It has been 25 years, but back in college I had a job refurbishing and repairing PCs. Most problems were caused by cheap no name hardware. Most the quality hardware rarely had problems.
Maybe when quality hardware has problems, the owner knows how to deal with it, but when no-name hardware has problems, the owner has no clue how to build a computer
Maybe. But then again, as someone who dual boots, I see one of the OS crashing and giving an alround worse experience then the other, on the same exact hardware, while the other just chugs along.
Now, I'm not someone good at maths or physics, so maybe, somehow, it's actually more likely than not that the worse OS gets to run when there's worse solar activity going on or whatever else has en effect on my hardware, which also doesn't seem to affect memtest for some reason.
It could easily be flakey hardware and different drivers. Not necessarily better or worse, but one driver cause the hardware to ocassionaly fail in exciting ways, like DMA to the wrong address if jusy the right access patterns happen.
If you've got an IO-MMU and everything aligns properly, devices can't DMA to the wrong place anymore, which might make it easier to track things down.
Sometimes it's both. I had some crazy data corruption problems that turned out to be a one-two punch of a buggy anti-cheat driver from a game I was playing and a defective M.2 SSD slot on my motherboard. Without the combination of both factors everything was fine, but when I played the game with that slot populated, the disk in that slot started getting corrupted and failing to respond to requests from the OS (eventually hanging the system).
Maybe they just don't really use anything else, but I just love that the most reliable memory is just Kingston ValueRAM. No fancy heat spreader or packaging, not even a black PCB, just chips on a classic green circuit board.
Workstations/servers have forced air cooling that drives a significant amount of airflow over the ram sticks. Gaming PCs don't. I don't think you can make the assumption that heat spreaders / sinks on ram don't help in them.
I thought the gaming PC airflow was front fans => cpu cooler => back (and top) exhaust fan(s) which puts the RAM sticks in the smack middle of the airflow.
That kingston ram is DDR5-5600, with smaller memory sizes, and has a longer warranty. This suggests that the product is binned memory from a line that is relatively mature (and by extension low failure rates).
And, because it's clocked lower, it runs cooler, which reduces failure rate.
On top of that, server memory is usually binned more strictly. And, it usually has bits missing for ECC, custom controller firmware, and cutting edge processes for packing more memory into the form factor.
Now as a consumer you may think an LED is "riced" out, but I think custom firmware on your ram built for your application is way more "riced".
> None of that stuff actually helps
It probably actually does, especially for "high end" ram. IE stuff running at much higher transfer rates. Heat and voltage are the enemies of stability here. A heat sink/transfer shield is certainly going to help (in theory, anyways)
On top of that, server ram has a higher expectation of cooling quality than consumer, which can be anything goes.
Finding consumer hardware that isn't riced to the max is getting hard. I wish pcpartpicker had a checkbox to filter out anything with RGB lights. Or one to filter out things marketed towards 13 year old boys - but that might be harder.
Preempting the inevitable comment: "just turn it off". That doesn't always work. I bought a mouse once, I think it was Razor, that required their electron slop-ware to control the lights. And if you didn't keep the software running, the lights would default to on. I had to take it apart to desolder the LEDs and throw them in the trash. And of course, like all mice I've seen, the screws were under the teflon feet, so I had to mangle them slightly to get in there. It was a decent mouse otherwise, but screw that nonsense.
Ironically for gaming usage that electron slopware can get you VAC banned (I think you need to have used certain features not merely have it installed). I should probably find the OSS alternative that allows me to turn it off.
It's also getting to a point where I wind up paying a price premium for non capital g gamer hardware. Fortunately opaque cases are still a thing and can hide some of it.
The closest I've seen to this are the ASUS ProArt cases/components, which lean toward a modernized, stealthy workstation vibe, as well as some cases from botique Chinese manufacturers like Streacom and Jonsbo/Jonsplus which also go for a sleek but more professional and subdued aesthetic.
I'm now at the point I research parts to see where the LED control is stored.
My keyboard LED is controlled internally without software. My mouse requires software to set, but there is open source rgb control software that was trivial to install and set once, uninstall and forget.
The only one I got wrong was my GPU, which apparently isn't rgb but just has a strip of coloured light beaming at all times.
Thankfully my case isnt mesh everything, so most light is kept inside.
Just get the logitech competition. I've had multiple G series mice, they all have on-board memory. I have a cooler master which behaves the same.
You can program them from a VM, then toss that away and the mouse remembers its settings, even multiple "profiles". You don't have to put up with electron slop-ware or whatever the crap dev platform du jour is. They just work.
This is honestly why if it didn't start life as an OEM (Dell, HP, Lenovo) part, I just buy Supermicro workstation boards, as they tend to use the better workstation chipsets, don't have any of the silly shit like RGB, and seem to just be more durable and better built in my experience.
I had a logitech that did the same thing. When I found out I had to use the Windows bloatware to turn the LEDs off I became so enraged that I opened the mouse with my bear hands then twisted and ripped the LEDs off the board.
My main home PC is a Puget Serenity workstation from 2017. It has been rock solid and outperforms much newer laptops. And it has almost zero fan noise which is a priority for me. Unfortunately it looks like they may have discontinued the Serenity model, at least I don't see it on the website anymore.
Fractal still sells a Serenity workstation[1], but it's essentially an off-the-shelf AMD Ryzen-based system, installed into a Fractal Design Define 7 Mini case, with a Noctua tower air cooler and case fans replacing the stock cooling. They have a variety of photos showing their customized fan setup in various configurations.[2]
It's a reasonably well-built system, but $3,500 USD is hard to justify for a basic system with an 8-core CPU, 32 GB of RAM, and no discrete GPU, especially given that it's using parts that you can just purchase and assemble yourself.
I know that prices of some components have increased significantly, but not by THAT much.
I've been using Puget workstations for like 10 years now and the builds are really reliable. The one time I had issues with a build (not their fault - defective parts), they went the extra mile and rebuilt it after normal troubleshooting failed.
They do a lot of careful thermal testing and for the inside of their builds they often cut special acrylic dividers, flowguides, supports etc to manage airflow and make sure nothing comes loose like a heavy GPU.
I am not surprised at all to see the W series Xeons with very high reliability. I know they tend to be pricier than AMD, and maybe not as fast, but I can't recall the last time I managed to kill an E3/E5/W series Xeon in the last 15 years, no matter how hard they're worked. Intel pooched it with the i-series core parts, but the workstation xeons have always been really reliable and more thrifty with power especially at idle than AMD.
Can at least vouch for the (Sapphire Rapids) Xeons. With the right cooling, you can throw absurd TDP generating loads and they just keep on chugging along.
With octochannel memory, an 8480 can be slightly indistinguishable from an older GPU if used that way.
Most people using computers aren't technical enough to be able to discern these things, however, and many buy the cheapest thing on the shelf and so these subpar components persist.
Now, I'm not someone good at maths or physics, so maybe, somehow, it's actually more likely than not that the worse OS gets to run when there's worse solar activity going on or whatever else has en effect on my hardware, which also doesn't seem to affect memtest for some reason.
But the likelihood can't be that high. Can it?
If you've got an IO-MMU and everything aligns properly, devices can't DMA to the wrong place anymore, which might make it easier to track things down.
Wild troubleshooting adventure.
The difference in performance between "good" and "bad" DDR5 can be very large.
Thankfully Industrial Motherboards exist though not cheap or simple to obtain depending. Examples:
https://www.asrockind.com/en-gb/industrial-motherboards
https://www.advantech.com/en-us/products/microatx-motherboar...
Does it?
That kingston ram is DDR5-5600, with smaller memory sizes, and has a longer warranty. This suggests that the product is binned memory from a line that is relatively mature (and by extension low failure rates).
And, because it's clocked lower, it runs cooler, which reduces failure rate.
On top of that, server memory is usually binned more strictly. And, it usually has bits missing for ECC, custom controller firmware, and cutting edge processes for packing more memory into the form factor.
Now as a consumer you may think an LED is "riced" out, but I think custom firmware on your ram built for your application is way more "riced".
> None of that stuff actually helps
It probably actually does, especially for "high end" ram. IE stuff running at much higher transfer rates. Heat and voltage are the enemies of stability here. A heat sink/transfer shield is certainly going to help (in theory, anyways)
On top of that, server ram has a higher expectation of cooling quality than consumer, which can be anything goes.
Preempting the inevitable comment: "just turn it off". That doesn't always work. I bought a mouse once, I think it was Razor, that required their electron slop-ware to control the lights. And if you didn't keep the software running, the lights would default to on. I had to take it apart to desolder the LEDs and throw them in the trash. And of course, like all mice I've seen, the screws were under the teflon feet, so I had to mangle them slightly to get in there. It was a decent mouse otherwise, but screw that nonsense.
It's also getting to a point where I wind up paying a price premium for non capital g gamer hardware. Fortunately opaque cases are still a thing and can hide some of it.
The downside is that they're not cheap.
https://www.asus.com/Microsite/CSM/
My keyboard LED is controlled internally without software. My mouse requires software to set, but there is open source rgb control software that was trivial to install and set once, uninstall and forget.
The only one I got wrong was my GPU, which apparently isn't rgb but just has a strip of coloured light beaming at all times.
Thankfully my case isnt mesh everything, so most light is kept inside.
You can program them from a VM, then toss that away and the mouse remembers its settings, even multiple "profiles". You don't have to put up with electron slop-ware or whatever the crap dev platform du jour is. They just work.
I had a logitech that did the same thing. When I found out I had to use the Windows bloatware to turn the LEDs off I became so enraged that I opened the mouse with my bear hands then twisted and ripped the LEDs off the board.
It's a reasonably well-built system, but $3,500 USD is hard to justify for a basic system with an 8-core CPU, 32 GB of RAM, and no discrete GPU, especially given that it's using parts that you can just purchase and assemble yourself.
I know that prices of some components have increased significantly, but not by THAT much.
[1] https://www.pugetsystems.com/solutions/more-workstations/qui...
[2] https://www.pugetsystems.com/parts/photography/Additional-Co...
They do a lot of careful thermal testing and for the inside of their builds they often cut special acrylic dividers, flowguides, supports etc to manage airflow and make sure nothing comes loose like a heavy GPU.
With octochannel memory, an 8480 can be slightly indistinguishable from an older GPU if used that way.
Either them or a Falcon Northwest. What other builders exist at this level of premium quality?