Hacker Newsnew | past | comments | ask | show | jobs | submit | latchkey's commentslogin

Instead of forcing iOS onto laptops, they locked down MacOS.

For decades now, we've had to deal with articles like this one. People who know just enough to sound credible mislead those who known even less into mutilating their systems in the name of "optimization". This genre is a menace.

Much harm has arisen out of the superstitious fear of 100% CPU use. Why wouldn't you want a compute bound task to use all available compute? It'll finish faster that way. We keep the system responsive with priorities and interactivity-aware thresholds, not by making a scary-looking but innocuous number go down in an ultimately counterproductive way.

The article's naive treatment of memory is also telling. The "Memory" column in the task manager is RSS. It counts shared memory multiple times, once for each process. You literally can't say the 5MB "adds up". It quite literally is not amenable to the arithmetic operation of addition in a way that produces a physically meaningful result. It is absolute nonsense, and when you make optimization decisions based on garbage input, you produce garbage output.

It's hard to blame Apple for locking down the OS core like this. People try to "optimize" Windows all the time by disabling load-bearing services that cost almost nothing just so "number go down" and they get that fuzzy feeling they've optimized their computer. Then the rest of the world has to deal with bug reports in which some API mysteriously doesn't work because the user broke his own system but blames you anyway.


> Much harm has arisen out of the superstitious fear of 100% CPU use. Why wouldn't you want a compute bound task to use all available compute? It'll finish faster that way.

Because it hurts the speed/responsiveness of stuff you actually care about. It also has other negative side effects like fan noise and temperature, which with bad insulation in MacBook it can even physically burn. Pretty obvious stuff if you don't discard issues as superstitions

> It'll finish faster that way.

The usefulness of which might be none: some background maintenance process finishes in 5 seconds that I don't notice vs in 1 seconds while turning the fans on or making my app slower

> We keep the system responsive with priorities and interactivity-aware thresholds,

Only in your fantasy, in reality you fail at that, so "superstitions" arise

> It's hard to blame Apple for locking down the OS core like this.

Of course, if you ignore real issues with bloat, and only notice the mistakes, but that's a self-inflicted perspective

> by disabling load-bearing services

The article mentions that there is not even basic information on what services do, it's similar in Windows, so maybe the proper way out is teach people and also debloat the OS proactively to give them less of an incentive to do it themselves?


The right way to make the system stick to thermal constraints is to modulate clock speed and cooling, not randomly throttle workloads so some task manager reports they're running inefficiently

The right way is also not to bloat your OS, but again, we live in a reality we live in, where people also go left!

> The "Memory" column in the task manager is RSS. It counts shared memory multiple times, once for each process.

It’s “footprint” and no it does not do that


Perhaps it did a while ago. Now, https://www.bazhenov.me/posts/activity-monitor-anatomy/ is a good read. Thanks. It's much better than RSS, although I'm at still not sure that I like the inclusion of private compressed memory. In any case, thanks for the correction.

One of the ways both macOS and iOS get good battery life is burst-y CPU loads to return the CPU to idle as quickly as possible. They also both run background tasks like Spotlight on the e-cores whenever possible. So some process maxing out an e-core is using a lot less power than one maxing out a p-core. Background processes maxing out a core occasionally is not as much of a problem as a lot of people seem to assume.

You're not wrong. Let's hope that articles, like the OP's post, shed light on further optimizations that Apple is now fully in charge of making.

I see nothing in the post that convinces me Apple ought to change a single thing.

To be fair, satoshi stepped back too.

Back when PoD t-shirts were more of a thing..

https://www.wired.com/2015/05/techs-failures-live-t-shirts/



We don't have lot of GPUs available right now, but it is not crazy hard to get it running on our MI300x. Depending on your quant, you probably want a 4x.

ssh admin.hotaisle.app

Yes, this should be made easier to just get a VM with it pre-installed. Working on that.


Unless using docker, if vllm is not provided and built against ROCm dependencies it’s going to be time consuming.

It took me quite some time to figure the magic combination of versions and commits, and to build each dependency successfully to run on an MI325x.


Agreed, the OOB experience kind of suck.

Here is the magic (assuming a 4x)...

  docker run -it --rm \
  --pull=always \
  --ipc=host \
  --network=host \
  --privileged \
  --cap-add=CAP_SYS_ADMIN \
  --device=/dev/kfd \
  --device=/dev/dri \
  --device=/dev/mem \
  --group-add render \
  --cap-add=SYS_PTRACE \
  --security-opt seccomp=unconfined \
  -v /home/hotaisle:/mnt/data \
  -v /root/.cache:/mnt/model \
  rocm/vllm-dev:nightly
  
  mv /root/.cache /root/.cache.foo
  ln -s /mnt/model /root/.cache
  
  VLLM_ROCM_USE_AITER=1 vllm serve zai-org/GLM-4.7-FP8 \
  --tensor-parallel-size 4 \
  --kv-cache-dtype fp8 \
  --quantization fp8 \
  --enable-auto-tool-choice \
  --tool-call-parser glm47 \
  --reasoning-parser glm45 \
  --load-format fastsafetensors \
  --enable-expert-parallel \
  --allowed-local-media-path / \
  --speculative-config.method mtp \
  --speculative-config.num_speculative_tokens 1 \
  --mm-encoder-tp-mode data

Speculative decoding isn’t needed at all, right? Why include the final bits about it?


GLM 4.7 supports it - and in my experience for Claude code a 80 plus hit rate in speculative is reasonable. So it is a significant speed up.


I find it hard to trust post training quantizations. Why don't they run benchmarks to see the degradation in performance? It sketches me out because it should be the easiest thing to automatically run a suite of benchmarks

Unsloth doesn't seem to do this for every new model, but they did publish a report on their quant methods and the performance loss it causes.

https://unsloth.ai/docs/basics/unsloth-dynamic-2.0-ggufs

It isn't much until you get down to very small quants.


Yes I usually run Unsloth models, however you are linking to the big model now (355B-A32B), which I can't run on my consumer hardware.

The flash model in this thread is more than 10x smaller (30B).


When the Unsloth quant of the flash model does appear, it should show up as unsloth/... on this page:

https://huggingface.co/models?other=base_model:quantized:zai...

Probably as:

https://huggingface.co/unsloth/GLM-4.7-Flash-GGUF


it'a a new architecture. Not yet implemented in llama.cpp

issue to follow: https://github.com/ggml-org/llama.cpp/issues/18931


One thing to consider is that this version is a new architecture, so it’ll take time for Llama CPP to get updated. Similar to how it was with Qwen Next.

Apparently it is the same as the DeepseekV3 architecture and already supported by llama.cpp once the new name is added. Here's the PR: https://github.com/ggml-org/llama.cpp/pull/18936

has been merged

There are a bunch of 4bit quants in the GGUF link and the 0xSero has some smaller stuff too. Might still be too big and you'll need to ungpu poor yourself.

yeah there is no way to run 4.7 on a 32g vram this flash is something that im also waiting to try later tonight

Why not? Run it with vLLM latest and enable 4bit quantization with bnb, and it will quantize the original safetensors on the fly and fit your vram.


Except this is GLM 4.7 Flash which has 32B total params, 3B active. It should fit with a decent context window of 40k or so in 20GB of ram at 4b weights quantization and you can save even more by quantizing the activations and KV cache to 8bit.

yes, but the parrent link was to the big glm 4.7 that had a bunch of ggufs, the new one at the point of posting did not, nor does it now. im waiting for unsloth guys for the 4.7 flash

I also wouldn't trust it. Zero real oversight on food quality in VN.

Having power when your entire neighborhood is off, priceless.

[edit: yes, I assume you also get batteries, I know that solar alone doesn't magically power your house.]


Outside transfer switch and a 10-20kw portable generator is like $4-5k. It requires manual switching but it works for us in our hurricane-prone region. Helped with last years 1 in a 100 year winter storm in our southern region.

Battery/solar doesn’t make sense in my opinion. Too many years to break even like this parent comment said and by the time you break even at 10 years, your system either is too inefficient or needs replacing. At least with the portable generator, you can move it with you to a new home and use it for other things like camping or RVing.


Context: I’m in the Netherlands. With taxes, power is around 25cent/kWh for me. For reference: Amsterdam is around a latitude of 52N, which is north enough that it only hits Alaska, not the US mainland.

I installed 2800Wp solar for about €2800 ($3000, payback in: 4-5 years), and a 5kWh battery for €1200 ($1300) all in. The battery has an expected payback time of just over 5 years, and I have some backup power if I need it.

I’m pretty sure about the battery payback, because I have a few years of per second consumption data in clickhouse and (very conservatively) simulated the battery. A few years ago any business case on storage was completely impossible, and now suddenly we’re here.

I could totally see this happen for the US as prices improve further, even if it’s not feasible today.


Is it priceless? I literally wouldn't pay more than $200 to have electricity for a day while the whole neighborhood doesn't. Anything more and I'd prefer to just keep the money in my pocket to be honest.

In my country I've never had to deal with more than 15 minutes, twice in my life. In other countries its sometimes been a day but really I just go on with my life.


Whats funny about that -- is you assume thats the case - but a lot of solar isn't installed to be backup power. With Storage yes, but straight up solar -> no.

It's not the default but you can get it installed that way or get it adapted later (less than ideal if you end up having to replace the inverter).

Yea, that costs extra. My dad went for the natural gas generator.

Well there are other, far cheaper ways to get that.

99% of systems are grid tie, so unless you’re spending another $7k for an ATS and associated infrastructure or you’re 100% off grid, your power still goes off.

For others who aren't up on the lingo:

"An ATS (Automatic Transfer Switch) for solar is a crucial device that seamlessly switches your home's power between the utility grid, your solar panels/battery bank...


And I should clarify that you technically can get away with a less expensive interlock system, but you're still paying a few thousand dollars to have your panel replaced (unless you feel comfortable doing that sort of electrical work yourself).

Making a system non-grid-tie is comparatively expensive, that's why grid tie is so common. People think you add solar + batteries and you're ready for doomsday - not quite.


A few of the alternatives have been reviewed by Will Prowse as well. His YT is a treasure of information. https://www.youtube.com/@WillProwse

16 years ago, I wrote a Java client that is still in use today in quite a number of products. It wasn't that bad.

https://github.com/lookfirst/sardine


Thank you! I also use it.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: