More

gnulinux · 2025-08-10T22:38:07 1754865487

Maybe, ever since I graduated from college I learned again and again that pretty much anything worth thinking about in life boils down to math for me. I'd maybe/probably study CS, as a minor or double major, but Pure/Applied Math programs can be more intellectually enriching in this day and age. This is a completely person analysis, it'll change for everyone.

gnulinux · 2025-08-07T23:42:08 1754610128

My first impressions: not impressed at all. I tried using this for my daily tasks today and for writing it was very poor. For this task o3 was much better. I'm not planning on using this model in the upcoming days, I'll keep using Gemini 2.5 Pro, Claude Sonnet, and o3.

gnulinux · 2025-08-07T00:26:24 1754526384

Imho chatterbox is the current open weight SOTA model in terms of quality: https://huggingface.co/ResembleAI/chatterbox

wkat4242 · 2025-08-07T00:42:10 1754527330

Thank you, I hadn't heard of it. Will have a look! The samples sound excellent indeed.

gnulinux · 2025-08-05T17:35:29 1754415329

Name recognition? Advertisement? Federal grant to beat Chinese competition?

There could be many legitimate reasons, but yeah I'm very surprised by this too. Some companies take it a bit too seriously and go above and beyond too. At this point unless you need the absolute SOTA models because you're throwing LLM at an extremely hard problem, there is very little utility using larger providers. In OpenRouter, or by renting your own GPU you can run on-par models for much cheaper.

gnulinux · 2025-08-05T17:31:37 1754415097

Not even that, even if o3 being marginally better is important for your task (let's say) why would anyone use o4-mini? It seems almost 10x the price and same performance (maybe even less): https://openrouter.ai/openai/o4-mini

Invictus0 · 2025-08-05T20:42:46 1754426566

Probably because they are going to announce gpt 5 imminently

gnulinux · 2025-08-05T17:30:23 1754415023

Wow, that's significantly cheaper than o4-mini which seems to be on part with gpt-oss-120b. ($1.10/M input tokens, $4.40/M output tokens) Almost 10x the price.

LLMs are getting cheaper much faster than I anticipated. I'm curious if it's still the hype cycle and Groq/Fireworks/Cerebras are taking a loss here, or whether things are actually getting cheaper. At this we'll be able to run Qwen3-32B level models in phones/embedded soon.

tempaccount420 · 2025-08-05T17:42:42 1754415762

It's funny because I was thinking the opposite, the pricing seems way too high for a 5B parameter activation model.

gnulinux · 2025-08-05T17:45:04 1754415904

Sure you're right, but if I can squeeze out o4-mini level utility out of it, but its less than quarter the price, does it really matter?

wahnfrieden · 2025-08-05T19:50:57 1754423457

mikepurvis · 2025-08-05T17:33:55 1754415235

Are the prices staying aligned to the fundamentals (hardware, energy), or is this a VC-funded land grab pushing prices to the bottom?

gnulinux · 2025-08-01T21:54:37 1754085277

It's averaging to $0.3/1M input tok and $1.2/1M output tok. That's kind of mind blowingly cheap for a model at its caliber. Gemini 2.5 Pro is more than 10x that price.

gnulinux · 2025-08-01T21:51:01 1754085061

At $2/1Mt it's cheaper than e.g. Gemini 2.5 Pro which is ($1.25/1Mt for input and $10/1Mt per output). When I code with Aider my requests average to something like 5000 tokens input and 800 tokens output. At this rate, Gemini 2.5 Pro is about $0.01425 per single Aider request and Cerebras Qwen3 Coder is $0.0116 per request. Not a significant difference, but I think sufficiently cheaper to be competitive, especially given Qwen3-coder is on part with Gemini/Claude/o3, it even surpasses them in some tests.

NOTE: Currently in OpenRouter, Qwen3-Coder requests are averaging to $0.3/1M input tok and $1.2/1M output tok. That's just so significantly cheaper that I wouldn't be surprised if open weight models start eating Google/Anthropic/OpenAI lunch. https://openrouter.ai/qwen/qwen3-coder

pkaye · 2025-08-01T23:14:29 1754090069

Do you have any experience on how good is Qwen3-coder compared to Claude 4 Sonnet?

gnulinux · 2025-08-05T17:48:52 1754416132

No, unfortunately, I haven't used Qwen3-coder yet. I do like Claude 4 Sonnet, but my favorite programming LLM is Gemini 2.5 Pro at the moment, I think it's the smartest model (Claude and o3 do print better code though).

I have exprience using the base Qwen3-32B model and it's extremely good for its size, especially in solving undergrad/grad level math problems. So my guess would be that Qwen3-coder should be competitive, but this is just speculation.

gnulinux · 2025-08-01T05:04:10 1754024650

Qwen3 is the open weight state of the art at the moment. Qwen3-embedding-8B and Qwen3-reranker-8b are surprisingly good (according to some benchmarks, better than Gemini 2.5 embedding). 4B is also nearly as good so you might as well use that too unless 8B benefits your usecase. If you don't need a SOTA-precise embedding model because you'll run a more powerful reranker, you could run qwen3-embedding-4B at Q4 which is only 2GB, and will process extremely fast in most hardware. A weaker but close choice is `Qwen3-Embedding-0.6B` at Q8 which is about 600MB and will run just fine on most powerful CPUs. So if that does the job for you, you may not even need GPU, just grab an instance with 16 vCPUs, that'll give you plenty of throughput, probably more than you need until your RAG has thousands of active users.

gnulinux · 2025-08-01T05:02:28 1754024548

Tool calling complements RAG. You build a full scale RAG (embedding, reranker, create prompt, get output from LLM) and hook that to a tool another agent can see. That combines both their power.