People should be demanding consistency and traceability from the model vendors checked by some tool perhaps like this. This may tell you when the vendor changed something but there is otherwise no recourse?
Agreed! FWIW I am attempting to create an open-source wiki/watchdog eval platform -- weval.org -- , so we can all keep an eye on LLMs, their biases, and their general competencies without relyong in the AI providers marking their own homework. I really believe this needs to exist to express our needs and hold model creators to account. Especially as model drift and manipulation becomes a risk.
I have a lot of respect for organizations that get a lot done with Microsoft technologies. I think your perspective could be thought of as the benefits of vertical integration and vendor lock in. These do help people get things done!
In the academic and open source world those things are fought against because you don't want to be at the mercy of the software developer in the context of certain rights.
I think for every negative you mention on either side a positive could be found on either side. And like many things on the net, you're not wrong but not necessarily talking about the same kinds of things.
My remaining complaints about Microsoft are the inflexibility of their solutions that command abstractions that just don't work for many organizations, and the general viral nature of software sales in general of which they are one of many with similar issues, however Oracle is the worst of course.
Perfectly valid points. I've worked in academia, and their insistence on non-Microsoft technologies was helpful in certain fields where openness and long-term reproducibility is critical.
The downside is that this produces a microcosm of obscure technologies that can have... strange effects on industry. Some FAANG-like companies have a habit of hiring only recent graduates, so their entire staff is convinced that what they saw at their University is how everybody else does things.
It leads to Silicon Valley clique that has a fantastically distorted perspective of the rest of the world.
Some comments I've seen here on HN are downright hilarious to anyone from the "rest of the world", such as:
"Does anyone still use Windows Server!?" -- yes, at least 60% of all deployed servers world wide, and over 80% in many industries.
"Supports all popular directory servers such as OpenLDAP, ApacheDS, Kopano, ..."
-- hello!? Active Directory! Have you heard of it!? It's something like 95% of all deployed LDAP deployments no matter how you count it! The other 5% is Oracle Directory and/or Novell eDirectory and then all of the rest put together is a rounding error.
I thought this was a very good read about the many of the issues that are faced without having any ground truth to reason against. It is interesting how many different ways people have developed to work around missing information, and the marginal improvements it makes in some benchmarks.
This also assumes that non human written code will be of any use to humans and no one has shown that to be possible, it is all humans patching it up so far.
This has been the problem with higher level natural language programming for years. I really wonder what people are doing if they don't see this core issue that precludes their use.
It makes me wonder if some people writing code just cannot think in terms of code?
I imagine it is very slow if you always have to think in a human language and then translate each step into programming language
When people describe being in flow state, I think what is happening is they are more or less thinking directly in the programming language they are writing. No translation step, just writing code
LLM workflows completely remove the ability to achieve that imo
CSS or Tailwind has always been a tough one for me. I have banks of flashcards to help me remember stuff, (align-items, justify-content, grid-template-columns, etc.). Even with all that effort and many projects of practice, though, I've never had things click.
LLM assisted programming, however? – instant flow state. Instead of thinking in code I can think in product, and I can go straight from a pencil sketch to describing it as a set of constraints, and then say, "make sure it's ARIA compliant and responsive", and 95% of the work is done.
I feel similarly about configuration heavy files like Nginx or something. I really don't care to spend my time reading documentation, I'd rather copy paste the entire docs into the context window and then describe what I want in English.
Also good for SQL. And library code for a one off tool or API. And Bash scripting.
A lot of people try to hedge this kind of sober insight along with their personal economic goals to say all manner of unfalsifiable statements of adequate application in some context, but it is refreshing to try to deal with the issues separately and I think a lot of people miss the insufficiency compared to traditional methods in all cases that I've heard of so far.
There are definitely dumb errors that are hard for human reviewers to find because nobody expects them.
One concrete example is confusing value and pointer types in C. I've seen people try to cast a `uuid` variable into a `char` buffer to, for example, memset it, by doing `(const char *)&uuid)`. It turned out, however, that `uuid` was not a value type but rather a pointer, and so this ended up just blasting the stack because instead of taking the address of the uuid storage, it's taking the address of the pointer to the storage. If you're hundreds of lines deep and are looking for more complex functional issues, it's very easy to overlook.
reply