Hacker Newsnew | past | comments | ask | show | jobs | submit | voidUpdate's commentslogin

I think the tech crowd appreciates how hard it is to lock down access to tech, since they were the kids bypassing the restrictions

I enjoy how the website has overridden my browsers scroll bar to use it's own, significantly lower contrast and less visible one, making it much harder to tell where in the article I am. My browser already has a good dark mode scroll bar...

I would not want to be the supervisor that has to review any CSAM positives to check for false ones

I started watching this and was genuinely interested, but I kinda got tired of all the drama around the stuff I actually found interesting. I know that for a general audience, you need to pad technical stuff with scenes of the tech screwing the business guy, but I just wanted the computers!

What you do is actually read the comments, think about how you can improve the code, and then improve it, whether by telling the agent to do that or doing it yourself

This is the consequence of "I don't want to write this function myself, I'll get the plagiarism machine to do it for me"

And what's wrong with not wanting to write functions yourself? It is a perfectly reasonable thing, and in some cases (ex: crypto), rolling your own is strongly discouraged. That's the reason why libraries exist, you don't want to implement your own associative array every time your work needs it do you?

As for plagiarism, it is not something to even consider when writing code, unless your code is an art project. If someone else's code does the job better then yours, that's the code you should use, you are not trying to be original, you are trying to make a working product. There is the problem of intellectual property laws, but it is narrower than plagiarism. For instance, writing an open source drop-in replacement of some proprietary software is common practice, it is legal and often celebrated as long as it doesn't contain the original software code, in art, it would be plagiarism.

Copyright laundering is a problem though, and AI is very resource intensive for a result of dubious quality sometimes. But that just shows that it is not a good enough "plagiarism machine", not that using a "plagiarism machine" is wrong.


If I use a package for crypto stuff, it will generally be listed as part of the project, in an include or similar, so you can see who actually wrote the code. If you get an LLM to create it, it will write some "new original code" for you, with no ability to tell you any of the names of people who's code went into that, and who did not give their consent for it to be mangled into the algorithm.

If I copy work from someone else, whether that be a paragraph of writing, a code block or art, and do not credit them, passing it off as my own creation, that's plagiarism. If the plagiarism machine can give proper attribution and context, it's not a plagiarism machine anymore, but given the incredibly lossy nature of LLMS, I don't foresee that happening. A search engine is different, as it provides attribution for the content it's giving you (ignoring the "ai summary" that is often included now). If you go to my website and copy code from me, you know where the code came from, because you got it from my website


Why is "plagiarism" "bad"?

Modern society seems to assume any work by a person is due to that person alone, and credits that person only. But we know that is not the case. Any work by an author is the culmination of a series of contributions, perhaps not to the work directly, but often to the author, giving them the proper background and environment to do the work. The author is simply one that built upon the aggregate knowledge in the world and added a small bit of their own ideas.

I think it is bad taste to pass another's work as your own, and I believe people should be economically compensated for creating art and generating ideas, but I do not believe people are entitled to claim any "ownership" of ideas. IMHO, it is grossly egoistic.


Sure, you can't claim ownership of ideas, but if you verbatim repeat other people's content as if it is your own, and are unable to attribute it to its original creator, is that not a bit shitty? That's what LLMs are doing

If a human learns to code by reading other people's code, and then writes their own new code, should they have to attribute all the code they ever read?

Plagiarism is a concept from academia because in academia you rise through the ranks by publishing papers and getting citations. Using someone else's work but not citing them breaks that system.

The real world doesn't work like that: your value to the world is how much you improve it. It would not help the world if everyone were forced to account for all the shoulders they have stood on like academics do. Rather, it's sufficient to merely attribute your most substantial influences and leave it at that.


If a human copies someone else's code verbatim, they should attribute the source, yes. If they learn from it and write original code, no, they don't have to cite every single piece of code they've ever read

Yes, you've stated the current social and legal rule we have to follow.

But I don't think you've given any moral justification for the rule, and in particular, why LLMs (who are not humans and have no legal rights or obligations) have to follow it.


Is "taking credit for something someone else did is not very nice" not enough moral justification for you?

But some company owns the LLM, and they have legal rights and obligations. You don't get to use AI to launder breaking the law.

I honestly think it's not that simple.

The ones who spend billions on integrating public cloud LLM services are not the ones writing that function. They are managers who based on data pulled out of thin air say "your goal for this year is to increase productivity by X%. With AI, while staffing is going slightly down".

I have to watch AI generated avatars on the most boring topics imaginable, because the only "documentation" and link to actual answer is in a form of fake person talking. And this is encouraged!

Then the only measure of success is either AI services adoption (team count), or sales data.

That is the real tragedy and the real scale - big companies pushing (external!) AI services without even proof that it justifies the cost alone. Smooth talking around any other metric (or the lack of it).


In my experience LLMs mimic human thought, so they don't "copy" but they do write from "experience" -- and they know more than any single developer can.

So I'm getting tired of the argument that LLMs are "plagiarism machines" -- yes, they can be coaxed into repeating training material verbatim, but no, they don't do that unless you try.

Opus 4.6's C compiler? I've not looked at it, but I would bet it does not resemble GCC -- maybe some corners, but overall it must be new, and if the prompting was specific enough as to architecture and design then it might not resemble GCC or any other C compiler much at all.

Not only do LLMs mimic human thinking, but also they mimic human faults. Obviously one way in which they mimic human faults is that there are mistakes in the LLMs' training materials, so they will evince some imperfections, and even contradictions (since there will be contradictions in their training materials). Another way is that their context windows are limited, just like ours. I liken their hallucinations to crappy code written by a tired human at 3AM after a 20 hour day.

If they are so human-like, we really cannot ascribe their output to plagiarism except when prompted so as to plagiarize.


LLMs just predict the next token. They mimic humans because they were trained on terabytes of human-created data (with no credit given to the authors of the training data). They don't mimic human thinking. If they did, you would be able to train them by themselves, but if you do that you get Model Collapse

Wouldn't "Serverless OCR" mean something like running tesseract locally on your computer, rather than creating an AI framework and running it on a server?

Serverless means spinning compute resources up on demand in the cloud vs. running a server permanently.

~99.995% of the computing resources used on this are from somebody else's servers, running the LLM model.

> Serverless means spinning compute resources up on demand in the cloud vs. running a server permanently.

Not quite. Serverless means you can run a server permanently, but you need pay someone else to manage the infrastructure for you.


You might be conflating "cloud" with serverless. Serverless is where developers can focus on code, with little care of the infrastructure it runs on, and is pay-as-you-go.

> You might be conflating "cloud" with serverless. Serverless is where developers can focus on code, with little care of the infrastructure it runs on, and is pay-as-you-go.

That's not what serverless means at all. Most function-as-a-service offerings require developers to bother about infrastructure aspects, such as runtimes and even underlying OS.

They just don't bother about managing it. They deploy their code on their choice of infrastructure, and go on with their lives.


A runtime is notably NOT infrastructure, had you said instruction set you might have landed closer to making a compelling argument, but the whole point is that AWS (and other providers) abstract away the underlying infrastructure and allow the developers to as I said, have "little care of the infrastructure it runs on". There is often advanced networking that CAN be configured, as well as other infrastructure components developers can choose to configure.

Close. It means there's no persistent infra charges and you're charged on use. You dont run anything permanently.

It still doesn't capture the concept because, say, both AWS Lambda and EC2 can be run just for 5 minutes and only one of them is called serverless.

Unless the engineer takes steps to spin down EC2 infrastructure after execution, it is absolutely persistent compute that you're billed for whether you are doing actual processing or not. Whereas lambda and other services are billed only for execution time.

Depends if you mean "server" as in piece of metal (or vm), or as in "a daemon"

Thanks for noting this - for a moment I was excited.

You can still be excited! Recently, GLM-OCR was released, which is a relatively small OCR model (2.5 GB unquantized) that can run on CPU with good quality. I've been using it to digitize various hand-written notes and all my shopping receipts this week.

https://github.com/zai-org/GLM-OCR

(Shameless plug: I also maintain a simplified version of GLM-OCR without dependency on the transformers library, which makes it much easier to install: https://github.com/99991/Simple-GLM-OCR/)


When people mentions the number of lines of code, I've started to become suspicious. More often than not it's X number of lines, calling a massive library loading a large model, either locally or remote. We're just waiting for spinning up your entire company infrastructure in two lines of code, and then just being presented a Terraform shell script wrapper.

I do agree with the use of serverless though. I feel like we agree long ago that serverless just means that you're not spinning up a physical or virtual server, but simply ask some cloud infrastructure to run your code, without having to care about how it's run.


> When people mentions the number of lines of code, I've started to become suspicious.

Low LoC count is a telltale sign that the project adds little to no value. It's a claim that the project integrates third party services and/or modules, and does a little plumbing to tie things together.


>implement RSA with this one simple line of python!

No, that would be "Running OCR locally..."

'Serverless' has become a term of art: https://en.wikipedia.org/wiki/Serverless_computing


It's good they note explicitly:

> Serverless is a misnomer


Running it locally would typically be called “client(-)side”.

But this caught me for a bit as well. :-)


That's the beauty of such stupid terms.

I use carless transportation (taxis).


taxis are cars, aren't they?

Precisely. And serverless uses servers.

Yep. That fraudulent term finally got me this time. Totally serverless except for that remote 3rd party server. Sigh.

This reminds me a lot of a show I'm currently watching called Pantheon, where a company has been able to scan the entirety of someone's brain (killing them in the process), and fully emulate it via computer. There is a decent amount of "Is an uploaded intelligence the same as the original person?" and "is it moral to do this?" in the show, and I'm been finding it very interesting. Would recommend. Though the hacking scenes are half "oh that's clever" and half "what were you smoking when you wrote this?"

It was a little jarring when Sam Altman recommended this on X awhile back.

https://xcancel.com/sama/status/1952070519018373197?lang=en


Oh blegh... I have no association with him, I just think it was an interesting show that I think I came across on instagram

Reminds me of a story about a probe i saw a while back (sorry, I have no idea which one) which took photos of the body it was orbiting around, and then to get an accurate surface map, the engineers projected the images back onto a sphere of the same apparent size as the body, and could then take photos all around it, because the photo distortion was cancelled out by the projection distortion and looking down at the surface gave an accurate view

I know that the Rectified Lunar Atlas [0] was done this way, but it used normal telescope images of the moon from earth.

[0] - https://sic.lpl.arizona.edu/collection/rectified-lunar-atlas


I may have been misremembering that. After posting my comment, I searched for what I meant for a while and came up with noting

Now imagine all the power that "AI" datacenters use, and for what purpose? Generating code you could have written yourself? Generating pornographic images of people without their consent?

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: