Hacker Newsnew | past | comments | ask | show | jobs | submit | James_K's commentslogin

I think it depends what you mean by language. There is a kind of symbolic logic that happens in the brain, and as a programmer I might liken it to a programming language, but the biological term is defined differently. Language, as far as it is unique to humans, is the serialisation of those internal logical structures in the same way text file is the serialisation of the logical objects within a programming language. What throws most people here is that the internal structures can develop in response to language and mirror it in some ways. As a concrete example, there is certainly a part of my brain that has developed to process algebraic equations. I can clearly see this as distinct from the part that would serialise them and allow me to write out the equation stored internally. In that way, the language of mathematics has precipitated the creation of an internal pattern of thought which one could easily confuse for its serialisation. It seems reasonable to assume that natural language could have similar interactions with the logical parts of the mind. Constructs such as “if/then” and “before/after” may be acquired through language, but exist separate from it.

Language is, therefore, instrumental to human thought as distinct from animal thought because it allows us to more easily acquire and develop new patterns of thinking.


Would such a license fall under the definition of free software? Difficult to say. Counter-proposition: a license which permits training if the model is fully open.

My next project will be released under a GPL-like license with exactly this condition added. If you train a model on this code, the model must be open source & open weights

In light of the fact that the courts have found training an AI model to be fair use under US copyright law, it seems unlikely this condition will have any actual relevance to anyone. You're probably going to need to not publicly distribute your software at all, and make such a condition a term of the initial sale. Even there, it's probably going to be a long haul to get that to stick.

Not sure why the FSF or any other organization hasn't released a license like this years ago already.

Because it would violate freedom zero. Adding such terms to the GNU GPL would also mean that you can remove them, they would be considered "further restrictions" and can be removed (see section 7 of the GNU GPL version 3).

Freedom 0 is not violated. GPL includes restrictions for how you can use the software, yet it's still open source.

You can do whatever you want with the software, BUT you must do a few things. For GPL it's keeping the license, distributing the source, etc. Why can't we have a different license with the same kind of restrictions, but also "Models trained on this licensed work must be open source".

Edit: Plus the license would not be "GPL+restriction" but a new license altogether, which includes the requirements for models to be open.


That is not really correct, the GNU GPL doesn't have any terms whatsoever on how you can use, or modify the program to do things. You're free to make a GNU GPL program do anything (i.e., use).

I suggest a careful reading of the GNU GPL, or the definition of Free Software, where this is carefully explained.


> You may convey a work based on the Program, or the modifications to produce it from the Program, in the form of source code under the terms of section 4, provided that you also meet all of these conditions:

"A work based on the program" can be defined to include AI models (just define it, it's your contract). "All of these conditions" can include conveying the AI model in an open source license.

I'm not restricting your ability to use the program/code to train an AI. I'm imposing conditions (the same as the GPL does for code) onto the AI model that is derivative of the licensed code.

Edit: I know it may not be the best section (the one after regarding non-source forms could be better) but in spirit, it's exactly the same imo as GPL forcing you to keep the GPL license on the work


I think maybe you're mixing up distribution and running a program, at least taking your initial comment into account, "if you train/run/use a model, it must be open source".

I should have been more precise: "If you train and distribute an AI model on this work, it must use the same license as the work".

Using AGPL as the base instead of GPL (where network access is distribution), any user of the software will have the rights to the source code of the AI model and weights.

My goal is not to impose more restrictions to the AI maker, but to guarantee rights to the user of software that was trained on my open source code.


It isn't the difficult, a license that forbids how the program is used is a non-free software license.

"The freedom to run the program as you wish, for any purpose (freedom 0)."


Yet the GPL imposes requirements for me and we consider it free software.

You are still free to train on the licensed work, BUT you must meet the requirements (just like the GPL), which would include making the model open source/weight.


Running the program and analyzing the source code are two different things...?

In the context of Free Software, yes. Freedom one is about the right to study a program.

But training an AI on a text is not running it.

And distributing an AI model trained on that text is neither distributing the work nor a modification of the work, so the GPL (or other) license terms don't apply. As it stands, the courts have found training an AI model to be a sufficiently transformative action and fair use which means the resulting output of that training is not a "copy" for the terms of copyright law.

> And distributing an AI model trained on that text is neither distributing the work nor a modification of the work, so the GPL (or other) license terms don't apply.

If I print an harry potter book in red ink then I won't have any copyright issues?

I don't think changing how the information is stored removes copyright.


If it is sufficiently transformative yes it does. That’s why “information” per se is not eligible for copyright, no matter what the NFL wants you to think. No printing the entire text of a Harry Potter book in red ink is not likely to be viewed as sufficiently transformative. But if you take the entirety of that book and publish a list of every word and the frequency, it’s extremely unlikely to be found a violation of copyright. If you publish a count of every word with the frequency weighted by what word came before it, you’re also very likely to not be found to have violated copyright. If you distribute the MD5 sum of the file that is a Harry Potter book you’re also not likely to be found to have violated copyright. All of these are “changing how the information is stored”.

Model weights, source, and output.

It does not sound as such if you read the content of the post. It's hardly a misconception to suggest that gay men do not generally fall in love with women.

> It does not sound as such if you read the content of the post. It's hardly a misconception to suggest that gay men do not generally fall in love with women.

Aside from the fact that this premise is incorrect, it's also inapplicable, because, as far as the essay mentions, the father never said he was gay.


How do you know the father never said he was gay? He may well have said that to the man he enjoyed dating as opposed to the wife whom he clearly didn't like very much.

> How do you know the father never said he was gay?

I don't, which is why I didn't say that. I said:

> as far as the essay mentions, the father never said he was gay.


The essay mentions the father being gay, reasonable to assume he said it. Unless you think he was bisexual, but I'm willing to bet you don't actually think that and this is just you being silly and you do actually think he's gay.

Gross.

Hmm, idk man, you shouldn't say such things.

The author has suffered enough and you call xir that on top ... be more considerate of xir situation.


Personally, I learned Web Assembly by reading through the spec. Can't recommend it more. It's extremely well written.

I agree, it really is quite approachable.

Whatever higher-minded cause a company might claim, the real reason is profit. A model which appears to advocate a view will not be tolerable to half the population, even if said view is objectively correct. Best to create an even-handed model which is broadly agreeable than one which critiques the user honestly.

For those interested, this is a generally and expression of Modern Monetary Theory (MMT).

The essential conclusion is that most places with hyperinflation (Weimar Germany, Zimbabwe, etc.) where really suffering supply shocks (reparations, farming collapse) and so you actually can just print money, as long as you're using it to get people working and those working people produce greater value through their work than they are paid in printed money.


Whoda thought that agreeing to build $300 billion of infrastructure for a company with $20 billion revenue and zero profit was a bad idea?


I use XSLT because I want my website to work for users with JavaScript disabled and I want to present my Atom feed link as an HTML document on a statically hosted site without breaking standards compliance. Hope this helps.


Yeah, but WHY? If they are on the website, why would they want to look at the feed for the website, on the website, in the browser instead of just looking at the website? If the feed is so amazing, why have the website in the first place? Oh yeah, you need something to make the feed off :D


I don't want the feed to look amazing. I just don't want to present a wall of XML text to non-technical users who don't know what an RSS feed is!


Could you run XSLT as part of your build process, and serve the generated HTML?


XML source + XSLT can be considerably more compact than the resulting transformation, saving on hosting and bandwidth.


The Internet saves a lot more on storage and bandwidth costs by not shipping an XSLT implementation with every browser than it does by allowing Joe's Blog to present XML as an index.


You redownload your browser every request‽


I have arduinos with sensors providing their measurements as XML, with an external XSLT stylesheet to make them user-friendly. The arduinos have 2KB RAM and 16 MIPS.

Which build process are you talking about? Which XSLT library would you recommend for running on microcontrollers?


> Which build process are you talking about?

The one in the comment I replied to.


Fair, but that shows the issue at hand, doesn't it? XSLT is a general solution, while most alternatives are relatively specific solutions.

(Though I've written repeatedly about my preferred alternative to XSLT)


> (Though I've written repeatedly about my preferred alternative to XSLT)

Link to example?


I've previously suggested the XML stylesheet tag should allow

    <?xml-stylesheet type="application/javascript" href="https://example.org/script.js"?>
which would then allow the script to use the service-worker APIs to intercept and transform the request.


Oh yes sorry I thought you meant you had a blog post or something on it.


That is not the point: I already have the blog's HTML pages. I want the RSS feed to be an RSS feed, not another version of the HTML.

The XSLT view of the RSS feed so people (especially newcomers) aren't met with a wall of XML text. It should still be a valid XML feed.

Plus it needs to work with static site generators.


No because then it would not be an Atom feed. Atom is a syndication format, the successor to RSS. I must provide users with a link to a valid Atom XML document, and I want them to see a web page when this link is clicked.

This is why so many people find this objectionable. If you want to have a basic blog, you need some HTML docments and and RSS/Atom feed. The technologies required to do this are HTML for the documents and XSLT to format the feed. Google is now removing one of those technologies, which makes it essentially impossible to serve a truly static website.


> Google is now removing one of those technologies, which makes it essentially impossible to serve a truly static website.

How so? You're just generating static pages. Generate ones that work.


You cannot generate a valid RRS/Atom document which also renders as HTML.


So put them on separate pages because they are separate protocols (HTML for the browser and XML for a feed reader), with a link on the HTML page to be copied and pasted into a feed reader.

It really feels like the developer has over-constrained the problem to work with browsers as they are right now in this context.


> So put them on separate pages because they are separate protocols

Would you also suggest I use separate URLs for HTTP/2 and HTTP/1.1? Maybe for a gzipped response vs a raw response?

It's the same content, just supplied in a different format. It should be the same URL.


There are separate URLs for "https:" vs "http:" although they are usually the same content when both are available (although I have seen some where it isn't the same), although the compression (and some other stuff) is decided by headers. However, it might make sense to include some of these things optionally within the URL (within the authority section and/or scheme section somehow), for compression, version of the internet, version of the protocol, certificate pinning, etc, in a way that these things are easily delimited so that a program that understands this convention can ignore them. However, that might make a mess.

I had also defined a "hashed:" scheme for specifying the hash of the file that is referenced by the URL, and this is a scheme that includes another URL. (The "jar:" scheme is another one that also includes other URL, and is used for referencing files within a ZIP archive.)


> Would you also suggest I use separate URLs for HTTP/2 and HTTP/1.1? Maybe for a gzipped response vs a raw response?

The difference between HTTP/2 and HTTP/1.1 is exactly like the difference between plugging your PC in with a green cable or a red cable. The client neither knows nor cares.

> It's the same content, just supplied in a different format. It should be the same URL.

So what do I put as the URL of an MP3 and an Ogg of the same song? It's the same content, just supplied in a different format.


> The difference between HTTP/2 and HTTP/1.1 is exactly like the difference between plugging your PC in with a green cable or a red cable. The client neither knows nor cares.

Just like protocol negotiation, HTTP has format negotiation and XML postprocessing for exactly the same reason.

> So what do I put as the URL of an MP3 and an Ogg of the same song? It's the same content, just supplied in a different format

Whatever you want? If I access example.org/example.png, most websites will return a webp or avif instead if my browser supports it.

Similarly, it makes sense to return an XML with XSLT for most browsers and a degraded experience with just a simple text file for legacy browsers such as NCSA Mosaic or 2027's Google Chrome.


> Whatever you want? If I access example.org/example.png, most websites will return a webp or avif instead if my browser supports it.

So, you need a lot of cleverness on the browser to detect which format the client needs, and return the correct thing?

Kind of not the same situation as emitting an XML file and a chunk of XSLT with it, really.

If you're going to make the server clever, why not just make the server clever enough to return either an RSS feed or an HTML page depending on what it guesses the client wants?


> If you're going to make the server clever, why not just make the server clever enough to return either an RSS feed or an HTML page depending on what it guesses the client wants?

There's no cleverness involved, this is an inherent part of the HTTP protocol. But Chrome still advertises full support for XHTML and XML:

    Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
But importantly, for audio/video files, that's still just serving static files, which is very different from having to dynamically generate different files.


Then the server should supply the right format based on the `Accept` header, be it `application/rss+xml` or `application/atom+xml` or `text/xml` or `text/html`.

Even cheaper than shipping the client an XML and an XSLT is just shipping them the HTML the XSLT would output in the first place.


That's not exactly cheap on an arduino uno 3 with 2kb ram.

But regardless, someone suggested just including a script tag with xmlns of xhtml as alternative, which should work well enough (though not ideal).


How many people out of the world's nearly eight billion population, would you estimate, are attempting to host their blog including HTML posts and RSS feeds on an Arduino?


A lot of IoT devices use this strategy, actually. A lot. Significantly more than are using e.g. WebUSB.

Nonetheless, by that same argument you could just kill HN off. A lot of projects have a benefit that far outweighs their raw usage numbers.


I guess that tracks for Internet of Shitty Insecure Badly-Designed Things.

Come up with the worst possible way to present information over a web page.

What device with 2kB of RAM is going to generate any kind of useful RSS feed? Why would you not use something more capable, which is not only going to have more memory but also a lower power consumption?


> What device with 2kB of RAM is going to generate any kind of useful RSS feed?

Such devices usually don't generate RSS feeds, but e.g. sensor measurements as XML (which can be processed directly, or opened in a browser with XSLT to generate a website and an SVG chart from it)

> Why would you not use something more capable, which is not only going to have more memory but also a lower power consumption?

Because anything else will have >100× more power consumption?


Compare the power consumption of the atmega328p on an Arduino Uno 3 mentioned further up this thread, with the power consumption of literally any ARM chip smaller than the sort of thing you'd use in a laptop.

So in other words, no more static sites?


A static site can inspect headers. Static sites still have a web server.

A static site cannot inspect headers. There is no HTML, or even JavaScript function you can put in a file to inspect the headers before the file is sent to the client.

A static site is a collection of static files. It doesn't need a server, you could just open it locally (in browsers that don't block file:// URI schemes). If you need some special configuration of the server, it is no longer a static site. The server is dynamically selecting which content is served.


Oh, difference in definitions. You mean "non-configurable web server." Because you could definitely use a static site generator to create multiple versions of the site data and then configure your web server to select which data is emitted.

But agreed; if your web server is just reflecting the filesystem, add this to the pile of "things that are hard with that kind of web server." But perhaps worth noting: even Apache and Python's http.server can select the file to emit based on the Accept header.


A static site is one that you can serve through static hosting, where you have no control over the web server or its configuration. There is not some extra thing which is a static site with dynamic content. “Static” means “doesn't change.” The document served doesn't change subject to the person receiving it. You are talking about a solution that is dynamic. That does change based on who is making the request.

>you could definitely use a static site generator to create multiple versions of the site data and then configure your web server to select which data is emitted

And this web-server configuration would not exist within the static site. The static site generator could not output it, therefore it is not a part of the static site. It is not contained within the files output by the static site generator. It is additional dynamic content added by the web server.

It breaks the fundamental aspect of a static site, that it can be deployed simply to any service without change to the content. Just upload a zip file, and you are done.


Like I said, difference in definitions. https://www.google.com/search?q=static+site+serving+with+apa...

I get your meaning; I've just heard "static site" used to refer to a site where the content isn't dynamically computed at runtime, not a site where the server is doing a near-direct-mapping from the filesystem to the HTTP output.

> Just upload a zip file, and you are done.

This is actually how I serve my static sites via Dreamhost. The zipfile includes the content negotiation rules in the `.htaccess` file.

(Perhaps worth remembering: even the rule "the HTTP responses are generated by looking up a file matching the path in the URL and echoing that file as the body of the GET response" is still a per-server rule; there's no aspect of the HTTP spec that declares "The filesystem is directly mirrored to web access" is a thing. It's rather a protocol used by many simple web servers, and most of them allow overrides to do something slightly more complicated while being one step away from "this is just the identity function on whatever is in your filesystem, well, not technically the identity function because unless someone did something very naughty, I don't serve anything for http://example.com/../uhoh-now-i-am-in-your-user-directory").


>I must provide users with a link to a valid Atom XML document, and I want them to see a web page when this link is clicked.

Do RSS readers and browsers send the same Accept header?


> Security? MUCH worse.

This is patently false. It is much better for security if you use one of the many memory-safe implementations of it. This is like saying “SSL is insecure because I use an implementation with bugs”. No, the technology is fine. It's your buggy implementation that's the problem.


XSLT used as a pre-processor is obviously also a fundamentally better model for security because... it's used as a preprocessor. It cannot spy on you and exfiltrate information after page load because it's not running anymore (so you can't do voyeuristic stuff like capture user mouse movements or watch where they scroll on the page). It also doesn't really have the massive surface Javascript does for extracting information from the user's computer. It wasn't designed for that; it was designed to transform documents.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: