Hacker Newsnew | past | comments | ask | show | jobs | submit | hallman76's commentslogin

I inherited an S3 bucket where hundreds of thousands of files were written to the bucket root. Every filename was just a uuid. ls might work after waiting to page though to get every file. To grep you would need to download 5 TB.


There is a nascent movement of families bringing back landlines for exactly this reason


We will never get back the collective man-decades of time that has been burned by this format. When will the madness stop?


PDF is effectively digital paper, and it works really well for this. When I made PDFs 20 years ago, I knew they will always look the same on every device, including on paper, and they did, and they still do. In addition, a document is a single file, reasonably compact, looks good on any resolution, and is generally searchable. Even if not ideal, it can also support scans of paper documents in a way that can be sent to a printer on the other side of the planet and you will get the same result as if you had used a copier.

Data extraction is hard, but that's not what it is designed for, it is for people to read, like paper documents.

Far from being "mad", it is remarkably stable. It has some crazy features, and it is not designed for data extraction (but doesn't actively prevent it!). But look at the alternative. Word documents? Html? Svg? One of the zillion XML-based document formats? Markdown? Is any one of these suitable for writing, say, a scientific paper (with maths, tables, graphics...) in a way that is readable by a human on a computer or in print and will still be in decades and that is easier to process by a machine than a PDF?


When we get an alternative that can:

(1) be stored in a single file

(2) Allow tables, images and anything else that can be shown on a piece paper

(3) Won't have animation, fold-out text, or anything that cannot be be shown on a piece of paper

(4) won't require Javascript or access to external sites

that means never.. We've got lucky we at least got PDF before "web designers" made (3) impossible, and marketers made (4) impossible


> (3) Won't have animation, fold-out text, or anything that cannot be be shown on a piece of paper

> (4) won't require Javascript or access to external sites

So about that... https://opensource.adobe.com/dc-acrobat-sdk-docs/library/jsa...


that's the power of legacy. Adobe may think they can add junk to PDF like Javascript support, or lmz's "3D PDF" link below, but since PDFs viewers have a diverse ecosystem, those features won't have a great adoption.

And this is actually pretty great, maybe even the best part of PDFs! Companies _know_ that publishing PDF that require 3d-graphics or Javascript means many people won't be able to see them, so they publish good, static PDFs, maintaining virtuous cycle.



Did you miss the meaning of the word "require"?


(-1) be vector format that never gets pixelated

(0) that reproduce everywhere on any OS perfectly

(0.5) that supports (everything) any typographical engineers ever wanted past and future

Bitmap formats are out from clause -1, Office file formats disqualify from clause 0, Markdown doesn't satisfy clause 0.5. Otherwise a Word .doc format covers most of clauses 1-4.


> (0) that reproduce everywhere on any OS perfectly

Can somebody explain why this isn't the case for HTML? I'm frequently in a situation where a website that mimics printed pages fails to render the same between Firefox and Chrome. I wish to understand the primary culprit here. I thought all of the CSS units are completely defined?


I think this is the result of 1) it being a moving target and 2) HTML and CSS being a de facto standard rather than de jure, where the (differing) implementations define at least part of the spec.

You also can't really embed fonts in a HTML file, you rely on linking instead -- and those can rot. Apparently there has been some work towards it (base64 encoded), but support may vary. And you need to embed the whole font, I don't think you can do character subsets easily.


Probably due to different font rendering in the OS.


Behold a Bitmap.

But for real, thats a pretty easy set of hurdles. Really the barrier is the psychological fallacy that PDF's are immutable.


Should have added "looks good on screen and on paper", "stores text compactly" and "multiple pages supported" :) And yes, that's a pretty easy set of hurdles. I wish we'd standardized on DjVu instead.

Re "PDF's are immutable." - that's not a psychological fallacy, that's a primary advantage of PDFs. If I wanted mutable format, I'd take an odt (or rtf or a doc). "Output only" format allows one to use the very latest version of editor app, while having the result working even in ancient readers, something very desirable in many contexts.


PDFs are not really immutable. I use Okular all the time to write my "notes" (it's just text that you can place anywhere) on top of a PDF form and then print out a new completely filled out PDF. The only thing I do by hand is sign the physical paper.


Your understanding of immutability feels skewed here. Every time you annotate the PDF, it creates a new version. Even when you overwrite the same file, the structure of the original document changes, therefore creating a new document, ultimately making it "the ship of Theseus.pdf"

Sure, someone may try using the same argument, applying it to .doc and .txt documents, yet there is a general consensus saying that pdfs were designed to "resist the change". You can probably self-illustrate the point by making changes to a .txt document and then removing your changes - the md5 of the file would remain the same.


Have you ever used Acrobat? Not "Acrobat Reader", but regular Acrobat, the most popular PDF editor. It's from Adobe, and it definitely does not "resist" edits.


I got what you're saying the first time, and you still seem to be entirely missing the point. Immutability means that an object cannot be modified after it's created, and any changes result in a new object rather than altering the original.

You're saying "well, look, I can modify this pdf and I can even undo my changes...", what I'm saying is that whenever you modify a PDF, you're essentially creating a new file rather than truly "undoing" changes in the original. PDFs have complex internal structures with metadata, object references, and possibly compression that make bit-perfect restoration challenging.

Unlike plain text files where changes can be precisely tracked and reversed at the character level, PDFs don't easily support this kind of granular reversibility. Even "undoing" in PDF editors often means generating yet another variant rather than returning to the exact binary state of the original.

Take a look at how Git stores PDFs - when the delta approach doesn't work efficiently since even small logical changes can result in significantly different binary files with completely different checksums, it stores EVERY version of the same document in a separate blob object.

When you annotate a pdf and then later change your mind, undo all the annotations and save it — only to your eyes it may look the same as the original — in digital reality, it will be a different file.


The people who act as if PDFs are legally immutable are not performing MD5 comparisons.

Also, that isnt even an intention of the file format as far as I can see, its mostly a byproduct of cruft and backwards compatibility.

No one would call .doc immutable because its very difficult to move an image and then restore that image to the original location.

In this context, people will save something out as pdf and store it because they dont think it cannot be modified.

But as has been rightly pointed out, thats not the case.


I feel like I'm talking to a toddler, sigh. Let me try again.

Immutability doesn't mean that an "object cannot be modified", it means that in order to modify an object, you must create a new (clone) object. That's all what I meant to say. Sure, you can get pedantic or otherwise and say "yes, pdfs are immutable; or no, pdfs aren't immutable in some contexts", etc., and depending on the point of view, both of these claims could be correct — I'm not arguing about the specifics.

I'm just saying that your explanation of why you think pdfs are not immutable hinges on an incorrect idea of what immutability actually is.

There's a rigorous definition for "immutability" in computer science, e.g., strings in many programming languages are immutable, but that doesn't mean you can't manipulate them, it just means that operations that appear to modify strings actually create new string objects.

The greatest illustration for immutability is imbued in programming languages with immutability-by-default, e.g., Clojure. Once someone groks the basics, it becomes really clear what that thing is about.


> I feel like I'm talking to a toddler, sigh.

Me too, but I'm done. Have fun!


> I got what you're saying the first time,

That wasn't me. Multiple people were taking the time to help you understand.


What's immutable, without tools to decompress and possibly perform further de-obfuscation of text streams, is the typical way publishing software encodes text into streams inside PDFs.

It remains possible to have a pdf with text that is easily mutable with any text editor.

Even if text inside a pdf is annoyingly encoded, you can always just replace the appropriate object/text streams... if you can identify the right one(s). You can extract and edit and re-insert, or simply replace, embedded images as well.

I don't think "this format promotes, as the norm, so much obfuscation of basic text objects that it becomes impractical to edit them in situ without wholesale replacement" is the win you think it is.

"Looks good on paper" has to do with the rendering engine (largely high-DPI and good font handling/spacing/kerning), not PDF as a content layout/presentation format. A high-quality software rasterizer (for postscript or PDF, often embedded in the printer)—not the PDF file format—has been the magic sauce.

Today, some large portion of end-user interaction with PDFs is via rendering into a web browser DOM via javascript. Text in PDFs is rendered as text in the browser. Perhaps nothing else demonstrates more clearly that the "PDF is superior" argument is invalid.


> you can always just replace the appropriate object/text streams

Or right-click and select Edit. Works in several PDF editors, on both text and image content.


Word can edit pretty much any pdf these days, the issue is that it will often garble the attempt.


PDFs are not immutable.


Why can't this be done with epub? Single file, all files are packed within the zip, no javascript needed but can be included. Allows for markup and forms, just like pdf.


EPub is, like html, reformattable by the reader, so documents aren't fixed in the way PDFs are.


A subset of HTML and CSS surely does that to a large degree. Data urls solve the single file problem.


Postscript fits that bill better.



> For a DVI file to be printed or even properly previewed, the fonts it references must be already installed.

If you want alternatives, I'd choose DjVu. But it's too late now, everyone is converged on PDFs, and the alternatives are not good enough to warrant the switch.


DVI isn’t suitable as you’d still have to intuit where the paragraph- and even word-breaks are; what’s body text vs. headers/footers, sidebars, captions, etc; never mind what math expression a particular jumble of characters and rules came from.


The solution always has been in plain sight - just make XML-based format. Nobody liked it, except OpenDocument and eventually Microsoft. Though these formats serve different purpose, new, similar one, could be created with picture-perfect features.


TIL that there is a W3C WebMonetization spec - thanks for sharing.

Taking a look, it seems quite underwhelming[0] :( Lack of monetization on the web gave us the ad-driven content model that LLMs are now hovering up.

Have there been any other proposals for monetization?

[0] https://webmonetization.org/specification/


For desktop folks, I use the Stylebot Chrome Extension https://chromewebstore.google.com/detail/stylebot/oiaejidbmk...

I crank up the font size and set a serif font on the index page to make it easier to scan.


I had the same issue. Was able to use it to join via the discord app ("add a server").


> because say the CEO of a company is friendly to the President

OP didn't give say politics had anything to do with it. Let them nerd up if they want to.

Centralization around specific platforms has plusses and minuses. Having alternatives drives innovation.


He's flooding the zone and media is once again failing us.


I really like this idea! Could you share more about your letters?

What sorts of things do you include? I assume different topics for different people. Do the loved ones know that these letters exist?


I highly recommend it!

I left instructions to my children in a safe they know about. I also keep digital copies on a flash drive in the safe.

Each letter is printed, signed and sealed in an envelope. Each has a name, title, and sometimes directions on where to send. In some cases, I don't know the current address of the recipient so I just put my best guess and hope my children can find them.

Generally I polish them and add new content based on personally meaningful changes in my life that year, or new interactions I had with the person.

In rare cases, such as the one to my ex-wife, I have rewritten the entire thing. It grows shorter each year, but I remain polite.

I also plan to record a short video for my children each year they can watch or share. Obviously I can't cover everything in it but it will be short, sweet, and probably funny. I will just keep each one year over year so they have more material to work with. I will store copies on two or three USB flash drives.

My idea for this was inspired ~10 years ago by a co-worker whose friend--a father and husband--was killed by a piece of metal debris that flew into his car windshield while he was driving on the freeway.

This past weekend I found a memorial bench dedicated to a man who died of a quick late-stage cancer, not much older than me. The motto on the plaque read: DON'T WAIT


I'm pretty sure that the owners of Belmont Books in Belmont, MA also opened their store out of a love of bookstores & the community surrounding it. Best of luck to you! https://www.belmontbooks.com/ (no affiliation)


This is my hometown, fun to see this here. I've never been, but it sounds great.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: