No they are not. I've asked that question to quite a few well-published economists studying innovation and the off-the-record answer is always:
> Patents are an extremely noisy and biased measure of innovation; conflate innovation with a million other variables, etc. but we have all of the patents and patent applications available for free since the beginning of the time, so that's what we'll use. If you find another good measure as comprehensive let us know and we'll use that instead; meanwhile <shrug emoji>
TBF the College Parks location is by far the most important one; the most visited one; and the one that holds the most interesting stuff. Never met anyone who went to the Kansas one or whatever
I have about 500,000 news articles I am parsing. OpenAI models work well but found Gemini had fewer mistakes.
Problem is; they give me a terrible 10k RPD limit. To increase to the next tier, they then require a minimum amount of spending but I can't reach that amount even when maxing the RPD limit for multiple days in a row.
I emailed them twice and completed their forms but everyone knows how this works. So now I'm back at OpenAI, with a model with a bit more mistakes but that won't 403 me after half an hour of using it due to their limits.
The rate limits apply only to the Gemini API. There is also Vertex from GCP, which offers the same models (and even more, such as Claude) at the same pricing, but with much higher rate limits (basically none, as long as they don't need to cut anyone off with provisioned throughput iiuc) and with a process to get guaranteed throughput.
I did some very broad testing of several PDF text extraction tools recently, and PDF.js was one of the slowest.
My use-case was specifically testing their performance as command-line tools, so that will skew the results to an extent. For example, PDFBox was very slow because you're paying the JVM startup cost with each invocation.
Poppler's pdftotext utility and pdfminer.six were generally the fastest. Both produced serviceable plain-text versions of the PDFs, with minor differences in where they placed paragraph breaks.
I also wrote a small program which extracted text using Chrome's PDFium, which also performed well, but building that project can be a nightmare unless you're Google. IBM's Docling project, which uses ML models, produced by far the best formatting, preserving much of the document's original structure – but it was, of course, enormously slower and more energy-hungry.
Disclaimer: I was testing specific PDF files that are representative of the kind of documents my software produces.
You really don't need many users to flag a post. Get five users constantly flagging anything that makes Trump look bad (and a complicit mod that doesn't undo this) and that's all you need.
It would be really interesting to see the list of people who flag these news. At this point, I mostly browse HN by clicking on COMMENTS so I can see the actual stories that got flagged (that always have to do about democracy being eroded by Trump)
Figure out your residency (digital nomad, non lucrative visa from passive income, etc). Find where you want to live. Find a property within that area. Use word of mouth, find someone locals trust, have them help you with maintenance and pay them for that help. I use both Wise and Santander. Idealista is, by far, the biggest real estate website in Spain. Use this to start your search. Some properties are going to be held by local agencies, and you're going to have to talk to folks on the ground to see those listings (wildly different market than Zillow and Redfin in the states). There is an annual wealth tax on globally held assets >€2M, if this potentially applies to you, seek assistance from a Spanish tax advisor. I strongly advise not buying over the Internet and buying in person to avoid scams and fraud. Use a legal advisor to facilitate your purchase as you would with an attorney in the states.
reply