According to linked in the author worked on Apache Arrow, so I suspect they knew about it. It feels like this is intended as a much more lightweight option, but would be interested to hear their take.
That's right, it's intended to be more lightweight since it's built with only Rust from the ground up. Apache Drill also only focuses on serving SQL as the user interface while ROAPI wants to provide a pluggable interface to support all use-cases. For example, we can plan graphql and rest api calls into query plan and efficiently execute them using Datafusion.
This is a great article, one I wished I read a long time ago after a lot of inefficient learning.
One thing I'm still keeping an eye out for is how to best learn when time is limited.
As I get older, free time is at a premium, with job/kids/life/etc. meaning that some days even 20m practice is a luxury.
So I've always been curious how to make the most of that and unexpected slots of free time.
Should I just work on technical exercises/scales? Have a set of things written down to work on in case free time appears? Follow the advice in the article? Or accept that nothing worthwhile can be done in such short time and noodle around?
I am in same situation as you. What keeps me going is practicing songs that I enjoy. If I am working on a song that I enjoy, I am excited to pick up the guitar and am more or less able to find time every day. I also see progress because I am learning what I enjoy.
I think they mean that they add "reddit" to the end of the search strings they submit to Google.
Often if you're looking for discussion on a certain topic it's a reasonable starting point for a lot of things.
Curious if there are any chorded keyboards that use tilt sensors to make character input depend on button plus position, rather than button combinations alone.
Another memorisation aid I accidentally observed was with using different pens/inks for note taking.
I started taking an minor interest in different pens at some point, and have about a dozen pens I switch between
(a mix of cheap fountain pens and rollerballs mostly).
I found that I started remembering which pen or which colour ink I wrote certain things with, it seemed to add yet another memory link to whatever I was writing.
A company I worked for ~7 years ago ran its own focused web crawler (fetching ~10-100m pages per month, targeting certain sections of the web).
There were a surprising number of sites out there that explicitly blocked access to anyone but Google/Bing at the time.
We'd also get a dozen complaints or so a month from sites we'd crawled. Mostly upset about us using up their bandwidth, and telling us that only Google was allowed to crawl them (though having no robots.txt configured to say so).
Isn't that the website owners right though? I'm not sure I understand the problem here.
If Google is taking traffic and reducing revenue, a company can deny in robots.txt. Google will actually follow those rules - unlike most others that are supposedly in this 2nd class.
Yup, no problem here, was just making an observation about how common such blocking was (and about the fact that some people were upset at being crawled by someone other than Google, despite not blocking them).
The company did respect robots.txt, though it was initially a bit of a struggle to convince certain project managers to do so.
No. The internet is public. Publishers shouldn't get any say in who accesses their content or how they do it. As far as I'm concerned, the fact that they do is a bug.
I usually recommend setting only Google/Bing/Yandex/Baidu etc to Allow and everything else to Disallow.
Yes, the bad bots don't give a fuck, but even the non-malicious bots (ahrefs, moz, some university's search engine etc) don't bring any value to me as a site owner, take up band width and resources and fill up logs. If you can remove them with three lines in your robots.txt, that's less noise. Especially universities do, in my opinion, often behave badly and are uncooperative when you point out their throttling does not work and they're hammering your server. Giving them a "Go Away, You Are Not Wanted Here" in a robots.txt works for most, and the rest just gets blocked.
From some I could, but why would I? If they're not adding value and they don't want to behave, I don't see a reason to spend money to adapt my systems to be "inclusive" towards their usage patterns.
In context, you're justifying blocking all automated traffic, even that which does behave, by pointing out that some of it doesn't. That attitude seems lazy at best, malicious at worst.
Now that's a really good point. I wonder why there isn't a standard protocol for signalling upstream that a particular connection is abusive and to please rate limit the path at the source on your behalf? It would certainly add complexity, but the current situation is hardly better.
I was thinking along similar lines the other day. One of the things I realised was that imagination played a much bigger part in my enjoyment of games than it does now (as it did with playing with toy cars, lego etc. back at that age).
I vividly remember playing one of the Shinobi games on the Sega Master system, and spent a huge amount of time wandering back and forth in the levels thinking about the townsfolk who occupied various buildings and their work days.
There was also a mountain range in one of the backgrounds on one level, and I recall spending time planning to explore it later, even pausing the game to draw a map of an imagined village there.
Of course now the vocabulary of games (and my understanding of them) has changed a lot, and it's far more obvious what the limits of a game and its interactive areas are - to me they're now throwaway pieces of entertainment, no longer worlds to inhabit and explore.
> I vividly remember playing one of the Shinobi games on the Sega Master system, and spent a huge amount of time wandering back and forth in the levels thinking about the townsfolk who occupied various buildings and their work days.
I remember Choplifter for the Sega Master System. Half the fun was the story in my head regarding the hostages, the behind the scenes operation of rescue attempt, etc. And the tragedy of it all going wrong when you'd crash with people on board...
I'd be interesting in hearing more about some of these.
We're currently using GitLab's CI/CD for ~50 or so private repositories, covering ~7 different languages without any issues. That includes testing, Docker builds and documentation generation for most projects.
We only use private runners though, I don't know if any issues are related to their own runners.
So runners are a good example:
The runner config is stored in a config.toml on the runner. If you want autoscaling runners, it gets very complex so you might want to version control it or back it up, but you have to store your AWS secrets in this config.toml. It also contains hashes that have to be matched with settings in the UI, and of course, tags, which need to be tied into your .gitlab-ci. UI configuration changes take places instantly and there's no undo. So you have configuration in 3 places, only one of which is version controlled. And lets say you have some new runners with new tags -- you aren't going to be able to run an old pipeline on the new runner.