Hacker Newsnew | past | comments | ask | show | jobs | submit | Rietty's commentslogin

Working in a Data Engineering/Operations role which focuses heavily on financial datasets. Everything is within AWS and Snowflake and each table can easily have >100M records of any type of random data (there is a lot of breadth.) General day to day is creating jobs that will process large amounts of input data and storing them into Snowflake, sending out tons of automated reports and emails to decision makers as well as gathering more data from the web.

All of this is done in a Python environment with usage of Rust for speeding up critical code/computations. (The rust code is delivered as Python modules.)

The work is interesting and different challenges arise when having to process and compute datasets that are updated with 10s of TBs of fresh data daily.


Hello fellow data engineer! I feel like I don't see a lot of us around / don't see many popular submissions dealing with data engineering. I also work with financial datasets (think aggregated consumer transaction data) for use by investors and corporate clients


Many of my datasets are similiar!


> General day to day is creating jobs that will process large amounts of input data and storing them into Snowflake

About how long do these typically take to execute? Minute, Tens of Minutes, Hours?

My work if very iterative where the feedback loop is only a few minutes long.


Depends on the dataset anywhere from seconds to tens of minutes depending on preprocessing needed.


Some of the largest are a few billion rows and we sample randomly when developing code then execute it on all


According to a friend I know who lives in the tri-state area that is what happens to them, but they max out 401K, have insurance etc.


But that's not $140->$70 after tax, that's $140k->$70k after tossing $24k into retirement investment savings, another $5k into healthcare savings, possibly another $1,500 towards healthcare premiums (huge amount of variability there), and then finally taxes.


"lives in the tri-state area"

Do you know how many of those are in the US?


Well there's 50 states so 504948 is just 117,600.


I bet it's just outside of Springfield.


Which Springfield? :-)


> but they max out 401K

So then it’s not what happens to them is it?


What if the LLM gives them bad information and they don't know it? I personally would also just ask in a thread than risk the LLM info.


> Yet the super fast probe with apparently made it not an issue?

Could you explain how that makes it a non-issue? It seems counter-intuitive to me that it solves the problem by just probing faster?


> we had a hack to drop all messages in a process's mailbox through the introspection facilities and sometimes we automated that with cron...

What happens to the messages? Do they get processed at a slower rate or on a subsystem that works in the background without having more messages being constantly added? Or do you just nuke them out of orbit and not care? That doesn't seem like a good idea to me since loss of information. Would love to know more about this!


Nuked; it's the only way to be sure. It's not that we didn't care about the messages in the queue, it's just there's too many of them, they can't be processed, and so into the bin they go. This strategy is more viable for reads and less viable for writes, and you shouldn't nuke the mnesia processes's queues, even when they're very backlogged ... you've got to find a way to put backpressure on those things --- maybe a flag to error out on writes before they're sent into the overlarge queue.

Mostly this is happening in the context of request/response. If you're a client and connect to the frontend you send a auth blob, and the frontend sends it to the auth daemon to check it out. If the auth daemon can't respond to the frontend in a reasonable time, the frontend will drop the client; so there's no point in the auth daemon looking at old messages. If it's developed a backlog so high it can't get it back, we failed and clients are having trouble connecting, but the fastest path to recovery is dropping all the current requests in progress and starting fresh.

In some scenarios even if the process knew it was backlogged and wanted to just accept messages one at a time and drop them, that's not fast enough to catch up to the backlog. The longer you're in unrecoverable backlog, the worse the backlog gets, because in addition to the regular load from clients waking up, you've also got all those clients that tried and failed going to retry. If the outage is long enough, you do get a bit of a drop off, because clients that can't connect don't send messages that require waking up other clients, but that effect isn't so big when you've only got a large backlog a few shards.


If the user client is well implemented either it or the user notices that an action didn't take effect and tries again, similar to what you would do if a phone call was disconnected unexpectedly or what most people would do if a clicked button didn't have the desired effect, i.e. click it repeatedly.

In many cases it's not a big problem if some traffic is wasted, compared to desperately trying to process exactly all of it in the correct order, which at times might degrade service for every user or bring the system down entirely.


Does striking mean you are unemployed, and if so, why? I mean I understand that you're not gonna get paid if you strike.. but feels weird to also have other things affected? Then again I don't know how the US works. (Genuine question)


Minecraft is not an online game, it has a multi-player option sure.. but it doesn't strictly need to be online. There is a single-player mode since basically forever..


I'm not arguing your point, but this is the general state of gaming today. A significant portion of "single player only" games require you to be online, contain "online" content, and access to the game can/will be effectively removed when servers/services go offline.


It's a sad state of affairs that you can buy an offline game, and then the developer/publisher can push out an update that turns it into an online game and then disables your access to it


It's sad, but people keep buying these games, so there is no incentive for the game companies to stop.


It's one of the wealthier neighbourhoods afaik, so apparently it's pretty nice. I think 'low 80s' means when they started renting there.


"Early 80s" would refer to the year. "Lower 80s" in Manhattan means they live somewhere in the area bounded to the south by 80th Street and to the north by 85th Street--which includes part of the Upper East Side.


I understand the first two, but I could never wrap my head around the "having an SO" is a scandal. Why is that?


Because they are sold as an innocent sexual icon that is unclaimed and "maybe you the consumer can claim them if you spend enough money on products they support".

It's this weird toxic aspect of idol culture that sees the "idol" this way whether they are a man or a woman, and whether they are an actual idol, musician, actor, voice actor, etc. If they are young and are a public face (other than politician but even then...) then odds are they are affected by this culture.


A disgustingly horrific perversion of the human spirit. We'll look back on these perverted practices as we look back on the flagellants of the 14th century, and the executives responsible should be prosecuted.


Because brand celebrities' commercial value is in the parasocial relationship that viewers have with them, allowing people to imagine romantic pairings with a person they see on TV or the internet every day while still being at a psychologically safe distance. If the celebrity has a real life boy/girlfriend, then the romantic fantasy is over and so, probably, the consumer brand attachment.


Interesting is that why Americans freaked out over that transsexual influencer from the Budweiser commercial? They were having confusing feelings?


To a large extent. Americans have a lot of opinions about what opinions others should have.


So doesn't having an AI spokesperson completely kill the chance for romance thus making it a fairly useless mascot ?


There's a class of entertainers in Japan and Korea (and to some extent other asian countries) called "Idols", for whom the appeal is being a simulated SO. The breaking of the illusion when they are revealed to have been "cheating" on you is a cardinal sin in this industry.


Mostly for boy/girl - band types, where a major draw for people is "he/she is my future husband/wife, if only we could meet once". So having a SO destroys that connection.


> major draw for people is "he/she is my future husband/wife, if only we could meet once"

Even if they are OK with their celebrity getting an SO, they might take issue with who the SO is.

Either way, it’s a loss.



You don't AFAIK. Mine just asked for a routing and account number.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: