Hacker Newsnew | past | comments | ask | show | jobs | submit | jp57's commentslogin

“Bridging the gap between PhD and SWE” would be a good subtitle for my career.

I started out writing software for scientists, psychologists, first at a university, then a small company. After eight years of that I went to grad school and got a PhD in CS (ML/AI), and did a postdoc, before going into industry, and eventually landed a role in what was then called “data mining”, later “data science”, then “machine learning engineering”. In the beginning when the team was small, we were all generalists, doing both the science work and the engineering. As we grew, specialized roles developed, but I was able to chart a course somewhere between a SWE and a scientist, doing a lot of knowledge work, experiments, measurement, and presentation, but also building common tools that the rest of the team can use.

I’ve been out of the job market for 15 years now, but I think any company that does science and builds software would value your skillset. In fact, when I was shifting from academia to industry, I started out determined to be a “scientist”. After all, what was my PhD for, anyway? But my SWE chops were pretty evident on my resume, and I had a hard time getting traction. Then I got brought in for an interview at a company that had a team of scientists and a team of engineers and they brought me in for a split interview with both teams. It was clear by the end that they wanted me as an engineer, but I was insistent on wanting to be a scientist. They didn’t offer me a job, and I was disappointed. The disappointment was educational for me, and I rewrote my resume to put more emphasis on my SWE skills, and that made it easier to find a role that fit me.


> I’ve been out of the job market for 15 years now

Wow - that's a long time at one company, or being without a job. Could you share more on that? Simple curiosity, thanks.


I've been at the same company, and in the same team/dept for all that time. When I started I was in my early 40s with a family and had moved three times in the previous four years, and I was certainly ready to stay put if I liked the job. It turned out that the job was great. We were a small and scrappy team, fighting for recognition in a big company, and we got the recognition and grew explosively. The comp and benefits were good, and the management humane. The growth meant I had opportunities to do new things. I became a tech lead and then a manager. As a manager, I got to see how comp, and promotion, and hiring, and firing worked. And I got a lot more empathy in general for the work that management does that ICs generally never see.

After a few years I got tired and somewhat bored with being a manager, and asked my director to move back to a senior IC role and he facilitated that for me.

TBH, I have always had my doubts about the narrative that short tenures are the norm in tech. It has always sounded to me like a misreading of the statistical distribution: if you were to histogram the length of tenure of every job (person+company) in tech over some period, of course there would be a big hump at the left end. That's natural, because they are short. I myself have three jobs of less than two years and one each of six, eight, and 15 (if you count grad school as a job). So that's 12 years in the the four shorter stints and 23 in the two longer ones.


I went through something similar. Ended up under-emphasizing my science background because I noticed that it turned people off.

You don't say what it means for you to have under-emphasized it, or what the consequences were, but my changes mostly consisted of changing the preamble of my resume to be clearer that I was willing to take dev jobs, though I was really only likely to apply in "sciency" roles.

When I said I was disappointed when I didn't get the job in the story above, what I meant was that I was disappointed that they didn't offer me the SWE job, and I kicked myself for telling them I didn't want it. But really I only knew that I wanted it after I didn't get it.


...and just like that, the reproducibility crisis is forgotten.

Seriously, it's amazing how fast we can go from "man, scientific research sure is a mess, wtf are all these people doing anyway?" to "How dare you mess with the status quo?!"

It's worth remembering that American academic science has for years been training far more grad students than they could ever hope to eventually give tenure to, or even place in tenure track jobs (only to be denied at the last step). Instead, PhD graduates spend years working in the precariat of "soft-funding". The result is a desperate publish-or-perish culture that leads to all the ills we see so often on the HN front page: unreproducible results, p-hacking, etc.

This entire toxic environment is created and sustained by universities that demand that their faculty have independently funded research programs, that put a third or more of their grant funds into the university general fund via indirect fees.

This is the status quo that is being disrupted. It is pretty reasonable to assume that the majority of young researchers whose careers are getting derailed were not going to make tenure or publish anything anyway, and they have in fact been done a favor.

The counterargument to this is that we should deliberately fund many researchers who we know will never actually produce anything useful because that's how we find the few actual geniuses who will produce useful things. There is something to this argument, but we should be clear up front to the students about their true prospects.


Usually, when something is broken the correct course of action is to fix it, not demolish it utterly.

Academia is tough, and things are bad enough to complain about it.

However, you have (understandably) fallen in a trap of rationalization. This is not an earnest effort to improve. As it stands now, the damage of the conservative rage is measured in decades needed for repair. As in: the intended effect.

I have linked it a few times, but I am happy to do it once more, because I can surely understand the genuine confusion people have about these things:

https://www.arte.tv/en/videos/103517-001-A/capitalism-in-ame...


If the problem is, as I posit that it is, that universities cynically exploit cheap labor in the form of grad students and postdocs in order to keep indirect funds flowing into the universities' accounts', then many earnest efforts to improve would necessarily involve putting a lot of researchers out of work, and that improvement would be a good thing.

My issue is with the uncritical defense of the status quo in both the article and most of the comments. Though I suppose I can understand the impulse for scientists to say that the field's problems are internal, to be dealt with internally, and that the government needs to just give the money they ask for and not make any effort to see or change how the sausage is made.


The status quo is not in focus, let alone I would defend it. Your concerns about the status quo are really valid imho, how they should be dealt with would be an interesting other subject, but they are not a concern for the conservative movement, nor are there any signs one could expect even unintended good consequences. As such, as well-intending you might be, it only adds to confusion.

The bad consequences are, from a historical perspective, the least of a surprise.


"This programme is not available in your country." (i.e. USA) Oh the irony. You'll have to make the argument yourself, I guess.

I think the broadcast license is restricted to EUR area. Proton vpn is free though. I recommend to take the hassle, it is a great historical documentary in three parts.

"git doesn't really work ... because docx is a binary blob."

Well, yes, but the binary blob is a zip archive of a directory of text XML files, and one could imagine tooling that wraps the git interaction in an unzip/zip bracket.

The real problem is that lawyers, like basically all other non-programmers, neither know nor care about the sequence of bytes that makes a file in the minds of programmers. In their minds the file IS what they see when they open it in word: a sequence of white rectangles with text laid out on it in specific ways, including tables with borders, etc. The fact that a lot of really complicated stuff goes on inside the file to get the WYSIWYG rendering is not only irrelevant to them, it's unknown.

Maybe the answer here will be along the lines of Karpathy's musings about making LLMs work directly with pixels (images of text), instead of encoded text and tokenizers [1]. An AI tool would take the document visually-standard legal document form, and read it, and produce output with edits, redlines, etc as directed by the user.

[1] https://x.com/karpathy/status/1980397031542989305


Diffing the XML is a complete nonstarter. I've spent years working with the OpenXML format and can assure you it is very complex even for a professional software engineer with 10 years of experience.

The diff of the document (referred to as a "redline") is what lawyers send to the client and their counterparties. It's essential that the redline is legible for all parties and reflects their professionalism.

Moreover, it is not enough to see the structural changes between the versions. A lawyer needs to see the formatting changes between the versions as well which cannot be accomplished by diffing XML files.


And, importantly, there already is an official diff tool: the "Compare" button.


Correct. Solely relying on the built in Word Compare tool results in a whole host of version control issues, however, which I outline in detail in my post "On Building Git for Lawyers."

https://theredline.versionstory.com/p/on-building-git-for-la...


Git supports registering custom diff tools for specific file types [1]

Wouldn't the obvious solution then be to take the tool they already use for redlining (e.g. Word's compare function) and integrate it into a git workflow?

[1] https://stackoverflow.com/questions/12356917/how-to-set-diff...


Pardon me but is there any way that openxml can be converted to a format similar to https://www.gnu.org/software/recutils/

If openxml can be converted to csv/similar perhaps which can be converted to recutils

Recutils supports both mdb (Microsoft Access database files)/csv files to/from recutils

I saw this project on a recent hackernews comment and I had seen some comments there about how it does / can work decently with git features iirc (https://news.ycombinator.com/item?id=46265811)

I am interested to hear what your thoughts on recutils are and if perhaps we can have microsoft word/similar to git+recutils like workflow maybe

I thought about it and a tar/zipped git folder which can contain images/other content too which can be referenced with recutils instead of openxml/word document to me does feel an interesting idea

I am not sure but I think that openxml directly embeds data like pictures which can defnitely make it hard for git software to work perhaps but basically I am interested what you think about this/any feedback


You don't seem to be aware of any of the work I'm doing on CSTML (built to replace HTML and XML, and yes, built to be useful for legal documents (even though IANAL)). If you're interested in collaborating to go after the law market, let's talk! You're trying to sneak in a side door. I'm planning to smash down the main gates, the ones you say are impregnable. My investigation says they're not unbreakable, but instead strong and brittle. Many attacks will bounce off, yes, but brittleness means that these are defenses that will shatter before they bend.

Something I've started doing in my workflow is using Pandoc to convert between Markdown and DOCX when authoring long documents. This lets me put the Markdown into Git and apply the Gemini CLI to it. When referencing other documents, I'll also convert them to MD and drop them into a folder so I can tell the AI to read them and cross-reference things.

At the start of the project the Markdown is authoritative, and the DOCX is just for previewing the styling. (Pandoc can insert the text into a layout template with place holders.)

Towards the end of a project I'll start treating the DOCX as authoritative but continue generating Markdown from it, so I can run the AI over it as a final proof-read or whatever.

This is similar to what people used to do with DocBook, but with a more friendly text format and a more AI-friendly "modern" workflow with Git, etc...


I do this with asciidoc instead of- same advantages with git and llms but you get a tremendous amount more styling and functionality.

If you only stay in hotels alone, it probables doesn’t matter that much to you. Quite apart from questions of dignity, when sharing a hotel room, there are practical conveniences: it’s nice to keep odors contained, and to be able to turn on the bathroom light at night without waking anyone up.


Wish they'd give a bulk delete interface that lets me choose which chats to keep and which to delete. (i.e. not "Delete All" scorched earth).


I, too, experience some synethesia with letters and numbers (two is green, three is yellow, 4 is blue), but when I look at a field of clover, I don't experience a field of numbers representing the number of leaves on the individual clover stalks. In fact that seems like a weird way of perceiving nature. When I see a bird flying with its two wings outstretched, I'm not experiencing the number 2, and thus I get no sense of green.


The main protagonist in my novel experiences synethesia. She talks about numbers having colours. To help ensure consistency throughout the novel, I developed a text editor (KeenWrite) that allows me to refer to externally defined variables within the prose, such as:

      syn_1: black
      syn_2: purple
      syn_3: red
      syn_4: gold
      syn_5: blue
      syn_6: silver
      syn_7: yellow
      syn_8: brown
      syn_11: teal
      syn_16: orange
It's tempting to change the colour map based on your abilities. How far does your colour mapping go and what other "columbers" (that's what the protagonist calls them) do you perceive?


Respectfully, I don't think synesthesia is behind OP's purported skill.


Hi, synesthesia researcher here! (1)

Here's a few relevant things we know:

- Synesthesia is not rare. You probably know someone that has synesthesia, even they haven't mentioned it.

- There are many forms of synesthesia. Many documented forms that we know of, and very probably a bunch we haven't documented yet.

- There are cases of tasks where we are able to measure enhanced performance of that task by synesthetes. (2)

- While some synesthetes do have a single form of synesthesia, it is common for synesthetes to experience multiple forms. We've found cluster groups where subjects with a given form are more likely to have another form within the same cluster.

From the other writings on the OP's site, we can see that they report to have at least two forms of Colored Sequence Synesthesia: Grapheme -> Color, and Day of the Week -> Color.

Their report of their experience in the linked article sounds like possibly Shape -> Motion. This is a form they could have, and it's plausible that someone already known to be a multiple synesthete might also experience this.

It is also plausible that someone with a Shape -> X type of synesthesia would be able use that to spot the odd shape out faster than others.

------

(1) I maintained the online synesthesia battery for a number of years while working in the Eagleman Neuroscience Lab at Baylor College of Medicine

(2) Some of these are ones that allowed us to study synesthesia on a larger scale by testing online! Among those, one particularly notable form of test is Stroop Interference. Genuine synesthetes are able to respond much faster and more accurately, and we get a good clear separation between them and controls.


That all sounds very interesting. As someone who has synesthesia, I’d be interested if you still maintain those tests you refer to?


I'm not currently active with it myself, but the site is still here:

https://synesthete.org

Back when I was handling it, we were still using Flash for most of the interactive tests, because that was how you had to do it when it was first built circa 2007. Obviously those would have had to be redone in HTML5 since then to keep it working on modern browsers.


The article is hugged to death. Maybe it wasn't hosted in the cloud?


Right, because it’s not possible for cloud services to get hugged to death.


Loads just fine?


Why do you think that the taste in your mouth is waste draining from your brain and not the result of some metabolic changes in your body from the fast? Ketosis is known to cause a metallic taste in the mouth, for example.


Nim has a python-like syntax, but I wish they'd gone farther, using `def` instead of `proc` and a `print` function instead of the `echo` statement. Though even if they did those things, I'm not sure it would really feel like programming Python.

As a long-time Python programmer, I was drawn to trying the language partly because of the syntax, but as soon as I tried to write something substantial, Nim's heritage in languages like Pascal, Modula, and Ada starts to show. Syntax notwithstanding, programming in it really felt more like programming in Pascal/Modula.

I in fact did not know anything about Nim's history or design choices when I started using it, but I'm old enough to have written a fair amount of Pascal, and I was not long into using Nim when I started thinking, "this feels weirdly familiar." `type` and `var` blocks, ordinal types, array indexing with enums, etc.


From https://nim-lang.org/faq.html :

Why is it named proc?

Procedure used to be the common term as opposed to a function which is a mathematical entity that has no side effects. And indeed in Nim func is syntactic sugar for proc {.noSideEffect.}. Naming it def would not make sense because Nim also provides an iterator and a method keyword, whereas def stands for define.


Actually echo is not a statement - Nim's syntax is just much more flexible than Python so what looks like a statement in Python is actually just a UFCS/Command-Line "call" (of macro/template/generic/procedure aka "routine"). It is super easy to roll your own print function [1] and there is no penalty for doing so except that the std lib does not provide a "common parlance". So, that wheel might get reinvented a lot.

A lot of things like this in cligen because it is a leaf dependency (the literally 1..3 identifier CLI "api") and so many "drive by" PLang tester-outers might want to roll a little CLI around some procs their working on.

Also, beyond the echo x,y is same as echo(x,y) or x.echo(y) or x.echo y, the amount of syntax flexibility is dramatically more than Python. You can have user-defined operators like `>>>` or `!!!` or `.*`. There are also some experimental and probably buggy compiler features to do "term re-writing macros" so that your matrix/bignum library could in theory re-write some bz*ax+y expression into a more one-pass loop (or maybe conditionally depending upon problem scale).

I sometimes summarize this as "Nim Is Choice". Some people don't like to have to/get to choose. To others it seems critical.

Someone even did some library to make `def` act like `proc`, but I forget its name. Nim has a lot more routine styles than Python, including a special iterator syntax whose "call" is a for-construct.

[1] https://github.com/c-blake/cligen/blob/master/cligen/print.n...


I have been meaning to explore Nim for a while because it feels like "golang, but python syntax and dev experience." I vibe coded a simple tool, tt, that allows me to track time to a central log from all my devices. Realllly simple:

    $ tt stats
    Time Tracking Stats
      Total entries: 39
      First entry:   Oct 21, 2025 23:04
      Last entry:    Oct 30, 2025 18:29
      Tracking since: 228h 34m
      Days tracked:  5

    $ tt "working on xyz today"
     Logged at 11:38:44

    $ tt today
    Today (1 entries)
    11:38:44 working on xyz today
The code is pretty damn ugly though, I feel like I am working with perl:

    proc groupIntoThreads(entries: seq[Entry], threshold: Duration): seq[seq[Entry]] =
      if entries.len == 0:
        return @[]

      var sorted = entries
      sorted.sort(proc (a, b: Entry): int =
        if a.timestamp < b.timestamp: -1
        elif a.timestamp > b.timestamp: 1
        else: 0
      )

      result = @[]
      var currentThread = @[sorted[0]]

      for i in 1..<sorted.len:
        let gap = sorted[i].timestamp - sorted[i-1].timestamp
        if gap > threshold:
          result.add(currentThread)
          currentThread = @[sorted[i]]
        else:
          currentThread.add(sorted[i])

      if currentThread.len > 0:
        result.add(currentThread)


What are the `@` characters for? Are they what makes it feel like Perl?

Because other than them I don’t think the equivalent Python code would look much different. Maybe more concise, e.g. you could replace the second section with something like `sorted = entries.sorted(key=lambda entry: entry.timestamp)`.


There are shorter options in Nim too, depending on your stylistic preferences

    let sorted = entries.sorted(proc (a, b: Entry): int = cmp(a.timestamp, b.timestamp))
    let sorted = entries.sorted((a, b) => cmp(a.timestamp, b.timestamp))
    let sorted = entries.sortedByIt(it.timestamp)
I suppose you could change the whole proc to something like

    proc groupIntoThreads(entries: seq[Entry], threshold: int): seq[seq[Entry]] =
      let sorted = entries.sortedByIt(it.timestamp)

      for i, entry in sorted:
        if i == 0 or entry.timestamp - sorted[i - 1].timestamp > threshold:
          result.add(@[sorted[i]])
        else:
          result[^1].add(sorted[i])


`@` makes the array (stack allocated) into a sequence (heap allocated).

Edit: Just read the second half of your post—

> I don’t think the equivalent Python code would look much different. Maybe more concise

He could be leveraging [std/sugar](https://nim-lang.org/docs/sugar.html) to make this look cleaner.


@[] is syntax for a "seq", which is similar to a C++ vector, ArrayList in Java/C#, or a python list. It's a heap-allocated array that automatically resizes.

In contrast with [], which is mostly identical to a C array: fixed size and lives on the stack.


Yeah, the code doesn't seem very Perl-ish to me.


If your really want to use the keyword def instead of proc: you can do that with sed.

In all serious-ness, don't do that. I've used Python a lot, but Nim is a different language. Writing the proc keyword helps condition your brain to realize you are writing Nim, not Python.


Nim is indeed a different language, which was the point of my comment, for those who got past the first sentence. However, if folks are going to tout its “python-like” syntax as a selling point, it’s not really fair to then turn around and say, “no, it’s a different language”, when a Python programmer points out that it’s not really all that python-like after all, and maybe it could be more so.

If one is going to take pains to point out that there are good reasons why it is different from Python, then we can carry that as far as we like. There’s no particular reason to use indentation to denote blocks. BEGIN and END worked just fine, after all, and would be more true to Nim’s intellectual heritage. Or maybe just END, and continue to open the block with a colon.


AI won't replace you ... if you are the Magnus Carlsen of your field.

How you will get experience to grow into the Magnus Carlsen of your field is an open question, however.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: