Hacker Newsnew | past | comments | ask | show | jobs | submit | bjar2's commentslogin

Ingestion engine, it is indeed a cron job that runs once a day to get the latest podcast episodes posted. Yes it scrapes the web for episodes and then populates the database. And yup yup, I transcribe the audio to text, and process the text to get the embeddings using embedding models. The secret sauce is using language models to find promising snippets within each episode by running a sliding window over the transcript. So I actually make different types of embeddings, for highlights and also for episodes. I also make use of the metadata in podcast episodes to enhance recommendations, mainly by deriving the strength of the source making the content.

You are spot on, I use celery for tasks, many different kinds of tasks actually, super handy tool to have, it truly enhances what I am able to do on Heroku. My devops life becomes much more comfy


Hi, and thanks for reaching out @jamescridland. I sent a mail with more information :)


For the past 18 months, I’ve been building a podcast discovery app because I felt like existing platforms don’t make it easy to find new episodes. Most recommendation systems focus on entire podcast series, which can be limiting—what if the best content for you is hidden in an episode of a show you’ve never heard of? I wanted to create something that surfaces great episodes, not just popular shows.

To do this, I built a system that streams, downloads, transcribes, and analyzes a huge number of podcast episodes. Instead of relying on metadata or user behavior alone, it evaluates episodes individually based on content, merit, and inspiration. The recommendation engine is designed to balance relevance with diversity, avoiding echo chambers while still keeping suggestions engaging.

On the technical side, I’m running a Django backend with a PostgreSQL database, supported by two NVIDIA GPU-based HyperStack servers that handle Whisper-based transcription and deeper semantic analysis. The model doesn’t just surface what’s already popular—it actively works to highlight lesser-known but high-quality episodes that might otherwise go unnoticed.

I’d love to hear your thoughts. What frustrates you most about podcast discovery? What would make this useful for you?


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: