We have traditionally used Django in all our projects. We believe it is one of the most underrated, beautifully designed, rock solid framework out there.
However, if we are to be honest, the history of async usage in Django wasn't very impressive. You could argue that for most products, you don’t really need async. It was just an extra layer of complexity without any significant practical benefit.
Over the last couple of years, AI use-cases have changed that perception. Many AI products have calling external APIs over the network as their bottleneck. This makes the complexity from async Python worth considering. FastAPI with its intuitive async usage and simplicity have risen to be the default API/web layer for AI projects.
tldr: Async django is ready! there is a couple of gotcha's here and there, but there should be no performance loss when using async Django instead of FastAPI for the same tasks. Django's built-in features greatly simplify and enhance the developer experience.
So - go ahead and use async Django in your next project. It should be a lot smoother that it was a year or even six months ago.
(Disclaimer: I haven't used Django in a long time)
Can you expand more on why these AI cases make the complexity tradeoff different?
I'd imagine think waiting on a 3rd party LLM API call would be computationally very inexpensive compared to what's going on at the business end of that API call. Further lowering the cost, Django is usually configured to use multiple threads and/or processes so that this blocking call won't keep a CPU idle, no?
> Can you expand more on why these AI cases make the complexity tradeoff different?
They’re very slow. Like, several seconds to get a response slow. If you’re serving a very large number of very fast requests, you can argue that the simplicity of the sync model makes it worth it to just scale up the number of processes required to serve that many requests, but the LLM calls are slow enough that it means you need to dramatically scale up the number of serving processes available if you’re going to keep the sync model, and that’s mostly to have CPUs sitting around idle waiting for the LLM to come back. The async model can also let you parallelize calls to the LLMs if you’re making multiple independent calls within the same request - this can cut multiple seconds off your response time.
Yeah the article doesn't even mention "what about more threads". Responses to your comment suggest there's broadly a lack of clarity around what Async gets you especially vs threads.
If the alternative to Async is more processes, rather than threads, a clear benefit is reduced memory usage and reduced process startup time.
> a clear benefit is reduced memory usage and reduced process startup time
Not necessarily true. Many process-parallel Python environments support using fork(2) for parallelism (multiprocessing, gunicorn, celery).
For similar processes (e.g. parallel waiting on RPCs) that removes the memory overhead. It also largely mitigates startup time costs (especially if forks are reused for multiple requests, which they are in most forking contexts).
While there is debate and grumbling in the Python community about fork(2)’s rough edges re: signals/threads/MacOS, these issues are usually handled inside parallelism-management library code and rarely concern application level developers.
The explanation is in the article. Tldr is for sync functions, the CPU is blocked, with async functions, once the await statement is reached, other stuff can be handled in between
Indeed it says "It enhances performance in areas where tasks are waiting for IO to complete by allowing the CPU to handle other tasks in the meantime".
To restate my comment: I argued (1) this CPU cost would be very marginal compared to the LLM API compute cost, and (2) the CPU blocking claim doesn't really hold, due to the wonders of threads and processes.
There are two ways to call out to an externally hosted LLM via an HTTP API:
1. A blocking call, which can take 3-10 seconds.
2. A streaming call, which can also take 3-10 seconds but where the content is streaming directly to you as it is generated (and you may be proxying it through to your user).
In both cases you risk blocking a thread or process for several seconds. That's the problem asyncio solves for you - it means you could have hundreds (or even thousands) of users all waiting for the response to that LLM call without needing hundreds or thousands of blocked threads/processes.
Reading this my first reaction was that my questions still holds. Unless the slowness of the LLMs is of such a magnitude that having a thread or process waiting on the API call would substantially increase its cost, which I guess would mean the LLM server endpoint would be doing very heavy queuing and/or multitasking instead utilizing a powerful compute element for 90% of the call duration.
Or maybe the disconnect is where I'm taking for granted that "cost of parked thread" is the same as worrying about the nr of parked threads? Maybe everyone uses Django setups where it's nontrivial to add memory, increase serverless platform limits, etc if you get the happy problem of 10k concurrent users? Or maybe people don't know that the nr of threads/processes you can have on Linux is much more than hundreds or thousands. Or maybe there's some Python or Django specific limits to this.
Maybe I need to update my mental model of how many threads is too many on Linux (and benchmark the impact on the GIL here, which should at least be released for I/O network waits).
On my Linux desktop the default cap seems[1] to be ~32k.
See eg https://www.baeldung.com/linux/max-threads-per-process for some runtime sysctl knobs if you want to go higher than the default limits. (Though the section 6 there seems out of date and written with the 32-bit OS in mind)
However, if we are to be honest, the history of async usage in Django wasn't very impressive. You could argue that for most products, you don’t really need async. It was just an extra layer of complexity without any significant practical benefit.
Over the last couple of years, AI use-cases have changed that perception. Many AI products have calling external APIs over the network as their bottleneck. This makes the complexity from async Python worth considering. FastAPI with its intuitive async usage and simplicity have risen to be the default API/web layer for AI projects.
I wrote about using async Django in a relatively complex AI open source project here: https://jonathanadly.com/is-async-django-ready-for-prime-tim...
tldr: Async django is ready! there is a couple of gotcha's here and there, but there should be no performance loss when using async Django instead of FastAPI for the same tasks. Django's built-in features greatly simplify and enhance the developer experience.
So - go ahead and use async Django in your next project. It should be a lot smoother that it was a year or even six months ago.