It is no secret that nginx is faster and consumes less memory than Apache. If you always wondered why nginx’s architecture makes it faster but never understood it, I’m explaining it in this post. I also put nginx’s greater efficiency in perspective, and I show that, for many cases, the difference doesn’t matter so much as many people believe. What matters is to deliver a project early and economically. If you are more comfortable with Apache than with nginx, rather than investing in learning nginx, it might be better to pay for a slightly larger virtual machine or to use a content delivery network (and very often none of these solutions is necessary).
The multitasking paradigm
The big difference between the two servers is that Apache uses one thread per request, whereas nginx is event-driven.
What does event-driven mean?
Here is a silly program that first opens a file and then performs a multiplication:
f = open(filename) p = a * bIn the first line, you ask to
open(filename)
. Python will ask the operating system to do it. The operating system will ask the disk to move its head. This may take several milliseconds—this is an eternity in processor time. So your program will actually stop running at that point. In operating system terminology, we say that the process is blocked. While the process is blocked, the operating system will use the CPU for another process; and if there is no other process that wants to run (e.g. because all processes are blocked), then the CPU will be idle. When the disk gets to the file, it will notify the operating system, and the operating system will unblock the process and put it to run. Only then will the program proceed to perform the multiplication.So, whenever Apache asks the operating system to do something (most notably, to read from or write to the network), it gets blocked. But Apache runs in threads. If there are 100 concurrent requests, there are 100 Apache threads serving them. Most of the time, these threads are blocked. Some data arrives, the operating system unblocks the thread responsible, the thread does some processing, then it asks to access a file or to write something to the network, so it gets blocked again. Meanwhile another thread will get unblocked, and so on.
If the above program were event-driven, it would not get blocked when opening the file. It would tell the operating system “go open that file, and let me know when you’re done”. It would then continue its processing. That’s how nginx works. It asks the operating system to do 100 things, and meanwhile it does whatever processing is required. Whenever the operating system finishes something and delivers it to nginx, nginx puts it in the queue of stuff it has to process. If you want to get an idea how this programming technique works, read An intro to asyncio. (Nginx and Apache are written in C, not Python, but the programming technique is the same regardless the language.)
Apache consumes more memory, because each thread consumes a little bit of memory, so if you have 100 threads it will add up.
Besides memory, whenever the operating system stops a thread from running in the CPU and puts another thread to run in its place (which is called context switching), it needs a little time. If there are 100 threads on four cores, the kernel might need to switch the running threads several hundreds or thousands of times per second, and this can add up to considerable time. But nginx does not require context switching, since a single thread can serve all requests (actually we typically configure nginx to run in as many processes as there are CPU cores). This is the main reason why nginx is faster, meaning it can serve more requests per second than Apache on the same hardware.
How much faster?
If you look at test results, you’ll see that, on the same hardware, nginx might be able to serve 15,000 small static files per second, whereas apache may be serving only 3,000. However, we need to put that in perspective. On that same hardware, your Django app might be able to serve only 50 requests per second.
So, if it takes Apache 0.3 ms to proxy a request to your Django app, and your Django app needs 20 ms to respond to the request, you need a total of 20.3 ms, whereas with nginx you might need, say, 20.06 ms. Nginx may be five times faster, but in the end it doesn’t make much difference.
Although dynamic page requests need way more CPU time than static page requests, the static page requests are typically more in number (and the responses are usually larger). Each page visit is typically one dynamic page request plus a number of static file requests (CSS, images, and JavaScript). If for each dynamic page request there is an average of 5 static file requests, then, if you get 50 visits per second, the total is 300 requests per second, of which 50 are for Django and 250 for static files. The 250 files is a very small load for your web server, whether that is Apache or nginx. It’s the 50 requests for Django that push your server’s CPU to the limit. So, for small deployments, it doesn’t really matter, and many small web sites have really less traffic than 5, let alone 50, requests per second.
Memory-wise the difference is more significant. On one small server where I run Apache, it consumes about 100 MB of RAM, with approximately 25 threads. If something happens and the server gets, say, 50 concurrent requests, Apache will create about 25 new threads. This can easily raise RAM usage to 150–200 MB. On another small server I run nginx, and it consumes about 10 MB of RAM. That’s it. If the server gets a sudden flood of requests, nginx will stay at 10 MB. Both servers are small virtual machines with 1 GB of RAM, and 100 MB more or less makes a difference.
If you use a content delivery network like Cloudflare, then your static files are being taken care of elsewhere, and your web server receives only dynamic page requests. In that case, the difference between nginx and Apache is less important.
Apache benefits
If nginx is faster and uses so much less memory, why would someone use Apache? For one thing, I think that its documentation is better organized and more complete (but, on the other hand, I agree that nginx’s configuration language is nicer than Apache’s). There are also many more modules for it. Some time ago I needed a module for Shibboleth (an authentication system similar to OpenID, widely used in universities), and it exists only for Apache. I switched from nginx to Apache on that occasion. Apache also has support for content negotiation, and logging through syslog. Nginx’s support for these is much inferior and you may have to patch it and recompile it. I believe that Apache is still more popular for PHP programs (like WordPress) and CGI programs (like mapserver), as it is easier to set them up with it.
The verdict
If you have no prior experience, and the support you can get (from friends or colleagues, for example) makes no difference, and you have no other reason to use Apache, then use nginx. Besides being more efficient, it seems to be more popular for Django, so you will probably find support more easily. But if you or the guy next door already know Apache, there is no reason to go into the trouble of learning nginx if you are going to deploy a small Django site.
Also note that most people seem to think that they should either deploy with Apache+mod_wsgi, or with nginx+gunicorn; but there is absolutely no reason why you can’t use Apache+gunicorn, and, in fact, this is the setup I’m using mostly.
Update: Denial-of-service attacks
2016-11-30
After I published this article and slept over it, it occurred to me that I could write a program that opens a connection to the web server, begins the request, but never finishes the request, and leaves the request hanging. This would keep an Apache thread occupied. Do that many times, and you have all available threads occupied.
Indeed, I wrote this program, barely 10 lines of Python, and I managed to attack some of my servers. A server which was running Apache with mod_php, using the prefork worker (that is, using processes instead of threads, because PHP can’t use threads), became unresponsive after about 200 requests. Another server, Apache without mod_php and with threads, managed 300 requests. On a server with nginx I opened 800 connections and the server’s performance wasn’t affected in the least.
After searching the web I discovered that this type of attack has a name, Slowloris, and I was surprised to learn that it was discovered only as recently as 2009. It can be used against nginx also, but it involves tens of thousands of connections, it will take longer than with Apache, and it is easier to mitigate.
It seems like a serious problem, but again, let’s put it in perspective: in the last 15 years, from what I remember, I have had a similar attack only once or twice. I don’t remember whether it was an attack on the web server or on some other service, just that it was a single machine making many connections. I fixed the problem by stopping the attacker at the firewall. Given that, while it shouldn’t be too hard to go and configure my servers to withstand, I consider this to be premature optimization.