17 April 2013 0 Comments
PHP is meant to die, continued

TL;DR: PHP is still a poor choice for continually-running processes. See this post for context. Read on for more proof.

There’s been some reaction to the previous entry. Some people agreed, mostly seasoned PHP folks, some others just didn’t. Yeah, apparently it pleased or infuriated more than two folks in the entire Internet.

First and foremost, a correction to previous statements: PHP’s 5.3+ enables garbage collection by default, it’s an opt-out feature, not opt-in. So it’s probably enabled in all your scripts, even the fast-dying ones, if your PHP version is recent enough and you didn’t do anything funny to your php.ini file.

Yes, I already acknowledged that PHP has a garbage collection implementation starting 5.3.0 and up (opt-in or opt-out, that’s not the problem). I also acknowledge that garbage collection works, and is able to take care of most circular references just fine. However, if you’re one of the many that think “hey, no one should be using anything below PHP 5.4 nowadays”, you’re clearly too young to remember how much blood, sweat and years it took to get rid of PHP 4, even when PHP 5.0 was reaching end-of-life, 5.1 was healthy, and 5.2 was already in the works.

Anyway, as previously stated too, garbage collection is a great thing, but not enough for PHP. It’s a borrowed feature that does not play well with old fundamental decisions inherited from the original design. Garbage collection is not a magical solution for every problem, like many tried to argue about. Let’s illustrate with another example.

React

In between the sea of praises and insults from the previous entry, some constructive-criticism folks suggested that I should try out projects that use PHP in a continually-running loop. One, specially, was recurrently suggested and caught my attention: React, an event-driven, non-blocking I/O platform, somewhat similar to node.js (ok, maybe more than just similar).

Before starting, I can’t stress enough how much of a good thing I think React is. No, this is not irony. No, I don’t think it’s great because it aims to bring the wonders of node.js to PHP, which may be cool, but I think it’s truly exceptional because it’s the kind of project that pushes the boundaries of the underlying programming language further. Myself, as I’m sure many more, want React to be successful, and for that PHP should step up and support this kind of development… which, currently, doesn’t, and I think the project will inevitably suffer from that. Let’s continue.

Why React, and not something else? Because is the kind of innovative stuff I’m talking about from the beginning, it uses PHP in a continually-running loop that isn’t just meant to recurrently run time-consuming database queries in the background, something that has been already solved by garbage collection (and probably a patched up solution involving periodical restarts, for sure), or worked-around with cron jobs.

A preliminary note on deployment

Other React-like projects usually have a production deployment guide in the lines of “this is good for development/debugging, but you need something more robust for production deployments” 1. I can’t find anything for React, except a few notes for Ratchet, a PHP WebSockets framework built on top of React, so I made sure to do my homework and follow those tips to simulate a production-like environment. That means we’re assuming React is following node.js steps: the production server is already built-in, at most you should be able to reverse-proxy the requests, but ultimately the continually-running process will manage them.

Besides that, I’ll be using the latest stable PHP release to date (5.4.13 by the time of this writing), with garbage collection enabled by default, of course.

Testing it yourself

Assuming that we’ll be using React for more than printing “Hello, World” in the browser, I took two examples from their codebase itself (here , and here), mixed them together, and added a print line for quick debugging. No circular references, no special tricks. The goal is just to serve a shitload of data (in React’s own terms), sending 20MB of a single random ASCII character as response to each HTTP request. Here’s the result:

https://gist.github.com/anonymous/5357469

Running that file (granted that you have installed React2, of course) will start the server in port 1337. Now, we can test this from the shell:

 while [ 1 ]; do wget -qO- http://127.0.0.1:1337; sleep 1; done

So, we’re sending one request after another, waiting a second in between, and repeatedly getting the following debug output:

 Allocated memory usage: 23,174,168, Memory Limit: 128, PID: 72928
 Allocated memory usage: 23,174,168, Memory Limit: 128, PID: 72928
 ...

So far so good, right? It allocates 20MB to send the response, plus a few more megabytes for PHP own requirements, and never gets past that, so the garbage collector works just fine, freeing memory after each request.

Sadly, this test is not even close to a real production environment. Requests won’t gently queue one behind the other and wait for the previous one the finish, just to behave nicely to your server. So let’s test this with a slightly less trivial, more production-like scenario (verbosity and multiple lines for clarity):

 while [ 1 ]; do for i in {1..4}; do \ 
 wget -bqO- http://127.0.0.1:1337; done; sleep 1; done \
 done; sleep 1; done 

Now, we’re sending 4 request per second, with one twist: using the -b option in wget (meaning “run in the background”), all request are executed in parallel, not one behind the other. So, approximately, we’re sending 4 parallel requests per second (yes, I’m still gentle enough to wait a second in between). Here’s the debugging output:

 Allocated memory usage: 23,174,168, Memory Limit: 128, PID: 64873
 Allocated memory usage: 39,974,960, Memory Limit: 128, PID: 64873
 Allocated memory usage: 46,866,264, Memory Limit: 128, PID: 64873
 Allocated memory usage: 62,603,296, Memory Limit: 128, PID: 64873
 Allocated memory usage: 72,787,752, Memory Limit: 128, PID: 64873
 Allocated memory usage: 87,614,320, Memory Limit: 128, PID: 64873
 Allocated memory usage: 105,745,480, Memory Limit: 128, PID: 64873

And then…

 Fatal error: Allowed memory size of 134217728 bytes exhausted (tried to allocate 20971532 bytes) 
 in /react-test/vendor/react/http/React/Http/Response.php on line 82 

Wait. What just happened?

The process died, because PHP’s default memory limit (128MB) was reached, just as expected. Everyone knows that.

You mean that the request died, not the process.

No, I mean the process. Everything. The server, the event loop, the request that went over the memory limit, and all other ongoing requests by the time too. All dead.

So the garbage collector didn’t work?

The garbage collector was working just fine.

But aren’t you being specially tricky or aggressive with the React server?

I just mashed up a couple of their own examples and figured that in a production environment, a few requests per second is not too much to ask for. I took less than five seconds to die. To make things worse, remember that this was tested in against your loopback interface, in your own machine, with virtually no network latency.

So, should I increase the memory limit?

Sure, you can. And I’ll increase the number of request per second, or the HTTP response size, or I’ll just put this in a real production environment with real network latency and your process will die anyway. It’s a matter of math.

Can’t this be fixed?

It may be worked around, with clever or dumb techniques. None of them validate PHP as a suitable language for continually-running processes.

Can’t we just monitor and restart the process when it dies?

Sure, that has been previously suggested, too, as a workaround. It’s a good thing to monitor your background processes, generally speaking. However, monitoring is intended to prevent your infrastructure to collapse in the event of an unexpected error, hopefully with enough details to identify, debug and fix the problem. In this case, we’re talking about a known, expected, inevitable issue. Restarting every time it happens may be good enough for your needs, but still doesn’t validate PHP as a suitable tool for the job.

Isn’t this a problem with your own machine because it isn’t a real server?

What? No. You may get slightly different results in your environment (maybe you need an extra request per second, or a larger HTTP response), but specific by-the-byte numbers aren’t important here. You should be able to bring death to the process quite easily. But more importantly, please, read on. It’s substantial to understand what we’re trying to fix in the first place.

An attempt on understanding

So, what exactly happened here? PHP veterans may already know, but we’re going for the full explanation.

First, let’s understand how React works. Identically to node.js, they use an event loop. Also, as well as node.js claims (with maybe not the best terminology), it’s “single-threaded”. For React, what’s important to us is this: while it’ll be able to manage requests simultaneously, there’s no forks, and everything will happen within the the same process, the original one, the one that runs the event loop. See the debug output from the previous test, the PID never changes.

Secondly, I think we can all agree that garbage collection frees memory once it’s not being used anymore. Yes, I know this is a very basic concept, and not specific to PHP. That said, if your data is still being transferred through the network, it’s obviously not ready to be freed.

However, the significant result of those combined factors is that allocated memory for each individual request you send to React will add up to the one and only process that’s running. If somehow you have enough requests that, individually, have allocated enough memory to sum up a number that go over the memory_limit, your entire process (server and ongoing requests) will die. In “classic” PHP terms, it’s the equivalent of a fatal error in a single page that brings down Apache completely. Go figure.

Also, two factors were taken out of the equation that would make things even worse. One is real network latency, as previously mentioned, which would keep allocated memory for way more than a fraction of a second until a slow transfer is completed. The other would be a service that takes more than a fraction of a second to generate its response. In both cases, your allocated memory will rise, won’t go down quickly, and will continually add up 3.

What is this memory limit, anyway?

You probably figured it out by now, but this is not a React’s problem. memory_limit, as well as max_execution_time are old relics inherited from PHP’s original dying nature. You can read the manual for their formal definition, but their spirit is this: if your PHP process is running for too long, or consuming too much memory, it needs to be killed. Why is that? Because PHP processes are meant to die rather sooner than later. That’s what the language was designed for from the very beginning.

You can argue that this kind of monitoring (for a process that’s consuming too much memory, or running for too long) is closer the operating system toolset rather than an application’s programming language setting. I concur, but those settings have been there forever, and they now reveal PHP’s true nature. Their defaults haven’t changed in a while. In the latest stable release, PHP 5.4.13, it’s still 128MB, a perfectly valid value to serve web pages.

Is there any hope?

There’s always workarounds. There’s been workarounds before garbage collection, and there are some after it.

You can disable memory_limit completely, setting it’s value to -1. That’s been strongly discouraged throughout the years, and by the very PHP’s literal manual definition, memory_limit "helps prevent poorly written scripts for eating up all available memory on a server.". Judging by the default value, PHP thinks that any script going over 128MB is poorly written. You can disable the limit anyway, if you’re reckless enough.

Or, maybe you can set a high memory_limit — ridiculously high enough to suit your needs, making sure that your process won’t die that quickly. I’ve tried that with the above example, and made it last a couple of additional seconds. Back when PHP has no garbage collection, I clearly remember being forced to set a limit of 4GB in a server. Now we’re talking of a similar solution. See why garbage collection isn’t a magic solution after all?

Whatever you do to the memory_limit setting, keep in mind that it’s global, and it is that way for a reason. It means that if you raise the memory limit to cope with a particularly troublesome request, you’ll be raising that for every other single thing that runs through PHP in your server. That’s also strongly discouraged, because in the name of a very specific problem, you’re opening for potential problems everywhere else, where the rest of your PHP code will probably run just fine with a small portion of the allowed memory.

But wait, there’s also ini_set. You can change configuration values dynamically in your code, without affecting server’s PHP global configuration setting. That’s true, but in our very particular case, it’s a non-solution. Remember, React runs as one, and only one process. If you change the memory_limit with ini_set inside a very specific request handler, you’re changing it for the entire process and all other requests too. No, no joke. It’s the same thing as before, but instead of doing it at PHP’s global scope, you’re doing it at your application’s global scope 4.

Yeah, you can raise the memory_limit, serve a particularly memory-consuming request, and lower it again. But, in the meantime, all other React’s request handlers will run with the raised level, including the ones that doesn’t need it. Remember, global setting, one process, simultaneous requests. Same thing over and over. Oops.

Or, maybe, in the near future we hear about a deployment option for React based on workers (like Gunicorn) instead of an omnipotent forever-running process. We only can dream about that today.

Or, just maybe… you may be able to configure a web server as reverse-proxy to proxy (uh?) only a few simultaneous requests, keeping the others waiting in queue, in the way of “don’t send me more than two of those requests at a time, or I’ll die”, if that’s performant enough for your needs. The question is… why you would ever need to apply those half-baked tricks? They have nothing to do with anything.

Additional questions, no answers

Why on earth are we listing those workarounds?

How come that by this point, there’s still people that doesn’t ask themselves "Darn, is there maybe, just maybe, a fundamental problem in the language I did choose for this?", or "Why I’m spending more time on workarounds for a process that doesn’t like to keep running, than the time in implementing the process itself?", or "Why this ugly memory settings have to be set or unset in the middle of my business logic?"

Why people keep arguing that PHP is perfect for continually-running processes just because they were able to work around its limitations once, believing that one particularly working but constricted idea ("I just restart it when it crashes!") is the magic solution for any other possible problem?

I have no clue. But there’s more.

Sequential nature

PHP, born as a scripting language, is meant to run in a sequential manner. Fundamentally, almost all “classic” programming works like that, and the problem doesn’t lie there at all.

It starts being troublesome when the language puts all the eggs in one basket and aims to die early: if everything is part of the same sequence, an early error will kill whatever else comes later, related to the error or not 5. Because of that, in our particular React example where the everything is part of the same process, an individual handler raising an non-catched exception will kill the the entire server 6.

But the real problem comes when you combine that with PHP’s totally incoherent amalgamation of global error handlers mixed with a very late implementation of exceptions. They don’t even talk to each other. Some of them are just uncatchable. A lot of things have been said about how this particular part of PHP is horribly designed, so I won’t add much else. It’s part of its dying nature. PHP dies, sometimes you can’t even know why, and the best you can do is restart and hope for the best 7.

But, really, is there any hope?

React may be enough for your needs. If your responses are small and quick enough, or you don’t expect too much load, or you architect the solution in a way that the actual request is just a lightweight interface to something else, it may work. You can also take advantage of the non-blocking I/O (ignoring React’s own example for a shitload of data) and serve large data sets in smaller chunks, giving the garbage collector some air, which will probably be slower, but lively.

We’re not judging React here, we’re doing a technical analysis of PHP as a suitable tool for continually-running processes. In my opinion, the risk of having to craft half-baked last-minute workarounds is too much to justify it against other tools. I’ll be delighted to see React’s guys to find a totally unforeseen solution to this, but I think PHP is blocking their way.

Finally, PHP has been slowly improving. This isn’t irony, either. It’s been around for 13 years, it’s one of the most widely-used programming languages, there’s no argument against that. No matter how ill-designed it is, flagging it as totally useless would be incoherent, short-minded and a complete denial of reality. Even some largely awaited improvements have been made at the core and language syntax level, to link one of a few. But it’s been too many years, and it’s been consistently like that: PHP is always slowly improving. It’s been slowly improving forever. Right now, to me, the improvement speed that PHP historically had is just not enough anymore, and it is already, objectively behind.

A more proper question would be “Is there any hope for PHP and continually-running processes?”.

No, right now, there isn’t.


  1. For instance, Flask has an insanely-detailed deployment chapter. In node.js, everything is expected to be out of the box, though I think proxying and monitoring should be a must, rather than a mere recommendation. 

  2. You’ll need to install React through Composer. As an old-timer coming from the dark PEAR days, I can only say that the work that has been put into it is top-notch. If you’ve tried other attempts to centralise dependency management in PHP, you know what I’m talking about. 

  3. Node.js has its own limitations for this too. Here’s an equivalent sample script. After a hundred concurrent requests it obviously starts struggling and performance gets seriously affected, but the server itself doesn’t die in a matter of seconds. 

  4. Globals have been a long and unending nightmare in PHP. If you want some background, just try to come up with a rightfully architected solution to manage multiple timezones with PHP’s original datetime functions. 

  5. See “Bring it to the foreground” in the original post

  6. I’ve heard strong arguments stating that every time, every error has a proper exit, and that the perfect programmer should absolutely foresee all that, catch every single possible error condition and manage it accordingly. That reasoning means that no one will never, ever experience an unexpected condition, specially not in a very busy instance, and that if you do, it’s your fault because servers are perfect, operating systems are perfect, programming languages are perfect, database engines are perfect, time is infinite, and you’re flawed. Send me a ticket to that fantasy world, please. 

  7. Some may say that the problem lies in that I’m not knowledgeable enough about errors and exceptions in PHP. I know them very well, that’s why I don’t want to go out and play with them anymore. 

← Read More