4 April 2013 0 Comments
PHP is meant to die

Disclaimer: I’ve got 10+ years on my back as a PHP developer. I started using it when PHP4 was a little boy and PHP5 only a dream in Zend’s mind. I did a lot with it, loved it, cursed it, and saw it grow and evolve, sometimes with great shame. I still use it for some legacy projects, but it’s been a while since it’s not the language of my choice anymore. Also, I’m not affiliated with any framework or tool mentioned here.

TL;DR: If your project will rely on continually-running processes to function, avoid PHP.

In my opinion, a lot of the hatred that PHP receives misses the utter basic point: PHP is meant to die. It doesn’t mean that a perfectly capable (to some extent) programming language will disappear into nothingness, it just means that your PHP code can’t run forever. Now, 13 years after the first official release in 2000, that concept still looks valid to me.

A dying programming model

The core PHP feature follows the most simple programming workflow: get input data, process it, display the output, and die. As much as I think sites like PHP Sadness are a fun way to help in detecting language inconsistencies, there’s no much they can do against that basic design choice. But there’s been some attempts through time, for sure.

In the beginning, dying wasn’t that big of a deal for websites. You read something from the database, apply some logic or formatting to it, and display the result in between a sea of HTML tags. PHP is still really hard to beat at that, it’s the language core feature, despite the many horrors brought to the world of programming from this approach. Thankfully, most problems solvable by these techniques have already been solved, and the remaining ones are being tackled with more clever or modern stuff, because that tiny functionality is usually just a microscopic part into a bigger, complex project. Alas, PHP insists in dying when it’s done. When you try to do the opposite, bad stuff happens.

Not just websites anymore

Sending information from a contact form through email is still a good task for PHP. Nothing wrong with that (well, except that if I keep doing things like these after 10+ years, I probably made some very bad decisions along the way). But that’s also an example of something that has been solved a million times, in million different little ways. Virtually any other web-oriented language or framework can do that too.

Let’s say we’ve past that point. Now we’re writing a real application, a complex one. As complexity grows, also grows your codebase. Let’s say you’re a capable enough programmer. You use PHP 5.x and modern programming techniques. Everything is cleanly abstracted into interfaces and classes. You know your way around existing production-quality libraries. Now, you’re probably now dependant on your own ORM models, vendor code, custom interfaces, maybe your client API for a third-party RESTful interface, and so on. Everything is written in PHP.

And here is when the nightmare begins: you’ll inevitably need to run code in the background. Your application reached a complexity level in which waiting for an HTTP request to do something isn’t enough. What’s different may be anything: you have to process queued orders, or you have to cache information to speed up the actual HTTP response, you have to check periodically for due payments and proceed depending on the result, you have to constantly update pull-only data from external sources, you have to write some stuff to the database in batches to avoid performance degradation, you have to open and keep alive several network connections, or you have to implement the server-side part of a WebSockets application. They’re just examples, there’s no end for this list, it depends on what you’re building.

Bring it to the foreground

The most infamous implementation for the above thing may sound familiar: No matter how “enterprise” PHP or the framework of your choice claim to be, it’s also supposed to be so cheap that you don’t have access to ssh, cron or any similar tool in your $5/month shared hosting server.

Which is the easiest solution to that? Bring background tasks execution to the foreground, of course! Run them at random times for 1/nth of the page visits. That means that one visitor, randomly, will patiently wait while a system-wide background task runs at its expense, and only when it’s done, the actual request will be processed for him. It’s alarming how many “serious” and “mature” PHP frameworks implement this into their internals, how difficult is to find those blocks of code, and how intricate is to untie them.

Why this aberration? Well, special server credentials aren’t required; plain, old and feature-less FTP is more than enough. Everything is triggered by the client, and a little delay every once in a while for only one visitor shouldn’t be noticed. And that’s exactly why it’ll inevitably fail, because it works great when testing, it works well when deploying, and it will inelegantly crash under load. You’re assuming that the amount of visits you’ll get will always be the approximate number you need to run some task each a few minutes. But pseudo-randomness in hundred-nth slices is irrelevant with thousands of requests per minute. Tasks that were supposed to run every once in a while start running several times between blinks of an eye. Weird issues start happening, like the same task running -and failing- twice in the same exact moment, sounding errors when some of the tasks access to shared resources in a locking way, and so on. Keep in mind that a crashing tasks also means a broken page, because, you know, PHP dies and everything else dies with it, including the user’s request. Dying doesn’t necessarily mean success.

If you’re enterprise enough, you can move those background tasks to a cron job. While that’s somewhat better, it works until time is not enough as the only control variable. For recurring tasks (and recurring is the core nature of crond), you’re always expecting that a single task finishes before it starts again. That may be mostly true, otherwise you may find yourself adjusting crontab intervals as your load increases, or assuming a long enough one that may be safe, but not fast enough for your application. Under scale that may get very, very hairy. In that scenario, you may start thinking in separating system-wide tasks in small batches. And someone must coordinate those batches, someone that doesn’t die.

Summon the daemons

Here’s where the nightmare gets really ugly. When at the point where the proper solution seems to create a daemon, or a process that doesn’t die. Two examples of this would be establishing and maintaining a WebSockets connection, or creating the producer component in a Producer-Consumer implementation.

But PHP will forsake you.

There’s several issues that just make PHP the wrong tool for this. Remember, PHP will die, no matter how hard you try. First and foremost, there’s the issue of memory leaks. PHP never cared to free memory once it’s not used anymore, because everything will be freed at the end — by dying. In a continually-running process, that will slowly keep increasing the allocated memory (which is, in fact, wasted memory), until reaching PHP’s memory_limit value and killing your process without a warning. You did nothing wrong, except expecting the process to live forever. Under load, replace the “slowly” part for "pretty quickly".

There’s been improvements in the “don’t waste memory” front. Sadly, they’re not enough. As things get complex or the load increases, it’ll crash. Consider the following snippet of code:

https://gist.github.com/anonymous/5337645

Those are called circular references, objects that have a reference to another object that have a reference to the original one. When you try to run that script, depending on the PHP version you use, it’ll slowly eat more and more memory, because at runtime, it fails to recognise when all the objects in a circular reference aren’t used anymore, even if you explicitly try to free them with unset().

That’s a very basic example, you may have more “nodes” in the circle, which is what happens when your program starts being more complex than the above simple script. We can argue if circular references are good software design or not, but there’s one thing that uses them profusely: ORMs. For a variety of good reasons, an object representing a row in a database is always the same object in memory, and it’s referenced in all other objects, if required, usually defined by foreign key constraints. That saves a lot of trouble, like propagating changes in a single row throughout all instanced models and their references. Now look at your database schema, count how many foreign keys you have… see how many circular references you may have in your running process? PHP won’t be able to tell which ones are being used and which ones aren’t, sooner or later memory will be exhausted, being used or not, and PHP will die.

But more importantly, keeping a program running forever was never PHP’s top priority, that’s why problems like the above one weren’t never really solved. Garbage collection was introduced in PHP 5.3, in its first real implementation, as an opt-in feature. That’s right, the language tells you to ask "please, could you not leak memory for this one?". That was around mid-2009, a relatively recent change. What happened before that? Absolutely nothing. Nobody cared if PHP leaked memory or not, because it was meant to die as soon as it was done. In PHP terms, leaking memory was just fine.

Not just about memory

There’s more to this. If you’ve used PHP an awful lot , you may have experienced this very weird issue:

 Fatal error: Exception thrown without a stack frame in Unknown on line 0

What does that mean? I honestly have no idea. I can’t find the line #0 into an unknown PHP file. People seem to be as clueless as me. Others say it’s related to PHP’s errors and exceptions handling. No one knows for sure, I can only tell you the preconditions to get that error: continually running processes, working with a gigabytes-sized database, under heavy load/processing, in a production environment. That’s right, it’s something you wouldn’t see in your local instance.

I’ve also experienced sudden process crashes that were untraceable. They just died. PHP is full of cryptic, hard to debug or impossible to reproduce errors. Just look here.

Using the wrong tool

Do you see the pattern? I’ve inherited projects where PHP was used for daemons or other stuff that’s not just regular websites (yes, I’m a hired keyboard), and all of them shared that same problem. No matter how good or clever your idea looked on paper, if you want to keep the processes running forever they will crash, and will do it really fast under load, because of known or unknown reasons. That’s nothing you can really control, it’s because PHP is meant to die. The basic implementation, the core feature of the language, is to be suicidal, no matter what.

For a moderately new project, I’ve been using Python, virtualenv, Flask , Supervisor , Gunicorn. I’m amazed how well all those work, how mature each of those components are — some of them are even considered to be in beta. But more importantly, they handle the not-dying thing really well. How they do it, you may ask? They do nothing, Python just won’t commit suicide. That’s it, it’s part of the language core design. I’m not saying that’s the magic bullet to everything. Maybe I spent too much time with PHP and I’m easily impressed.

Conclusion

Yes, you can always find a workaround. You can increase PHP’s memory limit to ridiculous amounts and keep processes alive for a little more. You can get creative with cron, shell scripts and other UNIX tools. You can use PHP for the website and another language for daemons. But keep in mind that we’re talking about mid-to-high levels of complexity here, it’s not only about the language itself but also about all the libraries and tools that you put on top of it, so adding a second programming language because your main one has abandoned you isn’t a trivial task. It’s also about working with stuff in which a 30-seconds delay may have disastrous results.

Maybe there’s people that think that a more services-oriented architecture may help to overcome PHP’s limitations. Maybe there’s people that can argue that PHP 5.4 or PHP 5.5 are a lot better, and that the language is slowly improving.

Thankfully, I’ve already moved on.

Discuss on HN or Reddit

Continued here


Some updates, based on people’s comments (yeah, the two of you, dear Internet strangers):

  • Yeah, I’m aware of the irony in smacking PHP from a PHP-based blog platform. That was intentional, like the tabloid-like post title.
  • No, I don’t want PHP to disappear. As I said, it’s a prefectly capable language to some extent. My intention was to encourage the use of the right tool for the job, nothing more.
  • Yes, I’m aware that message queues are a solution for some of the problems stated above (not on $5/month shared hosts, of course). In fact, in combination with cron, that was part of the solution for a particularly ill-designed project. However, one thing is using message queues as the proper solution, and a different one to use it to circumvent the language’s limitations.
  • I have no problems in using PHP, as also stated, I still use it for legacy projects. But right now I think Python is a far superior language, with far superior quality libraries, and not just because of the non-dying thing. Yes, I know a few Python 3 jokes too.
← Read More