February 25, 2007
So Many Untapped PHP Features
I love PHP as a language. It can be used for quick and dirty scripts. Or, you can harness the object-oriented features to make a project very structured. I've been doing the later almost exclusively for the past two years. The reason? I generally find projects are easier to maintain when functionality is encapsulated in classes. You never know when you might want to duplicate something, right? Its basically the list of benefits of object-oriented programming versus procedural.
Anyway, I have my heart set on creating a personal web site with blog, projects list, code repository, etc, much like what my friend Ben has done with www.benchodroff.com. The first step in creating your own web site is deciding what software will power it. I saw Ben was using WordPress. So, I downloaded a copy and glanced at the source. I immediately noticed:
- No PHP 5 code
- Similar functionality is denoted by function name prefixes, not encapsulated in classes
- SQL statements don't use bound parameters
- Very little distinction between model, view, and controller
- XML generation is via print() or echo(), not via PHP's XML classes or functions
Many of these immediately triggered alarms. The lack of SQL bound parameters means it is harder to defend against SQL injection. The lack of MVC increases the likelihood of XSS. The lack of class usage means it will take me longer to understand the API. The lack of PHP 5 visibility keywords (private, protected, public), means it will be difficult to ensure forward-compliant code.
After confirming my suspicions of Wordpress's rough history via the National Vulnerability Database, I decided not to take my chances. The software may be easy to use and may work well, but I really don't feel like rolling the security dice. Besides, as someone who loves to evangelize PHP, why should I host my site on PHP software that doesn't utilize about three years of PHP language improvements?
Although I pick on WordPress for under-utilizing PHP, there are countless-- and many of them successful-- projects that do the same. Drupal, which I love because it is one of the few web applications that handles modularization and feature extensibility properly, only utilizes PHP 4 features. Although I love the design philosophy of the software from a 20,000m overview, when you start to hack away, you find that the API is very difficult to understand. There are no classes, so all functionality is provided via functions in the global namespace. And, the include files are named .inc, so Apache will serve the content at plain-text by default, so users can easily see what version you are running and cross-reference against known exploits (luckily a .htaccess is generated that prevents this for many, but not all installations).
Gallery, another commonly used PHP application also fails to fully utilize PHP, although they are more graceful about it. Gallery 2 uses classes for almost everything. Like Drupal, I love their design philosophy of separating features into modules. Unlike Drupal, they take the proper approach and contain similar functionality in classes. Sadly, there are no PHP 5 visibility keywords on class variables and functions, but at least their naming convention distinguishes between public and non-public.
I could go on listing applications with similar behavior, but why waste all day? The more interesting topic is why these applications have not made the jump to utilize PHP 5's features. Others have speculated, and I tend to agree, that application developers are worried that PHP 5 adoption is too low and requiring its use will turn away users. Now, considering the improvements of the PHP 5 engine, both from a performance and security standpoint, there is no reason in my mind why a sane system administrator wouldn't be running PHP 5.2.1 (most recent at the time of this entry). When PHP 5 was released almost three years ago, there was a compelling reason not to upgrade: scripts broke, albeit mostly the ones that were written improperly. Now, very few applications written in strict PHP 4 will break when running on PHP 5. If they do break, chances are they are built so poorly that security holes, not PHP version compatibility, should be a bigger concern.
Now, it is very easy to point to low PHP 5 adoption on servers and take sides with application developers for sticking with PHP 4. But I think there is more to it than just that. After all, if given a compelling reason to upgrade, you'll do it, right? (Sadly, security is not compelling enough to many running PHP 4. Oh well.)
I believe the biggest problem hindering PHP 5 adoption is ignorance. The average PHP "programmer" is ignorant of the features available in PHP. Because the PHP syntax is easy for new programmers to read (at least compared to Perl, C, and arguably Python (note to self, learn more Python) ) and because the barrier to entry for PHP is low (much easier to iteratively develop against a scripting language than a compiled language, such as Java), many PHP programmers learn the language in an informal process. Because of the popularity of PHP, most start learning PHP by examining existing programs. This is how I started (probably trying to get Netjuke to work properly). And, since PHP 4 is still statistically more popular than PHP 5, chances are your first exposure to PHP will be a PHP 4 program. Sure, there are some excellent PHP 5 applications out there (some of the best PHP is, IMO, part of the Zend Framework), statistically, you won't encounter it unless you've been using PHP for a while.
The differences between PHP 4 and PHP 5 are staggering. I consider them to be two separate languages. Therefore, it is of no surprise that the differences between PHP programmers are equally staggering. If you analyze the skills and abilities of PHP programmers and attempt to classify them, you will most likely encounter groups that correlate to the version of PHP they use. The greater the level of PHP understanding, the more likely that person is running the most recent version of PHP.
Although more people deploy than program PHP, we can attempt to infer the percentages of the skill level of PHP programmers by looking at the PHP version distribution. If I were to ask you what percentage of PHP programmers know about, much less use, PHP higher-version features, such as PDO, __call(), SimpleXML, SPL (DirectoryIterator, ArrayObject, spl_autoload_register() ), Filter extension, etc, what would you answer? From personal experience, I'd peg it below 10%. Remember, this is including all people who program PHP, whether it is a project lead working for Zend or some Joe Schmoe who made a module for Drupal. From the version distribution, we look for the upper 10th percentile, which correlates to PHP 5.1.0. Even though this graph maps PHP users, not programmers, I think it is safe to say my estimate is in the ballpark, although it appears to be a little high in terms of percentage. If 10% of PHP programmers used the advanced features enumerated earlier, then more than 10% (or 15% running 5.0.0 or greater) of the servers would have to be running PHP 5. So perhaps a more accurate estimate of PHP programmers who use advanced features is only around 5%? It is hard to say, and the correlation can be thrown off by numerous factors, including servers that don't expose PHP via HTTP headers for security purposes, something the smarter, advanced PHP programmers are probably more likely to do. I could speculate on percentages all day, but you can't ignore that there is a giant rift in both abilities and percentages between the top tier of PHP programmers and the rest. If nothing else, my personal experience can confirm this.
What does all of this mean? It depends on who you are. If you know how to use all of the "advanced" features enumerated above, good for you. You know your stuff and many companies would love to have you on board. If you are wearing a security hat, be worried. Since PHP 4 is missing intuitive features for security best-practice coding (private methods, DB layer with easy-to-use parameter binding (PDO), etc), chances are the PHP programs you have deployed aren't as secure as you would like. I'm not saying PHP 5 is a magic cure-all, just that if you take advantage of its features, programs are more likely to be secure and maintainable. If you need to hire a PHP developer, ask questions about advanced PHP features. Programmers who know that stuff are much more valuable, even if you don't use the advanced features.
What can be done about it? This is a difficult question and it has been addressed by many. To all the application developers out there, I would say to not be afraid of migrating your application to PHP 5. So your users might gripe initially. So it takes a lot of effort. I understand. Just think of it as part of the regular maintenance you perform on any application. If you are going to migrate, my tip is to migrate as part of a complete rewrite. Unless your code base is as well organized as Gallery 2's (you can just drop in private, protected, and public keywords in classes), you shouldn't attempt to migrate. You will just end up with a giant hack (MediaWiki, for example). If you perform a complete rewrite, you can also take advantage of some of the amazing PHP frameworks out there. Zend Framework has already been mentioned, but Symfony deserves a plug as well. If you use those frameworks properly, I guarantee you will get more enjoyment from working on the application then you do now.
If you are newcomer to PHP, I recommend you to NOT learn PHP from existing programs, unless that program is a reputable framework (as mentioned above). These frameworks are varied enough that you can find all common aspects of PHP in them and you will be exposed to proper practices in the process. For language reference, the official manual on php.net is a great one-stop shop. But only use it for quick reference. If you are looking for how to do X with PHP, check out the Zend Developer Zone, or follow the practices encouraged by a framework. (You can tell I really like frameworks, can't you).
In conclusion, there are currently two PHP programmer "camps" that are vastly different in terms of size and skill level. Only a handful of PHP programmers actually use the time-saving (and often security best-practice) features of higher versions of PHP. PHP 5 is not a magic cure-all, however. Bad programmers are bad programmers. I feel that PHP 5 is more conducive to good programming practices than PHP 4. Unfortunately, PHP newcomers are more likely to be initially exposed and hooked to the PHP 4 mentality. Furthermore, there are few resources available that encourage and evangelize PHP 5. Until this changes, I don't see PHP 5 mass adoption happening any time soon. The silver lining in all of this is that hiring PHP developers is relatively easy, as a candidate who knows how to used advanced PHP features is probably head and shoulders above every candidate that doesn't.
Trackback
You can ping this entry by using http://blog.case.edu/gps10/mt-tb.cgi/12853 .