Stop writing PHP like it’s 2009…

Posted by

It boggles my mind having left engineering at Facebook only 2 months ago that the outside world still largely seems to write PHP like it’s 2009.

It seems like people have never heard of Hack, HHVM, XHP… People still largely seem to use require() and include() statements everywhere in their code. What. The. Fudge.

I still think PHP is a great language to write frontend applications (the business logic and API layers anyways), but only if you use the following modern advances:

1. Hack

Typing your variables:

Ok, let’s be honest, the number 1 issue with PHP is its lack of strong typing. A variable can be anything, and most of the time is a ticking timebomb.

If you’ve ever had to write code like this after your script blew up…

if ($var !== null && is_int($var)) {
  //...
}

it means you probably got bit by trying to reference a null var or the wrong variable type.

Hack is a way to gradually add type information to PHP, and is an addition on top of the PHP language.

If you add hack type annotations, this strongly types your variables (including marking them as potentially null). For example:

class Foo {
  ?int $var = null;
  // ... some code ...
}

This can be done with method signatures, class variables, etc… It then lets you check whether your code has any errors by running the hh_client which will highlight any type errors.

There’s a much longer and better explanation of typing Hack on the Hack Documentation page: https://docs.hhvm.com/hack/overview/typing 

Async

The next thing which is a giant leap forward for any decent PHP website is to use hack’s async/await keywords.

If you’ve never worked with a language feature like this, then let me explain.

Let’s say you need to do 3 function calls to your database to get 3 pieces of data. You need all 3 to calculate something on your page, but each requires a different SQL query.

Normally you would write:

$data1 = querySQL1();
$data2 = querySQL2();
$data3 = querySQL3();
$result = computeResult($data1, $data2, $data3);

Ok, so in actual fact unless you’re explicitly doing something fancy, PHP is single threaded inside a given request. That means the server will first do a call to SQL for the first query and wait for the result, then do the second call, then the 3rd.

So what’s wrong with this? The issue is now the time to compute the result is the sum of the time for query1, query2 and query3.

But most databases are multi-threaded and can run operations in parallel. If on top of this your DB runs on an SSD and not a spinning disk, you can potentially really get some use out of your DB’s multi-core processor and true parallelism…

If you’re querying multiple DBs or services on different boxes, or making API calls, this also greatly works to your advantage.

So how do we fix this? With async/await:

list($data1, $data2, $data3) = await\HH\Asio\v(
array(
  querySQL1(),
  querySQL2(),
  querySQL3(),
));

In this way, all three queries are sent up at the same time and await results. Now the length of time to get all 3 pieces of data back is the time it takes to run the longest query, since all 3 will run in parallel.

The Hack docs on async explains this even better with diagrams: https://docs.hhvm.com/hack/async/introduction 

Hack provides async MySQL, memcache and Curl implementations, so you can most likely just replace your calls with their libraries and take immediate advantage of the benefits.

Collections:

Aaah the PHP array. Sometimes a vector. Sometimes a dictionary. Sometimes both.

Even if you may think you know what it contains, some other engineer on your team probably also thought they knew, and put the wrong type of variable inside it.

If you’ve ever used a language like C#, Java or C++, you may be familiar with Generics and Collections.

Hack introduces Collections that let you specify the type of the contents of your Collections. This means you are no longer just blindly hoping that an array contains what it says it does, now you know a given structure contains the type you want (string, int, etc…).

On top of this, if you still want to use PHP’s array so your code requires less refactoring, you can specify the array content types such as:

class Bar {
  array $vector_of_ints = array();
  array $dictionary_with_string_keys = array();
}

Again if you then try to put the wrong type of variable inside the array or try to use the vector array as a map by giving it a string key, the typechecker will throw an error.

2. HHVM

Hack comes with its own runtime, as you might have expected, as it cannot directly run on Zend’s PHP runtime.

HHVM stands for HipHop Virtual Machine, and was developed at Facebook to greatly increase the performance of running complex PHP at scale.

HHVM runs the entirety of Facebook and some other major sites like Wikipedia as well at this point, and has been time and time again proven to have many performance benefits.

As HHVM can also run regular PHP without any Hack annotations and as it will also speed up the performance of running that code, you’re throwing money away by not using HHVM as your default PHP runtime.

For example, when Wikipedia switched to HHVM, the average load time for their pages was cut by more than half, and their servers went from average 70% CPU usage to average 12%, and this was 2 years ago. Since then the HHVM team has continued to make performance improvements to HHVM, so you can expect it to be even better today.

HHVM needs to be fronted by an HTTP server like Apache or nginx in production, but can be used in development mode as a standalone server as well.

3. XHP

If there’s one thing I hate, it’s PHP/HTML soup. Something like the following just makes me gag:

$user_name = 'Fred';
$output = "Hello $user_name";

Worse is the guy who gets “clever” and decides he’s not going to open and close his HTML tags in the same spot, like so:

$user_name = 'Fred';
$output = "
Hello $user_name"; // some call to a function that takes in $output and is supposed to close the div tag $output = addTheRestOfTheSoup($output);

And then you try to maintain this and…

jackie_chan_wtf

XHP makes HTML a first-class citizen of PHP, by making it so you can write HTML outside of a string literal and have it parse and behave properly as XHP.

For example:

$user_name ='Fred';
$output = 
Hello $user_name
; addTheRestOfTheDivContentsTo($output); //... function addTheRestOfTheDivContentsTo(:div $div): :div { $div->appendChild("We come in peace"); return $div; }

As you can see, XHP also enforces that tags match, that is to say an open tag has a corresponding close tag, and that they are open and closed in proper order, not out of order.

XHP also takes care of escaping string variables for you, which prevents someone injecting HTML/JS into your page from user content, and reduces the risk that this vector of attack will be used against your site.

You can also create custom XHP classes for your own bits of re-usable HTML so you end up with something akin to “custom HTML tags” in your codebase, where you can have a tag for example which automatically puts in a link to your FB page, or even a tag that renders your entire header.

You can see more about this in the XHP docs: https://docs.hhvm.com/hack/XHP/introduction

But there’s more…

The above covers the basics of what HHVM, Hack and XHP are. Next time I hope to lay out a tutorial covering the basics of setting up a dev box with HHVM, HHVM-powered autoloading of classes, functions and constants, and a basic Controller framework to route incoming web requests.

Happy coding!

7 comments

  1. It’s interesting to read about these things in HHVM and Hack but I believe the reason not a lot of people are using these features outside of Facebook is simply because a good part of them are solved by PHP 7, Composer or templating libraries like Twig or Blade. So I agree that it would be nice to see people move on from early PHP 5 style but given the current community support of Hack / HHVM, these are probably not the solution.

Leave a Reply