Clean file names using PHP preg_replace

It’s always a good idea to protect yourself from all sorts of possible malicious attempts by users (or even mistakes by misinformed users). Here we look at taking a string of text (a filename) containing characters that are generally speaking unsafe.

Here’s a simple way to clean-up filenames (or other text input) using PHP – leaving only alphanumeric characters, dashes, underscores, and periods. I’m not great with regular expressions, but it seems one should be able to use preg_replace() to replace every character that’s *not* within a defined range… but that’s not really the case

I don’t want to assume too much, but it seems like /(![[:alnum:]_.-]*)/ should match all the baddies in the string. It doesn’t. The solution, rather, is to find all the baddies by replacing all the OK characters into a temporary variable that can be used to strip them from your string.


$fname="Filename 123;".'"'."lal[a]*(/.jpg"; //yikes!
$replace=""; //what you want to replace the bad characters with
$pattern="/([[:alnum:]_.-]*)/"; //basically all the filename-safe characters
$bad_chars=preg_replace($pattern,$replace,$fname); //leaves only the "bad" characters
$bad_arr=str_split($bad_chars); //split them up into an array for the str_replace() func.
$fname=str_replace($bad_arr,$replace,$fname); replace all instances of the bad chars with the replacement
echo $fname; //just echo the name for your satisfaction

Or just simply

$fname="Filename 123;".'"'."lal[a]*(/.jpg";
$replace="_";
$pattern="/([[:alnum:]_.-]*)/";
$fname=str_replace(str_split(preg_replace($pattern,$replace,$fname)),$replace,$fname);

Conclusion:
Though it might not seem like a big deal to replace spaces and the like with underscores, consider the possibility of a user injecting code and commands, that when the string is used in the right context, can compromise your site and its data:


$fname="' OR super_top_secret=1;";
$result=mysql_query("SELECT * FROM files where fname='$fname' LIMIT 1");

And with that a malicious filename allows all of our top secret files to be visible when it should have only been just one. Granted, we should escape anything that goes into the DB query, but as far as I know, it is possible to upload a file with that exact name (or change the name if the online app allows it). So for now, we’ll just restrict it to only characters that play nice with the web server.

Join the Conversation

12 Comments

  1. I like your thinking here but it seems over complicated, I haven’t tested this yet but would it not be better to just replace all none Alpha Chars with and empty string. I think that the regex pattern is something like /[^a-zA-Z]/ so the function call would be preg_replace(“/[^a-zA-Z]/”,””,$value).

    The only problem with this is that it removes all of the dots as well. Since we are not trying to validate the filename just ensure that it is safe we can just add . to the regex.

    So preg_replace(“/[^a-zA-Z.]/”,””,$value) would give us a safe string.

  2. Thanks, found this very handy after I was having problems using the fopen function where the filename has a colon (:) in it.

    I ended up using, with my seperator being the ‘-‘ character

    preg_replace(“/[^a-z0-9-]/”, “-“, strtolower($filename))

  3. Don’t you unnecessarily replace characters like underscore, hyphen, etc? And on Unix systems, basically any character is valid except for the forward slash.

    1. OS X, which is a unix system, often uses colon (:) instead of slashes in paths, so you can’t use colons in file names there.

      1. OS X, like the other *nix systems uses slashes. The colon was used in Mac OS through 9. Regardless, I’m of the opinion that filenames that require the backslash to escape a character is a poor design decision.

Leave a comment

Hey there! Come check out all-new content at my new mistercameron.com!