It’s always a good idea to protect yourself from all sorts of possible malicious attempts by users (or even mistakes by misinformed users). Here we look at taking a string of text (a filename) containing characters that are generally speaking unsafe.
Here’s a simple way to clean-up filenames (or other text input) using PHP – leaving only alphanumeric characters, dashes, underscores, and periods. I’m not great with regular expressions, but it seems one should be able to use preg_replace() to replace every character that’s *not* within a defined range… but that’s not really the case
I don’t want to assume too much, but it seems like /(![[:alnum:]_.-]*)/
should match all the baddies in the string. It doesn’t. The solution, rather, is to find all the baddies by replacing all the OK characters into a temporary variable that can be used to strip them from your string.
$fname="Filename 123;".'"'."lal[a]*(/.jpg"; //yikes!
$replace=""; //what you want to replace the bad characters with
$pattern="/([[:alnum:]_.-]*)/"; //basically all the filename-safe characters
$bad_chars=preg_replace($pattern,$replace,$fname); //leaves only the "bad" characters
$bad_arr=str_split($bad_chars); //split them up into an array for the str_replace() func.
$fname=str_replace($bad_arr,$replace,$fname); replace all instances of the bad chars with the replacement
echo $fname; //just echo the name for your satisfaction
Or just simply
$fname="Filename 123;".'"'."lal[a]*(/.jpg";
$replace="_";
$pattern="/([[:alnum:]_.-]*)/";
$fname=str_replace(str_split(preg_replace($pattern,$replace,$fname)),$replace,$fname);
Conclusion:
Though it might not seem like a big deal to replace spaces and the like with underscores, consider the possibility of a user injecting code and commands, that when the string is used in the right context, can compromise your site and its data:
$fname="' OR super_top_secret=1;";
$result=mysql_query("SELECT * FROM files where fname='$fname' LIMIT 1");
And with that a malicious filename allows all of our top secret files to be visible when it should have only been just one. Granted, we should escape anything that goes into the DB query, but as far as I know, it is possible to upload a file with that exact name (or change the name if the online app allows it). So for now, we’ll just restrict it to only characters that play nice with the web server.