Presenting several UTF-8 / Multibyte-aware escape functions.
These functions represent alternatives to mysqli::real_escape_string, as long as your DB connection and Multibyte extension are using the same character set (UTF-8), they will produce the same results by escaping the same characters as mysqli::real_escape_string.
This is based on research I did for my SQL Query Builder class:
https://github.com/twister-php/sql
<?php
/**
* Returns a string with backslashes before characters that need to be escaped.
* As required by MySQL and suitable for multi-byte character sets
* Characters encoded are NUL (ASCII 0), \n, \r, \, ', ", and ctrl-Z.
*
* @param string $string String to add slashes to
* @return $string with `\` prepended to reserved characters
*
* @author Trevor Herselman
*/
if (function_exists('mb_ereg_replace'))
{
function mb_escape(string $string)
{
return mb_ereg_replace('[\x00\x0A\x0D\x1A\x22\x27\x5C]', '\\\0', $string);
}
} else {
function mb_escape(string $string)
{
return preg_replace('~[\x00\x0A\x0D\x1A\x22\x27\x5C]~u', '\\\$0', $string);
}
}
?>
Characters escaped are (the same as mysqli::real_escape_string):
00 = \0 (NUL)
0A = \n
0D = \r
1A = ctl-Z
22 = "
27 = '
5C = \
Note: preg_replace() is in PCRE_UTF8 (UTF-8) mode (`u`).
Enhanced version:
When escaping strings for `LIKE` syntax, remember that you also need to escape the special characters _ and %
So this is a more fail-safe version (even when compared to mysqli::real_escape_string, because % characters in user input can cause unexpected results and even security violations via SQL injection in LIKE statements):
<?php
/**
* Returns a string with backslashes before characters that need to be escaped.
* As required by MySQL and suitable for multi-byte character sets
* Characters encoded are NUL (ASCII 0), \n, \r, \, ', ", and ctrl-Z.
* In addition, the special control characters % and _ are also escaped,
* suitable for all statements, but especially suitable for `LIKE`.
*
* @param string $string String to add slashes to
* @return $string with `\` prepended to reserved characters
*
* @author Trevor Herselman
*/
if (function_exists('mb_ereg_replace'))
{
function mb_escape(string $string)
{
return mb_ereg_replace('[\x00\x0A\x0D\x1A\x22\x25\x27\x5C\x5F]', '\\\0', $string);
}
} else {
function mb_escape(string $string)
{
return preg_replace('~[\x00\x0A\x0D\x1A\x22\x25\x27\x5C\x5F]~u', '\\\$0', $string);
}
}
?>
Additional characters escaped:
25 = %
5F = _
Bonus function:
The original MySQL `utf8` character-set (for tables and fields) only supports 3-byte sequences.
4-byte characters are not common, but I've had queries fail to execute on 4-byte UTF-8 characters, so you should be using `utf8mb4` wherever possible.
However, if you still want to use `utf8`, you can use the following function to replace all 4-byte sequences.
<?php
// Modified from: https://stackoverflow.com/a/24672780/2726557
function mysql_utf8_sanitizer(string $str)
{
return preg_replace('/[\x{10000}-\x{10FFFF}]/u', "\xEF\xBF\xBD", $str);
}
?>
Pick your poison and use at your own risk!