Voting

: min(six, five)?
(Example: nine)

The Note You're Voting On

therselman at gmail dot com
7 years ago
Presenting several UTF-8 / Multibyte-aware escape functions.

These functions represent alternatives to mysqli::real_escape_string, as long as your DB connection and Multibyte extension are using the same character set (UTF-8), they will produce the same results by escaping the same characters as mysqli::real_escape_string.

This is based on research I did for my SQL Query Builder class:
https://github.com/twister-php/sql

<?php
/**
* Returns a string with backslashes before characters that need to be escaped.
* As required by MySQL and suitable for multi-byte character sets
* Characters encoded are NUL (ASCII 0), \n, \r, \, ', ", and ctrl-Z.
*
* @param string $string String to add slashes to
* @return $string with `\` prepended to reserved characters
*
* @author Trevor Herselman
*/
if (function_exists('mb_ereg_replace'))
{
function
mb_escape(string $string)
{
return
mb_ereg_replace('[\x00\x0A\x0D\x1A\x22\x27\x5C]', '\\\0', $string);
}
} else {
function
mb_escape(string $string)
{
return
preg_replace('~[\x00\x0A\x0D\x1A\x22\x27\x5C]~u', '\\\$0', $string);
}
}

?>

Characters escaped are (the same as mysqli::real_escape_string):

00 = \0 (NUL)
0A = \n
0D = \r
1A = ctl-Z
22 = "
27 = '
5C = \

Note: preg_replace() is in PCRE_UTF8 (UTF-8) mode (`u`).

Enhanced version:

When escaping strings for `LIKE` syntax, remember that you also need to escape the special characters _ and %

So this is a more fail-safe version (even when compared to mysqli::real_escape_string, because % characters in user input can cause unexpected results and even security violations via SQL injection in LIKE statements):

<?php

/**
* Returns a string with backslashes before characters that need to be escaped.
* As required by MySQL and suitable for multi-byte character sets
* Characters encoded are NUL (ASCII 0), \n, \r, \, ', ", and ctrl-Z.
* In addition, the special control characters % and _ are also escaped,
* suitable for all statements, but especially suitable for `LIKE`.
*
* @param string $string String to add slashes to
* @return $string with `\` prepended to reserved characters
*
* @author Trevor Herselman
*/
if (function_exists('mb_ereg_replace'))
{
function
mb_escape(string $string)
{
return
mb_ereg_replace('[\x00\x0A\x0D\x1A\x22\x25\x27\x5C\x5F]', '\\\0', $string);
}
} else {
function
mb_escape(string $string)
{
return
preg_replace('~[\x00\x0A\x0D\x1A\x22\x25\x27\x5C\x5F]~u', '\\\$0', $string);
}
}

?>

Additional characters escaped:

25 = %
5F = _

Bonus function:

The original MySQL `utf8` character-set (for tables and fields) only supports 3-byte sequences.
4-byte characters are not common, but I've had queries fail to execute on 4-byte UTF-8 characters, so you should be using `utf8mb4` wherever possible.

However, if you still want to use `utf8`, you can use the following function to replace all 4-byte sequences.

<?php
// Modified from: https://stackoverflow.com/a/24672780/2726557
function mysql_utf8_sanitizer(string $str)
{
return
preg_replace('/[\x{10000}-\x{10FFFF}]/u', "\xEF\xBF\xBD", $str);
}
?>

Pick your poison and use at your own risk!

<< Back to user notes page

To Top