Problem/Motivation
Email Address Internationalization (EAI) is based on a set of RFCs published in 2012 which enables the usage of utf-8 in the local part of an email address. Additionally email addresses with Internationalized Domain Names and an ASCII only local part are not permitted.
We are currently excluding people with such email addresses, as they are not usable for registration.
Relevant RFCs are the following (taken from https://uasg.tech):
Overview and Framework for Internationalized Email
https://tools.ietf.org/html/rfc6530
SMTP Extension for Internationalized Email
https://tools.ietf.org/html/rfc6531
Internationalized Email Headers
https://tools.ietf.org/html/rfc6532
Internationalized Delivery Status and Disposition Notifications
https://tools.ietf.org/html/rfc6533
Drupal 8
The current validation of email addresses in Drupal 8 uses HTML5 based client side validation and a server side validation based on Egulias\EmailValidator.
While the server side validation is based on the above RFCs and provides good support for internationalized email addresses the client side validation fails, for it accepts only ASCII characters in the local part of email addresses as defined in the current W3C Recommendation for HTML 5.2.
https://www.w3.org/TR/html/single-page.html#valid-e-mail-address
That will change in the next release of the W3C Recommendation for HTML 5.3 and will allow utf-8 in the local part of email addresses.
https://w3c.github.io/html/single-page.html#valid-e-mail-address
The WHATWG is moving into this direction, too, for client side validation:
https://github.com/whatwg/html/issues/4562
Drupal 7
The current validation of email addresses in Drupal 7 does not use HTML5 based client side validation and a server side validation based on PHP's filter_var() with the FILTER_VALIDATE_EMAIL filter which is based on RFC 822 and allows only a small subset of email addresses currently possibly in public use.
Example internationalized email addresses and the validity in D7/D8:
| Structure | Status D7 | Status D8 / D9 | Example |
| ascii@ascii.new | ok | ok | Info1@ua-test.link |
| ascii@Ascii.long | ok | ok | info2@ua-test.technology |
| ascii@Idn.ascii | fail | ok | info3@普遍接受-测试.top |
| ascii@ascii.idn | fail | ok | info4@ua-test.世界 |
| ascii@Idn.idn | fail | ok | info5@普遍接受-测试.世界 |
| fail | ok | uasg.tech@डाटामेल.भारत | |
| ascii@Idn-open dot-idn | fail | fail | info5@普遍接受-测试。世界 |
| ascii@ascii.punycode | ok | ok | Info4@ua-test.xn--rhqv96g |
| ascii@Punycode.ascii | ok | ok | Info3@xn----f38am99bqvcd5liy1cxsg.top |
| ascii@Punycode.punycode | ok | ok | Info5@xn----f38am99bqvcd5liy1cxsg.xn--rhqv96g |
| ascii@RTL.ascii | fail | ok | info6@ختبار-القبولالعالمي.top |
| ascii@RTL.RTL | fail | ok | user@السعودية.رسيل |
| unicode@ascii.new | fail | ok | 测试1@ua-test.link |
| unicode@Ascii.long | fail | ok | 测试2@ua-test.technology |
| unicode@Idn.ascii | fail | ok | 测试3@普遍接受-测试.top |
| unicode@ascii.idn | fail | ok | 测试4@ua-test.世界 |
| unicode@Idn.idn | fail | ok | 测试5@普遍接受-测试.世界 |
| Unicode@ Idn-open dot-idn | fail | fail | 测试5@普遍接受-测试。世界 |
| unicode@ascii.punycode | fail | ok | 测试4@ua-test.xn--rhqv96g |
| unicode@Punycode.ascii | fail | ok | 测试3@xn----f38am99bqvcd5liy1cxsg.top |
| unicode@Punycode.punycode | fail | ok | 测试5@xn----f38am99bqvcd5liy1cxsg.xn--rhqv96g |
| unicode@RTL.ascii | fail | ok | 测试6@ختبار-القبولالعالمي.top |
| unicode@RTL.RTL | fail | ok | مستخدم@رسيل.السعودية |
References:
https://uasg.tech/
https://en.wikipedia.org/wiki/International_email#Email_addresses
https://en.wikipedia.org/wiki/Email_address#Internationalization
Proposed resolution
Drupal 8
* Remove HTML 5 client side validation for the time being or wait until the specs update and the browser vendors adapt
Drupal 7
* Add Egulias\EmailValidator and use it for validation
Remaining tasks
Drupal 8
* Decide which way to go (wait or remove client side validation)
Drupal 7
* Add Egulias\EmailValidator and use it for validation #2343043: valid_email_address() should use egulias/EmailValidator and become deprecated
User interface changes
none
API changes
none
Data model changes
none
Comments
Comment #2
sanduhrsComment #3
sanduhrsComment #4
sanduhrsComment #7
skaughtComment #11
larowlanIs the client side validation still an issue in modern browsers?
Also this sounds like a task or feature request more than a bug
Comment #12
sanduhrsAfter a quick and incomplete test on https://developer.mozilla.org/en-US/docs/Learn/Forms/Form_validation#fra...
it appears to me, that latest Firefox Developer Edition still does not support Unicode characters in the local_part of an email address and latest Chromium even fails to validate email addresses with Unicode characters in the domain part.
I'd consider this a bug, because well-known mail servers like
Postfix http://ftp.uma.es/mirror/postfix/doc/SMTPUTF8_README.html
and Services like
Microsoft Exchange https://techcommunity.microsoft.com/t5/exchange-team-blog/eai-support-an...
already support communicating with EAI email addresses.
And as we use email addresses in a vital part of our application (registration, user identification), we too should fully support all forms of email addresses.
Currently (Drupal 9.3.11) for example, we support people registering accounts with email addresses that contain accents in the domain part, but then consider the non-accented form as taken.
For example I can register an account with
sanduhrs@exâmple.eubut then it is not possible anymore to register an account with
sanduhrs@example.euThe domains above are considered valid and different by my domain registrar.
But whoever comes first wins in case of an account registration in Drupal, although they should be considered different identities.
Comment #13
cilefen commentedI've made this a meta issue because it references various objectives and Drupal 7.
Comment #14
sanduhrsComment #15
sanduhrsComment #16
sanduhrsComment #17
sanduhrsClient side validation isn't there yet, but is moving in this direction:
https://github.com/whatwg/html/issues/4562
Comment #18
sanduhrsComment #19
sanduhrsAdded some tests in #18, that should proof whether we support the different email formats documented by UASG.
Apparently the format containing an
open dotfail in all cases.While the behavior mentioned in #12, Drupal treating accented and non accented emails the same, only manifests running on MySQL.
Comment #20
sanduhrsComment #21
sanduhrsAs for client side validation asked in [#14509827-11].
The current Regex defined by WHATWG is very strict:
https://html.spec.whatwg.org/#valid-e-mail-address
Only 5 out of the 23 defined examples above validate.
Apparently browser vendors do not stick to that suggestion, though.
Firefox 101.0b2 client side validation.
Only 12 out of the 23 defined examples above validate.
Google Chrome / Chromium 101.0.4951.54 client side validation
Only 12 out of the 23 defined examples above validate.
So browser's appear to be supporting unicode characters in the
domainandtldpart of the email.The
local-partstill is ASCII only.Server side we support almost all of the formats using
egulias/EmailValidator.Only the open-dot notation of two formats are invalid.
Also we'd need to make sure to switch to a binary collation e.g.
utf8mb4_bin, currently using a non-binary collationutf8mb4_general_cifor themailfield in theusers_field_datatable to fix the behavior MySQL/MariaDB shows when using accents in domains, tld or local-parts. Sqlite/PostgreSQL are not affected with this - AFAIK binary is the default there.Comment #22
sanduhrsAdding the official test data from
UASG 004 Test Cases for UA Readiness Evaluation EN
https://uasg.tech/download/uasg-004-use-cases-for-ua-readiness-evaluatio...
Comment #23
sanduhrsComment #24
sanduhrsAdd missing test cases.
Comment #29
kentr commentedThis is probably outdated now that #1797438: HTML5 validation is preventing form submit and not fully accessible has landed.
I tested account creation with "info3@普遍接受-测试.top" and changed the email address to "info4@普遍接受-测试.top".
Both passed validation.