Problem/Motivation

Email Address Internationalization (EAI) is based on a set of RFCs published in 2012 which enables the usage of utf-8 in the local part of an email address. Additionally email addresses with Internationalized Domain Names and an ASCII only local part are not permitted.
We are currently excluding people with such email addresses, as they are not usable for registration.

Relevant RFCs are the following (taken from https://uasg.tech):
Overview and Framework for Internationalized Email
https://tools.ietf.org/html/rfc6530
SMTP Extension for Internationalized Email
https://tools.ietf.org/html/rfc6531
Internationalized Email Headers
https://tools.ietf.org/html/rfc6532
Internationalized Delivery Status and Disposition Notifications
https://tools.ietf.org/html/rfc6533

Drupal 8
The current validation of email addresses in Drupal 8 uses HTML5 based client side validation and a server side validation based on Egulias\EmailValidator.

While the server side validation is based on the above RFCs and provides good support for internationalized email addresses the client side validation fails, for it accepts only ASCII characters in the local part of email addresses as defined in the current W3C Recommendation for HTML 5.2.
https://www.w3.org/TR/html/single-page.html#valid-e-mail-address

That will change in the next release of the W3C Recommendation for HTML 5.3 and will allow utf-8 in the local part of email addresses.
https://w3c.github.io/html/single-page.html#valid-e-mail-address

The WHATWG is moving into this direction, too, for client side validation:
https://github.com/whatwg/html/issues/4562

Drupal 7
The current validation of email addresses in Drupal 7 does not use HTML5 based client side validation and a server side validation based on PHP's filter_var() with the FILTER_VALIDATE_EMAIL filter which is based on RFC 822 and allows only a small subset of email addresses currently possibly in public use.

Example internationalized email addresses and the validity in D7/D8:

Structure Status D7 Status D8 / D9 Example
ascii@ascii.new ok ok Info1@ua-test.link
ascii@Ascii.long ok ok info2@ua-test.technology
ascii@Idn.ascii fail ok info3@普遍接受-测试.top
ascii@ascii.idn fail ok info4@ua-test.世界
ascii@Idn.idn fail ok info5@普遍接受-测试.世界
fail ok uasg.tech@डाटामेल.भारत
ascii@Idn-open dot-idn fail fail info5@普遍接受-测试。世界
ascii@ascii.punycode ok ok Info4@ua-test.xn--rhqv96g
ascii@Punycode.ascii ok ok Info3@xn----f38am99bqvcd5liy1cxsg.top
ascii@Punycode.punycode ok ok Info5@xn----f38am99bqvcd5liy1cxsg.xn--rhqv96g
ascii@RTL.ascii fail ok info6@ختبار-القبولالعالمي.top
ascii@RTL.RTL fail ok user@السعودية.رسيل
unicode@ascii.new fail ok 测试1@ua-test.link
unicode@Ascii.long fail ok 测试2@ua-test.technology
unicode@Idn.ascii fail ok 测试3@普遍接受-测试.top
unicode@ascii.idn fail ok 测试4@ua-test.世界
unicode@Idn.idn fail ok 测试5@普遍接受-测试.世界
Unicode@ Idn-open dot-idn fail fail 测试5@普遍接受-测试。世界
unicode@ascii.punycode fail ok 测试4@ua-test.xn--rhqv96g
unicode@Punycode.ascii fail ok 测试3@xn----f38am99bqvcd5liy1cxsg.top
unicode@Punycode.punycode fail ok 测试5@xn----f38am99bqvcd5liy1cxsg.xn--rhqv96g
unicode@RTL.ascii fail ok 测试6@ختبار-القبولالعالمي.top
unicode@RTL.RTL fail ok مستخدم@رسيل.السعودية

References:
https://uasg.tech/
https://en.wikipedia.org/wiki/International_email#Email_addresses
https://en.wikipedia.org/wiki/Email_address#Internationalization

Proposed resolution

Drupal 8
* Remove HTML 5 client side validation for the time being or wait until the specs update and the browser vendors adapt
Drupal 7
* Add Egulias\EmailValidator and use it for validation

Remaining tasks

Drupal 8
* Decide which way to go (wait or remove client side validation)
Drupal 7
* Add Egulias\EmailValidator and use it for validation #2343043: valid_email_address() should use egulias/EmailValidator and become deprecated

User interface changes

none

API changes

none

Data model changes

none

Release notes snippet

Original report by [username]

Comments

sanduhrs created an issue. See original summary.

sanduhrs’s picture

Related issues:

Version: 8.8.x-dev » 8.9.x-dev

Drupal 8.8.0-alpha1 will be released the week of October 14th, 2019, which means new developments and disruptive changes should now be targeted against the 8.9.x-dev branch. (Any changes to 8.9.x will also be committed to 9.0.x in preparation for Drupal 9’s release, but some changes like significant feature additions will be deferred to 9.1.x.). For more information see the Drupal 8 and 9 minor version schedule and the Allowed changes during the Drupal 8 and 9 release cycles.

Version: 8.9.x-dev » 9.1.x-dev

Drupal 8.9.0-beta1 was released on March 20, 2020. 8.9.x is the final, long-term support (LTS) minor release of Drupal 8, which means new developments and disruptive changes should now be targeted against the 9.1.x-dev branch. For more information see the Drupal 8 and 9 minor version schedule and the Allowed changes during the Drupal 8 and 9 release cycles.

skaught’s picture

Version: 9.1.x-dev » 9.2.x-dev

Drupal 9.1.0-alpha1 will be released the week of October 19, 2020, which means new developments and disruptive changes should now be targeted for the 9.2.x-dev branch. For more information see the Drupal 9 minor version schedule and the Allowed changes during the Drupal 9 release cycle.

Version: 9.2.x-dev » 9.3.x-dev

Drupal 9.2.0-alpha1 will be released the week of May 3, 2021, which means new developments and disruptive changes should now be targeted for the 9.3.x-dev branch. For more information see the Drupal core minor version schedule and the Allowed changes during the Drupal core release cycle.

Version: 9.3.x-dev » 9.4.x-dev

Drupal 9.3.0-rc1 was released on November 26, 2021, which means new developments and disruptive changes should now be targeted for the 9.4.x-dev branch. For more information see the Drupal core minor version schedule and the Allowed changes during the Drupal core release cycle.

larowlan’s picture

Priority: Major » Normal
Status: Active » Postponed (maintainer needs more info)
Issue tags: +Bug Smash Initiative

Is the client side validation still an issue in modern browsers?

Also this sounds like a task or feature request more than a bug

sanduhrs’s picture

After a quick and incomplete test on https://developer.mozilla.org/en-US/docs/Learn/Forms/Form_validation#fra...
it appears to me, that latest Firefox Developer Edition still does not support Unicode characters in the local_part of an email address and latest Chromium even fails to validate email addresses with Unicode characters in the domain part.

I'd consider this a bug, because well-known mail servers like
Postfix http://ftp.uma.es/mirror/postfix/doc/SMTPUTF8_README.html
and Services like
Microsoft Exchange https://techcommunity.microsoft.com/t5/exchange-team-blog/eai-support-an...
already support communicating with EAI email addresses.

And as we use email addresses in a vital part of our application (registration, user identification), we too should fully support all forms of email addresses.

Currently (Drupal 9.3.11) for example, we support people registering accounts with email addresses that contain accents in the domain part, but then consider the non-accented form as taken.

For example I can register an account with
sanduhrs@exâmple.eu
but then it is not possible anymore to register an account with
sanduhrs@example.eu
The domains above are considered valid and different by my domain registrar.
But whoever comes first wins in case of an account registration in Drupal, although they should be considered different identities.

cilefen’s picture

Title: Email Address Internationalization (EAI) » [meta] Fully support the range of characters that are allowed in email addresses
Category: Bug report » Plan
Status: Postponed (maintainer needs more info) » Active
Issue tags: -IDN, -IDN emails

I've made this a meta issue because it references various objectives and Drupal 7.

sanduhrs’s picture

Issue summary: View changes
sanduhrs’s picture

sanduhrs’s picture

StatusFileSize
new4.48 KB
sanduhrs’s picture

Issue summary: View changes

Client side validation isn't there yet, but is moving in this direction:
https://github.com/whatwg/html/issues/4562

sanduhrs’s picture

StatusFileSize
new4.48 KB
sanduhrs’s picture

Added some tests in #18, that should proof whether we support the different email formats documented by UASG.

Apparently the format containing an open dot fail in all cases.
While the behavior mentioned in #12, Drupal treating accented and non accented emails the same, only manifests running on MySQL.

sanduhrs’s picture

Issue summary: View changes
sanduhrs’s picture

As for client side validation asked in [#14509827-11].

The current Regex defined by WHATWG is very strict:
https://html.spec.whatwg.org/#valid-e-mail-address

Only 5 out of the 23 defined examples above validate.

Info1@ua-test.link                             ok
info2@ua-test.technology                       ok
info3@普遍接受-测试.top                            fail
info4@ua-test.世界                              fail
info5@普遍接受-测试.世界                            fail
uasg.tech@डाटामेल.भारत                            fail
info5@普遍接受-测试。世界                           fail
Info4@ua-test.xn--rhqv96g                      ok
Info3@xn----f38am99bqvcd5liy1cxsg.top          ok
Info5@xn----f38am99bqvcd5liy1cxsg.xn--rhqv96g  ok
info6@ختبار-القبولالعالمي.top                   fail
user@السعودية.رسيل                             fail
测试1@ua-test.link                               fail
测试2@ua-test.technology                         fail
测试3@普遍接受-测试.top                                fail
测试4@ua-test.世界                                fail
测试5@普遍接受-测试.世界                                 fail
测试5@普遍接受-测试。世界                                 fail
测试4@ua-test.xn--rhqv96g                        fail
测试3@xn----f38am99bqvcd5liy1cxsg.top            fail
测试5@xn----f38am99bqvcd5liy1cxsg.xn--rhqv96g    fail
测试6@ختبار-القبولالعالمي.top                     fail
مستخدم@رسيل.السعودية                            fail

Apparently browser vendors do not stick to that suggestion, though.

Firefox 101.0b2 client side validation.
Only 12 out of the 23 defined examples above validate.

Info1@ua-test.link                             ok
info2@ua-test.technology                       ok
info3@普遍接受-测试.top                            ok
info4@ua-test.世界                              ok
info5@普遍接受-测试.世界                            ok
uasg.tech@डाटामेल.भारत                            ok
info5@普遍接受-测试。世界                           ok (fail server side)
Info4@ua-test.xn--rhqv96g                      ok
Info3@xn----f38am99bqvcd5liy1cxsg.top          ok
Info5@xn----f38am99bqvcd5liy1cxsg.xn--rhqv96g  ok
info6@ختبار-القبولالعالمي.top                   ok
user@السعودية.رسيل                             ok
测试1@ua-test.link                               fail
测试2@ua-test.technology                         fail
测试3@普遍接受-测试.top                                fail
测试4@ua-test.世界                                fail
测试5@普遍接受-测试.世界                                 fail
测试5@普遍接受-测试。世界                                 fail
测试4@ua-test.xn--rhqv96g                        fail
测试3@xn----f38am99bqvcd5liy1cxsg.top            fail
测试5@xn----f38am99bqvcd5liy1cxsg.xn--rhqv96g    fail
测试6@ختبار-القبولالعالمي.top                     fail
مستخدم@رسيل.السعودية                           fail

Google Chrome / Chromium 101.0.4951.54 client side validation
Only 12 out of the 23 defined examples above validate.

Info1@ua-test.link                             ok
info2@ua-test.technology                       ok
info3@普遍接受-测试.top                            ok
info4@ua-test.世界                              ok
info5@普遍接受-测试.世界                            ok
uasg.tech@डाटामेल.भारत                            ok
info5@普遍接受-测试。世界                           ok
Info4@ua-test.xn--rhqv96g                      ok
Info3@xn----f38am99bqvcd5liy1cxsg.top          ok
Info5@xn----f38am99bqvcd5liy1cxsg.xn--rhqv96g  ok
info6@ختبار-القبولالعالمي.top                   ok
user@السعودية.رسيل                             ok
测试1@ua-test.link                               fail
测试2@ua-test.technology                         fail
测试3@普遍接受-测试.top                                fail
测试4@ua-test.世界                                fail
测试5@普遍接受-测试.世界                                 fail
测试5@普遍接受-测试。世界                                 fail
测试4@ua-test.xn--rhqv96g                        fail
测试3@xn----f38am99bqvcd5liy1cxsg.top            fail
测试5@xn----f38am99bqvcd5liy1cxsg.xn--rhqv96g    fail
测试6@ختبار-القبولالعالمي.top                     fail
مستخدم@رسيل.السعودية                           fail

So browser's appear to be supporting unicode characters in the domain and tld part of the email.
The local-part still is ASCII only.

Server side we support almost all of the formats using egulias/EmailValidator.
Only the open-dot notation of two formats are invalid.

Also we'd need to make sure to switch to a binary collation e.g. utf8mb4_bin, currently using a non-binary collation utf8mb4_general_ci for the mail field in the users_field_data table to fix the behavior MySQL/MariaDB shows when using accents in domains, tld or local-parts. Sqlite/PostgreSQL are not affected with this - AFAIK binary is the default there.

sanduhrs’s picture

sanduhrs’s picture

sanduhrs’s picture

Add missing test cases.

Version: 9.4.x-dev » 9.5.x-dev

Drupal 9.4.0-alpha1 was released on May 6, 2022, which means new developments and disruptive changes should now be targeted for the 9.5.x-dev branch. For more information see the Drupal core minor version schedule and the Allowed changes during the Drupal core release cycle.

Version: 9.5.x-dev » 10.1.x-dev

Drupal 9.5.0-beta2 and Drupal 10.0.0-beta2 were released on September 29, 2022, which means new developments and disruptive changes should now be targeted for the 10.1.x-dev branch. For more information see the Drupal core minor version schedule and the Allowed changes during the Drupal core release cycle.

Version: 10.1.x-dev » 11.x-dev

Drupal core is moving towards using a “main” branch. As an interim step, a new 11.x branch has been opened, as Drupal.org infrastructure cannot currently fully support a branch named main. New developments and disruptive changes should now be targeted for the 11.x branch, which currently accepts only minor-version allowed changes. For more information, see the Drupal core minor version schedule and the Allowed changes during the Drupal core release cycle.

Version: 11.x-dev » main

Drupal core is now using the main branch as the primary development branch. New developments and disruptive changes should now be targeted to the main branch.

Read more in the announcement.

kentr’s picture

This is probably outdated now that #1797438: HTML5 validation is preventing form submit and not fully accessible has landed.

I tested account creation with "info3@普遍接受-测试.top" and changed the email address to "info4@普遍接受-测试.top".

Both passed validation.