"Internationalisation with PHP and Intl" source code

First play and test

<?php

//note that this script file is UTF-8

//UTF-8 CLI assumed, else you'll need this:
//header("Content-Type: text/html; charset=UTF-8;");

//'hi' is Hindi, 'fa' is Farsi, 'ar_EG' is Egyptian Arabic
$locales = array('en', 'en_US', 'fr_FR', 'de_DE', 'hi', 'fa', 'ar_EG');

$number = 1234567890;

foreach($locales as $locale)
{
$formatter = new NumberFormatter($locale, NumberFormatter::DECIMAL);
echo $locale . ":t" . $formatter->format($number) . "n";
}

//Output:
//en: 1,234,567,890
//en_US: 1,234,567,890
//fr_FR: 1 234 567 890
//de_DE: 1.234.567.890
//hi: १,२३,४५,६७,८९०
//fa: ۱٬۲۳۴٬۵۶۷٬۸۹۰
//ar_EG: ١٬٢٣٤٬٥٦٧٬٨٩٠

?>

Sorting German

<?php



//some German surnames
$german_names = array('Weiß', 'Goldmann', 'Göbel', 'Weiss', 'Göthe',
'Goethe', 'Götz');

sort($german_names);

//gives Array ( [0] => Goethe [1] => Goldmann [2] => Göbel [3] => Göthe
//[4] => Götz [5] => Weiss [6] => Weiß )
//which COINCIDENTALLY is the Austrian sort order
print_r($german_names);

sort($german_names, SORT_STRING); //default is SORT_REGULAR
print_r($german_names); //gives same as above

//BTW, you're not going to get far with setlocale() if you don't
//have that particular locale supported on your OS!
//on *nixes, something like:

//> locale --all-locales
//will give you a list of all installed locales
//
//you can give setlocale() a *list* of locales to try if
//you're not sure how your OS is spelling it etc
//
//ICU (and therefore Intl) uses its own locales and is not dependent
//on the operating system for the locale data

setlocale(LC_ALL, 'de_DE');
sort($german_names, SORT_LOCALE_STRING);
//above gives Array ( [0] => Göbel [1] => Göthe [2] => Götz [3] => Goethe
//[4] => Goldmann [5] => Weiß [6] => Weiss )
//which isn't dictionary, phonebook or Austrian sort order :(
//[it seems to be "umlauted vowel comes before plain vowel, eszett
//comes before double ess" order]

setlocale(LC_ALL, 'de_AT');
//above gives same as for de_DE - ie. nothing good :(

//this is curious...
setlocale(LC_ALL, 'de_DE.utf8');
//above gives Array ( [0] => Göbel [1] => Goethe [2] => Goldmann
//[3] => Göthe [4] => Götz [5] => Weiss [6] => Weiß )
//which is our dictionary sort order!

//----Let's try using Intl--------

$coll = new Collator('de_DE');

$coll->sort($german_names);

//gives Array ( [0] => Göbel [1] => Goethe [2] => Goldmann [3] => Göthe
//[4] => Götz [5] => Weiss [6] => Weiß )
//which is our dictionary sort order!

//Collator constructor can accept UCA keywords
$coll = new Collator('de@collation=phonebook'); //see
http://userguide.icu-project.org/collation/architecture and
http://userguide.icu-project.org/locale

$coll->sort($german_names);

//gives Array ( [0] => Göbel [1] => Goethe [2] => Göthe [3] => Götz
//[4] => Goldmann [5] => Weiss [6] => Weiß )
//which is our phonebook sort order!

?>

Japanese era

<?php



$timezones = array('en_GB' => 'Europe/London', 'ja_JP' => 'Asia/Tokyo',
'ja_JP@calendar=japanese' => 'Asia/Tokyo');

$now = new DateTime(); //DateTime is a core PHP class as of version 5.2.0

foreach($timezones as $locale => $timezone)
{
$calendar = IntlDateFormatter::GREGORIAN;

if(strpos($locale, 'calendar=') !== false)
{
//slightly presumptuous as @calendar=gregorian also exists
$calendar = IntlDateFormatter::TRADITIONAL;
}

$formatter = new IntlDateFormatter($locale, IntlDateFormatter::FULL,
IntlDateFormatter::FULL, $timezone, $calendar);

echo 'It is now: "' . $formatter->format($now) . '" in ' .
"{$timezone}n";
}

//Last line of output gives "平成 23 年" which is Heisei 23!

?>

Korean numbers

<?php



//See:
//http://en.wikipedia.org/wiki/Korean_numerals
//http://askakorean.blogspot.com/2010/03/korean-language-series-sino-
korean.html

$number = 1234567890;

$formatter = new NumberFormatter('ko_KR', NumberFormatter::SPELLOUT);

echo "Korean spellout ({$formatter->getLocale()}):t"
. $formatter->format($number) . "n";
//above gives [Korean spellout (en_GB): one thousand two hundred and
thirty-four

//million, five hundred and sixty-seven thousand, eight hundred and
ninety]
//ie. locale has fallen back to system default (in this case en_GB)

//but ko_KR is a valid ICU locale string, so let's check:
$formatter = new NumberFormatter('ko_KR', NumberFormatter::CURRENCY);

echo "Korean currency ({$formatter->getLocale()}):t"
. $formatter->format($number) . "n";
//above gives [Korean currency (ko): ￦1,234,567,890] which is correct

//ok, so it looks like we don't have the rules for Korean spellout
//we'll have to supply the NumberFormatter with our own ruleset.
//the technical details of the ruleset format are at
//http://userguide.icu-project.org/formatparse/numbers/rbnf-examples
//BUT we *do* have a ruleset for Japanese spellout which we can modify and
use!
//(it's similar because it also counts in ten thousands and has non-Arabic
numerals)
//the Japanese spellout ruleset is this (construct a NumberFormatter for
Japanese
//spellout and then var_dump($formatter->getPattern())):
//
//
//pattern for japanese spellout
/*
string(1520) "%financial:
0: 零;
1: 壱;
2: 弐;
3: 参;
4: 四;
5: 伍;
6: 六;
7: 七;
8: 八;
9: 九;
10: 拾;
11: 拾>%financial>;
20: <%financial<拾;
21: <%financial<拾>%financial>;
100: <%financial<百;
101: <%financial<百>%financial>;
1000: <%financial<千;
1001: <%financial<千>%financial>;
10000: <%financial<萬;
10001: <%financial<萬>%financial>;
100000000: <%financial<億;
100000001: <%financial<億>%financial>;

1000000000000: <%financial<兆;
1000000000001: <%financial<兆>%financial>;
10000000000000000: =#,##0=;
-x: マイナス>%financial>;
x.x: <%financial<点>%financial>;
%traditional:
0: 〇;
1: 一;
2: 二;
3: 三;
4: 四;
5: 五;
6: 六;
7: 七;
8: 八;
9: 九;
10: 十;
11: 十>%traditional>;
20: <%traditional<十;
21: <%traditional<十>%traditional>;
100: 百;
101: 百>%traditional>;
200: <%traditional<百;
201: <%traditional<百>%traditional>;
1000: 千;
1001: 千>%traditional>;
2000: <%traditional<千;
2001: <%traditional<千>%traditional>;
10000: <%traditional<万;
10001: <%traditional<万>%traditional>;
100000000: <%traditional<億;
100000001: <%traditional<億>%traditional>;
1000000000000: <%traditional<兆;
1000000000001: <%traditional<兆>%traditional>;
10000000000000000: =#,##0=;
-x: マイナス>%traditional>;
x.x: <%traditional<・>%traditional>;
"
*/

//basically, for Japanese we count in groups of ten thousands
//and we have a traditional set of characters
//and a financial (anti-forgery) set of characters.
//
//Korean also counts in groups of ten thousands

//so let's modify this pattern for Korean
//(note that we'll use only South Korean Sino-Korean numbers)

$korean_pattern = '%hangul:
0: 영; 1: 일; 2: 이; 3: 삼; 4: 사; 5: 오; 6: 육; 7: 칠; 8: 팔; 9:
구; 10: 십;
11: 십>%hangul>;
20: <%hangul<십;
21: <%hangul<십>%hangul>;
100: 백;
101: 백>%hangul>;
200: <%hangul<백;
201: <%hangul<백>%hangul>;
1000: 천;
1001: 천>%hangul>;
2000: <%hangul<천;
2001: <%hangul<천>%hangul>;
10000: <%hangul<만;
10001: <%hangul<만>%hangul>;
100000000: <%hangul<억;
100000001: <%hangul<억>%hangul>;
1000000000000: <%hangul<조;
1000000000001: <%hangul<조>%hangul>;
10000000000000000: =#,##0=;
-x: ->%hangul>;
x.x: <%hangul<・>%hangul>;
%hanja:
0: 〇;
1: 一;
2: 二;
3: 三;
4: 四;
5: 五;
6: 六;
7: 七;
8: 八;
9: 九;
10: 十;
11: 十>%hanja>;
20: <%hanja<十;
21: <%hanja<十>%hanja>;
100: 百;
101: 百>%hanja>;

200: <%hanja<百;
201: <%hanja<百>%hanja>;
1000: 千;
1001: 千>%hanja>;
2000: <%hanja<千;
2001: <%hanja<千>%hanja>;
10000: <%hanja<萬;
10001: <%hanja<萬>%hanja>;
100000000: <%hanja<億;
100000001: <%hanja<億>%hanja>;
1000000000000: <%hanja<兆;
1000000000001: <%hanja<兆>%hanja>;
10000000000000000: =#,##0=;
-x: ->%hanja>;
x.x: <%hanja<・>%hanja>;';

$formatter = new NumberFormatter('ko_KR',
NumberFormatter::PATTERN_RULEBASED,
$korean_pattern);

$formatter->setTextAttribute(NumberFormatter::DEFAULT_RULESET, "%hangul");

$numbers = array_merge(range(0, 20), range(30, 100, 10),
array(1000, 10000, 100000000, 1000000000000));

foreach($numbers as $number)
{
echo "{$number}: {$formatter->format($number)}n";
}

//outputs a correct list of Korean Hangul numbers

?>

strftime (not on presentation)

<?php



//out of curiosity, let's see how PHP's core strftime() handles
//some different locale date formats

//BTW, you're not going to get far with setlocale() if you don't
//have that particular locale supported on your OS!
//on *nixes, something like:
//> locale --all-locales
//will give you a list of all installed locales
//

//you can give setlocale() a *list* of locales to try if
//you're not sure how your OS is spelling it etc
//
//ICU (and therefore Intl) uses its own locales and is not dependent
//on the operating system for the locale data

$locales = array(array('fi_FI.utf8', 'fi_FI'), array('ja_JP.utf8',
'ja_JP'),
array('fr_FR.utf8', 'fr_FR'));

$format = "%A %d %B %Y"; //ie. "Wednesday 17 August 2011"

foreach($locales as $locale_array)
{
$locale = setlocale(LC_TIME, $locale_array);
echo "{$locale}:t" . strftime($format) . "n";
}

//Output:
//fi_FI.utf8: keskiviikko 17 elokuu 2011
//ja_JP.utf8: 水曜日 17 8 月 2011
//fr_FR.utf8: mercredi 17 août 2011

//Comment:
//Not bad at all. The Japanese date won't be very natural-looking for a
//native Japanese speaker as the day and year aren't quantified with the
//appropriate character.
//Also seem to be some implementation issues
(http://uk.php.net/manual/en/function.strftime.php)
//Intl extension is giving us more power and flexibility

?>

Japanese financial numbers (not on presentation)

<?php



//SUPPORTED LOCALES FOR "SPELLOUT" IS A LOT MORE LIMITED
//THAN FOR DECIMAL OR CURRENCY ETC...
//
//from ICU website:
//"ICU provides number spellout rules for several locales,
//but not for all of the locales that ICU supports, and not all of the
predefined rule types.
//Also, as of release 2.6, some of the provided rules are known to be
incomplete."

$number = 1234567890;

$formatter = new NumberFormatter('ja_JP', NumberFormatter::SPELLOUT);

echo "Default Japanese spellout:t" . $formatter->format($number) . "n";

//above gives [十二億三千四百五十六万七千八百九十] - the usual kanji numbers

//see "Key/Type Definitions" at http://www.unicode.org/reports/tr35
$formatter = new NumberFormatter('ja_JP@numbers=jpanfin',
NumberFormatter::SPELLOUT);

echo "Modified locale spellout:t" . $formatter->format($number) . "n";
//above also gives [十二億三千四百五十六万七千八百九十] - not our financial kanji
numbers!

//Hmmmm, but if we now var_dump($formatter->getPattern()) we get:

//pattern for japanese spellout
//(interestingly, financial kanji here and at
http://www.sljfaq.org/afaq/banknote-numbers.html differ)
/*
string(1520) "%financial:
0: 零;
1: 壱;
2: 弐;
3: 参;
4: 四;
5: 伍;
6: 六;
7: 七;
8: 八;
9: 九;
10: 拾;
11: 拾>%financial>;
20: <%financial<拾;
21: <%financial<拾>%financial>;
100: <%financial<百;
101: <%financial<百>%financial>;
1000: <%financial<千;
1001: <%financial<千>%financial>;
10000: <%financial<萬;
10001: <%financial<萬>%financial>;
100000000: <%financial<億;
100000001: <%financial<億>%financial>;
1000000000000: <%financial<兆;
1000000000001: <%financial<兆>%financial>;
10000000000000000: =#,##0=;
-x: マイナス>%financial>;
x.x: <%financial<点>%financial>;
%traditional:
0: 〇;

1: 一;
2: 二;
3: 三;
4: 四;
5: 五;
6: 六;
7: 七;
8: 八;
9: 九;
10: 十;
11: 十>%traditional>;
20: <%traditional<十;
21: <%traditional<十>%traditional>;
100: 百;
101: 百>%traditional>;
200: <%traditional<百;
201: <%traditional<百>%traditional>;
1000: 千;
1001: 千>%traditional>;
2000: <%traditional<千;
2001: <%traditional<千>%traditional>;
10000: <%traditional<万;
10001: <%traditional<万>%traditional>;
100000000: <%traditional<億;
100000001: <%traditional<億>%traditional>;
1000000000000: <%traditional<兆;
1000000000001: <%traditional<兆>%traditional>;
10000000000000000: =#,##0=;
-x: マイナス>%traditional>;
x.x: <%traditional<・>%traditional>;
"
*/

//so the financial kanji are in there but how to wrangle them out??

$formatter = new NumberFormatter('ja_JP', NumberFormatter::SPELLOUT);

$formatter->setTextAttribute(NumberFormatter::DEFAULT_RULESET,
"%financial");

echo "setTextAttribute spellout:t" . $formatter->format($number) . "n";
//above gives [拾弐億参千四百伍拾六萬七千八百九拾] - bingo!

//now, out of curiosity

//same formatter as above

$formatter->setTextAttribute(NumberFormatter::DEFAULT_RULESET,
"%traditional");

echo "setTextAttribute spellout:t" . $formatter->format($number) . "n";
//yes, this gives [十二億三千四百五十六万七千八百九十]

//notice that the %traditional and $financial patterns
//differ in more than just the characters used
//(for example, look at each format for the value of 100).
//let's take a look

$numbers = array(100, 199, 200, 201, 1000, 1999, 2000, 2001);

$traditional_formatter = new NumberFormatter('ja_JP',
$financial_formatter = new NumberFormatter('ja_JP',
$financial_formatter->setTextAttribute(NumberFormatter::DEFAULT_RULESET,
"%financial");

foreach($numbers as $number)
{
echo "{$number} as traditional:t" . $traditional_formatter-
>format($number) . "n";
echo "{$number} as financial:t" . $financial_formatter-
>format($number) . "n";
echo "----------------n";
}

//outputs:
/*
100 as traditional: 百
100 as financial: 壱百
----------------
199 as traditional: 百九十九
199 as financial: 壱百九拾九
----------------
200 as traditional: 二百
200 as financial: 弐百
----------------
201 as traditional: 二百一
201 as financial: 弐百壱
----------------
1000 as traditional: 千
1000 as financial: 壱千
----------------
1999 as traditional: 千九百九十九
1999 as financial: 壱千九百九拾九
----------------
2000 as traditional: 二千
2000 as financial: 弐千
----------------

2001 as traditional: 二千一
2001 as financial: 弐千壱
----------------
*/
//We see that the different rules enable the financial spellout to write,
say, "one thousand" instead of
//the traditional "thousand". This clearly makes sense in an anti-forgery
context.

//User exercise: compare and contrast with PHPs' core localeconv()

?>

Locale::acceptFromHttp (not on presentation)

<?php


//We can set Intl's locale based on the browser's HTTP_ACCEPT_LANGUAGE
header.
//Browser's send this header based on their "prefered language" setting.
//Only power users would tinker with this setting directly, but we can
assume
//that it is *usually* correct.
//Google sites are quite good at using this header, try changing your
//browser's prefered language setting and then visit your favourite
//Google site!

header("Content-Type: text/html; charset=UTF-8;");

echo 'Browser's Accept-Language header: ' .
$_SERVER['HTTP_ACCEPT_LANGUAGE'] . ' ';

$browser_locale =
Locale::acceptFromHttp($_SERVER['HTTP_ACCEPT_LANGUAGE']);
echo 'Decided browser locale: ' . $browser_locale . ' ';

Locale::setDefault($browser_locale);
echo 'Intl default locale now: ' . Locale::getDefault() . ' '; //a
check

$all_variants = Locale::getAllVariants(Locale::getDefault());
echo 'All variants: ';
print_r($all_variants);
echo ' ';

$language_name = Locale::getDisplayLanguage(Locale::getDefault());
echo 'Language display name: ' . $language_name . ' ';

$region_name = Locale::getDisplayRegion(Locale::getDefault());
echo 'Region display name: ' . $region_name . ' ';

$script_name = Locale::getDisplayScript(Locale::getDefault());
echo 'Script display name: ' . $script_name . ' ';

$variant_name = Locale::getDisplayVariant(Locale::getDefault());
echo 'Variant display name: ' . $variant_name . ' ';

$keywords = Locale::getKeywords(Locale::getDefault());
echo 'Keywords: ';
print_r($keywords);
echo ' ';

?>

"Internationalisation with PHP and Intl" source code

More Related Content

What's hot (20)

Similar to "Internationalisation with PHP and Intl" source code (20)

More from Daniel_Rhodes (9)

Recently uploaded (20)

"Internationalisation with PHP and Intl" source code