Working with Swedish Letters in PHP and MySQL

A user made a request to include Swedish letters in my A-Z Directory module for Joomla. A quick check on Wikipedia showed the Swedish alphabet consisted of the “modern 26-letter basic Latin alphabet (‘A’ through ‘Z’) plus, ‘Å’, ‘Ä’, and ‘Ö’, in that order.”

Displaying Swedish Letters in PHP

To generate the 26-letter alphabet, I used range('A', 'Z'), but simply adding on the three additional characters to that array either gave me weird ♦ characters or did not display them at all.

I struggled for hours with iconv, mb_convert_encoding, and all manner of encoding, decoding, and converting, but nothing worked in getting the characters to display properly using PHP while at the same time matching those characters against values stored in Joomla’s #__contact_details MySQL database table.

I had a breakthrough when instead of merging the characters themselves to the array, I merged their HTML entities to the array. But, before I could do that, I ran my English and Swedish characters through html_entity_decode to make sure they were decoded consistently.

$english = range("A", "Z");
$swedish = array("Å", "Ä", "Ö");
array_walk($english, "_azDecode");
array_walk($swedish, "_azDecode");
$alphabet = array_merge($english, $swedish);

The _azDecode function follows:

function _azDecode(&$item){
$item = html_entity_decode($item, ENT_NOQUOTES, "UTF-8");
return $item;
}

MySQL Collation

The critical piece to Wikipedia’s description of the Swedish alphabet is “in that order”. This turned out to be more problematic than I anticipated. The output from the database was sorting the Swedish letters as ‘Ä’, ‘Å’, and ‘Ö’. And more troubling was that the output was not differentiating between names that ended with ‘Å’ and ‘Ä’. Again, I struggled for hours with setlocale, strcmp, and strcoll, but to no avail. After reconstructing my SQL statement in phpMyAdmin, I realized it had nothing to do with PHP — PHP was faithfully interpreting the results of the query.

It was here that I stumbled across MySQL’s COLLATE clause. I couldn’t change the collation of the database table, but as stated in the MySQL reference manual, “with the COLLATE clause, you can override whatever the default collation is for a comparison.” MySQL’s COLLATE clause can be used with ORDER BY and AS in addition to various other parts of SQL statements.

With that information, I was able to successfully order the results based on the first letter of the contact’s last name within the context of Joomla’s database API.

$query->order("SUBSTRING_INDEX(" . $db->quoteName('name') . ", ' ', -1) COLLATE utf8_swedish_ci");