Posts

PHP htmlspecialchars_decode doesn’t handle nordic/german characters like å, ä and ö

The PHP functions htmlspecialchars and it’s reverse htmlspecialchars_decode only handles the following characters:

  • ‘&’ (ampersand) becomes ‘&’
  • ‘”‘ (double quote) becomes ‘"’ when ENT_NOQUOTES is not set.
  • ”’ (single quote) becomes ‘'’ only when ENT_QUOTES is set.
  • ‘<‘ (less than) becomes ‘&lt;’
  • ‘>’ (greater than) becomes ‘&gt;’

If you want to output html text containing nordic/german characters like Å, Ä, Ö and Ü in dialog boxes (popups) these characters also needs to be converted. The following PHP function does this for you:

function unhtml( $string ) {
$string = str_replace ( '&amp;', '&', $string );
$string = str_replace ( '&#039;', '\'', $string );
$string = str_replace ( '&quot;', '"', $string );
$string = str_replace ( '&lt;', '<', $string );
$string = str_replace ( '&gt;', '>', $string );
$string = str_replace ( '&uuml;', 'ü', $string );
$string = str_replace ( '&Uuml;', 'Ü', $string );
$string = str_replace ( '&auml;', 'ä', $string );
$string = str_replace ( '&Auml;', 'Ä', $string );
$string = str_replace ( '&ouml;', 'ö', $string );
$string = str_replace ( '&Ouml;', 'Ö', $string );
$string = str_replace ( '&aring;', 'å', $string );
$string = str_replace ( '&Aring;', 'Å', $string );
return $string;
}

It is important that the code is saved in UTF-8 encoding (or the format your web page is using). Edit the code in for example Windows notedpad and use Save as. Now you can select UTF-8 encoding when saving the file.

Or if you are using UNIX / Linux you can use iconv to convert the file if it is not already in the correct format. First, to find out the current encoding for your file, use the file command:

$ file --mime-encoding unhtml.php
unhtml.php: iso-8859-1

Now, to convert it to UTF-8 using the iconv command:

iconv -f iso-8859-1 -t utf-8 unhtml.php > unhtml-utf-8.php

The reverse of unhtml is of course the html function:

/* Encodes specific characters for display as html */
function html($string) {
  $string = str_replace ( '&', '&amp;', $string );
  $string = str_replace ( '\'', '&#039;', $string );
  $string = str_replace ( '"', '&quot;', $string );
  $string = str_replace ( '<', '&lt;', $string );
  $string = str_replace ( '>', '&gt;', $string );
  $string = str_replace ( 'ü', '&uuml;', $string );
  $string = str_replace ( 'Ü', '&Uuml;', $string );
  $string = str_replace ( 'ä', '&auml;', $string );
  $string = str_replace ( 'Ä', '&Auml;', $string );
  $string = str_replace ( 'ö', '&ouml;', $string );
  $string = str_replace ( 'Ö', '&Ouml;', $string );
  $string = str_replace ( 'å', '&aring;', $string );
  $string = str_replace ( 'Å', '&Aring;', $string );
  return $string;
}
?>

How does Internationalized Domain Names (IDN) work?

By the introduction of Internationalized Domain Names (IDN) it is now possible to register domain names on the Internet, containing regional characters, like for example www.räksmörgås.se.This is possible for a number of the Top Level Domains (TLD), like .se and .nu.

Read more