Skip to content Skip to sidebar Skip to footer

Htmlentities() Makes Chinese Characters Unusable

we have a web application where we allow users to enter their own html in a text area. We save that data to our database. When we load the html data into the text area, of course,

Solution 1:

Have you tried using htmlspecialchars?

I currently use that in production and it's fine.

$foo = "我的名字叫萨沙"echo'<textarea>' . htmlspecialchars($foo) . '</textarea>';

Alternately,

$str = “&#20320;&#22909;”;echo mb_convert_encoding($str, ‘UTF-8′, ‘HTML-ENTITIES’);

As found on http://www.techiecorner.com/129/php-how-to-convert-iso-character-htmlentities-to-utf-8/

Solution 2:

Specify charset, e.g. UTF-8 and it should work.

echo htmlentities($data, ENT_COMPAT, 'UTF-8'); 

Solution 3:

PHP is pretty appalling in terms of framework-wide support for international character sets (although it's slowly getting better, especially in PHP5, but you don't specify which version you're using). There are a few mb_ (multibyte, as in multibyte characters) functions to help you out, though.

This example may help you (from here):

<?php/** 
 *  Multibyte equivalent for htmlentities() [lite version :)] 
 * 
 * @param string $str 
 * @param string $encoding 
 * @return string 
 **/functionmb_htmlentities($str, $encoding = 'utf-8') { 
    mb_regex_encoding($encoding); 
    $pattern = array('<', '>', '"', '\''); 
    $replacement = array('&lt;', '&gt;', '&quot;', '&#39;'); 
    for ($i=0; $i<sizeof($pattern); $i++) { 
        $str = mb_ereg_replace($pattern[$i], $replacement[$i], $str); 
    } 
    return$str; 
} 
?>

Also, make sure your page is specifying the same character set. You can do this with a meta tag:

<metahttp-equiv="Content-Type"content="text/html; charset=utf-8">

Solution 4:

Most likely you're not using the correct encoding. If you already know your output encoding, use the charset argument of the html_entities function.

If you haven't settled on an internal encoding yet, take a look at the iconv functions; iconv_set_encoding("internal_encoding", "UTF-8"); might be a good start.

Post a Comment for "Htmlentities() Makes Chinese Characters Unusable"