C:/drupal/includes/unicode.inc File Reference

Go to the source code of this file.


Enumerations
enum	UNICODE_ERROR
enum	UNICODE_SINGLEBYTE
enum	UNICODE_MULTIBYTE
Functions
	unicode_check ()
	_unicode_check ()
	unicode_requirements ()
	drupal_xml_parser_create (&$data)
	drupal_convert_to_utf8 ($data, $encoding)
	drupal_truncate_bytes ($string, $len)
	truncate_utf8 ($string, $len, $wordsafe=FALSE, $dots=FALSE)
	mime_header_encode ($string)
	mime_header_decode ($header)
	_mime_header_decode ($matches)
	decode_entities ($text, $exclude=array())
	_decode_entities ($prefix, $codepoint, $original, &$table, &$exclude)
	drupal_strlen ($text)
	drupal_strtoupper ($text)
	drupal_strtolower ($text)
	_unicode_caseflip ($matches)
	drupal_ucfirst ($text)
	drupal_substr ($text, $start, $length=NULL)

Enumeration Type Documentation

enum UNICODE_ERROR

Indicates an error during check for PHP unicode support.

Definition at line 7 of file unicode.inc.

enum UNICODE_MULTIBYTE

Indicates that full unicode support with the PHP mbstring extension is being used.

Definition at line 18 of file unicode.inc.

enum UNICODE_SINGLEBYTE

Indicates that standard PHP (emulated) unicode support is being used.

Definition at line 12 of file unicode.inc.

Function Documentation

_decode_entities	(	$	prefix,
		$	codepoint,
		$	original,
		&$	table,
		&$	exclude
	)

Helper function for decode_entities

Definition at line 351 of file unicode.inc.

_mime_header_decode ( $ matches )

Helper function to mime_header_decode

Definition at line 309 of file unicode.inc.

References drupal_convert_to_utf8().

Here is the call graph for this function:

_unicode_caseflip ( $ matches )

Helper function for case conversion of Latin-1. Used for flipping U+C0-U+DE to U+E0-U+FD and back.

Definition at line 450 of file unicode.inc.

_unicode_check ( )

Perform checks about Unicode support in PHP, and set the right settings if needed.

Because Drupal needs to be able to handle text in various encodings, we do not support mbstring function overloading. HTTP input/output conversion must be disabled for similar reasons.

Parameters:

$errors

Whether to report any fatal errors with form_set_error().

Definition at line 38 of file unicode.inc.

References get_t().

Referenced by unicode_check(), and unicode_requirements().

Here is the call graph for this function:

decode_entities	(	$	text,
		$	exclude = `array()`
	)

Decode all HTML entities (including numerical ones) to regular UTF-8 bytes. Double-escaped entities will only be decoded once ("&lt;" becomes "<", not "<").

Parameters:

	$text	The text to decode entities in.
	$exclude	An array of characters which should not be decoded. For example, array('<', '&', '"'). This affects both named and numerical entities.

Definition at line 331 of file unicode.inc.

Referenced by db_connect(), drupal_html_to_text(), and format_rss_channel().

drupal_convert_to_utf8	(	$	data,
		$	encoding
	)

Convert data to UTF-8

Requires the iconv, GNU recode or mbstring PHP extension.

Parameters:

	$data	The data to be converted.
	$encoding	The encoding that the data is in

Returns:: Converted data or FALSE.

Definition at line 173 of file unicode.inc.

References watchdog().

Referenced by _mime_header_decode(), and drupal_xml_parser_create().

Here is the call graph for this function:

drupal_strlen ( $ text )

Count the amount of characters in a UTF-8 string. This is less than or equal to the byte count.

Definition at line 401 of file unicode.inc.

Referenced by _form_validate(), theme_username(), and truncate_utf8().

drupal_strtolower ( $ text )

Lowercase a UTF-8 string.

Definition at line 432 of file unicode.inc.

Referenced by book_export(), parse_size(), and template_preprocess_page().

drupal_strtoupper ( $ text )

Uppercase a UTF-8 string.

Definition at line 415 of file unicode.inc.

Referenced by drupal_ucfirst(), and tablesort_sql().

drupal_substr	(	$	text,
		$	start,
		$	length = `NULL`
	)

Cut off a piece of a string based on character indices and counts. Follows the same behavior as PHP's own substr() function.

Note that for cutting off a string at a known character/substring location, the usage of PHP's normal strpos/substr is safe and much faster.

Definition at line 470 of file unicode.inc.

Referenced by drupal_ucfirst(), theme_username(), and truncate_utf8().

drupal_truncate_bytes	(	$	string,
		$	len
	)

Truncate a UTF-8-encoded string safely to a number of bytes.

If the end position is in the middle of a UTF-8 sequence, it scans backwards until the beginning of the byte sequence.

Use this function whenever you want to chop off a string at an unsure location. On the other hand, if you're sure that you're splitting on a character boundary (e.g. after using strpos() or similar), you can safely use substr() instead.

Parameters:

	$string	The string to truncate.
	$len	An upper limit on the returned string length.

Returns:: The truncated string.

Definition at line 209 of file unicode.inc.

Referenced by mime_header_encode().

drupal_ucfirst ( $ text )

Capitalize the first letter of a UTF-8 string.

Definition at line 457 of file unicode.inc.

References drupal_strtoupper(), and drupal_substr().

Referenced by system_modules(), and system_modules_confirm_form().

Here is the call graph for this function:

drupal_xml_parser_create ( &$ data )

Prepare a new XML parser.

This is a wrapper around xml_parser_create() which extracts the encoding from the XML data first and sets the output encoding to UTF-8. This function should be used instead of xml_parser_create(), because PHP 4's XML parser doesn't check the input encoding itself. "Starting from PHP 5, the input encoding is automatically detected, so that the encoding parameter specifies only the output encoding."

This is also where unsupported encodings will be converted. Callers should take this into account: $data might have been changed after the call.

Parameters:

&$data

The XML data which will be parsed later.

Returns:: An XML parser object.

Definition at line 126 of file unicode.inc.

References drupal_convert_to_utf8(), and watchdog().

Here is the call graph for this function:

mime_header_decode ( $ header )

Complement to mime_header_encode

Definition at line 299 of file unicode.inc.

References $header.

mime_header_encode ( $ string )

Encodes MIME/HTTP header values that contain non-ASCII, UTF-8 encoded characters.

For example, mime_header_encode('tést.txt') returns "=?UTF-8?B?dMOpc3QudHh0?=".

See for more information.

Notes:

Only encode strings that contain non-ASCII characters.
We progressively cut-off a chunk with truncate_utf8(). This is to ensure each chunk starts and ends on a character boundary.
Using
as the chunk separator may cause problems on some systems and may have to be changed to
or .

Definition at line 279 of file unicode.inc.

References $output, and drupal_truncate_bytes().

Referenced by drupal_mail_send().

Here is the call graph for this function:

truncate_utf8	(	$	string,
		$	len,
		$	wordsafe = `FALSE`,
		$	dots = `FALSE`
	)

Truncate a UTF-8-encoded string safely to a number of characters.

Parameters:

	$string	The string to truncate.
	$len	An upper limit on the returned string length.
	$wordsafe	Flag to truncate at last space within the upper limit. Defaults to FALSE.
	$dots	Flag to add trailing dots. Defaults to FALSE.