UTF8CodepointToUnicode

Converts a UTF-8-encoded character to its unique Unicode U+XXXX character value.

Declaration

Source position: lazutf8.pas line 91

function UTF8CodepointToUnicode(

p: PChar;

out CodepointLen: Integer

):Cardinal;

Arguments

`p`		The UTF-8-encode string value.
`CodepointLen`		Number of bytes needed for the codepoint.

Function result

Unicode character value for the UTF-8 character.

Description

UTF8CodepointToUnicode is a Cardinal function used to convert a UTF-8-encoded character to its representation as a unique Unicode U+XXXX hexadecimal character value. For example: The letter 'A' (Decimal 65) is expressed in Unicode as U+0041.

CodepointLen is an output variable used to store the number of UTF-8-encoded bytes needed for the codepoint. It will normally contain a value in the range 1..4 (the number of possible bytes used in the UTF-8 encoding scheme). It can contain 0 (zero) when p is an empty PChar value.

The return value for the function contains the hexadecimal Unicode character value as a Cardinal data type. It can contain 0 (zero) when the value in p is not a valid UTF-8-encoded character.

Use UTF8FixBroken to fix invalid UTF-8 encoding in the string.

Use UnicodeToUTF8 to convert a Unicode character value to its UTF-8-encoded value.

Remark:

UTF8CodepointToUnicode does not check whether the codepoint is actually defined in Unicode tables.

Version 4.0

Generated 2025-05-03

Home