[Overview][Types][Classes][Procedures and functions][Variables][Index] Reference for unit 'LazUTF8' (#lazutils)

UTF8CodepointToUnicode

Converts a UTF-8-encoded character to its unique Unicode U+XXXX character value.

Declaration

Source position: lazutf8.pas line 88

function UTF8CodepointToUnicode(

  p: PChar;

  out CodepointLen: Integer

):Cardinal;

Arguments

p

  

The UTF-8-encode string value.

CodepointLen

  

Number of bytes needed for the codepoint.

Function result

Unicode character value for the UTF-8 character.

Description

UTF8CodepointToUnicode is a Cardinal function used to convert a UTF-8-encoded character to its representation as a unique Unicode U+XXXX hexadecimal character value. For example: The letter 'A' (Decimal 65) is expressed in Unicode as U+0041.

CodepointLen is an output variable used to store the number of UTF-8-encoded bytes needed for the codepoint. It will normally contain a value in the range 1..4 (the number of possible bytes used in the UTF-8 encoding scheme). It can contain 0 (zero) when p is an empty PChar value.

The return value for the function contains the hexadecimal Unicode character value as a Cardinal data type. It can contain 0 (zero) when the value in p is not a valid UTF-8-encoded character.

Use UTF8FixBroken to fix invalid UTF-8 encoding in the string.

Use UnicodeToUTF8 to convert a Unicode character value to its UTF-8-encoded value.

Remark: UTF8CodepointToUnicode does not check whether the codepoint is actually defined in Unicode tables.

Version 3.2 Generated 2024-02-25 Home