[Overview][Types][Classes][Procedures and functions][Variables][Index] |
Gets the number of valid UTF-8 codepoints in the specified value.
Source position: lazutf8.pas line 87
function UTF8CodepointCount( |
const s: string |
):SizeInt; |
p: PChar; |
ByteCount: SizeInt |
):SizeInt; |
s |
|
String with the codepoints examined in the routine. |
Integer value with the number of valid codepoints including combining characters.
p |
|
PChar type with the codepoints examined in the routine. |
ByteCount |
|
Number of bytes in the PChar value. |
UTF8CodepointCount is an overloaded SizeInt function used to determine the number of UTF-8 codepoints found in the specified value. It is similar to the UTF8Length routine, but excludes any invalid codepoints found in the input value from the count in the return value. The overloaded variants allow the input value to be specified using either the String or the PChar type.
UTF8CodepointCount iterates over the byte values in the s or p arguments, and increments the return value when a valid UTF-8 codepoint is found. UTF8CodepointLen (in system.pp) is called to the get the size for each of the UTF-8 codepoints. Valid codepoints include those represented using combining characters. The process is repeated until all of the bytes in the input value have been examined, or until a codepoint with a length of zero (0) is encountered.
The return value is zero (0) if the s or p arguments are empty, or when the ByteCount argument is zero (0).
For example:
// var // Utf8Str, InvalidUtf8Str: String; // Cnt, Len: Integer; {A macron (decomposed)} Utf8Str := 'A' + #$CC#$84; {invalid single byte UTF-8} InvalidUtf8Str := #$C0#$C1#$F5#$F6#$F7#$F8#$F9#$FA#$FB#$FC#$FD#$FE#$FF; Cnt := UTF8CodePointCount(Utf8Str); // Cnt = 2 Len := UTF8Length(Utf8Str); // Len = 2 Cnt := UTF8CodePointCount(InvalidUtf8Str); // Cnt = 0 Len := UTF8Length(InvalidUtf8Str); // Len = 13 Cnt := UTF8CodePointCount(InvalidUtf8Str + Utf8Str); // Cnt = 2 Len := UTF8Length(InvalidUtf8Str + Utf8Str); // Len = 15
Added in LazUtils version 4.0.
|
Gets the length of a UTF-8-encoded string in codepoints. |
|
|
Returns the size of the UTF-8 codepoint in bytes. |
|
|
Fast version of UTF8Length. |
|
|
Returns the number of bytes needed for the UTF-8 codepoint starting at p. |
|
Version 4.0 | Generated 2025-05-03 | Home |