[Overview][Types][Classes][Procedures and functions][Variables][Index] Reference for unit 'LazUTF8' (#lazutils)

UTF8CodepointCount

Gets the number of valid UTF-8 codepoints in the specified value.

Declaration

Source position: lazutf8.pas line 87

function UTF8CodepointCount(

  const s: string

):SizeInt;

function UTF8CodepointCount(

  p: PChar;

  ByteCount: SizeInt

):SizeInt;

Arguments

s

  

String with the codepoints examined in the routine.

Function result

Integer value with the number of valid codepoints including combining characters.

Arguments

p

  

PChar type with the codepoints examined in the routine.

ByteCount

  

Number of bytes in the PChar value.

Description

UTF8CodepointCount is an overloaded SizeInt function used to determine the number of UTF-8 codepoints found in the specified value. It is similar to the UTF8Length routine, but excludes any invalid codepoints found in the input value from the count in the return value. The overloaded variants allow the input value to be specified using either the String or the PChar type.

UTF8CodepointCount iterates over the byte values in the s or p arguments, and increments the return value when a valid UTF-8 codepoint is found. UTF8CodepointLen (in system.pp) is called to the get the size for each of the UTF-8 codepoints. Valid codepoints include those represented using combining characters. The process is repeated until all of the bytes in the input value have been examined, or until a codepoint with a length of zero (0) is encountered.

The return value is zero (0) if the s or p arguments are empty, or when the ByteCount argument is zero (0).

For example:

// var
//   Utf8Str, InvalidUtf8Str: String;
//   Cnt, Len: Integer;

{A macron (decomposed)}
Utf8Str := 'A' + #$CC#$84;

{invalid single byte UTF-8}
InvalidUtf8Str := #$C0#$C1#$F5#$F6#$F7#$F8#$F9#$FA#$FB#$FC#$FD#$FE#$FF;

Cnt := UTF8CodePointCount(Utf8Str); // Cnt = 2
Len := UTF8Length(Utf8Str); // Len = 2

Cnt := UTF8CodePointCount(InvalidUtf8Str); // Cnt = 0
Len := UTF8Length(InvalidUtf8Str); // Len = 13

Cnt := UTF8CodePointCount(InvalidUtf8Str + Utf8Str); // Cnt = 2
Len := UTF8Length(InvalidUtf8Str + Utf8Str); // Len = 15

Version info

Added in LazUtils version 4.0.

See also

UTF8Length

  

Gets the length of a UTF-8-encoded string in codepoints.

UTF8CodepointSize

  

Returns the size of the UTF-8 codepoint in bytes.

UTF8LengthFast

  

Fast version of UTF8Length.

UTF8CharacterLength

  

Returns the number of bytes needed for the UTF-8 codepoint starting at p.

UTF8CodepointLen


Version 4.0 Generated 2025-05-03 Home