[Overview][Types][Classes][Procedures and functions][Variables][Index] Reference for unit 'LazUTF8' (#lazutils)

UTF8FixBroken

Replaces all invalid UTF-8 characters in a string with the specified character.

Declaration

Source position: lazutf8.pas line 109

procedure UTF8FixBroken(

  P: PChar;

  ReplaceChar: Char = #$20

); overload;

procedure UTF8FixBroken(

  var S: string;

  ReplaceChar: Char = #$20

);

Arguments

P

  

PChar with the UTF-8-encoded values examined in the routine.

ReplaceChar

  

Character used to replace invalid codepoints in the input argument. The default value for the argument is the Space character (decimal 32 hex $20).

Arguments

S

  

String with the UTF-8-encoded values examined in the routine.

ReplaceChar

  

Character used to replace invalid codepoints in the input argument. The default value for the argument is the Space character (decimal 32 hex $20).

Description

UTF8FixBroken is an overloaded routine used to replace all invalid UTF-8 characters in the specified value with a replacement character. The overloaded variants allow the UTF-8-encoded content to be specified using either a PChar or a String type.

ReplaceChar contains the character used to replace any invalid UTF-8 characters found in the input value. The default value for ReplaceChar is the Space character (Hex $20 Decimal 32).

The PChar variant examines the specified byte values to determine when an invalid UTF-8 codepoint is found. This includes 1, 2, or 3 byte values, those that fall outside of the ranges allowed in UTF-8, and common byte sequences used to inject XSS vulnerabilities. UTF8FixBroken stops processing at the first occurrence of the byte value #0 (Decimal 0). UTF-8 byte sequences updated in the routine are stored in the original PChar argument.

The String variant converts the input argument to a PChar type and calls FindInvalidUTF8Codepoint to locate invalid UTF-8 byte sequences. If invalid bytes are found, UniqueString is called to get a new reference-counted String for the return value generated by calling the overloaded PChar variant.

Version info

Modified in LazUtils version 4.0 to include the ReplaceChar argument.

See also

FindInvalidUTF8Codepoint

  

Finds the position where an invalid UTF-8 codepoint is found in the string.

UniqueString


Version 4.0 Generated 2025-05-03 Home