[Overview][Types][Classes][Procedures and functions][Variables][Index] Reference for unit 'LazUTF8' (#lazutils)

UTF8FixBroken

Replaces all invalid UTF-8 characters with spaces.

Declaration

Source position: lazutf8.pas line 106

procedure UTF8FixBroken(

  P: PChar

); overload;

procedure UTF8FixBroken(

  var S: string

); overload;

Arguments

P

  

PChar with the UTF-8-encoded values examined in the routine.

Arguments

S

  

String with the UTF-8-encoded values examined in the routine.

Description

UTF8FixBroken is an overloaded routine used to replace all invalid UTF-8 characters with spaces. The overloaded variants allow the UTF-8-encoded content to be specified using either a PChar or a String type.

The PChar variant examines the specified byte values to determine when an invalid UTF-8 codepoint is found. This includes byte values that fall outside of the ranges allowed in UTF-8, and common byte sequences used to inject XSS vulnerabilities.

UTF-8 byte sequences updated in the routine are stored in the original PChar argument.

UTF8FixBroken processing at the first occurrence of the byte value #0 (Decimal 0).

The String variant converts the argument to a PChar type and calls FindInvalidUTF8Codepoint to locate invalid UTF-8 byte sequences. When found, UniqueString is called to get a new reference-counted String for the return value.

See also

FindInvalidUTF8Codepoint

  

Finds the position where an invalid UTF-8 codepoint is found in the string.

UniqueString


Version 3.2 Generated 2024-02-25 Home