[Overview][Types][Classes][Procedures and functions][Variables][Index] |
Replaces all invalid UTF-8 characters in a string with the specified character.
Source position: lazutf8.pas line 109
procedure UTF8FixBroken( |
P: PChar; |
ReplaceChar: Char = #$20 |
); overload; |
var S: string; |
ReplaceChar: Char = #$20 |
); |
P |
|
PChar with the UTF-8-encoded values examined in the routine. |
ReplaceChar |
|
Character used to replace invalid codepoints in the input argument. The default value for the argument is the Space character (decimal 32 hex $20). |
S |
|
String with the UTF-8-encoded values examined in the routine. |
ReplaceChar |
|
Character used to replace invalid codepoints in the input argument. The default value for the argument is the Space character (decimal 32 hex $20). |
UTF8FixBroken is an overloaded routine used to replace all invalid UTF-8 characters in the specified value with a replacement character. The overloaded variants allow the UTF-8-encoded content to be specified using either a PChar or a String type.
ReplaceChar contains the character used to replace any invalid UTF-8 characters found in the input value. The default value for ReplaceChar is the Space character (Hex $20 Decimal 32).
The PChar variant examines the specified byte values to determine when an invalid UTF-8 codepoint is found. This includes 1, 2, or 3 byte values, those that fall outside of the ranges allowed in UTF-8, and common byte sequences used to inject XSS vulnerabilities. UTF8FixBroken stops processing at the first occurrence of the byte value #0 (Decimal 0). UTF-8 byte sequences updated in the routine are stored in the original PChar argument.
The String variant converts the input argument to a PChar type and calls FindInvalidUTF8Codepoint to locate invalid UTF-8 byte sequences. If invalid bytes are found, UniqueString is called to get a new reference-counted String for the return value generated by calling the overloaded PChar variant.
Modified in LazUtils version 4.0 to include the ReplaceChar argument.
|
Finds the position where an invalid UTF-8 codepoint is found in the string. |
|
Version 4.0 | Generated 2025-05-03 | Home |