[Overview][Constants][Types][Procedures and functions][Variables][Index] Reference for unit 'LConvEncoding' (#lazutils)

GuessEncoding

Tries to determine the encoding used for the specified value.

Declaration

Source position: lconvencoding.pas line 102

function GuessEncoding(

  const s: string

):string;

Arguments

s

  

String with the content examined in the routine.

Function result

Encoding name detected, or a default value.

Description

GuessEncoding is a String function which tries to determine the encoding used for the value specified in S. The return value contains an encoding name, like 'utf8' or 'ISO-8859-1'. It may contain an empty string ('') when S is also an empty string.

GuessEncoding checks S for various Byte Order Marks at the start of the value, including: UTF8BOM, UTF16LEBOM, and UTF16BEBOM. When present, the BOM determines the encoding used for the value.

Next, it checks for an explicit '{%encoding' marker at the start of the value. When present, the value after the marker (up to the closing '}' character) is normalized and used as the return value.

Finally, it checks for a valid UTF-8 encoding (which includes ASCII characters). All characters in S are examined until a character whose UTF-8 code point is not valid is encountered.

When EncodingValid is True, EncodingAnsi is assumed. Otherwise, the default encoding for the platform is used. When the return value is EncodingUTF8, it is changed to 'ISO-8859-1'. This is done because the system may use the UTF-8 encoding, but the value in S does not. ISO 8859-1 has a full mapping to Unicode, and this prevents data loss in encoding conversions.


Version 3.2 Generated 2024-02-25 Home