[Overview][Constants][Types][Procedures and functions][Variables][Index] |
Tries to determine the encoding used for the specified value.
Source position: lconvencoding.pas line 102
function GuessEncoding( |
const s: string |
):string; |
s |
|
String with the content examined in the routine. |
Encoding name detected, or a default value.
GuessEncoding is a String function which tries to determine the encoding used for the value specified in S. The return value contains an encoding name, like 'utf8' or 'ISO-8859-1'. It may contain an empty string ('') when S is also an empty string.
GuessEncoding checks S for various Byte Order Marks at the start of the value, including: UTF8BOM, UTF16LEBOM, and UTF16BEBOM. When present, the BOM determines the encoding used for the value.
Next, it checks for an explicit '{%encoding' marker at the start of the value. When present, the value after the marker (up to the closing '}' character) is normalized and used as the return value.
Finally, it checks for a valid UTF-8 encoding (which includes ASCII characters). All characters in S are examined until a character whose UTF-8 code point is not valid is encountered.
When EncodingValid is True, EncodingAnsi is assumed. Otherwise, the default encoding for the platform is used. When the return value is EncodingUTF8, it is changed to 'ISO-8859-1'. This is done because the system may use the UTF-8 encoding, but the value in S does not. ISO 8859-1 has a full mapping to Unicode, and this prevents data loss in encoding conversions.
Version 3.2 | Generated 2024-02-25 | Home |