Loading...

XML

Word

Printable

Details

Type: Task
Resolution: Unresolved
Priority: Major
Fix Version/s: None
Component/s: Build System
Labels:
- MSVC
Environment:

Windows 7 x64, system language: zh-HK, non-unicode program codepage: 950 (Big5)

Description

This issue is an attempt to replace ~~CORE-7415~~

MSVC uses the system ANSI character set when compiling non-UTF-8 source files. This usually is a non-issue for SBCS (single-byte character set) systems. The code is in ASCII range, and any non-ASCII characters only appears within string/char literals and comments. Most non-ASCII characters map to valid chars in any SBCS codepages, but even if one doesn't map, it is either simply ignored (in comments) or compiled as-is (string literals), emitting a mostly harmless C4819 warning.

However on DBCS (double-byte character set) systems, the situation gets much more complicated. With DBCS, part of the non-ASCII range (0x80 - 0xFF) is mapped as a lead-byte, indicating that the following character is also a part of the same character. The main problem is that when the source contains non-ASCII characters intended for use with SBCS, it is treated as a lead byte on DBCS systems, which causes the character immediately following it to be "eaten". In cases where, for example, the "eaten" character indicates the end of a string literal, it results in a broken source file which fails to compile.

Therefore, in order to allow successful compilation across all systems, source files containing non-ASCII chars should be fixed.

From my observation, there are several cases where this situation can occur:

1. Comments in the header containing the author name with accented characters, e.g. Herv*é* Poussineau in `base/system/autochk/autochk.c`

Most of these cases won't cause any errors. They can be ignored, but should be fixed if C4819 is to be avoided.

2. Character literals in keyboard layout files, as mentioned in ~~CORE-7417~~, e.g. line 178 of `dll/keyboard/kbdhu/kbdhu.c`

This can be fixed by simply replacing the char literal with an integer literal.

3. Translations for the command-line setup utility, e.g. `base/setup/usetup/lang/bg-BG.h`

These files are tricky because the string literals in them are intended to be compiled as-is, bit-exact, and is intended to be displayed with the corresponding SBCS codepage (note that there're no translations for DBCS codepages, except for Japanese though it only supports the single-byte chars for cp932). For this reason, these files cannot simply be converted into unicode since the compiler wouldn't convert them into their corresponding codepages. The only way to ensure proper compilation is to convert them into escape sequences, but that will cause serious annoyance when doing translations.

4. Resource files, which should have been fixed completely with ~~CORE-9021~~

Additional References:

All codepages used in Windows: https://msdn.microsoft.com/en-us/goglobal/bb964654

Attachments

Issue Links

relates to

CORE-7417 Fix kbdhu building error

Resolved

CORE-7415 Fix VS building when the default code page is a MBCS

Resolved

CORE-7807 MSVC Warning C4819 (invalid character in source file) when building on system with Windows-1252-incompatible code page

Resolved

Activity

People

Assignee:: Bug Zilla

Reporter:: Alvin Wong

Votes:: 0 Vote for this issue

Watchers:: 1 Start watching this issue

Dates

Created:: 2015-04-26 06:25

Updated:: 2017-09-27 14:29