Uploaded image for project: 'Core ReactOS'
  1. Core ReactOS
  2. CORE-20341

Buggy generation of l_intl.nls, leading to wrong uppercasing in some non-latin languages

    XMLWordPrintable

Details

    • Bug
    • Resolution: Unresolved
    • Major
    • None
    • Tools
    • None

    Description

      While installing ReactOS in russian, warnings such as the one below appear in the debug log during 2nd-stage installation:
      (sdk\lib\rtl\unicode.c:556) \u0434 is not valid for OEM

      This happens when invoking RtlIsValidOemCharacter(). This function can be invoked, for example, by RtlGenerate8dot3Name() from FatSelectNames(), when creating new directories/files on a FAT(12/16/32) volume, and the filesystem driver tries to generate a 8.3 short file name.

      In this example, the RtlIsValidOemCharacter() is invoked with *Char == 0x0434 (note: 16-bit WCHAR), and the global NlsMbOemCodePageTag is FALSE. Execution then goes to:

       546         /* Receive Unicode character from the table */
       547         UnicodeChar = RtlpUpcaseUnicodeChar(NlsOemToUnicodeTable[(UCHAR)NlsUnicodeToOemTable[*Char]]);
       548 
       549         /* Receive OEM character from the table */
       550         OemChar = NlsUnicodeToOemTable[UnicodeChar];
      

      Doing some investigations, I confirm that both NlsUnicodeToOemTable and NlsOemToUnicodeTable arrays appear to contain valid values, corresponding to the CP866 codepage generated from this file (in turn, coming from: http://www.unicode.org/Public/MAPPINGS/VENDORS/MICSFT/PC/CP866.TXT )
      and the conversions appear to be correct:

      nt!RtlIsValidOemCharacter+0xdd:
      8058c20d 0fb755fc        movzx   edx,word ptr [ebp-4]
      kd> ??*Char
      wchar_t 0x434 'д'
      kd> ??NlsUnicodeToOemTable[*Char]
      char 0xa4 ''
      kd> ??NlsOemToUnicodeTable[0xa4]
      wchar_t 0x434 'д'
      kd> ??NlsOemToUnicodeTable[(UCHAR)NlsUnicodeToOemTable[*Char]]
      wchar_t 0x434 'д'
      

      corresponding to this line:

      0xa4	0x0434	#CYRILLIC SMALL LETTER DE
      

      Furthermore, we can note, above in this same file:

      0x84	0x0414	#CYRILLIC CAPITAL LETTER DE
      

      i.e., that the uppercase of Cyrillic small letter De (0x434) should be Cyrillic capital letter De (0x0414).

      However, the upper-casing: UnicodeChar = RtlpUpcaseUnicodeChar(...) returns the WRONG result:

      kd> ??UnicodeChar
      wchar_t 0x40c 'Ќ'
      

      The upper-case data comes from the l_intl.nls file, that is MANUALLY generated by using our create_nls host-tool, using Unicode's http://www.unicode.org/Public/12.0.0/ucd/UnicodeData.txt
      (similar auto-generated code can be found there from Wine).

      Looking in UnicodeData.txt how this returned "0x40c" character came to be, the only explanation I have is that the generator code for l_intl.nls did a wrong parsing of the following lines:

      1040C;DESERET CAPITAL LETTER AY;Lu;0;L;;;;;N;;;;10434;
      ...
      10434;DESERET SMALL LETTER AY;Ll;0;L;;;;;N;;;1040C;;1040C
      

      This can happen if the numerical values (that are LARGER than the NT 16-bit WCHAR), are truncated: 0x1040C --> 0x040C and 0x10434 --> 0x0434.

      A hint appears to be here in create_nls.c
      where appears such suspicious (WCHAR) casts:

       701         /* 0. Code value */
       702         code = (WORD)strtol(p, &p, 16);
      

      etc. and similar for the upper/lower-case mappings (variable case_mapping).
      Converting these variables to full 32-bit unsigned int, removing these casts, AND performing boundary checks (the table doesn't appear to store Unicode points > 0xFFFF) would appear to be a valid solution.
      The similar code for the Wine-generated files appears to be in this Perl script.
      (IMPORTANT NOTE: Updated script can be found at: https://github.com/wine-mirror/wine/blob/master/tools/make_unicode )

      A final question is, how all this code can support multi-point UTF-16 values, in the future...

      Attachments

        Activity

          People

            Unassigned Unassigned
            hbelusca hbelusca
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated: