The RC SUPport LIBrary
======================

This is a library addressed to satisfy the requirements for RC resource related
functionality extensions. To understand better the information described in
this document, first you may want to read more About Resource Files:
http://msdn.microsoft.com/en-us/library/windows/desktop/aa380599.aspx

Although currently it contains only number-dependence string handling, the
library is not intended to be limited to that if other similar functionality
needs will occur. Number-dependence is a way of determining a specific string
according to a language-specific rule. The most exemplifiable use-case is the
plural-forms handling, where different languages have different plural-forms
and different rules related to them.

For an overview on the issues that have led to the setting up of this library,
you may visit: https://jira.reactos.org/browse/CORE-8438



Contents
========

    Title                                                                Ln No.
    ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    Introduction                                                             41
        Presentation                                                         43
        Advantages and disadvantages                                         60
        Number-dependent resources (in localizations)                        77
    Number-dependent resource syntax                                        102
        The structure-set model                                             105
        Rule-set format                                                     128
        String-set format                                                   230
        Tutorial                                                            263
        Existing rule-sets                                                  437
    Library API                                                             564



Introduction
============

Presentation
------------

The resource-definition script (RC) files, offer a simple and efficient way of
handling various types of application-data needed at run-time. Its simplicity
however, does not come without limitations, as it proved to be when the need
for a new type of resource object occurred. In such case, the options seemed to
be either to develop an extension of the RC compiler in order to be able to
efficiently handle a larger set of resource-definition statements, or to take
advantage of the limited support for custom-data definition that RC has, in
order to tailor it as much as possible to the new needs. This library explores
the second option. It relies on script's RCDATA statement for specific raw
resource structure definitions and implements methods designed to make use of
them in an easy manner. You can learn more about RCDATA from its MSDN page:
https://msdn.microsoft.com/en-us/library/windows/desktop/aa381039.aspx


Advantages and disadvantages
----------------------------

One of the advantages of having a solution implemented as a library is the same
for lots of other solutions - portability. Having your application dependent on
some small library is better than having to rely on some non-standard (or
not-yet standardized) extension of RC compiler. Also, there is the freedom of
developing this library further at any time, with minimal consequence-related
friction in order to satisfy new needs if such needs will occur.
On disadvantages, a library can not, compared to some hypothetical extension of
the RC compiler, be by any means an optimal solution. That is because the RC
compiler can do, like any compiler for that matter, things at... compile-time,
and of course there is no way to compensate for that (performance-wise) with
any kind of run-time tricks. That being said, performance was an important
consideration in designing the data model handling that this library employs.


Number-dependent resources (in localizations)
---------------------------------------------

An example when the RC falls short with its predefined resource statements is
when asymmetrical resources for different localizations may become necessary.
That is, a structure being allowed to be defined differently for different
languages. One such case is the definition of strings that depend on
grammatical number. In a language like English, there are strings for only two
instances - singular and plural, like "file" and "files". (Actually, when a
message gets formatted to user interface, they often are merged into one -
"file(s)", but this approach is imperfect for English and downright impossible
for other languages.) Other languages may have only one form (less than
English) or more than two for such instances. (English has also more than two
necessary forms for some real-life cases, like ordinal numbers which will be
used later for exemplification.) Now, if things can be dealt with by
considering in advance the number of instances that occur in a given language,
it gets tricky when the number of languages and especially the number of
necessary forms for a given language is unknown. Here's where the current
library steps in by providing an abstract layer that deals with this part of
the problem. To achieve this, first some formal definitions of new RC
structures becomes necessary in order for these to be correctly read and used
later, at run-time.



Number-dependent resource syntax
================================

The structure-set model
-----------------------

In order to achieve resource number-dependence, a model of two separate RC
structures were designed. One of them stores the body of all the possible
strings of a given context, and the other one addresses the complexity of
choosing the right string given an external (provided at run-time) number.
We'll call this pair of structures "string-set" and "rule-set" respectively.
This library uses the rule-set to obtain an index to one of the available
strings defined in the string-set structure. Both these structures can be
defined differently from one language to another in order to address their
number-dependence grammar specifics.
Although all the needed information can be (and it was, in the initial steps of
development) defined in one single structure, due to several considerations it
was however changed along the way to the current form. These considerations are
the desired modularity (making the less various rule-related information
available across multiple string-storing parts, and the other way around - to
deal with the strings per rule ratio, where the amount of strings is always
less than or equal to the amount of rules), simplicity (as each of the
remaining structures resulted less adorned), and speed (because the rule-sets
got smaller and more likely to be processed faster due to caching).


Rule-sets format
----------------

A rule-set is a series of integer numbers embedded into a RCDATA structure:

    WHATEVERRULESET_ID RCDATA
    BEGIN
        amount_of_rules,
        first_lowbound, first_divisor, first_highlimit, first_result,
        /* ...all the other rules */
        last_lowbound, last_divisor, last_highlimit, last_result
    END

That is the amount of rules followed by sequences of three to four numbers
forming the rule definitions.
How it works? The library uses the rule-set and the aforementioned external
number (which we'll call from now on "deciding number") to generate an index
for the string-set resource. The process of result deduction involves a pass
through the RCDATA block, interpreting its raw content. When processing this
structure, the library expects an amount of data sufficient for an amount of
rules declared in "amount_of_rules" (which is a WORD, i.e. a "unsigned short
int" value) before it stops. Each rule consists of tree or four WORD integers,
which are the rule constraints (the "low bound" and the "high limit"), the
divisor, and the rule's result. The variable amount is due to rule's divisor
being equal to or different than zero, in which case the rule's next part - the
high limit - is omitted or not. The rule's result is considered valid if all
the rule's constraints were satisfied, and only the result of the last
validated rule is returned in the end. Passing each of the rule's constraints
involves operations with the deciding number as follows:

*  Comparison - if greater than or equal with rule's low bound.
*  Modulo operation with rule's divisor (if rule's divisor is not zero).
*  Comparison - if smaller than rule's high limit (if the divisor wasn't zero).

The first comparison is made between the deciding number and the rule's low
limit, the second comparison is between the rule's second constraint and the
remainder of the deciding number divided to rule's divisor.
The case where all these operations are involved defines a "cycling rule".
If, the rule's divisor is zero (which leaves it as "non-cycling rule"), then
the low bound remains the the rule's only constraint for result validation.
Here's the example of English plural-forms rule-set with non-cycling rules
(unlike the one defined in pfrsd_2a.rc):

    ENGLISH_PLURAL_FORMS_RULESET_ID RCDATA
    BEGIN
        3, /* three rules in total */
        0, 0, 0,    /* this rule is always validated, as any unsigned integer
                     * number is greater or equal with zero; rule's (and
                     * rule-set's default) result is "0"
                     */
        1, 0, 1,    /* this rule gets validated only for the deciding number
                     * being greater than or equal with one, thus overrides the
                     * previous rule-set for all but zero; rule's result is "1"
                     */
        2, 0, 0     /* this rule gets validated when the deciding number is
                     * greater than or equal with two, thus overrides the
                     * previous rule-set for all but zero and one; rule's
                     * result is "0"
                     */
    END

In order to to exemplify the cycling rules, the following one determines the
odd or even nature of the deciding number (although admittedly not very useful
rule-set outside its demonstrative purpose, considering that math doesn't vary
across different languages):

    ODD_OR_EVEN_RULESET_ID RCDATA
    BEGIN
        2, /* two rules in total */
        0, 0, 1,    /* this rule is always validated, as any unsigned integer
                     * number is greater or equal with zero; rule's (and
                     * rule-set's default) result is "1"
                     */
        0, 2, 1, 0  /* this cycling rule gets validated when the deciding
                     * number is greater than or equal to zero and its division
                     * remainder to two is less than one, which gives us all
                     * the even numbers and leaves the odd numbers validated
                     * only in the previous rule; rule's result is "0"
                     */
    END

Note that the rule's values can be set to the maximum value (65535 or -1),
which may be useful in some cases. Here's the example of English plural-forms
rule-set with cycling rule (as defined in pfrsd_2a.rc), commented:

    /* ENGLISH_PLURAL_FORMS_RULESET_ID */ PLURALFORMS_RULESET_ID RCDATA
    BEGIN
        2, /* two rules in total */
        0, 0, 1,    /* this rule is always validated, as any unsigned integer
                     * number is greater or equal with zero; rule's (and
                     * rule-set's default) result is "1"
                     */
        1, -1, 2, 0 /* this cycling rule gets validated when the deciding
                     * number is greater than or equal to one and its division
                     * to the highest value available (i.e. the deciding number
                     * unchanged) is smaller than two, which gives us a very
                     * narrow validating case of the deciding number being one;
                     * rule's result is "0"
                     */
    END


String-sets format
------------------

A string-set is a RCDATA structure embedding a rule-set reference ID and a
series of explicitly zero-delimited literal strings:

    WHATEVERSTRINGSET_ID RCDATA
    BEGIN
        ruleset_id,
        first_string,
        second_string,
        /* ...all the other strings */
        laststring
    END

The "ruleset_id" is a WORD integer, the strings are regular strings ending
with the terminal character ('\0'). Here's the string-set corresponding to the
odd/even rule-set:

    ODD_OR_EVEN_STRINGSET_ID RCDATA
    BEGIN
        ODD_OR_EVEN_RULESET_ID, /* ID of the previously-defined rule-set */
        "even\0", /* index 0 */
        "odd\0" /* index 1 */
    END

Consider that although string-sets are dependent on (and are acting as clients
of) their rule-set, when the actual developing of a new number-dependent pair
occurs, it is easier to start by laying out the string-set with all the
possible strings first. Check out the following tutorial in order to learn how
to do that.


Tutorial
--------

In order to illustrate better the mechanics of string selection, let's take a
step-by-step approach by creating a comprehensive rule-set and its
corresponding string-set for a non-trivial case. In English, a good example can
be made of determining the ordinal numeral suffixes. There are four different
endings in total: "st" (attachable to cardinal numeral "1" and alike, resulting
in "1st"), "nd" (as in "2nd"), "rd" (as in "3rd"), and "th" (as in "4th").
Thus, a string-set can be defined as follows:

    ORDINAL_STRINGSET_ID RCDATA
    BEGIN
        ORDINAL_RULESET_ID, /* reference to the correct rule-set */
        "st\0", /* index 0, for case one */
        "nd\0", /* index 1, for case two */
        "rd\0", /* index 2, for case three */
        "th\0" /* index 3, for case four */
    END

Note that for the strings inside this structure null-terminal character ('\0')
must be manually specified.
Now let's shape an acting rule-set for numbers one to ten. This initial limited
domain and the simplistic approach of its corresponding acting rule-set is
chosen intentionally out of didactic reasons and will be further developed step
by step to cover all the unsigned numbers domain. First we'll write the rules,
then we'll specify the amount of them in first row of data. To cover the rule
for "1st" we can say greater or equal with 1 (the lower bound), with 0 as
divisor (no cyclicity for now), to the index 0: "1, 0, 0,". Then the second
rule for case two we say greater or equal with 2, again with no cyclicity,
index 1: "2, 0, 1,". The same for the cases three and four, resulting in "3, 0,
2," and "4, 0, 3". Thus we have four rules, and a simple rule-set structure as
follows:

    ORDINAL_RULESET_ID RCDATA
    BEGIN
        4, /* the amount of rules */
        1, 0, 0, /* for 1 (case one) */
        2, 0, 1, /* for 2 (case two) */
        3, 0, 2, /* for 3 (case three) */
        4, 0, 3 /* for 4 and above (case four) */
    END

The evaluation procedure will go for a given (deciding) number N by applying
the amount of specified rules. In each rule N is compared with the first number
in the row, then continue (with the rule validation) if N is greater or equal
with rule's low bound, or will jumping to the next rule otherwise. The second
number in each rule is the divisor (for cyclicity) which is not considered for
now and we leave it 0. The last number is the resulting index of the rule (and
of the rule-set if none of the later rules gets validated).
Here for "0" no rule in the rule-set can be validated (and therefore neither a
matching string can be retrieved). This can be fixed by adding an exception
rule for 0 (assuming "0th" as the correct ordinal form for it): "0, 0, 3,". Our
rule-set becomes now:

    ORDINAL_RULESET_ID RCDATA
    BEGIN
        5, /* the amount of rules */
        0, 0, 3, /* for 0 (case four) */
        1, 0, 0, /* for 1 (case one) */
        2, 0, 1, /* for 2 (case two) */
        3, 0, 2, /* for 3 (case three) */
        4, 0, 3 /* for 4 and above (case four) */
    END

Now let's expand the numeral domain further, beyond the limit of 0 to 10. We
can observe that case one can be found again at other numbers that have "1" as
their least significant digit (e.g. 21, 31, 41, ..., 91, 101, 121, ...). We'll
call this "cyclic occurrence" and in a number-dependent resource rule an
additional restriction about it can be specified. Remember the second number
left "0" in rule-set rows above? If it is different than 0, then the number N
that got compared with the first number of the rule is now divided to this
second number. If this second number is non-zero, the rule has to have four
numbers instead of three, where the third one - the rule's high-limit - is
compared with the remainder of the previous division (if higher or not). This
comparison acts as a second constraint than will again validate or invalidate
the rule. The last number is the rule's resulting index as before.
Let's rewrite the second rule (for the case one) to cover all the numbers that
have "1" as their least significant digit: "1, 10, 2, 0" which reads as "if N
is greater than 1, and the remainder of the division by 10 is smaller than 2,
return index 0". Rewriting the rest of the rules in the same manner, we get:

    ORDINAL_RULESET_ID RCDATA
    BEGIN
        5, /* the amount of rules */
        0, 0, 3, /* for 0 (case four) */

        /* (intended to be) cyclic, for each ten */
        1, 10, 2, 0, /* (intended to be) for 1 (case one) */
        2, 10, 3, 1, /* (intended to be) for 2 (case two) */
        3, 10, 4, 2, /* (intended to be) for 3 (case three) */
        4, 10, 5, 3 /* (intended to be) for 4 and above (case four) */
    END

This has two problems, one regarding the initial constraint (for the deciding
number having to be greater than 1, 2, 3, and 4) which is no longer necessary
as the validation can be done solely on division remainder comparison, and the
other is about the order of constraints as the later rules are more permissive
than the previous ones and thus override them by resulting case four for every
number above 3. Let's rectify this by removing the initial constraints and by
reversing the order of rules:

    ORDINAL_RULESET_ID RCDATA
    BEGIN
        5, /* the amount of rules */
        0, 0, 3, /* acting as default - case four */

        /* cyclic, for each ten */
        0, 10, 4, 2, /* for 3, 2, 1, and 0 (case three) */
        0, 10, 3, 1, /* for 2, 1, and 0 (case two) */
        0, 10, 2, 0, /* for 1 and 0 (case one) */
        0, 10, 1, 3 /* for 0 (case four) */
    END

Here, the rule-set in the first rule prepares the case four as the default
result, then each rule inflicts an additional constraint upon the previous one:
the second rule overrides the default (four case - "th" ordinal suffix) for the
values of N with their least significant digit smaller than 4 (which is 0, 1,
2, and 3), then the next rules are narrowing this pool further and further to
smaller than 3 (0, 1, and 2), smaller than 2 (0, and 1), and smaller than 1
respectively.
This works for all the numbers having their least significant digit as 1, 2, 3,
or something else, but in English there is one additional exception for the
number domain of 11 to 13, which all take only case four prefix - "11th"
instead of "11st", "12th" instead of "12nd", and "13th" instead of "13rd". We
add one more exception-rule for them and get the new rule-set:

    ORDINAL_RULESET_ID RCDATA
    BEGIN
        6, /* the amount of rules */
        0, 0, 3, /* acting as default - case four */

        /* cyclic, for each ten */
        0, 10, 4, 2, /* for 3, 2, 1, and 0 (case three) */
        0, 10, 3, 1, /* for 2, 1, and 0 (case two) */
        0, 10, 2, 0, /* for 1 and 0 (case one) */
        0, 10, 1, 3, /* for 0 (case four) */
        11, -1, 14, 3 /* exception for 11, 12, and 13 (case four) */
    END

The added line has its first number defining the low bound, then uses the
cyclicity information to define a high limit ("-1" acts as the highest integer
number) as the remainder of any division of a dividend smaller than the divisor
is the dividend itself. The result is a rule that addresses only the numerical
domain 11 to 13 of all the numbers.
At last, it should be noticed that in real life the last added exception should
actually be applied also for other domains, like 111 to 113, 211 to 213, and so
on, or in other words - cycled in each new hundred. After defining cyclic rules
for it we finally get the complete rule-set:

    ORDINAL_RULESET_ID RCDATA
    BEGIN
        10, /* the amount of rules */
        0, 0, 3, /* acting as default - case four */
        
        /* cyclic, for each ten */
        0, 10, 4, 2, /* for 3, 2, 1, and 0 (case three) */
        0, 10, 3, 1, /* for 2, 1, and 0 (case two) */
        0, 10, 2, 0, /* for 1 and 0 (case one) */
        0, 10, 1, 3, /* for 0 (case four) */
        
        /* cyclic, for each hundred */
        0, 100, 14, 3, /* for 13, 12, 11, and lower (case four) */
        0, 100, 4, 2, /* for 3, 2, 1, and 0 (case three) */
        0, 100, 3, 1, /* for 2, 1, and 0 (case two) */
        0, 100, 2, 0, /* for 1 and 0 (case one) */
        0, 100, 1, 3 /* for 0 (case four) */
    END

Note that usually you won't have to write rule-sets, as by far the most
relevant use-case is working with resources involving cardinal numeral related
grammar inflection formulas, rule-sets for which already exist (see below).


Existing rule-sets
------------------

This library has defined several plural-forms rule-sets for a number of
languages. These rule-sets are found in "gndrrd" directory (which stands for
Grammatical Number Dependent Resources Rule-sets Definitions). Here you have an
alphabetical list with languages for which rule-set definitions already exist.
The information used as source for writing these rule-sets:
https://www.gnu.org/software/gettext/manual/html_node/Plural-forms.html
http://localization-guide.readthedocs.org/en/latest/l10n/pluralforms.html

    Language                File name
    ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    Afrikaans               gndrrd_b.rc
    Albanian                gndrrd_b.rc
    Amharic                 gndrrd_c.rc
    Arabic                  gndrrd_l.rc
    Armenian                gndrrd_b.rc
    Assamese                gndrrd_b.rc
    Azeri                   gndrrd_b.rc
    Basque                  gndrrd_b.rc
    Belarusian              gndrrd_h.rc
    Bengali                 gndrrd_b.rc
    Bosnian                 gndrrd_h.rc
    Breton                  gndrrd_c.rc
    Bulgarian               gndrrd_b.rc
    Catalan                 gndrrd_b.rc
    Chinese                 gndrrd_c.rc
    Croatian                gndrrd_h.rc
    Czech                   gndrrd_i.rc
    Danish                  gndrrd_b.rc
    Dutch                   gndrrd_b.rc
    Georgian                gndrrd_a.rc
    German                  gndrrd_b.rc
    Greek                   gndrrd_b.rc
    Greenlandic             gndrrd_b.rc
    English                 gndrrd_b.rc
    Estonian                gndrrd_b.rc
    Faroese                 gndrrd_b.rc
    Filipino                gndrrd_c.rc
    Finnish                 gndrrd_b.rc
    French                  gndrrd_c.rc
    Frisian                 gndrrd_b.rc
    Fula                    gndrrd_a.rc
    Irish                   gndrrd_m.rc
    Gaelic                  gndrrd_o.rc
    Galician                gndrrd_b.rc
    Gujarati                gndrrd_b.rc
    Hausa                   gndrrd_b.rc
    Hebrew                  gndrrd_b.rc
    Hindi                   gndrrd_b.rc
    Hungarian               gndrrd_b.rc
    Icelandic               gndrrd_p.rc
    Indonesian              gndrrd_a.rc
    Italian                 gndrrd_b.rc
    Japanese                gndrrd_a.rc
    Kannada                 gndrrd_b.rc
    Kazakh                  gndrrd_a.rc
    Khmer                   gndrrd_a.rc
    Kinyarwanda             gndrrd_b.rc
    Korean                  gndrrd_a.rc
    Kurdish                 gndrrd_b.rc
    Kyrgyz                  gndrrd_a.rc
    Lao                     gndrrd_a.rc
    Latvian                 gndrrd_d.rc
    Lithuanian              gndrrd_g.rc
    Luxembourgish           gndrrd_b.rc
    Macedonian              gndrrd_q.rc
    Malagasy                gndrrd_c.rc
    Malay                   gndrrd_a.rc
    Malayalam               gndrrd_b.rc
    Maltese                 gndrrd_r.rc
    Maori                   gndrrd_c.rc
    Mapudungun              gndrrd_c.rc
    Marathi                 gndrrd_b.rc
    Meithei                 gndrrd_b.rc
    Mongolian               gndrrd_b.rc
    Nepali                  gndrrd_b.rc
    Norwegian               gndrrd_b.rc
    Occitan                 gndrrd_c.rc
    Oriya                   gndrrd_b.rc
    Pashto                  gndrrd_b.rc
    Persian                 gndrrd_a.rc
    Polish                  gndrrd_j.rc
    Portuguese (Brasilian)  gndrrd_c.rc
    Portuguese (Portugal)   gndrrd_b.rc
    Punjabi                 gndrrd_b.rc
    Romanian                gndrrd_f.rc
    Romansh                 gndrrd_b.rc
    Russian                 gndrrd_h.rc
    Sakha                   gndrrd_a.rc
    Sami                    gndrrd_b.rc
    Serbian                 gndrrd_h.rc
    Sindhi                  gndrrd_b.rc
    Sinhalese               gndrrd_b.rc
    Slovak                  gndrrd_i.rc
    Slovene                 gndrrd_k.rc
    Sotho                   gndrrd_b.rc
    Spanish                 gndrrd_b.rc
    Swedish                 gndrrd_b.rc
    Swahili                 gndrrd_b.rc
    Tajik                   gndrrd_c.rc
    Tamil                   gndrrd_b.rc
    Tatar                   gndrrd_a.rc
    Telugu                  gndrrd_b.rc
    Thai                    gndrrd_a.rc
    Tibetan                 gndrrd_a.rc
    Tigrinya                gndrrd_c.rc
    Turkish                 gndrrd_c.rc
    Turkmen                 gndrrd_b.rc
    Ukrainian               gndrrd_h.rc
    Urdu                    gndrrd_b.rc
    Uyghur                  gndrrd_a.rc
    Uzbek                   gndrrd_c.rc
    Vietnamese              gndrrd_a.rc
    Walloon                 gndrrd_c.rc
    Welsh                   gndrrd_n.rc
    Wolof                   gndrrd_a.rc
    Yoruba                  gndrrd_b.rc

Note that although not all the existing languages are listed (due to the
unavailability of the necessary information to define rule-sets for), it is
likely that many of the missing languages will share one of the existing
plural-forms rules, just as a great deal of existing ones do.



Library API
===========

/* TO BE COMPLETED LATER, IN THE MEANTIME SEE THE LIBRARY HEADER */