This is a library addressed to satisfy the requirements for RC resource related functionality extensions. To understand better the information described in this document, first you may want no read more About Resource Files: http://msdn.microsoft.com/en-us/library/windows/desktop/aa380599.aspx
Although currently it contains only number-dependence string handling, the library is not intended to be limited to that if other similar functionality needs will occur. Number-dependence is a way of determining a specific string according to a language-specific rule. The most exemplifiable use-case is the plural-forms handling, where different languages have different plural-forms and different rules related to them.
For an overview on the issues that have led to the setting up of this library, you may visit: https://jira.reactos.org/browse/CORE-8438
The resource-definition script (RC) files, offer a simple and efficient way of handling various types of application-data needed at run-time. Its simplicity however, does not come without limitations, as it proved to be when the need for a new type of resource object occurred. In such case, the options seemed to be either to develop an extension of the RC compiler in order to be able to efficiently handle a larger set of resource-definition statements, or to take advantage of the limited support for custom-data definition that RC has, in order to tailor it as much as possible to the new needs. This library explores the second option. It relies on script's RCDATA statement for specific raw resource structure definitions and implements methods designed to make use of them in a easy manner. You can learn more about RCDATA from its MSDN page: https://msdn.microsoft.com/en-us/library/windows/desktop/aa381039.aspx
One of the advantages of having a solution implemented as a library is the same for lots of other solutions - portability. Having your application dependent on some small library is better than having to rely on some non-standard (or not- yet standardized) extension of RC compiler. Also, there is the freedom of developing this library further at any time, with minimal consequence-related friction in order to satisfy new needs if such needs will occur.
On disadvantages, a library can not, compared to some hypothetical extension of the RC compiler, be by any means an optimal solution. That is because the RC compiler can do, like any compiler for that matter, things at... compile-time, and of course there is no way to compensate performance-wise for that with any kind of run-time tricks. That being said, performance was an important consideration in designing the model of data handling that this library employs.
An example when the RC falls short with its predefined resource statements is when asymmetrical resources for different localizations may become necessary. That is, a structure being allowed to be defined differently for different languages. One such case is the strings definition that depend on grammatical number. In a language like English, there may be strings for only two instances - singular and plural, like "file" and "files". (Actually, when a message gets formatted to user interface, they often are merged into one - "file(s)", but such approach is imperfect for English and downright impossible for other languages.) Other languages may have only one form (less than English) or more than two for such instances. (English has also more than two necessary forms for some real-life cases, like ordinal numbers which will be used later for exemplification.) Now, if things can be dealt with by considering in advance the number of instances that occur in a given language, it gets tricky when the number of languages and especially the number of necessary forms for a given language is unknown. Here's where the current library steps in by providing an abstract layer that deals with this part of the problem.
To achieve this, first some formal definitions of new RC structures becomes necessary in order for these to be correctly read and used later, at run-time.
In order to achieve resource number-dependence, a model of two separate RC structures were designed. One of them stores the body of all the possible strings of the given context, and the other one addresses the complexity of choosing the right string given an external (provided at run-time) number. We'll call this pair of structures "string-set" and "rule-set" respectively. The library uses the rule-set to obtain an index to one of the available strings defined in the string-set structure. Both these structures can be defined differently from one language to another in order to address their number-dependence grammar specifics.
Although all the needed information can be (and it was, in the initial steps of development) defined in one single structure, due to several considerations it was however changed along the way to the current form. These considerations are the desired modularity (making the less various rule-related information available across multiple string-storing parts, and the other way around - to deal with the strings per rule ratio, where the amount of strings is always less than or equal to the amount of rules), simplicity (as each of the remaining structures resulted less adorned), and speed (because the rule-sets got smaller and more likely to be processed faster due to caching).
A rule-set is a series of integral numbers embedded into a RCDATA structure:
WHATEVERRULESET_ID RCDATA
BEGIN
amount_of_rules,
1strule_lowbound, 1strule_divisor, 1strule_highlimit, 1strule_result,
/* ...all the other rules */
lastrule_lowbound, lastrule_divisor, lastrule_highlimit, lastrule_result
END
That is the amount of rules followed by sequences of three to four numbers forming the rule definitions.
How it works? The library uses the rule-set and the aforementioned external number (which we'll call from now on "deciding number") to generate an index for the string-set resource. The process of result deduction involves a pass through the RCDATA block, interpreting its raw content. When processing this structure, the library expects an amount of data sufficient for an amount of rules declared in "amount_of_rules" (which is a WORD, i.e. a "short int" value) before it stops. Each rule consists of tree or four WORD integrals, which are rule constraints (the "low bound" and the "high limit"), the divisor, and the rule's result. The variable amount is due to rule's divisor being equal to or different than zero, in which case the rule's next part - the high limit - is omitted or not. The rule's result is considered valid if all the rule's constraints were satisfied, and only the result of the last validated rule is returned in the end. Passing each of the rule's constraints involves operations with the deciding number as follows:
Comparison - if greater than or equal with rule's low bound. Modulo operation with rule's divisor (if rule's divisor is not zero). Comparison - if smaller than rule's high limit (if the divisor wasn't zero).
The first comparison is made between the deciding number and the rule's low limit, the second comparison is between the rule's second constraint and the remainder of the deciding number divided to rule's divisor.
The case where all these operations are involved defines a "cycling rule". If, the rule's divisor is zero (which defines it as "non-cycling rule"), then the low bound remains the the rule's only constraint for result validation. Here's the example of English plural-forms rule-set with non-cycling rules (unlike the one defined in gndrrd_b.rc):
ENGLISH_PLURAL_FORMS_RULESET_ID RCDATA
BEGIN
3, /* three rules in total */
0, 0, 0, /* this rule is always validated, as any integral number is
* greater or equal with zero; rule's (and rule-set's default)
* result is "0" */
1, 0, 1, /* this rule gets validated only for deciding number being
* greater than or equal with one, thus overrides the previous
* rule-set for all but zero; rule's result is "1"
*/
2, 0, 0 /* this rule gets validated when the deciding number is greater
* than or equal with two, thus overrides the previous
* rule-set for all but zero and one; rule's result is "0"
*/
END
In order to to exemplify the cycling rules, the following one determines the odd or even nature of the deciding number (although admittedly not very useful rule-set outside its demonstrative purpose, considering that maths doesn't vary across different languages):
ODD_OR_EVEN_RULESET_ID RCDATA
BEGIN
2, /* two rules in total */
0, 0, 1, /* this rule is always validated, as any integral number is
* greater or equal with zero; rule's (and rule-set's default)
* result is "1" */
0, 2, 1, 0 /* this cycling rule gets validated when the deciding number is
* greater than or equal to zero and its division remainder to
* two is less than one, which gives us all the even numbers
* and leaves the odd numbers validated only in the previous
* rule; rule's result is "0"
*/
END
Note that the rule's values can be set to the maximum value (65535 or -1), which may be useful in some cases. Here's the example of English plural-forms rule-set with cycling rule (as defined in gndrrd_b.rc), commented:
/* ENGLISH_PLURAL_FORMS_RULESET_ID */ GNDR_RULESET_ID RCDATA
BEGIN
2, /* two rules in total */
0, 0, 1, /* this rule is always validated, as any integral number is
* greater or equal with zero; rule's (and rule-set's default)
* result is "1" */
1, -1, 2, 0 /* this cycling rule gets validated when the deciding number is
* greater than or equal to one and its division to the highest
* value available (i.e. the deciding number unchanged) is
* smaller than two, which gives us a very narrow validating
* case of deciding number being one; rule's result is "0"
*/
END
A string-set is a RCDATA structure embedding a rule-set reference ID and a series of explicitly zero-delimited literal strings:
WHATEVERSTRINGSET_ID RCDATA
BEGIN
ruleset_id,
first_string,
second_string,
/* ...all the other strings */
laststring
END
The "ruleset_id" is a WORD integral, the strings are regular strings ending with the terminal character ('\0'). Here's the string-set corresponding to the odd/even rule-set:
ODD_OR_EVEN_STRINGSET_ID RCDATA
BEGIN
ODD_OR_EVEN_RULESET_ID, /* reference to the correct rule-set for strings */
"even\0", /* index 0 */
"odd\0" /* index 1 */
END
Consider that although string-sets are dependent on (and are acting as clients of) their rule-set, when the actual developing of a new number-dependent pair occurs, it is easier to start by laying out the string-set with all the possible strings first. Check out the following tutorial in order to learn how to do that.
In order to illustrate better the mechanics of string selection, let's take a step-by-step approach by creating rule-sets and their corresponding string-sets for several examples.
In English, a good demonstration case can be made for ordinal numeral suffixes. The rule here is that different numbers get different suffixes: "st" for cardinal numeral "1" resulting in "1st", "nd" for "2", "rd" for "3", and "th" for "4" and higher. Thus, a string-set can be defined as follows:
ORDINAL_STRINGSET_ID RCDATA
BEGIN
ORDINAL_RULESET_ID, /* reference to the correct rule-set */
"st\0", /* index 0, for case one */
"nd\0", /* index 1, for case two */
"rd\0", /* index 2, for case three */
"th\0" /* index 3, for case four */
END
Note that for the strings inside this structure null-terminal character ('\0') must be manually specified.
Now let's shape an acting rule-set for numbers one to ten. This initial limited domain and the simplistic approach of its corresponding acting rule-set is chosen intentionally out of didactic reasons and will be further developed step by step to cover all the unsigned numbers domain. First we'll write the rules, then we'll specify the amount of them in first row of data. To cover the rule for "1st" we can say greater or equal with 1 (the lower bound), with 0 as divisor (no cyclicity for now), to the index 0: "1, 0, 0,". Then the second rule for case two we say greater or equal with 2, again with no cyclicity, index 1: "2, 0, 1,". The same for the cases three and four, resulting in "3, 0, 2," and "4, 0, 3". Thus we have four rules, and a simple rule-set structure as follows:
ORDINAL_RULESET_ID RCDATA
BEGIN
4, /* the amount of rules */
1, 0, 0, /* for 1 (case one) */
2, 0, 1, /* for 2 (case two) */
3, 0, 2, /* for 3 (case three) */
4, 0, 3 /* for 4 and above (case four) */
END
The evaluation procedure will go for the given number N (the deciding number) by applying the amount of specified rules. In each rule N is compared with the first number in the row, then continue or stop if N is greater or equal, or less respectively. The second number in each rule is the divisor (for cyclicity) which is not considered for now and we leave it 0. The last number is the resulting index if none of the rest of the rules is validated.
Here for "0" no rule in the rule-set can be validated (and therefore neither a matching string can be retrieved). This can be fixed by adding an exception rule for 0 (assuming "0th" as the correct ordinal form for it): "0, 0, 3,". Our rule-set becomes now:
ORDINAL_RULESET_ID RCDATA
BEGIN
5, /* the amount of rules */
0, 0, 3, /* for 0 (case four) */
1, 0, 0, /* for 1 (case one) */
2, 0, 1, /* for 2 (case two) */
3, 0, 2, /* for 3 (case three) */
4, 0, 3 /* for 4 and above (case four) */
END
Now let's expand the numeral domain further, beyond the limit of 0 to 10. We can observe that case one can be found again at other numbers that have "1" as their least significant digit (e.g. 21, 31, 41, ..., 91, 101, 121, ...). We'll call this "cyclic occurrence" and in a NDR rule an additional restriction about it can be specified. Remember the second number left "0" in rule-set rows above? If it is different than 0, the number N that gets compared with the first number in the rule is then divided to this second number. If this second number is not "0", the rule has to have four numbers instead of three, where the third one is compared with the remainder of the previous division to validate or invalidate the rule. The last number in the rule is the rule's resulting index as before. Let's rewrite the second rule (for the case one) to cover all the numbers that have "1" as their least significant digit: "1, 10, 2, 0" which reads as "if N is greater than 1, and the remainder of the division by 10 is smaller than 2, return index 0". Rewriting in a similar manner the rest of the rules, we get:
ORDINAL_RULESET_ID RCDATA
BEGIN
5, /* the amount of rules */
0, 0, 3, /* for 0 (case four) */
/* (intended to be) cyclic, for each ten */
1, 10, 2, 0, /* (intended to be) for 1 (case one) */
2, 10, 3, 1, /* (intended to be) for 2 (case two) */
3, 10, 4, 2, /* (intended to be) for 3 (case three) */
4, 10, 5, 3 /* (intended to be) for 4 and above (case four) */
END
This has two problems, one regarding the initial constraint (for the deciding number having to be greater than 1, 2, 3, and 4) which is no longer necessary as the validation can be done solely on division remainder comparison, and the other is about the order of constraints as the later rules are more permissive than the previous ones and thus override them by resulting case four for every number above 3. Let's rectify this by removing the initial constraints and by reversing the order of rules:
ORDINAL_RULESET_ID RCDATA
BEGIN
5, /* the amount of rules */
0, 0, 3, /* acting as default - case four */
/* cyclic, for each ten */
0, 10, 4, 2, /* for 3, 2, 1, and 0 (case three) */
0, 10, 3, 1, /* for 2, 1, and 0 (case two) */
0, 10, 2, 0, /* for 1 and 0 (case one) */
0, 10, 1, 3 /* for 0 (case four) */
END
Here, the rule-set in the first rule prepares the case four as the default result, then each rule inflicts an additional constraint upon the previous one: the second rule overrides the default (four case - "th" ordinal suffix) for the values of N with their least significant digit smaller than 4 (which is 0, 1, 2, and 3), then the next rules are narrowing this pool further and further to smaller than 3 (0, 1, and 2), smaller than 2 (0, and 1), and smaller than 1 respectively.
This works for all the numbers having their least significant digit as 1, 2, 3, or something else, but in English there is one additional exception for the number domain of 11 to 13, which all take only case four prefix - "11th" instead of "11st", "12th" instead of "12nd", and "13th" instead of "13rd". We add one more exception-rule for them and get the new rule-set:
ORDINAL_RULESET_ID RCDATA
BEGIN
6, /* the amount of rules */
0, 0, 3, /* acting as default - case four */
/* cyclic, for each ten */
0, 10, 4, 2, /* for 3, 2, 1, and 0 (case three) */
0, 10, 3, 1, /* for 2, 1, and 0 (case two) */
0, 10, 2, 0, /* for 1 and 0 (case one) */
0, 10, 1, 3, /* for 0 (case four) */
11, -1, 14, 3 /* exception for 11, 12, and 13 (case four) */
END
The added line has its first number defining the low bound, then uses the cyclicity information to define a high limit ("-1" acts as the highest integral number) as the remainder of any division of a dividend smaller than the divisor is the dividend itself. The result is a rule that addresses only the numerical domain 11 to 13 of all the numbers.
At last, it should be noticed that in real life the last added exception should actually be applied also for other domains, like 111 to 113, 211 to 213, and so on, or in other words - cycled in each new hundred. After defining cyclic rules for it we finally get the complete rule-set:
ORDINAL_RULESET_ID RCDATA
BEGIN
10, /* the amount of rules */
0, 0, 3, /* acting as default - case four */
/* cyclic, for each ten */
0, 10, 4, 2, /* for 3, 2, 1, and 0 (case three) */
0, 10, 3, 1, /* for 2, 1, and 0 (case two) */
0, 10, 2, 0, /* for 1 and 0 (case one) */
0, 10, 1, 3, /* for 0 (case four) */
/* cyclic, for each hundred */
0, 100, 14, 3, /* for 13, 12, 11, and lower (case four) */
0, 100, 4, 2, /* for 3, 2, 1, and 0 (case three) */
0, 100, 3, 1, /* for 2, 1, and 0 (case two) */
0, 100, 2, 0, /* for 1 and 0 (case one) */
0, 100, 1, 3 /* for 0 (case four) */
END
Note that usually you won't have to write rule-sets, as by far the most relevant use-case is working with resources involving cardinal numeral related grammar inflection formulas, rule-sets for which already exist (see below).
This library has defined several plural-forms rule-sets for a number of
languages. These rule-sets are found in "gndrrd" directory (which stands for
Grammatical Number Dependent Resources Rule-sets Definitions). Here you have an
alphabetical list with languages for which rule-set definitions already exist.
The information used as source for writing these rule-sets:
https://www.gnu.org/software/gettext/manual/html_node/Plural-forms.html
http://localization-guide.readthedocs.org/en/latest/l10n/pluralforms.html
Language | File name |
---|---|
Afrikaans | gndrrd_b.rc |
Albanian | gndrrd_b.rc |
Amharic | gndrrd_c.rc |
Arabic | gndrrd_l.rc |
Armenian | gndrrd_b.rc |
Assamese | gndrrd_b.rc |
Azeri | gndrrd_b.rc |
Basque | gndrrd_b.rc |
Belarusian | gndrrd_h.rc |
Bengali | gndrrd_b.rc |
Bosnian | gndrrd_h.rc |
Breton | gndrrd_c.rc |
Bulgarian | gndrrd_b.rc |
Catalan | gndrrd_b.rc |
Chinese | gndrrd_c.rc |
Croatian | gndrrd_h.rc |
Czech | gndrrd_i.rc |
Danish | gndrrd_b.rc |
Dutch | gndrrd_b.rc |
Georgian | gndrrd_a.rc |
German | gndrrd_b.rc |
Greek | gndrrd_b.rc |
Greenlandic | gndrrd_b.rc |
English | gndrrd_b.rc |
Estonian | gndrrd_b.rc |
Language | File name |
---|---|
Faroese | gndrrd_b.rc |
Filipino | gndrrd_c.rc |
Finnish | gndrrd_b.rc |
French | gndrrd_c.rc |
Frisian | gndrrd_b.rc |
Fula | gndrrd_a.rc |
Irish | gndrrd_m.rc |
Gaelic | gndrrd_o.rc |
Galician | gndrrd_b.rc |
Gujarati | gndrrd_b.rc |
Hausa | gndrrd_b.rc |
Hebrew | gndrrd_b.rc |
Hindi | gndrrd_b.rc |
Hungarian | gndrrd_b.rc |
Icelandic | gndrrd_p.rc |
Indonesian | gndrrd_a.rc |
Italian | gndrrd_b.rc |
Japanese | gndrrd_a.rc |
Kannada | gndrrd_b.rc |
Kazakh | gndrrd_a.rc |
Khmer | gndrrd_a.rc |
Kinyarwanda | gndrrd_b.rc |
Korean | gndrrd_a.rc |
Kurdish | gndrrd_b.rc |
Kyrgyz | gndrrd_a.rc |
Language | File name |
---|---|
Lao | gndrrd_a.rc |
Latvian | gndrrd_d.rc |
Lithuanian | gndrrd_g.rc |
Luxembourgish | gndrrd_b.rc |
Macedonian | gndrrd_q.rc |
Malagasy | gndrrd_c.rc |
Malay | gndrrd_a.rc |
Malayalam | gndrrd_b.rc |
Maltese | gndrrd_r.rc |
Maori | gndrrd_c.rc |
Mapudungun | gndrrd_c.rc |
Marathi | gndrrd_b.rc |
Meithei | gndrrd_b.rc |
Mongolian | gndrrd_b.rc |
Nepali | gndrrd_b.rc |
Norwegian | gndrrd_b.rc |
Occitan | gndrrd_c.rc |
Oriya | gndrrd_b.rc |
Pashto | gndrrd_b.rc |
Persian | gndrrd_a.rc |
Polish | gndrrd_j.rc |
Portuguese (Brasilian) | gndrrd_c.rc |
Portuguese (Portugal) | gndrrd_b.rc |
Punjabi | gndrrd_b.rc |
Romanian | gndrrd_f.rc |
Romansh | gndrrd_b.rc |
Russian | gndrrd_h.rc |
Language | File name |
---|---|
Sakha | gndrrd_a.rc |
Sami | gndrrd_b.rc |
Serbian | gndrrd_h.rc |
Sindhi | gndrrd_b.rc |
Sinhalese | gndrrd_b.rc |
Slovak | gndrrd_i.rc |
Slovene | gndrrd_k.rc |
Sotho | gndrrd_b.rc |
Spanish | gndrrd_b.rc |
Swedish | gndrrd_b.rc |
Swahili | gndrrd_b.rc |
Tajik | gndrrd_c.rc |
Tamil | gndrrd_b.rc |
Tatar | gndrrd_a.rc |
Telugu | gndrrd_b.rc |
Thai | gndrrd_a.rc |
Tibetan | gndrrd_a.rc |
Tigrinya | gndrrd_c.rc |
Turkish | gndrrd_c.rc |
Turkmen | gndrrd_b.rc |
Ukrainian | gndrrd_h.rc |
Urdu | gndrrd_b.rc |
Uyghur | gndrrd_a.rc |
Uzbek | gndrrd_c.rc |
Vietnamese | gndrrd_a.rc |
Walloon | gndrrd_c.rc |
Welsh | gndrrd_n.rc |
Wolof | gndrrd_a.rc |
Yoruba | gndrrd_b.rc |
Note that although not all the existing languages are listed (due to the unavailability of the necessary information to define rule-sets for), it is likely that many of the missing languages will share one of the existing plural-forms rules, just as a great deal of existing ones do.