The RC SUPport LIBrary ====================== This is a library addressed to satisfy the requirements for RC resource related functionality extensions. To understand better the information described in this document, first you may want to read more About Resource Files: http://msdn.microsoft.com/en-us/library/windows/desktop/aa380599.aspx Although currently it contains only number-dependence string handling, the library is not intended to be limited to that if other similar functionality needs will occur. Number-dependence is a way of determining a specific string according to a language-specific rule. The most exemplifiable use-case is the plural-forms handling, where different languages have different plural-forms and different rules related to them. For an overview on the issues that have led to the setting up of this library, you may visit: https://jira.reactos.org/browse/CORE-8438 Contents ======== Title Ln No. ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Introduction 41 Presentation 43 Advantages and disadvantages 60 Number-dependent resources (in localizations) 77 Number-dependent resource syntax 102 The structure-set model 105 Rule-set format 128 String-set format 230 Tutorial 263 Existing rule-sets 437 Library API 564 Introduction ============ Presentation ------------ The resource-definition script (RC) files, offer a simple and efficient way of handling various types of application-data needed at run-time. Its simplicity however, does not come without limitations, as it proved to be when the need for a new type of resource object occurred. In such case, the options seemed to be either to develop an extension of the RC compiler in order to be able to efficiently handle a larger set of resource-definition statements, or to take advantage of the limited support for custom-data definition that RC has, in order to tailor it as much as possible to the new needs. This library explores the second option. It relies on script's RCDATA statement for specific raw resource structure definitions and implements methods designed to make use of them in an easy manner. You can learn more about RCDATA from its MSDN page: https://msdn.microsoft.com/en-us/library/windows/desktop/aa381039.aspx Advantages and disadvantages ---------------------------- One of the advantages of having a solution implemented as a library is the same for lots of other solutions - portability. Having your application dependent on some small library is better than having to rely on some non-standard (or not-yet standardized) extension of RC compiler. Also, there is the freedom of developing this library further at any time, with minimal consequence-related friction in order to satisfy new needs if such needs will occur. On disadvantages, a library can not, compared to some hypothetical extension of the RC compiler, be by any means an optimal solution. That is because the RC compiler can do, like any compiler for that matter, things at... compile-time, and of course there is no way to compensate for that (performance-wise) with any kind of run-time tricks. That being said, performance was an important consideration in designing the data model handling that this library employs. Number-dependent resources (in localizations) --------------------------------------------- An example when the RC falls short with its predefined resource statements is when asymmetrical resources for different localizations may become necessary. That is, a structure being allowed to be defined differently for different languages. One such case is the definition of strings that depend on grammatical number. In a language like English, there are strings for only two instances - singular and plural, like "file" and "files". (Actually, when a message gets formatted to user interface, they often are merged into one - "file(s)", but this approach is imperfect for English and downright impossible for other languages.) Other languages may have only one form (less than English) or more than two for such instances. (English has also more than two necessary forms for some real-life cases, like ordinal numbers which will be used later for exemplification.) Now, if things can be dealt with by considering in advance the number of instances that occur in a given language, it gets tricky when the number of languages and especially the number of necessary forms for a given language is unknown. Here's where the current library steps in by providing an abstract layer that deals with this part of the problem. To achieve this, first some formal definitions of new RC structures becomes necessary in order for these to be correctly read and used later, at run-time. Number-dependent resource syntax ================================ The structure-set model ----------------------- In order to achieve resource number-dependence, a model of two separate RC structures were designed. One of them stores the body of all the possible strings of a given context, and the other one addresses the complexity of choosing the right string given an external (provided at run-time) number. We'll call this pair of structures "string-set" and "rule-set" respectively. This library uses the rule-set to obtain an index to one of the available strings defined in the string-set structure. Both these structures can be defined differently from one language to another in order to address their number-dependence grammar specifics. Although all the needed information can be (and it was, in the initial steps of development) defined in one single structure, due to several considerations it was however changed along the way to the current form. These considerations are the desired modularity (making the less various rule-related information available across multiple string-storing parts, and the other way around - to deal with the strings per rule ratio, where the amount of strings is always less than or equal to the amount of rules), simplicity (as each of the remaining structures resulted less adorned), and speed (because the rule-sets got smaller and more likely to be processed faster due to caching). Rule-sets format ---------------- A rule-set is a series of integer numbers embedded into a RCDATA structure: WHATEVERRULESET_ID RCDATA BEGIN amount_of_rules, first_lowbound, first_divisor, first_highlimit, first_result, /* ...all the other rules */ last_lowbound, last_divisor, last_highlimit, last_result END That is the amount of rules followed by sequences of three to four numbers forming the rule definitions. How it works? The library uses the rule-set and the aforementioned external number (which we'll call from now on "deciding number") to generate an index for the string-set resource. The process of result deduction involves a pass through the RCDATA block, interpreting its raw content. When processing this structure, the library expects an amount of data sufficient for an amount of rules declared in "amount_of_rules" (which is a WORD, i.e. a "unsigned short int" value) before it stops. Each rule consists of tree or four WORD integers, which are the rule constraints (the "low bound" and the "high limit"), the divisor, and the rule's result. The variable amount is due to rule's divisor being equal to or different than zero, in which case the rule's next part - the high limit - is omitted or not. The rule's result is considered valid if all the rule's constraints were satisfied, and only the result of the last validated rule is returned in the end. Passing each of the rule's constraints involves operations with the deciding number as follows: * Comparison - if greater than or equal with rule's low bound. * Modulo operation with rule's divisor (if rule's divisor is not zero). * Comparison - if smaller than rule's high limit (if the divisor wasn't zero). The first comparison is made between the deciding number and the rule's low limit, the second comparison is between the rule's second constraint and the remainder of the deciding number divided to rule's divisor. The case where all these operations are involved defines a "cycling rule". If, the rule's divisor is zero (which leaves it as "non-cycling rule"), then the low bound remains the the rule's only constraint for result validation. Here's the example of English plural-forms rule-set with non-cycling rules (unlike the one defined in pfrsd_2a.rc): ENGLISH_PLURAL_FORMS_RULESET_ID RCDATA BEGIN 3, /* three rules in total */ 0, 0, 0, /* this rule is always validated, as any unsigned integer * number is greater or equal with zero; rule's (and * rule-set's default) result is "0" */ 1, 0, 1, /* this rule gets validated only for the deciding number * being greater than or equal with one, thus overrides the * previous rule-set for all but zero; rule's result is "1" */ 2, 0, 0 /* this rule gets validated when the deciding number is * greater than or equal with two, thus overrides the * previous rule-set for all but zero and one; rule's * result is "0" */ END In order to to exemplify the cycling rules, the following one determines the odd or even nature of the deciding number (although admittedly not very useful rule-set outside its demonstrative purpose, considering that math doesn't vary across different languages): ODD_OR_EVEN_RULESET_ID RCDATA BEGIN 2, /* two rules in total */ 0, 0, 1, /* this rule is always validated, as any unsigned integer * number is greater or equal with zero; rule's (and * rule-set's default) result is "1" */ 0, 2, 1, 0 /* this cycling rule gets validated when the deciding * number is greater than or equal to zero and its division * remainder to two is less than one, which gives us all * the even numbers and leaves the odd numbers validated * only in the previous rule; rule's result is "0" */ END Note that the rule's values can be set to the maximum value (65535 or -1), which may be useful in some cases. Here's the example of English plural-forms rule-set with cycling rule (as defined in pfrsd_2a.rc), commented: /* ENGLISH_PLURAL_FORMS_RULESET_ID */ PLURALFORMS_RULESET_ID RCDATA BEGIN 2, /* two rules in total */ 0, 0, 1, /* this rule is always validated, as any unsigned integer * number is greater or equal with zero; rule's (and * rule-set's default) result is "1" */ 1, -1, 2, 0 /* this cycling rule gets validated when the deciding * number is greater than or equal to one and its division * to the highest value available (i.e. the deciding number * unchanged) is smaller than two, which gives us a very * narrow validating case of the deciding number being one; * rule's result is "0" */ END String-sets format ------------------ A string-set is a RCDATA structure embedding a rule-set reference ID and a series of explicitly zero-delimited literal strings: WHATEVERSTRINGSET_ID RCDATA BEGIN ruleset_id, first_string, second_string, /* ...all the other strings */ laststring END The "ruleset_id" is a WORD integer, the strings are regular strings ending with the terminal character ('\0'). Here's the string-set corresponding to the odd/even rule-set: ODD_OR_EVEN_STRINGSET_ID RCDATA BEGIN ODD_OR_EVEN_RULESET_ID, /* ID of the previously-defined rule-set */ "even\0", /* index 0 */ "odd\0" /* index 1 */ END Consider that although string-sets are dependent on (and are acting as clients of) their rule-set, when the actual developing of a new number-dependent pair occurs, it is easier to start by laying out the string-set with all the possible strings first. Check out the following tutorial in order to learn how to do that. Tutorial -------- In order to illustrate better the mechanics of string selection, let's take a step-by-step approach by creating a comprehensive rule-set and its corresponding string-set for a non-trivial case. In English, a good example can be made of determining the ordinal numeral suffixes. There are four different endings in total: "st" (attachable to cardinal numeral "1" and alike, resulting in "1st"), "nd" (as in "2nd"), "rd" (as in "3rd"), and "th" (as in "4th"). Thus, a string-set can be defined as follows: ORDINAL_STRINGSET_ID RCDATA BEGIN ORDINAL_RULESET_ID, /* reference to the correct rule-set */ "st\0", /* index 0, for case one */ "nd\0", /* index 1, for case two */ "rd\0", /* index 2, for case three */ "th\0" /* index 3, for case four */ END Note that for the strings inside this structure null-terminal character ('\0') must be manually specified. Now let's shape an acting rule-set for numbers one to ten. This initial limited domain and the simplistic approach of its corresponding acting rule-set is chosen intentionally out of didactic reasons and will be further developed step by step to cover all the unsigned numbers domain. First we'll write the rules, then we'll specify the amount of them in first row of data. To cover the rule for "1st" we can say greater or equal with 1 (the lower bound), with 0 as divisor (no cyclicity for now), to the index 0: "1, 0, 0,". Then the second rule for case two we say greater or equal with 2, again with no cyclicity, index 1: "2, 0, 1,". The same for the cases three and four, resulting in "3, 0, 2," and "4, 0, 3". Thus we have four rules, and a simple rule-set structure as follows: ORDINAL_RULESET_ID RCDATA BEGIN 4, /* the amount of rules */ 1, 0, 0, /* for 1 (case one) */ 2, 0, 1, /* for 2 (case two) */ 3, 0, 2, /* for 3 (case three) */ 4, 0, 3 /* for 4 and above (case four) */ END The evaluation procedure will go for a given (deciding) number N by applying the amount of specified rules. In each rule N is compared with the first number in the row, then continue (with the rule validation) if N is greater or equal with rule's low bound, or will jumping to the next rule otherwise. The second number in each rule is the divisor (for cyclicity) which is not considered for now and we leave it 0. The last number is the resulting index of the rule (and of the rule-set if none of the later rules gets validated). Here for "0" no rule in the rule-set can be validated (and therefore neither a matching string can be retrieved). This can be fixed by adding an exception rule for 0 (assuming "0th" as the correct ordinal form for it): "0, 0, 3,". Our rule-set becomes now: ORDINAL_RULESET_ID RCDATA BEGIN 5, /* the amount of rules */ 0, 0, 3, /* for 0 (case four) */ 1, 0, 0, /* for 1 (case one) */ 2, 0, 1, /* for 2 (case two) */ 3, 0, 2, /* for 3 (case three) */ 4, 0, 3 /* for 4 and above (case four) */ END Now let's expand the numeral domain further, beyond the limit of 0 to 10. We can observe that case one can be found again at other numbers that have "1" as their least significant digit (e.g. 21, 31, 41, ..., 91, 101, 121, ...). We'll call this "cyclic occurrence" and in a number-dependent resource rule an additional restriction about it can be specified. Remember the second number left "0" in rule-set rows above? If it is different than 0, then the number N that got compared with the first number of the rule is now divided to this second number. If this second number is non-zero, the rule has to have four numbers instead of three, where the third one - the rule's high-limit - is compared with the remainder of the previous division (if higher or not). This comparison acts as a second constraint than will again validate or invalidate the rule. The last number is the rule's resulting index as before. Let's rewrite the second rule (for the case one) to cover all the numbers that have "1" as their least significant digit: "1, 10, 2, 0" which reads as "if N is greater than 1, and the remainder of the division by 10 is smaller than 2, return index 0". Rewriting the rest of the rules in the same manner, we get: ORDINAL_RULESET_ID RCDATA BEGIN 5, /* the amount of rules */ 0, 0, 3, /* for 0 (case four) */ /* (intended to be) cyclic, for each ten */ 1, 10, 2, 0, /* (intended to be) for 1 (case one) */ 2, 10, 3, 1, /* (intended to be) for 2 (case two) */ 3, 10, 4, 2, /* (intended to be) for 3 (case three) */ 4, 10, 5, 3 /* (intended to be) for 4 and above (case four) */ END This has two problems, one regarding the initial constraint (for the deciding number having to be greater than 1, 2, 3, and 4) which is no longer necessary as the validation can be done solely on division remainder comparison, and the other is about the order of constraints as the later rules are more permissive than the previous ones and thus override them by resulting case four for every number above 3. Let's rectify this by removing the initial constraints and by reversing the order of rules: ORDINAL_RULESET_ID RCDATA BEGIN 5, /* the amount of rules */ 0, 0, 3, /* acting as default - case four */ /* cyclic, for each ten */ 0, 10, 4, 2, /* for 3, 2, 1, and 0 (case three) */ 0, 10, 3, 1, /* for 2, 1, and 0 (case two) */ 0, 10, 2, 0, /* for 1 and 0 (case one) */ 0, 10, 1, 3 /* for 0 (case four) */ END Here, the rule-set in the first rule prepares the case four as the default result, then each rule inflicts an additional constraint upon the previous one: the second rule overrides the default (four case - "th" ordinal suffix) for the values of N with their least significant digit smaller than 4 (which is 0, 1, 2, and 3), then the next rules are narrowing this pool further and further to smaller than 3 (0, 1, and 2), smaller than 2 (0, and 1), and smaller than 1 respectively. This works for all the numbers having their least significant digit as 1, 2, 3, or something else, but in English there is one additional exception for the number domain of 11 to 13, which all take only case four prefix - "11th" instead of "11st", "12th" instead of "12nd", and "13th" instead of "13rd". We add one more exception-rule for them and get the new rule-set: ORDINAL_RULESET_ID RCDATA BEGIN 6, /* the amount of rules */ 0, 0, 3, /* acting as default - case four */ /* cyclic, for each ten */ 0, 10, 4, 2, /* for 3, 2, 1, and 0 (case three) */ 0, 10, 3, 1, /* for 2, 1, and 0 (case two) */ 0, 10, 2, 0, /* for 1 and 0 (case one) */ 0, 10, 1, 3, /* for 0 (case four) */ 11, -1, 14, 3 /* exception for 11, 12, and 13 (case four) */ END The added line has its first number defining the low bound, then uses the cyclicity information to define a high limit ("-1" acts as the highest integer number) as the remainder of any division of a dividend smaller than the divisor is the dividend itself. The result is a rule that addresses only the numerical domain 11 to 13 of all the numbers. At last, it should be noticed that in real life the last added exception should actually be applied also for other domains, like 111 to 113, 211 to 213, and so on, or in other words - cycled in each new hundred. After defining cyclic rules for it we finally get the complete rule-set: ORDINAL_RULESET_ID RCDATA BEGIN 10, /* the amount of rules */ 0, 0, 3, /* acting as default - case four */ /* cyclic, for each ten */ 0, 10, 4, 2, /* for 3, 2, 1, and 0 (case three) */ 0, 10, 3, 1, /* for 2, 1, and 0 (case two) */ 0, 10, 2, 0, /* for 1 and 0 (case one) */ 0, 10, 1, 3, /* for 0 (case four) */ /* cyclic, for each hundred */ 0, 100, 14, 3, /* for 13, 12, 11, and lower (case four) */ 0, 100, 4, 2, /* for 3, 2, 1, and 0 (case three) */ 0, 100, 3, 1, /* for 2, 1, and 0 (case two) */ 0, 100, 2, 0, /* for 1 and 0 (case one) */ 0, 100, 1, 3 /* for 0 (case four) */ END Note that usually you won't have to write rule-sets, as by far the most relevant use-case is working with resources involving cardinal numeral related grammar inflection formulas, rule-sets for which already exist (see below). Existing rule-sets ------------------ This library has defined several plural-forms rule-sets for a number of languages. These rule-sets are found in "gndrrd" directory (which stands for Grammatical Number Dependent Resources Rule-sets Definitions). Here you have an alphabetical list with languages for which rule-set definitions already exist. The information used as source for writing these rule-sets: https://www.gnu.org/software/gettext/manual/html_node/Plural-forms.html http://localization-guide.readthedocs.org/en/latest/l10n/pluralforms.html Language File name ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Afrikaans gndrrd_b.rc Albanian gndrrd_b.rc Amharic gndrrd_c.rc Arabic gndrrd_l.rc Armenian gndrrd_b.rc Assamese gndrrd_b.rc Azeri gndrrd_b.rc Basque gndrrd_b.rc Belarusian gndrrd_h.rc Bengali gndrrd_b.rc Bosnian gndrrd_h.rc Breton gndrrd_c.rc Bulgarian gndrrd_b.rc Catalan gndrrd_b.rc Chinese gndrrd_c.rc Croatian gndrrd_h.rc Czech gndrrd_i.rc Danish gndrrd_b.rc Dutch gndrrd_b.rc Georgian gndrrd_a.rc German gndrrd_b.rc Greek gndrrd_b.rc Greenlandic gndrrd_b.rc English gndrrd_b.rc Estonian gndrrd_b.rc Faroese gndrrd_b.rc Filipino gndrrd_c.rc Finnish gndrrd_b.rc French gndrrd_c.rc Frisian gndrrd_b.rc Fula gndrrd_a.rc Irish gndrrd_m.rc Gaelic gndrrd_o.rc Galician gndrrd_b.rc Gujarati gndrrd_b.rc Hausa gndrrd_b.rc Hebrew gndrrd_b.rc Hindi gndrrd_b.rc Hungarian gndrrd_b.rc Icelandic gndrrd_p.rc Indonesian gndrrd_a.rc Italian gndrrd_b.rc Japanese gndrrd_a.rc Kannada gndrrd_b.rc Kazakh gndrrd_a.rc Khmer gndrrd_a.rc Kinyarwanda gndrrd_b.rc Korean gndrrd_a.rc Kurdish gndrrd_b.rc Kyrgyz gndrrd_a.rc Lao gndrrd_a.rc Latvian gndrrd_d.rc Lithuanian gndrrd_g.rc Luxembourgish gndrrd_b.rc Macedonian gndrrd_q.rc Malagasy gndrrd_c.rc Malay gndrrd_a.rc Malayalam gndrrd_b.rc Maltese gndrrd_r.rc Maori gndrrd_c.rc Mapudungun gndrrd_c.rc Marathi gndrrd_b.rc Meithei gndrrd_b.rc Mongolian gndrrd_b.rc Nepali gndrrd_b.rc Norwegian gndrrd_b.rc Occitan gndrrd_c.rc Oriya gndrrd_b.rc Pashto gndrrd_b.rc Persian gndrrd_a.rc Polish gndrrd_j.rc Portuguese (Brasilian) gndrrd_c.rc Portuguese (Portugal) gndrrd_b.rc Punjabi gndrrd_b.rc Romanian gndrrd_f.rc Romansh gndrrd_b.rc Russian gndrrd_h.rc Sakha gndrrd_a.rc Sami gndrrd_b.rc Serbian gndrrd_h.rc Sindhi gndrrd_b.rc Sinhalese gndrrd_b.rc Slovak gndrrd_i.rc Slovene gndrrd_k.rc Sotho gndrrd_b.rc Spanish gndrrd_b.rc Swedish gndrrd_b.rc Swahili gndrrd_b.rc Tajik gndrrd_c.rc Tamil gndrrd_b.rc Tatar gndrrd_a.rc Telugu gndrrd_b.rc Thai gndrrd_a.rc Tibetan gndrrd_a.rc Tigrinya gndrrd_c.rc Turkish gndrrd_c.rc Turkmen gndrrd_b.rc Ukrainian gndrrd_h.rc Urdu gndrrd_b.rc Uyghur gndrrd_a.rc Uzbek gndrrd_c.rc Vietnamese gndrrd_a.rc Walloon gndrrd_c.rc Welsh gndrrd_n.rc Wolof gndrrd_a.rc Yoruba gndrrd_b.rc Note that although not all the existing languages are listed (due to the unavailability of the necessary information to define rule-sets for), it is likely that many of the missing languages will share one of the existing plural-forms rules, just as a great deal of existing ones do. Library API =========== /* TO BE COMPLETED LATER, IN THE MEANTIME SEE THE LIBRARY HEADER */