file-online-preview/server/libreoffice/share/numbertext/mr.sor

504 lines
25 KiB
Java
Raw Normal View History

2021-06-23 02:26:22 +00:00
# mr.sor for - MARATHI Indian Language (mr-IN)
# In many Indian languages including MARATHI, rules of number reading (from 0 up to number 100) are complex and inconsistent.
# e.g often a number is read first with units place & then ten's place. - e.g 34 read as " चौतीस " where " चौ " stands for 4 (units place) and then for 30 (ten's place) which is inverse with the number reading logic in ENGLISH (where it is read as Thirty Four)
# Pronunciations of numbers changes and follows virtually no logic - e.g. 54 read as " चौपन्न " where ten's place 50 is read as but only No. 50 would be read as !
# when units place number is 9, the number is read with the reference to the NEXT number e.g. 39 is read as where " एकोण " stands for 9 (units place) and then with reference to 40 (the NEXT number) which is inverse with the number reading logic in ENGLISH (where it is read as Thirty Nine - reference of previous ten's place)
# Reading of same units place but different ten's place is vastly different - e.g. 27 , 47 , 67 , 77 . Here same units place 7 has been read differently as , , , ... very difficult to frame any logic !
# Therefore we have hard coded numbers from 0 to 100 with Marathi translations.
# Number reading after Hundred's place is very similar to English logic ... hence no problem in coding for further Marathi numbers
# --------------------------------------
# Number List in English , word in MARATHI
# - Ankur Joshi , pharmankur@gmail.com
#
^0
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
# ------------------------
# separator function
:0+ #
:0*\d?\d " आणि " # ि
:\d+ ", " # , ,
# ------------------------
(\d)(\d\d) $1[ $2] # default: Note - The intentional space before $ in [ $2] . This is done to eliminate Zero error after 100
(\d{1,2})(\d\d\d) $1 [ $2] #
(\d{1,2})(\d{5}) $1 $(:\2)$2 # 5 zero after number thus its a LAKH [ 5th power of 10 is LAKH thus expression (\d{5}) , after this next is CRORE which is 7th power i.e. diff of 2 powers thus expression (\d{1,2}) . Foww this henceforth if in future needs to adjust Name of decimal places ]
(\d{1,2})(\d{7}) $1 $(:\2)$2 # 7 zero after number thus its a CRORE
(\d{1,1})(\d{9}) $1 $(:\2)$2 # 9 zero after number thus its a ABJA (equivalant to a BILLION)
(\d{1,1})(\d{10}) $1 $(:\2)$2 # 10 zero after number thus its a KHARVA
(\d{1,1})(\d{11}) $1 ि$(:\2)$2 # 11 zero after number thus its a NIKHARVA
(\d{1,1})(\d{12}) $1 $(:\2)$2 # 12 zero after number thus its a MAHAPADMA (equivalant to a TRILLION)
(\d{1,1})(\d{13}) $1 $(:\2)$2 # 1 zero after number thus its a SHANKU
(\d{1,1})(\d{14}) $1 ि$(:\2)$2 # 14 zero after number thus its a JALADHI
(\d{1,1})(\d{15}) $1 $(:\2)$2 # 15 zero after number thus its a ANTYA (equivalant to a QUADRILLION)
(\d{1,1})(\d{16}) $1 $(:\2)$2 # 16 zero after number thus its a MADHYA
(\d{1,50})(\d{17}) $1 $(:\2)$2 # 17 zero after number thus its a PARARDH. Practically unlimited numbers are sumed um in Parardh.
### Above nameing for decimal places is with reference to work by BHASKARACHARYA (1150 AD) from book LILAVATI , SHLOKA (Verse) no 11 & 12 which is -->
### : : |
### ि : : || 11 ||
### ि ि : |
### : : : || 12 ||
###
### Verse is translated as -->
### Positions of the digits from right to left are unit, ten, hundred, thousand, ten thousand, hundred thousand (lakh), million, ten million (Crore), hundred million, billion (abja), Kharva, Nikharva, Mahapadma, Sanku, Jaladhi, Antya, Madhya, Parardha. The value of each digit on the left is ten times that on the right.
### Although for practical purpose this verse goes up to parardha (17th power of 10), there are terms for numbers up to 140th power of 10 in Sanskrit.
### In todays practice wordings and are not used and if used are replaced by and / respectively (Not used here)
###
# ------------------------
# negative number
[-](\d+) |$1
# ------------------------
# decimals
0[.,]
([-]?\d+)[.,] $1|
"([-]?\d+[.,])([^0]\d)" $1| |$2 # e.g.
"([-]?\d+[.,])(0)(0)(\d)" $1| |$2 |$3 |$4 # e.g.
"([-]?\d+[.,])(0)(\d\d)" $1| |$2 |$3 # e.g.
"([-]?\d+[.,])(\d\d\d)" $1| |$2 # e.g. , upto 3 places after decimal, decimals read in hundreds
"([-]?\d+[.,])(\d)(\d)(\d)(\d)" $1| |$2 |$3 |$4 |$5 # e.g.
"([-]?\d+[.,]\d*)(\d)" $1| |$2
# ------------------------
# currency
# unit/subunit singular/plural
us:([^,]*),([^,]*),([^,]*),([^,]*) \1
up:([^,]*),([^,]*),([^,]*),([^,]*) \2
ss:([^,]*),([^,]*),([^,]*),([^,]*) \3
sp:([^,]*),([^,]*),([^,]*),([^,]*) \4
AED:(\D+) $(\1: ि, ि, ि, ि)
AUD:(\D+) $(\1: ि , ि , , )
BGN:(\D+) $(\1: lev, leva, stotinka, stotinki)
BWP:(\D+) $(\1: pula, pula, thebe, thebe)
CAD:(\D+) $(\1: , , , )
CHF:(\D+) $(\1: ि , ि , , )
CNY:(\D+) $(\1: , , , )
CZK:(\D+) $(\1: Czech koruna, Czech koruny, halér, halére)
EEK:(\D+) $(\1: kroon, kroonid, sent, senti)
EUR:(\D+) $(\1: , , , )
GBP:(\D+) $(\1: ि, ि, , )
GHS:(\D+) $(\1: Ghana cedi, Ghana cedis, pesewa, pesewas)
GMD:(\D+) $(\1: dalasi, dalasi, butut, bututs)
HKD:(\D+) $(\1: , , , )
HRK:(\D+) $(\1: kuna, kuna, lipa, lipa)
HUF:(\D+) $(\1: forint, forint, fillér, fillér)
# --- Using Indian Rupee Symbol " ₹ " ------
INR:(\D+) $(\1: , , , )
# ------------------------------------------
JMD:(\D+) $(\1: Jamaica dollar, Jamaica dollars, cent, cents)
JPY:(\D+) $(\1: , , , )
KES:(\D+) $(\1: Kenyan shilling, Kenyan shillings, cent, cents)
KRW:(\D+) $(\1: Korean won, Korean won, jeon, jeon)
KWD:(\D+) $(\1: ि, ि, ि, ि)
LRD:(\D+) $(\1: Liberian dollar, Liberian dollars, cent, cents)
LSL:(\D+) $(\1: loti, maloti, sente, lisente)
LTL:(\D+) $(\1: litas, litai, centas, centai)
LVL:(\D+) $(\1: lats, lati, santims, santimi)
MGA:(\D+) $(\1: ariary, ariaries, iraimbilanja, iraimbilanja)
MUR:(\D+) $(\1: Mauritian rupee, Mauritian rupees, cent, cents)
MXN:(\D+) $(\1: Mexican peso, Mexican pesos, centavo, centavos)
MWK:(\D+) $(\1: Malawian kwacha, Malawian kwacha, tambala, tambala)
MYR:(\D+) $(\1: Ringgit, Ringgit, cent, cents)
NAD:(\D+) $(\1: Namibian dollar, Namibian dollars, cent, cents)
NGN:(\D+) $(\1: naira, naira, kobo, kobo)
NZD:(\D+) $(\1: , , , )
PGK:(\D+) $(\1: kina, kina, toea, toea)
PHP:(\D+) $(\1: Philippine peso, Philippine pesos, centavo, centavos)
PKR:(\D+) $(\1: ि , ि , , )
PLN:(\D+) $(\1: zloty, zlotys, grosz, groszy)
RON:(\D+) $(\1: Romanian leu, Romanian lei, ban, bani)
RSD:(\D+) $(\1: Serbian dinar, Serbian dinars, para, para)
RUB:(\D+) $(\1: Russian ruble, Russian rubles, kopek, kopeks)
RWF:(\D+) $(\1: Rwandese franc, Rwandese francs, centime, centimes)
SAR:(\D+) $(\1: ि, ि, , )
SDG:(\D+) $(\1: Sudanese pound, Sudanese pounds, piastre, piastres)
SGD:(\D+) $(\1: ि , ि , , )
SLL:(\D+) $(\1: leone, leones, cent, cents)
SZL:(\D+) $(\1: lilangeni, emalangeni, cent, cents)
THB:(\D+) $(\1: baht, baht, satang, satang)
TRY:(\D+) $(\1: Turkish lira, Turkish lira, kurus, kurus)
TTD:(\D+) $(\1: Trinidad and Tobago dollar, Trinidad and Tobago dollars, cent, cents)
TZS:(\D+) $(\1: Tanzanian shilling, Tanzanian shillings, cent, cents)
UAH:(\D+) $(\1: hryvnia, hryvnia, kopiyka, kopiyka)
UGX:(\D+) $(\1: Uganda shilling, Uganda shillings, cent, cents)
USD:(\D+) $(\1: . . , . . , , )
X[AO]F:(\D+) $(\1: CFA franc, CFA francs, centime, centimes)
ZAR:(\D+) $(\1: South African rand, South African rand, cent, cents)
ZMK:(\D+) $(\1: Zambian kwacha, Zambian kwacha, ngwee, ngwee)
ZWL:(\D+) $(\1: Zimbabwe dollar, Zimbabwe dollars, cent, cents)
"(JPY [-]?\d+)[.,](\d\d)0" $1
"(JPY [-]?\d+[.,]\d\d)(\d)" $1 $2 ि
"([A-Z]{3}) ([-]?1)([.,]00?)?" $2$(\1:us)
"([A-Z]{3}) ([-]?\d+)([.,]00?)?" $2$(\1:up)
"(CNY [-]?\d+)[.,](\d)0?" $1 $2
"(CNY [-]?\d+[.,]\d)(\d)" $1 $2
"((MGA|MRO) [-]?\d+)[.,]0" $1
"((MGA|MRO) [-]?\d+)[.,]2" $1 ि |$(1)$(\2:ss)
"((MGA|MRO) [-]?\d+)[.,]4" $1 ि |$(2)$(\2:sp)
"((MGA|MRO) [-]?\d+)[.,]6" $1 ि |$(3)$(\2:sp)
"((MGA|MRO) [-]?\d+)[.,]8" $1 ि |$(4)$(\2:sp)
"(([A-Z]{3}) [-]?\d+)[.,](01)" $1 ि |$(1)$(\2:ss)
"(([A-Z]{3}) [-]?\d+)[.,](\d)" $1 ि |$(\30)$(\2:sp)
"(([A-Z]{3}) [-]?\d+)[.,](\d\d)" $1 ि |$3$(\2:sp)
== money ==
"(JPY [-]?\d+)[.,](\d\d)0" $1
"(JPY [-]?\d+[.,]\d\d)(\d)" $1 $2 ि
"([A-Z]{3}) ([-]?1)([.,]00?)?" $2$(\1:us)
"([A-Z]{3}) ([-]?\d+)([.,]00?)?" $2$(\1:up)
"(CNY [-]?\d+)[.,](\d)0?" $1 $2
"(CNY [-]?\d+[.,]\d)(\d)" $1 $2
"(MGA|MRO) ([-]?\d+)[.,]0" $2$(\1:us)
"(MGA|MRO) ([-]?\d+)[.,]2" $2 ि 1/5$(\1:us)
"(MGA|MRO) ([-]?\d+)[.,]4" $2 ि 2/5$(\1:up)
"(MGA|MRO) ([-]?\d+)[.,]6" $2 ि 3/5$(\1:up)
"(MGA|MRO) ([-]?\d+)[.,]8" $2 ि 4/5$(\1:up)
"([A-Z]{3}) ([-]?1)" $2$(\1:us)
"([A-Z]{3}) ([-]?\d+)" $2$(\1:up)
"(([A-Z]{3}) ([-]?\d+))[.,](01)" $3 ि 1/100$(\2:us)
"(([A-Z]{3}) ([-]?\d+))[.,](\d)" $3 ि \40/100$(\2:up)
"(([A-Z]{3}) ([-]?\d+))[.,](\d\d)" $3 ि \4/100$(\2:up)
"(([A-Z]{3}) ([-]?\d+))[.,](\d\d\d)" $3 ि \4/1000$(\2:up)
# ------------------------
# Ordinal ------
# Ordinal no. reading in Marathi are GENDER dependent ( and not as simple in English where anyone at no 1 will be read as FIRST )
# in Marathi there are 3 gender identities Male ि , Female िि & Neutral ि (similar to masculine, feminine, neuter in Swiss )
# -----------------------
# If a sentence refer to MALE subject equivalent of FIRST is ि
# ordinal masculine --- ि
== ordinal-masculine ==
([-]?\d+) $(ordinal-masculine |$1)
(.*) \1ि
(.*) \1
(.*) \1
(.*) \1
(.*) \1
(.*) \1
(.*) \1
(.*) \1
(.*) \1
(.*) \1
(.*) \1
(.*) \1
(.*) \1
(.*) \1
(.*) \1
(.*) \1
(.*) \1
(.*) \1
(.*) \1
(.*) \1
(.*) \1
(.*) \1
(.*) \1
(.*) \1
(.*) \1
(.*) \1
(.*) \1
(.*) \1
([-]?\d+)[.,](.*) $1 $(\2) # Ordinal of Decimals
(.*) \1 # General Masculine Ordinal
# -----------------------
# If a sentence refer to FEMALE subject equivalent of FIRST is ि
# ordinal feminine --- िि
== ordinal-feminine ==
([-]?\d+) $(ordinal-feminine |$1)
(.*) \1ि
(.*) \1
(.*) \1
(.*) \1
(.*) \1
(.*) \1
(.*) \1
(.*) \1
(.*) \1
(.*) \1
(.*) \1
(.*) \1
(.*) \1
(.*) \1
(.*) \1
(.*) \1
(.*) \1
(.*) \1
(.*) \1
(.*) \1
(.*) \1
(.*) \1
(.*) \1
(.*) \1
(.*) \1
(.*) \1
(.*) \1
(.*) \1
([-]?\d+)[.,](.*) $1 $(\2) # Ordinal of Decimals
(.*) \1 # General Feminine Ordinal
# -----------------------
# If a sentence refer to NEUTRAL subject equivalent of FIRST is ि / ि
# ordinal neutral --- ि
== ordinal-neutral ==
([-]?\d+) $(ordinal-neutral |$1)
(.*) \1ि
(.*) \1
(.*) \1
(.*) \1
(.*) \1
(.*) \1
(.*) \1
(.*) \1
(.*) \1
(.*) \1
(.*) \1
(.*) \1
(.*) \1
(.*) \1
(.*) \1
(.*) \1
(.*) \1
(.*) \1
(.*) \1
(.*) \1
(.*) \1
(.*) \1
(.*) \1
(.*) \1
(.*) \1
(.*) \1
(.*) \1
(.*) \1
([-]?\d+)[.,](.*) $1 $(\2) # Ordinal of Decimals
(.*) \1 # General Neutral Ordinal
# -----------------------
# As the SUBJECT in the sentence forming is unknown and is out of scope of this code, default ordinal numbering is set to output all possible GENDERs separated by " / " , and hence may not deliver grammatically correct sentences ( we have hard coded Ordinal numbers from 1-10 with all possible GENDERs separated by " / " .. as in ि / ि / ि
# This is done deliberately considering ease of use.
# As separate commands for masculine, feminine & neutral eventhough exists, a User may not be aware. So by providing all gender words in default ordinal option, user at least will get some output of relevance.
# ordinal default --- ordinal words with all gender options separated by " / "
== ordinal == # Default
([-]?\d+) $(ordinal |$1)
(.*) \1ि / \1 ि / \1 ि
(.*) \1 / \1 / \1
(.*) \1 / \1 / \1
(.*) \1 / \1 / \1
(.*) \1 / \1 / \1
(.*) \1 / \1 / \1
(.*) \1 / \1 / \1
(.*) \1 / \1 / \1
(.*) \1 / \1 / \1
(.*) \1 / \1 / \1
(.*) \1 / \1 / \1
(.*) \1 / \1 / \1
(.*) \1 / \1 / \1
(.*) \1 / \1 / \1
(.*) \1 / \1 / \1
(.*) \1 / \1 / \1
(.*) \1 / \1 / \1
(.*) \1 / \1 / \1
(.*) \1 / \1 / \1
(.*) \1 / \1 / \1
(.*) \1 / \1 / \1
(.*) \1 / \1 / \1
(.*) \1 / \1 / \1
(.*) \1 / \1 / \1
(.*) \1 / \1 / \1
(.*) \1 / \1 / \1
(.*) \1 / \1 / \1
(.*) \1 / \1 / \1
([-]?\d+)[.,](.*) $1 $(\2) / $1 $(\2) / $1 $(\2) # Ordinal of Decimals
(.*) \1 / \1 / \1 # General ALL Gender Ordinals
# -----------------------
# Also we have considered to use generalized method where result for "ELEVEN" will be like - (similar to saying "Rank Eleven" in English in a gender neutral way.)
# ordinal Sequential --- ordinal-sq
== ordinal-sq ==
([-]?\d+) $(ordinal-sq |$1)
(.*) \1
# ------------------------
# ordinal-number
# Not relevant in Marathi
== ordinal-number ==
([-]?\d+) $(ordinal-number |$2)
(.*) \2
# ------------------------
# cardinal
# Not relevant in Marathi
== cardinal(-)? ==
([-]?\d+) $(cardinal |$2)
(.*) \2
# ------------------------
== year ==
(1[0-9])00 $1
(1[0-9])([0-9][0-9]) $1 $2 # e.g. 1857 = , 1947 =
(2[0-9])([0-9][0-9]) $1 $2 # e.g. 2021 =
(3[0-9])([0-9][0-9]) $1 $2
(4[0-9])([0-9][0-9]) $1 $2
(5[0-9])([0-9][0-9]) $1 $2
(6[0-9])([0-9][0-9]) $1 $2
(7[0-9])([0-9][0-9]) $1 $2
(8[0-9])([0-9][0-9]) $1 $2
(9[0-9])([0-9][0-9]) $1 $2
(.*) $(year-remove-and $1)
# ------------------------
== year-remove-and ==
"(.*) and (.*)" \1 \2
(.*) \1
== help ==
"" $(1)|, $(2), $(3)\n$(\0 ordinal)$(\0 ordinal-masculine)$(\0 ordinal-feminine)$(\0 ordinal-neutral)$(\0 ordinal-sq)$(\0 ordinal-number)year: $(year 1999), two thousand, $(year 2001)
"" \ncurrency \(for example, INR\): $(INR 2.5)\nmoney INR: $(money INR 2.5) \1: $(\1 1), $(\1 2), $(\1 3)\n