I just learnt from Format numbers as currency in Python that the Python module babel provides babel.numbers.format_currency
to format numbers as currency. For instance,
from babel.numbers import format_currency s = format_currency(123456.789, 'USD', locale='en_US') # u'$123,456.79' s = format_currency(123456.789, 'EUR', locale='fr_FR') # u'123\xa0456,79\xa0\u20ac'
How about the reverse, from currency to numbers, such as $123,456,789.00
--> 123456789
? babel
provides babel.numbers.parse_number
to parse local numbers, but I didn't found something like parse_currency
. So, what is the ideal way to parse local currency into numbers?
I went through Python: removing characters except digits from string.
# Way 1 import string all=string.maketrans('','') nodigs=all.translate(all, string.digits) s = '$123,456.79' n = s.translate(all, nodigs) # 12345679, lost `.` # Way 2 import re n = re.sub("\D", "", s) # 12345679
It doesn't take care the decimal separator .
.
Remove all non-numeric characters, except for .
, from a string (refer to here),
import re # Way 1: s = '$123,456.79' n = re.sub("[^0-9|.]", "", s) # 123456.79 # Way 2: non_decimal = re.compile(r'[^\d.]+') s = '$123,456.79' n = non_decimal.sub('', s) # 123456.79
It does process the decimal separator .
.
But the above solutions don't work when coming to, for instance,
As you can see, the format of currency varies. What is the ideal way to parse currency into numbers in a general way?
Using babel
The babel documentation notes that the number parsing is not fully implemented yes but they have done a lot of work to get currency info into the library. You can use get_currency_name()
and get_currency_symbol()
to get currency details, and also all other get_...
functions to get the normal number details (decimal point, minus sign, etc.).
Using that information you can exclude from a currency string the currency details (name, sign) and groupings (e.g. ,
in the US). Then you change the decimal details into the ones used by the C
locale (-
for minus, and .
for the decimal point).
This results in this code (i added an object to keep some of the data, which may come handy in further processing):
import re, os from babel import numbers as n from babel.core import default_locale class AmountInfo(object): def __init__(self, name, symbol, value): self.name = name self.symbol = symbol self.value = value def parse_currency(value, cur): decp = n.get_decimal_symbol() plus = n.get_plus_sign_symbol() minus = n.get_minus_sign_symbol() group = n.get_group_symbol() name = n.get_currency_name(cur) symbol = n.get_currency_symbol(cur) remove = [plus, name, symbol, group] for token in remove: # remove the pieces of information that shall be obvious value = re.sub(re.escape(token), '', value) # change the minus sign to a LOCALE=C minus value = re.sub(re.escape(minus), '-', value) # and change the decimal mark to a LOCALE=C decimal point value = re.sub(re.escape(decp), '.', value) # just in case remove extraneous spaces value = re.sub('\s+', '', value) return AmountInfo(name, symbol, value) #cur_loc = os.environ['LC_ALL'] cur_loc = default_locale() print('locale:', cur_loc) test = [ (n.format_currency(123456.789, 'USD', locale=cur_loc), 'USD') , (n.format_currency(-123456.78, 'PLN', locale=cur_loc), 'PLN') , (n.format_currency(123456.789, 'PLN', locale=cur_loc), 'PLN') , (n.format_currency(123456.789, 'IDR', locale=cur_loc), 'IDR') , (n.format_currency(123456.789, 'JPY', locale=cur_loc), 'JPY') , (n.format_currency(-123456.78, 'JPY', locale=cur_loc), 'JPY') , (n.format_currency(123456.789, 'CNY', locale=cur_loc), 'CNY') , (n.format_currency(-123456.78, 'CNY', locale=cur_loc), 'CNY') ] for v,c in test: print('As currency :', c, ':', v.encode('utf-8')) info = parse_currency(v, c) print('As value :', c, ':', info.value) print('Extra info :', info.name.encode('utf-8') , info.symbol.encode('utf-8'))
The output looks promising (in US locale):
$ export LC_ALL=en_US $ ./cur.py locale: en_US As currency : USD : b'$123,456.79' As value : USD : 123456.79 Extra info : b'US Dollar' b'$' As currency : PLN : b'-z\xc5\x82123,456.78' As value : PLN : -123456.78 Extra info : b'Polish Zloty' b'z\xc5\x82' As currency : PLN : b'z\xc5\x82123,456.79' As value : PLN : 123456.79 Extra info : b'Polish Zloty' b'z\xc5\x82' As currency : IDR : b'Rp123,457' As value : IDR : 123457 Extra info : b'Indonesian Rupiah' b'Rp' As currency : JPY : b'\xc2\xa5123,457' As value : JPY : 123457 Extra info : b'Japanese Yen' b'\xc2\xa5' As currency : JPY : b'-\xc2\xa5123,457' As value : JPY : -123457 Extra info : b'Japanese Yen' b'\xc2\xa5' As currency : CNY : b'CN\xc2\xa5123,456.79' As value : CNY : 123456.79 Extra info : b'Chinese Yuan' b'CN\xc2\xa5' As currency : CNY : b'-CN\xc2\xa5123,456.78' As value : CNY : -123456.78 Extra info : b'Chinese Yuan' b'CN\xc2\xa5'
And it still works in different locales (Brazil is notable for using the comma as a decimal mark):
$ export LC_ALL=pt_BR $ ./cur.py locale: pt_BR As currency : USD : b'US$123.456,79' As value : USD : 123456.79 Extra info : b'D\xc3\xb3lar americano' b'US$' As currency : PLN : b'-PLN123.456,78' As value : PLN : -123456.78 Extra info : b'Zloti polon\xc3\xaas' b'PLN' As currency : PLN : b'PLN123.456,79' As value : PLN : 123456.79 Extra info : b'Zloti polon\xc3\xaas' b'PLN' As currency : IDR : b'IDR123.457' As value : IDR : 123457 Extra info : b'Rupia indon\xc3\xa9sia' b'IDR' As currency : JPY : b'JP\xc2\xa5123.457' As value : JPY : 123457 Extra info : b'Iene japon\xc3\xaas' b'JP\xc2\xa5' As currency : JPY : b'-JP\xc2\xa5123.457' As value : JPY : -123457 Extra info : b'Iene japon\xc3\xaas' b'JP\xc2\xa5' As currency : CNY : b'CN\xc2\xa5123.456,79' As value : CNY : 123456.79 Extra info : b'Yuan chin\xc3\xaas' b'CN\xc2\xa5' As currency : CNY : b'-CN\xc2\xa5123.456,78' As value : CNY : -123456.78 Extra info : b'Yuan chin\xc3\xaas' b'CN\xc2\xa5'
It is worth to point out that babel
has some encoding problems. That is because the locale files (in locale-data
) do use different encoding themselves. If you're working with currencies you're familiar with that should not be a problem. But if you try unfamiliar currencies you might run into problems (i just learned that Poland uses iso-8859-2
, not iso-8859-1
).