Parse currency into numbers in Python

匿名 (未验证) 提交于 2019-12-03 02:20:02

问题:

I just learnt from Format numbers as currency in Python that the Python module babel provides babel.numbers.format_currency to format numbers as currency. For instance,

from babel.numbers import format_currency  s = format_currency(123456.789, 'USD', locale='en_US')  # u'$123,456.79' s = format_currency(123456.789, 'EUR', locale='fr_FR')  # u'123\xa0456,79\xa0\u20ac' 

How about the reverse, from currency to numbers, such as $123,456,789.00 --> 123456789? babel provides babel.numbers.parse_number to parse local numbers, but I didn't found something like parse_currency. So, what is the ideal way to parse local currency into numbers?


I went through Python: removing characters except digits from string.

# Way 1 import string all=string.maketrans('','') nodigs=all.translate(all, string.digits)  s = '$123,456.79' n = s.translate(all, nodigs)    # 12345679, lost `.`  # Way 2 import re n = re.sub("\D", "", s)         # 12345679 

It doesn't take care the decimal separator ..


Remove all non-numeric characters, except for ., from a string (refer to here),

import re  # Way 1: s = '$123,456.79' n = re.sub("[^0-9|.]", "", s)   # 123456.79  # Way 2: non_decimal = re.compile(r'[^\d.]+') s = '$123,456.79' n = non_decimal.sub('', s)      # 123456.79 

It does process the decimal separator ..


But the above solutions don't work when coming to, for instance,

As you can see, the format of currency varies. What is the ideal way to parse currency into numbers in a general way?

回答1:

Using babel

The babel documentation notes that the number parsing is not fully implemented yes but they have done a lot of work to get currency info into the library. You can use get_currency_name() and get_currency_symbol() to get currency details, and also all other get_... functions to get the normal number details (decimal point, minus sign, etc.).

Using that information you can exclude from a currency string the currency details (name, sign) and groupings (e.g. , in the US). Then you change the decimal details into the ones used by the C locale (- for minus, and . for the decimal point).

This results in this code (i added an object to keep some of the data, which may come handy in further processing):

import re, os from babel import numbers as n from babel.core import default_locale  class AmountInfo(object):     def __init__(self, name, symbol, value):         self.name = name         self.symbol = symbol         self.value = value  def parse_currency(value, cur):     decp = n.get_decimal_symbol()     plus = n.get_plus_sign_symbol()     minus = n.get_minus_sign_symbol()     group = n.get_group_symbol()     name = n.get_currency_name(cur)     symbol = n.get_currency_symbol(cur)     remove = [plus, name, symbol, group]     for token in remove:         # remove the pieces of information that shall be obvious         value = re.sub(re.escape(token), '', value)     # change the minus sign to a LOCALE=C minus     value = re.sub(re.escape(minus), '-', value)     # and change the decimal mark to a LOCALE=C decimal point     value = re.sub(re.escape(decp), '.', value)     # just in case remove extraneous spaces     value = re.sub('\s+', '', value)     return AmountInfo(name, symbol, value)  #cur_loc = os.environ['LC_ALL'] cur_loc = default_locale() print('locale:', cur_loc) test = [ (n.format_currency(123456.789, 'USD', locale=cur_loc), 'USD')        , (n.format_currency(-123456.78, 'PLN', locale=cur_loc), 'PLN')        , (n.format_currency(123456.789, 'PLN', locale=cur_loc), 'PLN')        , (n.format_currency(123456.789, 'IDR', locale=cur_loc), 'IDR')        , (n.format_currency(123456.789, 'JPY', locale=cur_loc), 'JPY')        , (n.format_currency(-123456.78, 'JPY', locale=cur_loc), 'JPY')        , (n.format_currency(123456.789, 'CNY', locale=cur_loc), 'CNY')        , (n.format_currency(-123456.78, 'CNY', locale=cur_loc), 'CNY')        ]  for v,c in test:     print('As currency :', c, ':', v.encode('utf-8'))     info = parse_currency(v, c)     print('As value    :', c, ':', info.value)     print('Extra info  :', info.name.encode('utf-8')                          , info.symbol.encode('utf-8')) 

The output looks promising (in US locale):

$ export LC_ALL=en_US $ ./cur.py locale: en_US As currency : USD : b'$123,456.79' As value    : USD : 123456.79 Extra info  : b'US Dollar' b'$' As currency : PLN : b'-z\xc5\x82123,456.78' As value    : PLN : -123456.78 Extra info  : b'Polish Zloty' b'z\xc5\x82' As currency : PLN : b'z\xc5\x82123,456.79' As value    : PLN : 123456.79 Extra info  : b'Polish Zloty' b'z\xc5\x82' As currency : IDR : b'Rp123,457' As value    : IDR : 123457 Extra info  : b'Indonesian Rupiah' b'Rp' As currency : JPY : b'\xc2\xa5123,457' As value    : JPY : 123457 Extra info  : b'Japanese Yen' b'\xc2\xa5' As currency : JPY : b'-\xc2\xa5123,457' As value    : JPY : -123457 Extra info  : b'Japanese Yen' b'\xc2\xa5' As currency : CNY : b'CN\xc2\xa5123,456.79' As value    : CNY : 123456.79 Extra info  : b'Chinese Yuan' b'CN\xc2\xa5' As currency : CNY : b'-CN\xc2\xa5123,456.78' As value    : CNY : -123456.78 Extra info  : b'Chinese Yuan' b'CN\xc2\xa5' 

And it still works in different locales (Brazil is notable for using the comma as a decimal mark):

$ export LC_ALL=pt_BR $ ./cur.py  locale: pt_BR As currency : USD : b'US$123.456,79' As value    : USD : 123456.79 Extra info  : b'D\xc3\xb3lar americano' b'US$' As currency : PLN : b'-PLN123.456,78' As value    : PLN : -123456.78 Extra info  : b'Zloti polon\xc3\xaas' b'PLN' As currency : PLN : b'PLN123.456,79' As value    : PLN : 123456.79 Extra info  : b'Zloti polon\xc3\xaas' b'PLN' As currency : IDR : b'IDR123.457' As value    : IDR : 123457 Extra info  : b'Rupia indon\xc3\xa9sia' b'IDR' As currency : JPY : b'JP\xc2\xa5123.457' As value    : JPY : 123457 Extra info  : b'Iene japon\xc3\xaas' b'JP\xc2\xa5' As currency : JPY : b'-JP\xc2\xa5123.457' As value    : JPY : -123457 Extra info  : b'Iene japon\xc3\xaas' b'JP\xc2\xa5' As currency : CNY : b'CN\xc2\xa5123.456,79' As value    : CNY : 123456.79 Extra info  : b'Yuan chin\xc3\xaas' b'CN\xc2\xa5' As currency : CNY : b'-CN\xc2\xa5123.456,78' As value    : CNY : -123456.78 Extra info  : b'Yuan chin\xc3\xaas' b'CN\xc2\xa5' 

It is worth to point out that babel has some encoding problems. That is because the locale files (in locale-data) do use different encoding themselves. If you're working with currencies you're familiar with that should not be a problem. But if you try unfamiliar currencies you might run into problems (i just learned that Poland uses iso-8859-2, not iso-8859-1).



易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!