============
price-parser
.. image:: https://img.shields.io/pypi/v/price-parser.svg
:target: https://pypi.python.org/pypi/price-parser
:alt: PyPI Version
.. image:: https://img.shields.io/pypi/pyversions/price-parser.svg
:target: https://pypi.python.org/pypi/price-parser
:alt: Supported Python Versions
.. image:: https://travis-ci.org/scrapinghub/price-parser.svg?branch=master
:target: https://travis-ci.org/scrapinghub/price-parser
:alt: Build Status
.. image:: https://codecov.io/github/scrapinghub/price-parser/coverage.svg?branch=master
:target: https://codecov.io/gh/scrapinghub/price-parser
:alt: Coverage report
price-parser
is a small library for extracting price and currency from
raw text strings.
Features:
- robust price amount and currency symbol extraction
- zero-effort handling of thousand and decimal separators
The main use case is parsing prices extracted from web pages.
For example, you can write a CSS/XPath selector which targets an element
with a price, and then use this library for cleaning it up,
instead of writing custom site-specific regex or Python code.
License is BSD 3-clause.
Installation
::
pip install price-parser
price-parser requires Python 3.6+.
Usage
Basic usage
from price_parser import Price
price = Price.fromstring("22,90 €")
price
Price(amount=Decimal('22.90'), currency='€')
price.amount # numeric price amount
Decimal('22.90')
price.currency # currency symbol, as appears in the string
'€'
price.amount_text # price amount, as appears in the string
'22,90'
price.amount_float # price amount as float, not Decimal
22.9
If you prefer, Price.fromstring
has an alias price_parser.parse_price
,
they do the same:
from price_parser import parse_price
parse_price("22,90 €")
Price(amount=Decimal('22.90'), currency='€')
The library has extensive tests (900+ real-world examples of price strings).
Some of the supported cases are described below.
Supported cases
Unclean price strings with various currencies are supported;
thousand separators and decimal separators are handled:
Price.fromstring("Price: $119.00")
Price(amount=Decimal('119.00'), currency='$')
Price.fromstring("15 130 Р")
Price(amount=Decimal('15130'), currency='Р')
Price.fromstring("151,200 تومان")
Price(amount=Decimal('151200'), currency='تومان')
Price.fromstring("Rp 1.550.000")
Price(amount=Decimal('1550000'), currency='Rp')
Price.fromstring("Běžná cena 75 990,00 Kč")
Price(amount=Decimal('75990.00'), currency='Kč')
Euro sign is used as a decimal separator in a wild:
Price.fromstring("1,235€ 99")
Price(amount=Decimal('1235.99'), currency='€')
Price.fromstring("99 € 95 €")
Price(amount=Decimal('99'), currency='€')
Price.fromstring("35€ 999")
Price(amount=Decimal('35'), currency='€')
Some special cases are handled:
Price.fromstring("Free")
Price(amount=Decimal('0'), currency=None)
When price or currency can't be extracted, corresponding
attribute values are set to None:
Price.fromstring("")
Price(amount=None, currency=None)
Price.fromstring("Foo")
Price(amount=None, currency=None)
Price.fromstring("50% OFF")
Price(amount=None, currency=None)
Price.fromstring("50")
Price(amount=Decimal('50'), currency=None)
Price.fromstring("R$")
Price(amount=None, currency='R$')
Currency hints
currency_hint
argument allows to pass a text string which may (or may not)
contain currency information. This feature is most useful for automated price
extraction.
Price.fromstring("34.99", currency_hint="руб. (шт)")
Price(amount=Decimal('34.99'), currency='руб.')
Note that currency mentioned in the main price string may be
preferred over currency specified in currency_hint
argument;
it depends on currency symbols found there. If you know the correct currency,
you can set it directly:
price = Price.fromstring("1 000")
price.currency = 'EUR'
price
Price(amount=Decimal('1000'), currency='EUR')
Decimal separator
If you know which symbol is used as a decimal separator in the input string,
pass that symbol in the decimal_separator
argument to prevent price-parser
from guessing the wrong decimal separator symbol.
Price.fromstring("Price: $140.600", decimal_separator=".")
Price(amount=Decimal('140.600'), currency='$')
Price.fromstring("Price: $140.600", decimal_separator=",")
Price(amount=Decimal('140600'), currency='$')
Contributing
Use tox_ to run tests with different Python versions::
tox
The command above also runs type checks; we use mypy.
.. _tox: https://tox.readthedocs.io
Changes
0.3.4 (2020-11-25)
0.3.3 (2020-02-05)
- Fixed installation issue on some Windows machines.
0.3.2 (2020-01-28)
- Improved Korean and Japanese currency detection.
- Declare Python 3.8 support.
0.3.1 (2019-10-21)
- Redundant $ signs are no longer returned as a part of currency, e.g.
for
SGD$ 100
currency would be SGD
, not SGD$
.
0.3.0 (2019-10-19)
- New
Price.fromstring
argument decimal_separator
allows to override
decimal separator for the cases where it is known
(i.e. disable decimal separator detection); - NTD and RBM unofficial currency names are added;
- quantifiers in regular expressions are made non-greedy, which provides
a small speedup;
- test improvements.
0.2.4 (2019-07-03)
- Declare price-parser as providing type annotations (pep-561). This enables
better type checking for projects using price-parser.
- improved test coverage
0.2.3 (2019-06-18)
- Follow-up for 0.2.2 release: improved parsing of prices with 4+ digits
after a decimal separator.
0.2.2 (2019-06-18)
- Fixed parsing of prices with 4+ digits after a decimal separator.
0.2.1 (2019-04-19)
- 23 additional currency symbols are added;
A$
alias for Australian Dollar is added.
0.2 (2019-04-12)
Added support for currencies replaced by euro.
0.1.1 (2019-04-12)
Minor packaging fixes.
0.1 (2019-04-12)
Initial release.