Inspired by ordereddict,
this is a packaging of an improved shlex module for Python 2 that handles
Unicode properly.
Shlex is "A lexical analyzer class for simple shell-like syntaxes."
If you've found your way here,
you probably already know that the standard shlex doesn't handle Unicode prior
to Python 3
(see bug 1170 <http://bugs.python.org/issue1170>
_ for details).
Since Python 2.7.3 however,
it accepts unicode objects.
Sadly, it still does not handle non-ascii chars:
.. code-block:: python
>>> import sys, shlex
>>> sys.version
'2.7.5+ ...'
>>> shlex.split(u'Hello world')
['Hello', 'world']
>>> shlex.split(u'café')
Traceback (most recent call last):
File "<input>", line 1, in <module>
File "/usr/lib/python2.7/shlex.py", line 275, in split
lex = shlex(s, posix=posix)
File "/usr/lib/python2.7/shlex.py", line 25, in __init__
instream = StringIO(instream)
UnicodeEncodeError: 'ascii' codec can't encode character u'\xe9' in
position 3: ordinal not in range(128)
This module does handle unicode objects and byte strings under Python 2.x:
.. code-block:: python
>>> import ushlex as shlex
>>> shlex.split(u'café')
[u'caf\xe9']
>>> shlex.split(u'echo "☺ ☕ ♫"')
[u'echo', u'\u263a \u2615 \u266b']
>>> from ushlex import split as shplit
>>> shplit('echo "hello there"')
['echo', 'hello there']
I found these release notes inside::
# Module and documentation by Eric S. Raymond, 21 Dec 1998
# Input stacking and error message cleanup added by ESR, March 2000
# push_source() and pop_source() made explicit by ESR, January 2001.
# Posix compliance, split(), string arguments, and
# iterator interface by Gustavo Niemeyer, April 2003.
# Modified to support Unicode by Colin Walters, Dec 2007
Bugs
Packaging-only bugs may be submitted to bitbucket.
Do not enter bugs for ushlex itself,
as the packager is not the author.