Contents
print-nonascii: print lines that contain non-ASCII characters.
print-nonascii
is a Unix CLI that locates lines in text files or
stdin input that contain non-ASCII characters, which is helpful when
diagnosing character encoding problems.
Lines can be printed as-is and/or using abstract representations of non-ASCII
characters in one of several formats; namely:
-v
, --caret
... the same representation cat -v
uses, based on caret notation.--bash
... per-byte two-digit hex. escape sequences such as \xc3
--psh
... PowerShell Unicode escape sequences such as `u{20ac}
for €
Note: --psh
only works correctly with properly UTF-8-encoded input.
Line numbers can be prepended on request, and output for multiple input
files is by default preceded with headers identifying each input file.
Caveat: For now, no automated tests are run before releases.
Examples
$ cat <<'EOF' > /tmp/test.txt
one
twö
three
EOF
$ print-nonascii /tmp/test.txt
twö
$ print-nonascii -n /tmp/test.txt
2:twö
$ print-nonascii --psh --raw /tmp/test.txt
twö
tw`u{f6}
$ print-nonascii --bash --raw /tmp/test.txt
twö
tw\xc3\xb6
$ print-nonascii -n /tmp/test.txt /tmp/test.txt
2:twö
2:twö
Installation
Prerequisites
- When installing from the npm registry: macOS and Linux
- When installing manually: any Unix platform with
bash
that also has perl
installed.
Installation from the npm registry
With Node.js installed, install the package as follows:
[sudo] npm install print-nonascii -g
Note:
Note: Even if you don't use Node.js, its package manager, npm
, works across platforms and is easy to install; try curl -L https://git.io/n-install | bash
- Whether you need
sudo
depends on how you installed Node.js / io.js and whether you've changed permissions later; if you get an EACCES
error, try again with sudo
. - The
-g
ensures global installation and is needed to put print-nonascii
in your system's $PATH
.
Manual installation
- Download the CLI as
print-nonascii
. - Make it executable with
chmod +x print-nonascii
. - Move it or symlink it to a folder in your
$PATH
, such as /usr/local/bin
(macOS) or /usr/bin
(Linux).
Usage
Find concise usage information below; for complete documentation, read the manual online, or, once installed, run man print-nonascii
(print-nonascii --man
if installed manually).
$ print-nonascii --help
Prints lines that contain non-ASCII characters.
print-nonascii [--<mode> [-r]] [-n] [-b] [file ...]
print-nonascii -q [file ...]
--<mode> prints abstract representations of non-ASCII chars.; one of:
--caret, -v ... use caret notation, as cat -v would.
--bash ... represent non-ASCII bytes as \xhh
--psh ... (PowerShell) represent non-ASCII Unicode characters as
Unicode escape sequences: <backtick>u{h...}
-r, --raw ... with --<mode>, print each matching line as-is too, first.
-n, --line-number ... prefix the output lines with their line number from
the original file, using format "<line-number>:" - decimal line numbers,
no padding, no space before or after the ":"
-b, --bare ... suppress per-input-filename headers
-q ... quiet mode: produce no output; signal presence of non-ASCII chars.
with exit code 0; exit code 100 signals that there are none.
Standard options: --help, --man, --version, --home
License
Copyright (c) 2017 Michael Klement mklement0@gmail.com (http://same2u.net), released under the MIT license.
Acknowledgements
This project gratefully depends on the following open-source components, according to the terms of their respective licenses.
npm dependencies below have an optional suffix denoting the type of dependency: the absence of a suffix denotes a required run-time dependency; (D)
denotes a development-time-only dependency, (O)
an optional dependency, and (P)
a peer dependency.
npm dependencies
Changelog
Versioning complies with semantic versioning (semver).
-
v0.0.3 (2017-09-11):
- [enhancement] Header lines are now only printed for input files that produce at least 1 output line.
-
v0.0.2 (2017-09-10):
- [fix] Header line is no longer printed twice when
--<mode>
is combined with --raw
. - Header line now uses a tab char. to separate prefix
###
from the filename.
-
v0.0.1 (2017-09-10):