greplica
A grep clone in Python with both CLI and library interfaces, supporting ANSI color coding and more.
Shameless Promotion
Check out my other Python clone tools:
Known Differences with grep
- The -D, --devices option is not supported and no support is planned. All inputs are handled as
file streams only.
- Context cannot be given as raw number -NUM.
- The Python module
re
is internally used for all regular expressions. The inputted regular
expression is modified only when basic regular expressions are used. See --help for more
information.
Contribution
Feel free to open a bug report or make a merge request on github.
Installation
This project is uploaded to PyPI at https://pypi.org/project/greplica/
To install, ensure you are connected to the internet and execute: python3 -m pip install greplica --upgrade
Once installed, there will be a script called greplica
under Python's script directory. If grep
is not found on the system, then a script called grep
will also be installed. Ensure Python's
scripts directory is under the environment variable PATH
in order to be able to execute the script
properly from command line.
CLI Help
usage: greplica [-E | -F | -G] [-P] [-e EXPRESSIONS] [-f FILE [FILE ...]] [-i]
[--no-ignore-case] [-w] [-x] [--end END] [-z] [-s] [-v] [-V] [--help]
[-m NUM] [-b] [-n] [--line-buffered] [-H] [-h] [--label LABEL] [-o] [-q]
[--binary-files TYPE] [-a] [-I] [-d ACTION] [-r] [-R]
[--include GLOB [GLOB ...]] [--exclude GLOB [GLOB ...]]
[--exclude-from FILE [FILE ...]] [--exclude-dir GLOB [GLOB ...]] [-L] [-l]
[-c] [-T] [-Z] [--result-sep SEP] [--name-num-sep SEP] [--name-byte-sep SEP]
[--context-group-sep SEP] [--context-result-sep SEP]
[--context-name-num-sep SEP] [--context-name-byte-sep SEP] [-B NUM] [-A NUM]
[-C NUM] [--color [WHEN]] [-U]
[EXPRESSIONS] [FILE [FILE ...]]
Reimplementation of grep command entirely in Python.
positional arguments:
EXPRESSIONS Expressions to search for, separated by newline character (\n). This
is required if --regexp or --file are not specified.
FILE Files or directories to search. Stdin will be searched if not
specified, unless -r is specified. Then current directory will be
recursively searched.How directories are handled is controlled by -d
and -r options.
Expression Interpretation:
-E, --extended-regexp
EXPRESSIONS are "extended" regular expressions. In this mode,
greplica passes regular expressions directly to Python re without
modification. This for the most part matches original "extended"
syntax, but be aware that there will be differences.
-F, --fixed-strings EXPRESSIONS are strings
-G, --basic-regexp EXPRESSIONS are "basic" regular expressions. In this mode, greplica
modifies escaping sequences for characters ?+{}|() before passing to
Python re. This for the most part matches original "basic" syntax,
but be aware that there will be differences.
-P, --perl-regexp EXPRESSIONS are "perl" regular expressions. In this mode, greplica
passes regular expressions directly to Python re without
modification. This for the most part matches original "perl" syntax,
but be aware that there will be differences.
-e EXPRESSIONS, --regexp EXPRESSIONS
use EXPRESSIONS for matching
-f FILE [FILE ...], --file FILE [FILE ...]
take EXPRESSIONS from FILE
-i, --ignore-case ignore case in expressions
--no-ignore-case do not ignore case (default)
-w, --word-regexp match whole words only
-x, --line-regexp match whole lines only
--end END end-of-line character for parsing search files (default: \n); this
does not affect file parsing for -f or --exclude-from
-z, --null-data same as --end='\0'
Miscellaneous:
-s, --no-messages suppress error messages
-v, --invert-match select non-matching lines
-V, --version display version information and exit
--help display this help text and exit
Output control:
-m NUM, --max-count NUM
stop after NUM lines
-b, --byte-offset print line's byte offset with each line
-n, --line-number print line number with each line
--line-buffered flush output on each line
-H, --with-filename print file name with each line
-h, --no-filename suppress the file name output
--label LABEL use LABEL as the standard input file name
-o, --only-matching show only nonempty parts of lines that match
-q, --quiet, --silent
suppress all normal output
--binary-files TYPE sets how binary file is parsed; TYPE is 'binary', 'text', or
'without-match'
-a, --text same as --binary-files=text
-I same as --binary-files=without-match
-d ACTION, --directories ACTION
controls how directory input is handled in FILE; ACTION is 'read',
'recurse', or 'skip'
-r, --recursive same as --directories=recurse
-R, --dereference-recursive
same as --directories=recurse_links
--include GLOB [GLOB ...]
limit files to those matching GLOB
--exclude GLOB [GLOB ...]
skip files that match GLOB
--exclude-from FILE [FILE ...]
read FILE for exclude globs file name globs
--exclude-dir GLOB [GLOB ...]
skip directories that match GLOB
-L, --files-without-match
print only names of FILEs with no selected lines
-l, --files-with-matches
print only names of FILEs with selected lines
-c, --count print only a count of selected lines per FILE
-T, --initial-tab currently just adds tabs to each sep value (will make better later)
-Z, --null adds 0 to the end of result-sep
--result-sep SEP String to place between header info and and search output
--name-num-sep SEP String to place between file name and line number when both are
enabled
--name-byte-sep SEP String to place between file name and byte number when both are
enabled
--context-group-sep SEP
String to print between context groups
--context-result-sep SEP
String to place between header info and context line
--context-name-num-sep SEP
String to place between file name and line number on context line
--context-name-byte-sep SEP
String to place between file name and byte number on context line
Context Control:
-B NUM, --before-context NUM
print NUM lines of leading context
-A NUM, --after-context NUM
print NUM lines of trailing context
-C NUM, --context NUM
print NUM lines of output context
--color [WHEN], --colour [WHEN]
use ANSI escape codes to highlight the matching strings; WHEN is
'always', 'never', or 'auto'
-U, --binary do not strip CR characters at EOL (MSDOS/Windows)
Library Help
greplica can be used as a library from another module. The following is a simple example.
from greplica.grep import Grep
grep_obj = Grep()
grep_obj.add_expressions('hello .*ld')
grep_obj.add_files('file1.txt', 'path/to/file2.txt', 'path/to/directory/')
grep_obj.directory_handling_type = Grep.Directory.RECURSE
data = grep_obj.execute()
for f in data.files:
print('{}, {}, {}, {}'.format(f.filename, f.start_index, f.stop_index, f.num_matches))
for l in data.lines:
print('{}, {}, {}, {}'.format(l.filename, l.line_num, l.byte_offset, l.line))
for i in data.info
print('{}, {}'.format(i.filename, i.info))
for e in data.errors:
print('{}, {}'.format(e.filename, e.err_str))
The following describes initialization arguments to Grep.
__init__(self, out_file:io.IOBase=None, err_file:io.IOBase=None, default_in_file:io.IOBase=None)
'''
Initializes Grep
Inputs: out_file - a file object to pass to print() as 'file' for regular messages.
This should be set to sys.stdout if writing to terminal is desired.
Writing to file is skipped when this is set to None. (default: None)
err_file - a file object to pass to print() as 'file' for error messages.
This should be set to sys.stderr if writing to terminal is desired.
Writing to file is skipped when this is set to None. (default: None)
default_in_file - default input file stream used when no files added.
This should be set to sys.stdin if reading from terminal is desired by default.
An exception will be caused on execute() if this is None and no files added.
(default: None)
'''
The following methods may be called to add expressions, file paths, and globs.
add_dir_exclude_globs(self, *args:Union[str, List[str]])
'''
Skip directories that match given globs.
'''
add_expressions(self, *args:Union[str, List[str]])
'''
Adds a single expression or list of expressions that Grep will search for in selected files.
Inputs: all arguments must be list of strings or string - each string is an expression
'''
add_file_exclude_globs(self, *args:Union[str, List[str]])
'''
Skip files that match given globs.
'''
add_file_include_globs(self, *args:Union[str, List[str]])
'''
Limit files to those matching given globs.
'''
add_files(self, *args:Union[str, List[str]])
'''
Adds a single file or list of files that Grep will crawl through. Each entry must be a path
to a file or directory. Directories are handled based on value of directory_handling_type.
Inputs: all arguments must be list of strings or string - each string is a file path
'''
clear_dir_exclude_globs(self)
'''
Clear all directory exclude globs previously added by add_dir_exclude_globs().
'''
clear_expressions(self)
'''
Clears all expressions that were previously set by add_expressions().
'''
clear_file_exclude_globs(self)
'''
Clear all file exclude globs previously added by add_file_exclude_globs().
'''
clear_file_include_globs(self)
'''
Clear all file include globs previously added by add_file_include_globs().
'''
clear_files(self)
'''
Clear all files that were previously set by add_files().
'''
The following Grep options may be adjusted.
search_type:Grep.SearchType = Grep.SearchType.BASIC_REGEXP
ignore_case:bool = False
word_regexp:bool = False
line_regexp:bool = False
no_messages:bool = False
invert_match:bool = False
max_count:int = None
output_line_numbers:bool = False
output_file_name:bool = False
output_byte_offset:bool = False
line_buffered:bool = False
end = b'\n'
results_sep:str = ':'
name_num_sep:str = ':'
name_byte_sep:str = ':'
context_sep:str = '--\n'
context_results_sep:str = '-'
context_name_num_sep:str = '-'
context_name_byte_sep:str = '-'
self.color_mode:Grep.ColorMode = Grep.ColorMode.AUTO
directory_handling_type:Grep.Directory = Grep.Directory.READ
label:str = '(standard input)'
quiet:bool = False
only_matching:bool = False
binary_parse_function:Grep.BinaryParseFunction = Grep.BinaryParseFunction.PRINT_ERROR
strip_cr:bool = True
before_context_count:int = 0
after_context_count:int = 0
print_matching_files_only:bool = False
print_non_matching_files_only:bool = False
print_count_only:bool = False
space_numbers_by_size:bool = False
grep_color_dict:dict
At any point, reset() may be called to reset all settings.
reset(self)
'''
Resets all Grep state values except for out_file, err_file, and default_in_file.
'''
The following method executes using all data set above.
execute(self, return_matches:bool=True) -> GrepResult
'''
Executes Grep with all the assigned attributes.
Inputs: return_matches - set to True to fill in lines, info, and errors in the result
- set to False if outputting to terminal is the only thing that is
desired, saving memory
Returns: a GrepResult object
Raises: ValueError if no expressions added
ValueError if no files added and no default input file set during init
'''