Security News
Fluent Assertions Faces Backlash After Abandoning Open Source Licensing
Fluent Assertions is facing backlash after dropping the Apache license for a commercial model, leaving users blindsided and questioning contributor rights.
A command-line tool to help you quickly inspect your log files and identify patterns.
pip install logmine
cat sample/Apache_2k.log | logmine
logmine
helps to cluster the logs into multiple clusters with common patterns
along with the number of messages in each cluster.
You can have more granular clusters by adjusting -m
value, the lower the
value, the more details you will get.
cat sample/Apache_2k.log | logmine -m0.2
The texts in red are the placeholder for multiple values that fit in the pattern, you can replace those with your own placeholder.
cat sample/Apache_2k.log | logmine -m0.2 -p'---'
You can define variables to reduce the number unnecessary patterns and have
less clusters. For example, the command bellow replaces all time texts
with <time>
variable.
cat sample/Apache_2k.log | logmine -m0.2 -p'---' -v "<time>:/\\d{2}:\\d{2}:\\d{2}/"
LogMine is an implementation of the same name paper LogMine: Fast Pattern Recognition for Log Analytics. The idea is to use a distance function to calculate a distance between to log line and group them into clusters.
The distance function is designed to work well on log dataset, where all log messages from the same application are generated by a finite set of formats.
The Max Distance variable (max_dist
or the -m
option) represents the
maximum distance between any log message in a cluster. The smaller max_dist
,
the more clusters will be generated. This can be useful to analyze a set of log
messages at multiple levels.
More details on the clustering algorithm and pattern generation are available in the paper.
max_dist
and many other variablesWelcome all contributions
Install virtualenv
(and optionally twine
if you intend to publish):
python3 -m pip install virtualenv twine
Create (if not yet exists) & activate virtual env:
python3 -m virtualenv -p $(which python3) .v
Activate the virtualenv
source ./.v/bin/activate
Run tests:
./test.sh
Run the dev version:
./logmine sample/Apache_2k.log
Publish:
setup.py
following semver../publish.sh
usage: logmine [-h] [-m MAX_DIST] [-v [VARIABLES [VARIABLES ...]]]
[-d DELIMETERS] [-i MIN_MEMBERS] [-k1 K1] [-k2 K2]
[-s {desc,asc}] [-da] [-p PATTERN_PLACEHOLDER] [-dhp] [-dm]
[-dhv] [-c]
[file [file ...]]
LogMine: a log pattern analyzer
positional arguments:
file Filenames or glob pattern to analyze. Default: stdin
optional arguments:
-h, --help show this help message and exit
-m MAX_DIST, --max-dist MAX_DIST
This parameter control how the granularity of the
clustering algorithm. Lower the value will provide
more granular clusters (more clusters generated).
Default: 0.6
-v [VARIABLES [VARIABLES ...]], --variables [VARIABLES [VARIABLES ...]]
List of variables to replace before process the log
file. A variable is a pair of name and a regex
pattern. Format: "name:/regex/". During processing
time, LogMine will consider all texts that match
varible regexes to be the same value. This is useful
to reduce the number of unnecessary cluster generated,
with trade off of processing time. Default: None
-d DELIMETERS, --delimeters DELIMETERS
A regex pattern used to split a line into multiple
fields. Default: "\s+"
-i MIN_MEMBERS, --min-members MIN_MEMBERS
Minimum number of members in a cluster to show in the
result. Default: 2
-k1 K1, --fixed-value-weight K1
Internal weighting variable. This value will be used
as the weight value when two fields have the same
value. This is used in the score function to calculate
the distance between two lines. Default: 1
-k2 K2, --variable-weight K2
Similar to k1 but for comparing variables. Two
variable is considering the same if they have same
name. Default: 1
-s {desc,asc}, --sorted {desc,asc}
Sort the clusters by number of members. Default: desc
-da, --disable-number-align
Disable number align in output. Default: True
-p PATTERN_PLACEHOLDER, --pattern-placeholder PATTERN_PLACEHOLDER
Use a string as placeholder for patterns in output.
Default: None
-dhp, --disable-highlight-patterns
Disable highlighting for patterns in output. Default:
True
-dm, --disable-mask-variables
Disable masks for variables in output. When disabled
variables will be shown as the actual value. Default:
True
-dhv, --disable-highlight-variables
Disable highlighting for variables in output. Default:
True
-c, --single-core Force LogMine to only run on 1 core. This will
increase the processing time. Note: the result output
can be different compare to when run with multicores,
this is expected. Default: False
By default, logmine writes the analysis results to stdout. In order to capture this output, a file-like object can be passed using the set_output_file()
method to capture the result string, like in the below example :
buffer = io.StringIO()
lm = LogMine() # pass the usual parameters
lm.output.set_output_file(file=buffer)
lm.run()
# The captured output can be accessed in the buffer.
print(buffer.getvalue())
FAQs
Log pattern analyzer
We found that logmine demonstrated a healthy version release cadence and project activity because the last version was released less than a year ago. It has 1 open source maintainer collaborating on the project.
Did you know?
Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.
Security News
Fluent Assertions is facing backlash after dropping the Apache license for a commercial model, leaving users blindsided and questioning contributor rights.
Research
Security News
Socket researchers uncover the risks of a malicious Python package targeting Discord developers.
Security News
The UK is proposing a bold ban on ransomware payments by public entities to disrupt cybercrime, protect critical services, and lead global cybersecurity efforts.