Research
Security News
Malicious npm Packages Inject SSH Backdoors via Typosquatted Libraries
Socket’s threat research team has detected six malicious npm packages typosquatting popular libraries to insert SSH backdoors.
fluent-plugin-grok-parser
Advanced tools
This is a Fluentd plugin to enable Logstash's Grok-like parsing logic.
fluent-plugin-grok-parser | fluentd | ruby |
---|---|---|
>= 2.0.0 | >= v0.14.0 | >= 2.1 |
< 2.0.0 | >= v0.12.0 | >= 1.9 |
Grok is a macro to simplify and reuse regexes, originally developed by Jordan Sissel.
This is a partial implementation of Grok's grammer that should meet most of the needs.
You can use it wherever you used the format
parameter to parse texts. In the following example, it
extracts the first IP address that matches in the log.
<source>
@type tail
path /path/to/log
tag grokked_log
<parse>
@type grok
grok_pattern %{IP:ip_address}
</parse>
</source>
If you want to try multiple grok patterns and use the first matched one, you can use the following syntax:
<source>
@type tail
path /path/to/log
tag grokked_log
<parse>
@type grok
<grok>
pattern %{COMBINEDAPACHELOG}
time_format "%d/%b/%Y:%H:%M:%S %z"
</grok>
<grok>
pattern %{IP:ip_address}
</grok>
<grok>
pattern %{GREEDYDATA:message}
</grok>
</parse>
</source>
You can parse multiple line text.
<source>
@type tail
path /path/to/log
tag grokked_log
<parse>
@type multiline_grok
grok_pattern %{IP:ip_address}%{GREEDYDATA:message}
multiline_start_regexp /^[^\s]/
</parse>
</source>
You can use multiple grok patterns to parse your data.
<source>
@type tail
path /path/to/log
tag grokked_log
<parse>
@type multiline_grok
<grok>
pattern Started %{WORD:verb} "%{URIPATH:pathinfo}" for %{IP:ip} at %{TIMESTAMP_ISO8601:timestamp}\nProcessing by %{WORD:controller}#%{WORD:action} as %{WORD:format}%{DATA:message}Completed %{NUMBER:response} %{WORD} in %{NUMBER:elapsed} (%{DATA:elapsed_details})
</grok>
</parse>
</source>
Fluentd accumulates data in the buffer forever to parse complete data when no pattern matches.
You can use this parser without multiline_start_regexp
when you know your data structure perfectly.
See also: Config: Parse Section - Fluentd
time_format (string) (optional): The format of the time field.
grok_pattern (string) (optional): The pattern of grok. You cannot specify multiple grok pattern with this.
custom_pattern_path (string) (optional): Path to the file that includes custom grok patterns
grok_failure_key (string) (optional): The key has grok failure reason.
grok_name_key (string) (optional): The key name to store grok section's name
multi_line_start_regexp (string) (optional): The regexp to match beginning of multiline. This is only for "multiline_grok".
time
.<source>
@type dummy
@label @dummy
dummy [
{ "message1": "no grok pattern matched!", "prog": "foo" },
{ "message1": "/", "prog": "bar" }
]
tag dummy.log
</source>
<label @dummy>
<filter>
@type parser
key_name message1
reserve_data true
reserve_time true
<parse>
@type grok
grok_failure_key grokfailure
<grok>
pattern %{PATH:path}
</grok>
</parse>
</filter>
<match dummy.log>
@type stdout
</match>
</label>
This generates following events:
2016-11-28 13:07:08.009131727 +0900 dummy.log: {"message1":"no grok pattern matched!","prog":"foo","message":"no grok pattern matched!","grokfailure":"No grok pattern matched"}
2016-11-28 13:07:09.010400923 +0900 dummy.log: {"message1":"/","prog":"bar","path":"/"}
<source>
@type tail
path /path/to/log
tag grokked_log
<parse>
@type grok
grok_name_key grok_name
grok_failure_key grokfailure
<grok>
name apache_log
pattern %{COMBINEDAPACHELOG}
time_format "%d/%b/%Y:%H:%M:%S %z"
</grok>
<grok>
name ip_address
pattern %{IP:ip_address}
</grok>
<grok>
name rest_message
pattern %{GREEDYDATA:message}
</grok>
</parse>
</source>
This will add keys like following:
grok_name: "apache_log"
if the record matches COMBINEDAPACHELOG
grok_name: "ip_address"
if the record matches IP
grok_name: "rest_message"
if the record matches GREEDYDATA
Add grokfailure
key to the record if the record does not match any grok pattern.
See also test code for more details.
<source>
@type tail
path /path/to/log
tag grokked_log
<parse>
@type grok
<grok>
name mylog-without-timezone
pattern %{DATESTAMP:time} %{GREEDYDATE:message}
timezone Asia/Tokyo
</grok>
</parse>
</source>
This will parse the time
value as "Asia/Tokyo" timezone.
See Config: Parse Section - Fluentd for more details about timezone.
Grok patterns look like %{PATTERN_NAME:name}
where ":name" is optional. If "name" is provided, then it
becomes a named capture. So, for example, if you have the grok pattern
%{IP} %{HOST:host}
it matches
127.0.0.1 foo.example
but only extracts "foo.example" as {"host": "foo.example"}
Please see patterns/*
for the patterns that are supported out of the box.
You can add your own Grok patterns by creating your own Grok file and telling the plugin to read it.
This is what the custom_pattern_path
parameter is for.
<source>
@type tail
path /path/to/log
<parse>
@type grok
grok_pattern %{MY_SUPER_PATTERN}
custom_pattern_path /path/to/my_pattern
</parse>
</source>
custom_pattern_path
can be either a directory or file. If it's a directory, it reads all the files in it.
Although every parsed field has type string
by default, you can specify other types. This is useful when filtering particular fields numerically or storing data with sensible type information.
The syntax is
grok_pattern %{GROK_PATTERN:NAME:TYPE}...
e.g.,
grok_pattern %{INT:foo:integer}
Unspecified fields are parsed at the default string type.
The list of supported types are shown below:
string
bool
integer
("int" would NOT work!)float
time
array
For the time
and array
types, there is an optional 4th field after the type name. For the "time" type, you can specify a time format like you would in time_format
.
For the "array" type, the third field specifies the delimiter (the default is ","). For example, if a field called "item_ids" contains the value "3,4,5", types item_ids:array
parses it as ["3", "4", "5"]. Alternatively, if the value is "Adam|Alice|Bob", types item_ids:array:|
parses it as ["Adam", "Alice", "Bob"].
Here is a sample config using the Grok parser with in_tail
and the types
parameter:
<source>
@type tail
path /path/to/log
format grok
grok_pattern %{INT:user_id:integer} paid %{NUMBER:paid_amount:float}
tag payment
</source>
If you want to use this plugin with Fluentd v0.12.x or earlier, you can use this plugin version v1.x.
See also: Plugin Management | Fluentd
Apache 2.0 License
FAQs
Unknown package
We found that fluent-plugin-grok-parser demonstrated a not healthy version release cadence and project activity because the last version was released a year ago. It has 2 open source maintainers collaborating on the project.
Did you know?
Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.
Research
Security News
Socket’s threat research team has detected six malicious npm packages typosquatting popular libraries to insert SSH backdoors.
Security News
MITRE's 2024 CWE Top 25 highlights critical software vulnerabilities like XSS, SQL Injection, and CSRF, reflecting shifts due to a refined ranking methodology.
Security News
In this segment of the Risky Business podcast, Feross Aboukhadijeh and Patrick Gray discuss the challenges of tracking malware discovered in open source softare.