Updated Developing Regex in Fail2ban (markdown)

master
Egbert 2020-09-30 16:41:50 -04:00
parent 5aa9acb324
commit 0a35688424
1 changed files with 15 additions and 13 deletions

@ -57,29 +57,29 @@ PRE-FILTER MATCHED
==================
If you have a single-line pattern, skip this section and leave `prefregex` empty or undefined.
`prefregex` is a pattern of the entire log file that is inherently all the same within the same log file. Such common pattern found in line-by-line log file are:
`prefregex` is ideally for a pattern of the entire log file that is inherently on all of the lines within same log file. Such common pattern found in line-by-line log file are:
* date (pretty much always)
* daemon name (optional)
* subroutine name and/or line number (optional)
* process ID (optional)
* severity level (optional)
So, the `prefregex` is highly dependent on proper supporting of this combinatorial of the above list of patterns (some always there, and other mostly optional) in order to make it work for everyone that uses the application which generates the logs.
So, the ideal `prefregex` would be highly dependent on proper supporting of this combinatorial of the above list of patterns (some always there, and others mostly optional) in order to make it work for everyone that uses the application which generates the logs.
Secondary benefit of `prefregex` is to ensure that `failregex` is left with the most dynamic part of the line. `prefregex` takes that most common part (see above list) of the line.
Secondary benefit of `prefregex` is to ensure that `failregex` is left with the most dynamic (and interesting) part of the regex line. `prefregex` takes that most common parts (see above list) of the line.
```console
<--- prefregex -->|<-- failregex ->
3-Jan-2020 myscript: Dynamic error message part
```
Furthermore, the really good reason to support `prefregex`, single pattern or not, is to accommodate whichever daemon/script is writing into that log file as each user of your filter may have different configuration settings to include or exclude certain things into that log file. If the daemon/script of that log file doesn't have any logging parameters that would affect the log, then `prefregex` may not be for you to use.
Furthermore, the really good reason to support `prefregex`, single pattern or not, is to accommodate whichever daemon/script is writing into that log file as each user of your filter has to support the many user-defined but different configuration settings to include or exclude certain things into each line of the log file. If the daemon/script of that log file doesn't have any logging parameters that would affect its log file, then `prefregex` may not be for you to use.
To Pre-Filter or Not To Pre-Filter
-----------------------------------
This section only applies if you have (or will have) multiple patterns within this same filter file that you are creating or modifying.
If `prefregex` already existed and you know it works, then you can move on to the next section. If you are creating one, read on.
If a pre-defined `prefregex` already existed and you know it works, then you can move on to the next section. If you are creating one, read on.
You can tell that the (default or customized) `prefregex` actually works if you added '`-l HEAVYDEBUG`' to your `fail2ban-regex` command line:
```bash
@ -98,7 +98,7 @@ and note the value of `'content:'`. This content comes after the `datepattern`;
Note: Please note in 'content': value that there is an extra space at the beginning of that value so be careful with the `^` and make sure it starts with `^ ` (note a space after caret symbol.)
In this example, I've opted to use the optional `prefregex` because I know that there is going to be more than one fail-matched pattern. And don't want to deal with it again later on.
In this example, I've opted to use the optional `prefregex` because I know that there is going to be more than one fail-matched pattern. And don't want future contributors to deal with it again later on.
NEW CONFIG FILE
===============
@ -115,7 +115,7 @@ prefregex = ^ <F-CONTENT>.+</F-CONTENT>$
```
The above custom `prefregex` will ensure that that beginning space character is removed before sending the remaining content to the `failregex`. This new `prefregex` returns just the interesting '`<F-CONTENT>.+</F-CONTENT>$`' which is basically everything after that lone (but unwanted) space char.
WARNING: This is a greedy Regex algorithm. Many regex are unsafe, neither contain start- (^) nor end-anchor ($), as well as contain catch-all like .+, especially which is immediately followed by unprecise <HOST> tag which is accepting every word as hostname.
WARNING: This is a greedy Regex algorithm. Many regex are unsafe, having neither contain start- (`^`) nor end-anchor (`$`), as well as contain catch-all like `.+`, especially which is immediately followed by unprecise `<HOST>` tag which is accepting every word as hostname.
Back on track, running that fail2ban-regex with the '`-l HEAVYDEBUG`', the new output shows:
```console
@ -138,7 +138,7 @@ Remember the above command; we are going to use it each time we modified the fil
FAILREGEX MATCHED
==================
Focus on the `failregex` portion of the filter config file. They go under `[Definition]` section.
Focus on the `failregex` portion of the filter config file. New regex patterns for `failregex` go under `[Definition]` section.
Using `failregex` means that there MUST be at least one regex group match such as:
* '`<HOST>`' - hostname
@ -150,16 +150,16 @@ Using `failregex` means that there MUST be at least one regex group match such a
* '`<F-NOFAIL>`' - Used as a mark for no-failure condition for a helper to accumulate
* '`<F-MLFID>`'
* '`<F-MLFFORGET`' - Forget the multi-line set by `<F-MLFID>`.
* '`<F-USER>`'.
* '`<F-ALT_USER>`' - Indicates non-Unix username (such as Dovecot's SMTP account name).
* '`<F-USER>`' - Unix-like username (login, ssh)
* '`<F-ALT_USER>`' - Indicates non-Unix username (Dovecot's SMTP account name).
So, do what I do… Make a generic `failregex` in your new local filter config file, like this:
```ini
failregex = query.+<HOST>
```
WARNING: Don't make my example a permanent change because `.+` is evil. Do no evil but not during this troubleshooting and development of regex. Just don't forget to have finally replaced all `.+`, `.*` with something staticly. And also don't forget to add that '^' at the beginning and '$' at the end, but not now, we're developing one.
WARNING: Don't make my example into your permanent change because `.+` is evil. Do no evil ... but not during this troubleshooting and development of regex. Just don't forget to have finally replaced all `.+`, `.*` with something staticly-pattern. And also don't forget to ensure that `^` is at the beginning; also to add that `$` at the end, but not now for `$`, as we're developing a working matching pattern here.
Notice that there is no '`$`' to catch end-of-line match condition? Well do those '`$`' lastly because were trying to just match … ANYTHING!
Notice that there is no '`$`' to catch end-of-line match condition? Well do those `$` lastly because were trying to just match … ANYTHING!
Re-run fail2ban-regex with '`-l HEAVYDEBUG`' and notice the '`T: Matched FailRegex part`':
```console
@ -169,7 +169,9 @@ Now I am matching SOMETHING!
Notice the convoluted patterns after '`query.+`'? These long patterns represent '`<HOST>`' part. We can safely ignore that for now.
Most importantly, I am MATCHING something that starts with '`^query`'! Yippee!
Most importantly, I am MATCHING something that starts with '`^query.+`'! Yippee!
That evil `.+` is only temporary; we'll get rid of that at the end.
GYRATING TOWARD FULL MATCH
==========================