Merge pull request #429 from grooverdan/filter-develop-doco

DOC: Filter development doco
2013-11-10 14:10:10 -08:00 · 2013-11-10 14:10:10 -08:00 · e8aa676cf5
parent 191c4fda1b b8f40fef1b
commit e8aa676cf5
1 changed files with 88 additions and 9 deletions
--- a/97
+++ b/97
@ -289,15 +289,19 @@ TIP: Some applications log spaces at the end. If you are not sure add \s*$ as
     the end part of the regex.

 If your regex is not matching, http://www.debuggex.com/?flavor=python can help
-to tune it:
+to tune it.  fail2ban-regex -D ...  will present Debuggex URLs for the regexs
+and sample log files that you pass into it.

+In general use when using regex debuggers for generating fail2ban filters:
 * use regex from the ./fail2ban-regex output (to ensure all substitutions are
-done) and replace <HOST> with (?&.ipv4). Make sure that regex type set to
-Python;
-* for the test data put your log output with the time removed;
- when you have fixed the regex put it back into your filter file.
+done)
+* replace <HOST> with (?&.ipv4)
+* make sure that regex type set to Python
+* for the test data put your log output with the date/time removed

-Please spread the good word about debuggex - Serge Toarca is kindly continuing
+When you have fixed the regex put it back into your filter file.
+
+Please spread the good word about Debuggex - Serge Toarca is kindly continuing
 its free availability to Open Source developers.

 Finishing up:
@ -327,7 +331,7 @@ failregex, while matching inserted text to the <HOST> part, they have the
 ability to deny any host they choose.

 So the <HOST> part must be anchored on text generated by the application, and
-not the user, to a extent sufficient to prevent user inserting the entire text
+not the user, to an extent sufficient to prevent user inserting the entire text
 matching this or any other failregex.

 Ideally filter regex should anchor at the beginning and at the end of log line.
@ -377,7 +381,7 @@ Note if we'd just had the expression:
 Then provided the user put a space in their command they would have never been
 banned.

-2. Filter regex can match other user injected data
+2. Unanchored regex can match other user injected data

 From the Apache vulnerability CVE-2013-2178
 ( original ref: https://vndh.net/note:fail2ban-089-denial-service ).
@ -398,7 +402,82 @@ Now the log line will be:
 As this log line doesn't match other expressions hence it matches the above
 regex and blocks 192.168.33.1 as a denial of service from the HTTP requester.

-3. Application generates two identical log messages with different meanings
+3.  Over greedy pattern matching
+
+From: https://github.com/fail2ban/fail2ban/pull/426
+
+An example ssh log (simplified)
+
+    Sep 29 17:15:02 spaceman sshd[12946]: Failed password for user from 127.0.0.1 port 20000 ssh1: ruser remoteuser
+
+As we assume username can include anything including spaces its prudent to put
+.* here. The remote user can also exist as anything so lets not make assumptions again.
+
+    failregex = ^%(__prefix_line)sFailed \S+ for .* from <HOST>( port \d*)?( ssh\d+)?(: ruser .*)?$
+
+So this works. The problem is if the .* after remote user is injected by the
+user to be 'from 1.2.3.4'. The resultant log line is.
+
+    Sep 29 17:15:02 spaceman sshd[12946]: Failed password for user from 127.0.0.1 port 20000 ssh1: ruser from 1.2.3.4
+
+Testing with:
+
+    fail2ban-regex -v 'Sep 29 17:15:02 Failed password for user from 127.0.0.1 port 20000 ssh1: ruser from 1.2.3.4' '^ Failed \S+ for .* from <HOST>( port \d*)?( ssh\d+)?(: ruser .*)?$'
+
+TIP: I've removed the bit that matches __prefix_line from the regex and log.
+
+Shows:
+
+    1) [1] ^ Failed \S+ for .* from <HOST>( port \d*)?( ssh\d+)?(: ruser .*)?$
+       1.2.3.4  Sun Sep 29 17:15:02 2013
+
+It should of matched 127.0.0.1. So the first greedy part of the greedy regex
+matched until the end of the string. The was no "from <HOST>" so the regex
+engine worked backwards from the end of the string until this was matched.
+
+The result was that 1.2.3.4 was matched, injected by the user, and the wrong IP
+was banned.
+
+The solution here is to make the first .* non-greedy with .*?. Here it matches
+as little as required and the fail2ban-regex tool shows the output:
+
+    fail2ban-regex -v 'Sep 29 17:15:02 Failed password for user from 127.0.0.1 port 20000 ssh1: ruser from 1.2.3.4' '^ Failed \S+ for .*? from <HOST>( port \d*)?( ssh\d+)?(: ruser .*)?$'
+
+    1) [1] ^ Failed \S+ for .*? from <HOST>( port \d*)?( ssh\d+)?(: ruser .*)?$
+       127.0.0.1  Sun Sep 29 17:15:02 2013
+
+So the general case here is a log line that contains:
+
+    (fixed_data_1)<HOST>(fixed_data_2)(user_injectable_data)
+
+Where the regex that matches fixed_data_1 is gready and matches the entire
+string, before moving backwards and user_injectable_data can match the entire
+string.
+
+Another case:
+
+ref: https://www.debuggex.com/r/CtAbeKMa2sDBEfA2/0
+
+A webserver logs the following without URL escaping:
+
+    [error] 2865#0: *66647 user "xyz" was not found in "/file", client: 1.2.3.1, server: www.host.com, request: "GET ", client: 3.2.1.1, server: fake.com, request: "GET exploited HTTP/3.3", host: "injected.host", host: "www.myhost.com"
+
+regex:
+
+    failregex = ^ \[error\] \d+#\d+: \*\d+ user "\S+":? (?:password mismatch|was not found in ".*"), client: <HOST>, server: \S+, request: "\S+ .+ HTTP/\d+\.\d+", host: "\S+"
+
+The .* matches to the end of the string. Finds that it can't continue to match
+", client ... so it moves from the back and find that the user injected web URL:
+
+    ", client: 3.2.1.1, server: fake.com, request: "GET exploited HTTP/3.3", host: "injected.host
+
+In this case there is a fixed host: "www.myhost.com" at the end so the solution
+is to anchor the regex at the end with a $.
+
+If this wasn't the case then first .* needed to be made so it didn't capture
+beyond <HOST>.
+
+4. Application generates two identical log messages with different meanings

 If the application generates the following two messages under different
 circumstances: