There are two ways that it would make sense to write the OpenRC
service script for fail2ban:
1. Use the fail2ban-client program to stop, start, reload, etc. the
server; and try to figure out whether or not it worked afterwards.
2. Use the start-stop-daemon program built into OpenRC to manage the
fail2ban-server process. This works only for starting and stopping,
because the "reload" command is sent over an undocumented protocol,
but has the benefit that you get immediate feedback about the result
of calling fail2ban-server.
The existing service script combined the two in a way that appeared to
work, but didn't make too much sense. It used start-stop-daemon to
initiate the fail2ban-client program with either a "start" or "stop"
argument. So long as everything goes fine, that appears to work. But
the start-stop-daemon is not actually monitoring the fail2ban-client
program; it's supposed to be monitoring the fail2ban-server process
that gets started as side-effect.
The existing stop() function does not do quite what you'd expect; for
example the "stop" command is never sent. Again, the daemon does
ultimately get stopped so long as the hard-coded PID file contains
what you think it does -- so it "works" -- but is misleading.
This commit changes everything to use the second approach above, where
start-stop-daemon manages everything. This was done mainly to simplify
the service script, because now the default start() and stop() phases
can be used, allowing us to delete them from our copy. One might worry
that there is some special magic behind "fail2ban-client start" and
"fail2ban-client stop", however that does not appear to be the
case. Admittedly, if in the future those two commands begin to do
something nonstandard, the service script would need to be changed
again to take the first approach above and use fail2ban-client for
everything.
The extra "showlog" command in our OpenRC service script was more
trouble than it was worth: the only thing it did was call "less" on a
log file, and the service script is only guessing at the location of
the log file (only the fail2ban server knows its true location).
It's not like "/etc/init.d/fail2ban showlog" is that much easier to type
than "less /var/log/fail2ban.log" in the first place, so I think the
extra complexity (5 more lines in the service script) is not worth it.
If the "retry" variable is set in the service script, we don't have to
pass it to start-stop-daemon explicitly. While we can't immediately
eliminate any code with this change, it will be necessary later to
adopt the default OpenRC stop() function.
If our service is installed under some other name, then we don't want
the service script to say things like "Starting fail2ban..." because
the name "fail2ban" won't make any sense at that point. Instead, we
use the $RC_SVCNAME variable to ensure that the service name matches
what we tell the user. Typically, however, $RC_SVCNAME will still be
"fail2ban".
Our OpenRC service script performs two tasks before starting the service:
1. It removes any stake sockets (from e.g. a system crash).
2. It ensures that the PID file directory exists.
These have both been moved into the "start_pre" phase, which is
designed to do such things (and will allow us to simplify the "start"
phase in the future). The existing "mkdir -p" has also been converted
into a "checkpath -d" command which is built-in to OpenRC.
OpenRC has a special variable "pidfile" that should be used to store
the location of the daemon's PID file. This commit replaces two
instances of said location with one variable.
The FAIL2BAN variable in our OpenRC service script was a combination
of two standard OpenRC variables, "command" and "command_args". This
commit simply replaces the custom variable with the two standard
ones. This will aid future simplifications of the service script.
Our OpenRC service script contained a "need logger" dependency, which
meant that the life cycle of the fail2ban service was tied to that of
the system logger service. That isn't quite correct: fail2ban
functions fine even if the system logger is stopped:
1. fail2ban is capable of analyzing non-syslog log files.
2. Even if fail2ban is solely analyzing syslog files, we don't
want to stop the fail2ban service simply because syslog was
stopped -- fail2ban just won't see any new log lines until
syslog is started again.
This commit changes the "need net" dependency to "use net", which will
still attempt to start the system logger service, but which won't kill
fail2ban if the system logger is ever stopped.
The "need net" dependency in our OpenRC service script was incorrect:
the fail2ban service does not need a working WAN to function. This
issue is well-documented and is covered in the OpenRC Service Script
Guide, currently located at
https://github.com/OpenRC/openrc/blob/master/service-script-guide.md
Our OpenRC conf file already tells users how to find the available
options that can be placed in the FAIL2BAN_OPTIONS variable, so having
a specific example of,
FAIL2BAN_OPTIONS="-x"
doesn't provide much more information. In fact, it makes you wonder
why it's there in the first place: does the init script have some kind
of problem with stale sockets? It used to, but that problem has been
fixed. This commit removes the redundant example.
There were two paths mentioned in comments in the fail2ban OpenRC conf
file, but those paths aren't guaranteed to be correct (until/unless we
integrate the conf file with the build system).
The first comment referenced the physical location of the associated
init script, and in my opinion is not useful to an end user in the
first place. It has been removed: OpenRC users know what this file
is for, there's no reason to repeat it in a comment.
The second comment contained an absolute path to fail2ban-client, and
I've removed the leading path components because "fail2ban-client" is
generally run from your $PATH.
We ship a service script and configuration file for "gentoo" that are
actually more generally applicable: they work on any system where
OpenRC is used. This commit simply renames the files from "gentoo" to
"openrc" to reflect the fact that they are in no way Gentoo-specific.
jailreader.py: additionally relocate the option `logpath` after all log-related data (backend, date-pattern, etc) that may be needed by the first usage (gh-2173).
Thanks to Matt Stancliff (mattsta)
When new log paths are configured, their start offset is immediately determined
by a filter searching for (now - findTime).
But, since findTime is configured *after* the log is loaded and
searched, logs are only searched back by the default 10 minute findTime,
regardless of user configuration of jail settings.
So, findTime must be configured before logpath or else the default findtime
is used, which ignores any findtime time defined by the user.
This fixes new reads on startup for actual log files. The systemd filter
always performed as expected due to being setup after the jail's
findtime config submission.
additionally provides more info if handler/conversion failed (with double protection inside catch-case);
tests/utils.py: log handler "_MemHandler" of LogCaptureTestCase fixed now to be safe also (test-cases only);
tests/misctestcase.py: the safe logging of all possible constellations is covered in testSafeLogging now.
both should be additionally exception-safe, so avoid possible errors in log-handlers (concat, str. conversion, etc);
test cases extended to cover any possible variants (invalid chars in unicode, bytes, str + unterminated char-sequence) with both cases (with replace of chars, with and without errors inside adapter-handlers).