There are two ways that it would make sense to write the OpenRC
service script for fail2ban:
1. Use the fail2ban-client program to stop, start, reload, etc. the
server; and try to figure out whether or not it worked afterwards.
2. Use the start-stop-daemon program built into OpenRC to manage the
fail2ban-server process. This works only for starting and stopping,
because the "reload" command is sent over an undocumented protocol,
but has the benefit that you get immediate feedback about the result
of calling fail2ban-server.
The existing service script combined the two in a way that appeared to
work, but didn't make too much sense. It used start-stop-daemon to
initiate the fail2ban-client program with either a "start" or "stop"
argument. So long as everything goes fine, that appears to work. But
the start-stop-daemon is not actually monitoring the fail2ban-client
program; it's supposed to be monitoring the fail2ban-server process
that gets started as side-effect.
The existing stop() function does not do quite what you'd expect; for
example the "stop" command is never sent. Again, the daemon does
ultimately get stopped so long as the hard-coded PID file contains
what you think it does -- so it "works" -- but is misleading.
This commit changes everything to use the second approach above, where
start-stop-daemon manages everything. This was done mainly to simplify
the service script, because now the default start() and stop() phases
can be used, allowing us to delete them from our copy. One might worry
that there is some special magic behind "fail2ban-client start" and
"fail2ban-client stop", however that does not appear to be the
case. Admittedly, if in the future those two commands begin to do
something nonstandard, the service script would need to be changed
again to take the first approach above and use fail2ban-client for
everything.
The extra "showlog" command in our OpenRC service script was more
trouble than it was worth: the only thing it did was call "less" on a
log file, and the service script is only guessing at the location of
the log file (only the fail2ban server knows its true location).
It's not like "/etc/init.d/fail2ban showlog" is that much easier to type
than "less /var/log/fail2ban.log" in the first place, so I think the
extra complexity (5 more lines in the service script) is not worth it.
If the "retry" variable is set in the service script, we don't have to
pass it to start-stop-daemon explicitly. While we can't immediately
eliminate any code with this change, it will be necessary later to
adopt the default OpenRC stop() function.
If our service is installed under some other name, then we don't want
the service script to say things like "Starting fail2ban..." because
the name "fail2ban" won't make any sense at that point. Instead, we
use the $RC_SVCNAME variable to ensure that the service name matches
what we tell the user. Typically, however, $RC_SVCNAME will still be
"fail2ban".
Our OpenRC service script performs two tasks before starting the service:
1. It removes any stake sockets (from e.g. a system crash).
2. It ensures that the PID file directory exists.
These have both been moved into the "start_pre" phase, which is
designed to do such things (and will allow us to simplify the "start"
phase in the future). The existing "mkdir -p" has also been converted
into a "checkpath -d" command which is built-in to OpenRC.
OpenRC has a special variable "pidfile" that should be used to store
the location of the daemon's PID file. This commit replaces two
instances of said location with one variable.
The FAIL2BAN variable in our OpenRC service script was a combination
of two standard OpenRC variables, "command" and "command_args". This
commit simply replaces the custom variable with the two standard
ones. This will aid future simplifications of the service script.
Our OpenRC service script contained a "need logger" dependency, which
meant that the life cycle of the fail2ban service was tied to that of
the system logger service. That isn't quite correct: fail2ban
functions fine even if the system logger is stopped:
1. fail2ban is capable of analyzing non-syslog log files.
2. Even if fail2ban is solely analyzing syslog files, we don't
want to stop the fail2ban service simply because syslog was
stopped -- fail2ban just won't see any new log lines until
syslog is started again.
This commit changes the "need net" dependency to "use net", which will
still attempt to start the system logger service, but which won't kill
fail2ban if the system logger is ever stopped.
The "need net" dependency in our OpenRC service script was incorrect:
the fail2ban service does not need a working WAN to function. This
issue is well-documented and is covered in the OpenRC Service Script
Guide, currently located at
https://github.com/OpenRC/openrc/blob/master/service-script-guide.md
Our OpenRC conf file already tells users how to find the available
options that can be placed in the FAIL2BAN_OPTIONS variable, so having
a specific example of,
FAIL2BAN_OPTIONS="-x"
doesn't provide much more information. In fact, it makes you wonder
why it's there in the first place: does the init script have some kind
of problem with stale sockets? It used to, but that problem has been
fixed. This commit removes the redundant example.
There were two paths mentioned in comments in the fail2ban OpenRC conf
file, but those paths aren't guaranteed to be correct (until/unless we
integrate the conf file with the build system).
The first comment referenced the physical location of the associated
init script, and in my opinion is not useful to an end user in the
first place. It has been removed: OpenRC users know what this file
is for, there's no reason to repeat it in a comment.
The second comment contained an absolute path to fail2ban-client, and
I've removed the leading path components because "fail2ban-client" is
generally run from your $PATH.
We ship a service script and configuration file for "gentoo" that are
actually more generally applicable: they work on any system where
OpenRC is used. This commit simply renames the files from "gentoo" to
"openrc" to reflect the fact that they are in no way Gentoo-specific.
add descriptions to stop syslog errors for extra_started_commands when running:
rc-service ipset describe
Oct 28 15:13:30 xxxx daemon.warn /etc/init.d/fail2ban[26446]: ^[[1m^[[36mreload^[[m: no description
Oct 28 15:13:30 xxxx daemon.warn /etc/init.d/fail2ban[26447]: ^[[1m^[[36mshowlog^[[m: no description
- setup process generates `build/fail2ban.service` from `files/fail2ban.service.in` using distribution related bin-path;
- bug-fixing by running setup with option `--dry-run` (note: specify option `--dry-run` before `install`, like `python setup.py --dry-run install`);
- test cases extended to cover dry-run.
The fail2ban server can take several seconds to shut down. This can
make Gentoo's start-stop-service time out and decide that stopping has
failed, even if it actually succeeds a few seconds later.
The default timeout for start-stop-service if --retry is not specified
appears to be 5 seconds. Increase that to 30 seconds to be sure that if
fail2ban-server is going to be able to stop, it has time to do so.
- starting service in normal mode (without forking)
- does not restart if service exited normally (exit-code 0, e.g. stopped via fail2ban-client)
- does not restart if service can not start (exit-code 255, e.g. wrong configuration, etc.)
- service can be additionally started/stopped with commands (fail2ban-client, fail2ban-server)
Currently, if fail2ban is killed (or crashes), its status will be
reported by '/etc/init.d/fail2ban status' as 'running' even though it
is not. Attempting to restart the service also fails, because Gentoo
unsuccessfully tries to stop the service.
By using start-stop-daemon and providing a pidfile, Gentoo will
instead report the status as 'crashed' and allow the service to be
restarted as normal.
These fixes are pretty pedantic, but they do simplify the script a
little.
* Checking the existence of a file/directory before creating/deleting
it adds complexity and raciness. There are better options.
* mkdir -p does the job of making sure a directory exists. (It only
fails if there's a filesystem error or something.)
* Likewise, rm -f doesn't fail if the file doesn't exist.
* rm -r isn't neccessary because the socket shouldn't be a directory.
(If it is for some reason, that should be an error.)
As a useful side effect, prevents "Unable to contact server. Is it
running?" mails from cron when fail2ban hasn't been (intentionally)
running nor thus logging anything either.
(a) use static-network-up, since it is more generic than the started networking event
(b) do not hook into network deconfiguration to speed up shutdown
(c) expect fork, per the use of the "-f" option
(d) use a variable for the run directory to make changing it simpler
(e) handle the situation of a left over socket file
(f) use the -f option to be able to track the PID
No longer directly exec the server, do not remove the PID file because it is unnecessary to do so. No longer respawns because Upstart can not track the process with the starter command.