Subsections


What To Do When Bacula Crashes (Kaboom)

If you are running on a Linux system, and you have a set of working configuration files, it is very unlikely that Bacula will crash. As with all software, however, it is inevitable that someday, it may crash, particularly if you are running on another operating system or using a new or unusual feature.

This chapter explains what you should do if one of the three Bacula daemons (Director, File, Storage) crashes. When we speak of crashing, we mean that the daemon terminates abnormally because of an error. There are many cases where Bacula detects errors (such as PIPE errors) and will fail a job. These are not considered crashes. In addition, under certain conditions, Bacula will detect a fatal in the configuration, such as lack of permission to read/write the working directory. In that case, Bacula will force itself to crash with a SEGFAULT. However, before crashing, Bacula will normally display a message indicating why. For more details, please read on.

Traceback

Each of the three Bacula daemons has a built-in exception handler which, in case of an error, will attempt to produce a traceback. If successful the traceback will be emailed to you.

For this to work, you need to ensure that a few things are setup correctly on your system:

  1. You must have a version of Bacula built with debug information turned on and not stripped of debugging symbols.

  2. You must have an installed copy of gdb (the GNU debugger), and it must be on Bacula's path. On some systems such as Solaris, gdb may be replaced by dbx.

  3. The Bacula installed script file btraceback must be in the same directory as the daemon which dies, and it must be marked as executable.

  4. The script file btraceback.gdb must have the correct path to it specified in the btraceback file.

  5. You must have a mail program which is on Bacula's path. By default, this mail program is set to bsmtp, so it must be correctly configured.

  6. If you run either the Director or Storage daemon under a non-root userid, you will most likely need to modify the btraceback file to do something like sudo (raise to root priority) for the call to gdb so that it has the proper permissions to debug Bacula.

If all the above conditions are met, the daemon that crashes will produce a traceback report and email it to you. If the above conditions are not true, you can either run the debugger by hand as described below, or you may be able to correct the problems by editing the btraceback file. I recommend not spending too much time on trying to get the traceback to work as it can be very difficult.

The changes that might be needed are to add a correct path to the gdb program, correct the path to the btraceback.gdb file, change the mail program or its path, or change your email address. The key line in the btraceback file is:

gdb -quiet -batch -x /home/kern/bacula/bin/btraceback.gdb \
     $1 $2 2>\&1 | bsmtp -s "Bacula traceback" your-address@xxx.com

Since each daemon has the same traceback code, a single btraceback file is sufficient if you are running more than one daemon on a machine.

Testing The Traceback

To "manually" test the traceback feature, you simply start Bacula then obtain the PID of the main daemon thread (there are multiple threads). The output produced here will look different depending on what OS and what version of the kernel you are running. Unfortunately, the output had to be split to fit on this page:

[kern@rufus kern]$ ps fax --columns 132 | grep bacula-dir
 2103 ?        S      0:00 /home/kern/bacula/k/src/dird/bacula-dir -c
                                       /home/kern/bacula/k/src/dird/dird.conf
 2104 ?        S      0:00  \_ /home/kern/bacula/k/src/dird/bacula-dir -c
                                       /home/kern/bacula/k/src/dird/dird.conf
 2106 ?        S      0:00      \_ /home/kern/bacula/k/src/dird/bacula-dir -c
                                       /home/kern/bacula/k/src/dird/dird.conf
 2105 ?        S      0:00      \_ /home/kern/bacula/k/src/dird/bacula-dir -c
                                       /home/kern/bacula/k/src/dird/dird.conf

which in this case is 2103. Then while Bacula is running, you call the program giving it the path to the Bacula executable and the PID. In this case, it is:

./btraceback /home/kern/bacula/k/src/dird 2103

It should produce an email showing you the current state of the daemon (in this case the Director), and then exit leaving Bacula running as if nothing happened. If this is not the case, you will need to correct the problem by modifying the btraceback script.

Typical problems might be that gdb or dbx for Solaris is not on the default path. Fix this by specifying the full path to it in the btraceback file. Another common problem is that you haven't modified the script so that the bsmtp program has an appropriate smtp server or the proper syntax for your smtp server. If you use the mail program and it is not on the default path, it will also fail. On some systems, it is preferable to use Mail rather than mail.

Getting A Traceback On Other Systems

It should be possible to produce a similar traceback on systems other than Linux, either using gdb or some other debugger. Solaris with dbx loaded works quite fine. On other systems, you will need to modify the btraceback program to invoke the correct debugger, and possibly correct the btraceback.gdb script to have appropriate commands for your debugger. If anyone succeeds in making this work with another debugger, please send us a copy of what you modified. Please keep in mind that for any debugger to work, it will most likely need to run as root, so you may need to modify the btraceback script accordingly.

Manually Running Bacula Under The Debugger

If for some reason you cannot get the automatic traceback, or if you want to interactively examine the variable contents after a crash, you can run Bacula under the debugger. Assuming you want to run the Storage daemon under the debugger (the technique is the same for the other daemons, only the name changes), you would do the following:

  1. Start the Director and the File daemon. If the Storage daemon also starts, you will need to find its PID as shown above (ps fax | grep bacula-sd) and kill it with a command like the following:

          kill -15 PID
    

    where you replace PID by the actual value.

  2. At this point, the Director and the File daemon should be running but the Storage daemon should not.

  3. cd to the directory containing the Storage daemon

  4. Start the Storage daemon under the debugger:

        gdb ./bacula-sd
    

  5. Run the Storage daemon:

         run -s -f -c ./bacula-sd.conf
    

    You may replace the ./bacula-sd.conf with the full path to the Storage daemon's configuration file.

  6. At this point, Bacula will be fully operational.

  7. In another shell command window, start the Console program and do what is necessary to cause Bacula to die.

  8. When Bacula crashes, the gdb shell window will become active and gdb will show you the error that occurred.

  9. To get a general traceback of all threads, issue the following command:

           thread apply all bt
    

    After that you can issue any debugging command.

Getting Debug Output from Bacula

Each of the daemons normally has debug compiled into the program, but disabled. There are two ways to enable the debug output. One is to add the -d nnn option on the command line when starting the debugger. The nnn is the debug level, and generally anything between 50 and 200 is reasonable. The higher the number, the more output is produced. The output is written to standard output.

The second way of getting debug output is to dynamically turn it on using the Console using the setdebug command. The full syntax of the command is:

 setdebug level=nnn client=client-name storage=storage-name dir

If none of the options are given, the command will prompt you. You can selectively turn on/off debugging in any or all the daemons (i.e. it is not necessary to specify all the components of the above command).

Kern Sibbald 2009-02-06