Bacula Enterprise Edition Documentation text image transdoc
Search

Main

New Features in 9.4.0

Cloud Backup

A major problem of Cloud backup is that data transmission to and from the Cloud is very slow compared to traditional backup to disk or tape. The Bacula Cloud drivers provide a means to quickly finish the backups and then to transfer the data from the local cache to the Cloud in the background. This is done by first splitting the data Volumes into small parts that are cached locally then uploading those parts to the Cloud storage service in the background, either while the job continues to run or after the backup Job has terminated. Once the parts are written to the Cloud, they may either be left in the local cache for quick restores or they can be removed (truncate cache).

Cloud Volume Architecture

Figure: Bacula Cloud Architecture

The picture shown above shows two Volumes (Volume0001 and Volume0002) with their parts in the cache. Below the cache, one can see that Volume0002 has been uploaded or synchronized with the Cloud.

Note: Regular Bacula disk Volumes are implemented as standard files that reside in the user defined Archive Directory. On the other hand, Bacula Cloud Volumes are directories that reside in the user defined Archive Directory. Each Cloud Volume's directory contains the cloud Volume parts which are implemented as numbered files (part.1, part.2, ...).

Cloud Restore

During a restore, if the needed parts are in the local cache, they will be immediately used, otherwise, they will be downloaded from the Cloud as needed. The restore starts with parts already in the local cache but will wait in turn for any part that needs to be downloaded. The Cloud part downloads proceed while the restore is running.

With most Cloud providers, uploads are usually free of charge, but downloads of data from the Cloud are billed. By using local cache and multiple small parts, you can configure Bacula to substantially reduce download costs.

The MaximumFileSize Device directive is still valid within the Storage Daemon and defines the granularity of a restore chunk. In order to limit volume parts to download during restore (specially when restoring single files), it might be useful to set the MaximumFileSize to a value smaller than or equal to the MaximumPartSize.

Compatibility

Since a Cloud Volume contains the same data as an ordinary Bacula Volume, all existing types of Bacula data may be stored in the cloud - that is client encrypted, compressed data, plugin data, etc. All existing Bacula functionality, with the exception of deduplication, is available with the Bacula Cloud drivers.

Deduplication and the Cloud

At the current time, Bacula Global Endpoint Backup does not support writing to the cloud because the cloud would be too slow to support large hashed and indexed containers of deduplication data.

Virtual Autochangers and Disk Autochangers

If you use a Bacula Virtual Autochanger you will find it compatible with the new Bacula Cloud drivers. However, if you use a third party disk autochanger script such as Vchanger, unless or until it is modified to handle Volume directories, it may not be compatible with Bacula Cloud drivers.

Security

All data that is sent to and received from the cloud by default uses the HTTPS protocol, so your data is encrypted while being transmitted and received. However, data that resides in the Cloud is not encrypted by default. If you wish extra security of your data while it resides in the cloud, you should consider using Bacula's PKI data encryption feature during the backup.

Cache and Pruning

The Cache is treated much like a normal Disk based backup, so that in configuring Cloud the administrator should take care to set "Archive Device" in the Device resource to a directory where he/she would normally start data backed up to disk. Obviously, unless he/she uses the truncate/prune cache commands, the Archive Device will continue to fill.

The cache retention can be controlled per Volume with the CacheRetention attribute. The default value is 0, meaning that the pruning of the cache is disabled.

The CacheRetention value for a volume can be modified with the update command or via the Pool directive CacheRetention for newly created volumes.

New Commands, Resource, and Directives for Cloud

To support Cloud, there are new bconsole commands, new Storage Daemon directives and a new Cloud resource that is specified in the Storage Daemon's Device resource.

New Cloud Bconsole Commands

  • Cloud The new cloud bconsole command allows you to do a number of things with cloud volumes. The options are the following:
    • None. If you specify no arguments to the command, bconsole will prompt with:
        Cloud choice: 
           1: List Cloud Volumes in the Cloud
           2: Upload a Volume to the Cloud
           3: Prune the Cloud Cache
           4: Truncate a Volume Cache
           5: Done
        Select action to perform on Cloud (1-5):
      
      The different choices should be rather obvious.

    • Truncate This command will attempt to truncate the local cache for the specified Volume. Bacula will prompt you for the information needed to determine the Volume name or names. To avoid the prompts, the following additional command line options may be specified:
      • Storage=xxx
      • Volume=xxx
      • AllPools
      • AllFromPool
      • Pool=xxx
      • MediaType=xxx
      • Drive=xxx
      • Slots=nnn
    • Prune This command will attempt to prune the local cache for the specified Volume. Bacula will respect the CacheRetention volume attribute to determine if the cache can be truncated or not. Only parts that are uploaded to the cloud will be deleted from the cache. Bacula will prompt you for the information needed to determine the Volume name or names. To avoid the prompts, the following additional command line options may be specified:
      • Storage=xxx
      • Volume=xxx
      • AllPools
      • AllFromPool
      • Pool=xxx
      • MediaType=xxx
      • Drive=xxx
      • Slots=nnn
    • Upload This command will attempt to upload the specified Volumes. It will prompt you for the information needed to determine the Volume name or names. To avoid the prompts, you may specify any of the following additional command line options:
      • Storage=xxx
      • Volume=xxx
      • AllPools
      • AllFromPool
      • Pool=xxx
      • MediaType=xxx
      • Drive=xxx
      • Slots=nnn
    • List This command will list volumes stored in the Cloud. If a volume name is specified, the command will list all parts for the given volume. To avoid the prompts, you may specify any of the following additional command line options:
      • Storage=xxx
      • Volume=xxx
      • Storage=xxx

Cloud Additions to the DIR Pool Resource

Within the bacula-dir.conf file each Pool resource there is an additional keyword CacheRetention that can be specified.

Cloud Additions to the SD Device Resource

Within the bacula-sd.conf file each Device resource there is an additional keyword Cloud that must be specified on the Device Type directive, and two new directives Maximum Part Size and Cloud.

New Cloud SD Device Directives

  • Device Type The Device Type has been extended to include the new keyword Cloud to specify that the device supports cloud Volumes. Example:
      Device Type = Cloud
    
  • Cloud The new Cloud directive permits specification of a new Cloud Resource. As with other Bacula resource specifications, one specifies the name of the Cloud resource. Example:
      Cloud = S3Cloud
    
  • Maximum Part Size This directive allows one to specify the maximum size for each part. Smaller part sizes will reduce restore costs, but may require a small additional overhead to handle multiple parts. The maximum number of parts permitted in a Cloud Volume is 524,288. The maximum size of any given part is approximately 17.5TB.

Example Cloud Device Specification

An example of a Cloud Device Resource might be:

Device {
  Name = CloudStorage
  Device Type = Cloud
  Cloud = S3Cloud
  Archive Device = /opt/bacula/backups
  Maximum Part Size = 10000000
  Media Type = CloudType
  LabelMedia = yes
  Random Access = Yes;
  AutomaticMount = yes
  RemovableMedia = no
  AlwaysOpen = no
}

As you can see from the above, the Cloud directive in the Device resource contains the name (S3Cloud) of the Cloud resource that is shown below.

Note also the Archive Device is specified in the same manner as one would use for a File device. However, in place of containing files with Volume names, the archive device for the Cloud drivers will contain the local cache, which consists of directories with the Volume name; and these directories contain the parts associated with the particular Volume. So with the above Device resource, and the two cache Volumes shown in figure (here) above would have the following layout on disk:

  /opt/bacula/backups
     /opt/bacula/backups/Volume0001
        /opt/bacula/backups/Volume0001/part.1
        /opt/bacula/backups/Volume0001/part.2
        /opt/bacula/backups/Volume0001/part.3
        /opt/bacula/backups/Volume0001/part.4
     /opt/bacula/backups/Volume0002
        /opt/bacula/backups/Volume0002/part.1
        /opt/bacula/backups/Volume0002/part.2
        /opt/bacula/backups/Volume0002/part.3

The Cloud Resource

The Cloud resource has a number of directives that may be specified as exemplified in the following example:

default east USA location:

Cloud {
  Name = S3Cloud
  Driver = "S3"
  HostName = "s3.amazonaws.com"
  BucketName = "BaculaVolumes"
  AccessKey = "BZIXAIS39DP9YNER5DFZ"
  SecretKey = "beesheeg7iTe0Gaexee7aedie4aWohfuewohGaa0"
  Protocol = HTTPS
  UriStyle = VirtualHost
  Truncate Cache = No
  Upload = EachPart
  Region = "us-east-1" 
  MaximumUploadBandwidth = 5MB/s
}

central europe location:

Cloud {
  Name = S3Cloud
  Driver = "S3"
  HostName = "s3-eu-central-1.amazonaws.com"
  BucketName = "BaculaVolumes"
  AccessKey = "BZIXAIS39DP9YNER5DFZ"
  SecretKey = "beesheeg7iTe0Gaexee7aedie4aWohfuewohGaa0"
  Protocol = HTTPS
  UriStyle = VirtualHost
  Truncate Cache = No
  Upload = EachPart
  Region = "eu-central-1"
  MaximumUploadBandwidth = 4MB/s
}

For Amazon Cloud, refer to http://docs.aws.amazon.com/general/latest/gr/rande.html#s3_region to get a complete list of regions and corresponding endpoints and use them respectively as Region and HostName directive.

For CEPH S3 interface:

Cloud {
  Name = CEPH_S3
  Driver = "S3"
  HostName = ceph.mydomain.lan
  BucketName = "CEPHBucket"
  AccessKey = "xxxXXXxxxx"
  SecretKey = "xxheeg7iTe0Gaexee7aedie4aWohfuewohxx0"
  Protocol = HTTPS
  Upload = EachPart

  UriStyle = Path            # Must be set for CEPH
}

The directives of the above Cloud resource for the S3 driver are defined as follows:

Name = <Device-Name>

The name of the Cloud resource. This is the logical Cloud name, and may be any string up to 127 characters in length. Shown as S3Cloud above.

Description = <Text>

The description is used for display purposes as is the case with all resource.

Driver = <DriverName>

This defines which driver to use. It can be S3. There is also a File driver, which is used mostly for testing.

Host Name = <Name>
This directive specifies the hostname to be used in the URL. Each Cloud service provider has a different and unique hostname. The maximum size is 255 characters and may contain a tcp port specification.

Bucket Name = <Name>

This directive specifies the bucket name that you wish to use on the Cloud service. This name is normally a unique name name that identifies where you want to place your Cloud Volume parts. With Amazon S3, the bucket must be created previously on the Cloud service. The maximum bucket name size is 255 characters.

Access Key = <String>

The access key is your unique user identifier given to you by your cloud service provider.

Secret Key = <String>

The secret key is the security key that was given to you by your cloud service provider. It is equivalent to a password.

Protocol = <HTTP | HTTPS>

The protocol defines the communications protocol to use with the cloud service provider. The two protocols currently supported are: HTTPS and HTTP. The default is HTTPS.

Uri Style = <VirtualHost | Path>

This directive specifies the URI style to use to communicate with the cloud service provider. The two Uri Styles currently supported are: VirtualHost and Path. The default is VirtualHost.

Truncate Cache = <Truncate-kw>

This directive specifies when Bacula should automatically remove (truncate) the local cache parts. Local cache parts can only be removed if they have been uploaded to the cloud. The currently implemented values are:

  • No Do not remove cache. With this option you must manually delete the cache parts with a bconsole Truncate Cache command, or do so with an Admin Job that runs an Truncate Cache command. This is the default.
  • AfterUpload Each part will be removed just after it is uploaded. Note, if this option is specified, all restores will require a download from the Cloud. Note: Not yet implemented.
  • AtEndOfJob With this option, at the end of the Job, every part that has been uploaded to the Cloud will be removed (truncated). Note: Not yet implemented.

Upload = <Upload-kw>

This directive specifies when local cache parts will be uploaded to the Cloud. The options are:

  • No Do not upload cache parts. With this option you must manually upload the cache parts with a bconsole Upload command, or do so with an Admin Job that runs an Upload command. This is the default.
  • EachPart With this option, each part will be uploaded when it is complete i.e. when the next part is created or at the end of the Job.
  • AtEndOfJob With this option all parts that have not been previously uploaded will be uploaded at the end of the Job. Note: Not yet implemented.

Maximum Upload Bandwidth = <speed>

The default is unlimited, but by using this directive, you may limit the upload bandwidth used globally by all devices referencing this Cloud resource.

Maximum Download Bandwidth = <speed>

The default is unlimited, but by using this directive, you may limit the download bandwidth used globally by all devices referencing this Cloud resource.

Region = <String<

The Cloud resource can be configured to use a specific endpoint within a region. This directive is required for AWS-V4 regions. ex: Region="eu-central-1"

File Driver for the Cloud

As mentioned above, one may specify the keyword File on the Driver directive of the Cloud resource. Instead of writing to the Cloud, Bacula will instead create a Cloud Volume but write it to disk. The rest of this section applies to the Cloud resource directives when the File driver is specified.

The following Cloud directives are ignored: Bucket Name, Access Key, Secret Key, Protocol, Uri Style. The directives Truncate Cache and Upload work on the local cache in the same manner as they do for the S3 driver.

The main difference to note is that the Host Name, specifies the destination directory for the Cloud Volume files, and this Host Name must be different from the Archive Device name, or there will be a conflict between the local cache (in the Archive Device directory) and the destination Cloud Volumes (in the Host Name directory).

As noted above, the File driver is mostly used for testing purposes, and we do not particularly recommend using it. However, if you have a particularly slow backup device you might want to stage your backup data into an SSD or disk using the local cache feature of the Cloud device, and have your Volumes transferred in the background to a slow File device.

WORM Tape Support

Automatic WORM (Write Once Read Multiple) tapes detection has been added in 10.2.

When a WORM tape is detected, the catalog volume entry is changed automatically to set Recycle=no. It will prevent the volume from being automatically recycled by Bacula.

There is no change in how the Job and File records are pruned from the catalog as that is a separate issue that is currently adequately implemented in Bacula.

When a WORM tape is detected, the SD will show WORM on the device state output (must have debug greater or equal to 6) otherwise the status shows as !WORM

Device state:
   OPENED !TAPE LABEL APPEND !READ !EOT !WEOT !EOF WORM !SHORT !MOUNTED ...

The output of the used volume status has been modified to include the worm state. It shows worm=1 for a worm cassette and worm=0 otherwise. Example:

Used Volume status:
Reserved volume: TestVolume001 on Tape device "nst0" (/dev/nst0)
   Reader=0 writers=0 reserves=0 volinuse=0 worm=1

The following programs are needed for the WORM tape detection:

  • sdparm
  • tapeinfo

The new Storage Device directive Worm Command must be configured as well as the Control Device directive (used with the Tape Alert feature).

Device {
  Name = "LTO-0"
  Archive Device = "/dev/nst0"
  Control Device = "/dev/sg0"    # from lsscsi -g
  Worm Command = "/opt/bacula/scripts/isworm %l"
...
}