Catalog Services
General
This chapter is intended to be a technical discussion of the catalog services and as such is not targeted at end users but rather at developers and system administrators who want or need to know more of the working details of Bareos.
We only support PostgreSQL. Support for SQLite or MySQL has been discontinued in Version >= 21.0.0. Remaining code has been deleted Version >= 23.0.0. PostgreSQL was chosen because it is a full-featured, very mature database.
Filenames and Maximum Filename Length
In general, PostgreSQL permit storing arbitrary long path names and file names in the catalog database. However, PostgreSQL limits the maximum length of an index item to 2730, which will affect the length of a path (without the filename) that can be stored in the catalog database. Nevertheless, the 2730 is not a hard limit as PostgreSQL will apply some lossless compression to strings before adding them to its index.
The guarantee is, that you will be able to store paths at least 2730 bytes long. You may or may not run into PostgreSQL index errors when using longer paths.
Database Table Design
Because the catalog database may contain very large amounts of data for large sites, we have made a modest attempt to normalize the data tables to reduce redundant information. While reducing the size of the database significantly, it does, unfortunately, add some complications to the structures.
In simple terms, the catalog database must contain a record of all Jobs run by Bareos, and for each Job, it must maintain a list of all files saved, with their File Attributes (permissions, create date, …), and the location and Media on which the file is stored. This is seemingly a simple task, but it represents a huge amount of interlinked data. The data stored in the File records, which allows the user or administrator to obtain a list of all files backed up during a job, is by far the largest volume of information put into the catalog database.
Although the catalog database has been designed to handle backup data for multiple clients, some users may want to maintain multiple databases, one for each machine to be backed up. This reduces the risk of confusion of accidental restoring a file to the wrong machine as well as reducing the amount of data in a single database, thus increasing efficiency and reducing the impact of a lost or damaged database.
Database Tables
Path
The Path table contains shown above the path or directory names of all directories on the system or systems.
Column Name |
Data Type |
Remark |
---|---|---|
PathId |
serial |
Primary Key |
Path |
text |
Full Path |
The filename and any disk name are stripped off. As with the filename, only one copy of each directory name is kept regardless of how many machines or drives have the same directory. These path names should be stored in Unix path name format.
File
The File table contains one entry for each file backed up by Bareos.
Column Name |
Data Type |
Remark |
---|---|---|
FileId |
serial |
Primary Key |
FileIndex |
integer |
The sequential file number in the Job |
JobId |
integer |
Link to Job Record |
PathId |
integer |
Link to Path Record |
DeltaSeq |
smallint |
|
MarkId |
integer |
Used to mark files during Verify Jobs |
Fhinofo |
numeric(20) |
|
Fhnode |
numeric(20) |
|
LStat |
text |
File attributes in base64 encoding |
MD5 |
text |
MD5/SHA1 signature in base64 encoding |
name |
text |
File that is backed up multiple times (as is normal) will have multiple entries in the File table. This will probably be the table with the most number of records. Consequently, it is essential to keep the size of this record to an absolute minimum. At the same time, this table must contain all the information (or pointers to the information) about the file and where it is backed up.
This table contains by far the largest amount of information in the catalog database, both from the stand point of number of records, and the stand point of total database size.
As MD5 hash (also termed message digests) consists of 128-bit (16-byte).
A typical format (eg. md5sum
, …) to represent them is as a sequence
of 32 hexadecimal digits. However, in the MD5 column, the digest is
represented as base64.
To compare a Bareos digest with command line tools, you have to use
openssl dgst -md5 -binary $PATH_OF_YOUR_FILE | base64
Please note, even the table column is named MD5, it is used to store any kind of digest (MD5, SHA1, …). This is not directly indicated by the value, however, you can determine the type depending of the length of the digest.
Job
The Job table contains one record for each Job run by Bareos.
Column Name |
Data Type |
Remark |
---|---|---|
JobId |
serial |
Primary Key |
Job |
text |
Unique Job Name |
Name |
text |
Job Name |
Type |
char(1) |
Job Type: Backup, Copy, Clone, Archive, Migration |
Level |
char(1) |
Job Level |
ClientId |
integer |
Client index |
JobStatus |
char(1) |
Job Termination Status |
SchedTime |
timestamp |
Time/date when Job scheduled |
StartTime |
timestamp |
Time/date when Job started |
EndTime |
timestamp |
Time/date when Job ended |
RealEndTime |
timestamp |
Time/date when original Job ended |
JobTDate |
bigint |
Start day in Unix format but 64 bits; used for Retention period. |
VolSessionId |
integer |
Unique Volume Session ID |
VolSessionTime |
integer |
Unique Volume Session Time |
JobFiles |
integer |
Number of files saved in Job |
JobBytes |
bigint |
Number of bytes saved in Job |
ReadBytes |
bigint |
|
JobErrors |
integer |
Number of errors during Job |
JobMissinFiles |
integer |
Number of files not saved (not yet used) |
PoolId |
integer |
Link to Pool Record |
FileSetId |
integer |
Link to FileSet Record |
PriorJobId |
integer |
Link to prior Job Record when migrated |
PurgedFiles |
smallint |
Set when all File records purged |
HasBase |
smallint |
Set when Base Job run |
HasCache |
smallint |
|
Reviewed |
smallint |
|
Comment |
text |
The Name field of the Job record corresponds to the Name resource record given in the Director’s configuration file.
The Job field contains a combination of the Name and the schedule time of the Job by the Director. Thus for a given Director, even with multiple catalog databases, the Job will contain a unique name that represents the Job.
For a given Storage daemon, the VolSessionId and VolSessionTime form a unique identification of the Job. This will be the case even if multiple Directors are using the same Storage daemon.
The JobStatus field specifies how the job terminated.
FileSet
The FileSet table contains one entry for each FileSet that is used.
Column Name |
Data Type |
Remark |
---|---|---|
FileSetId |
serial |
Primary Key |
FileSet |
text |
FileSet name |
FileSetText |
text |
|
MD5 |
text |
MD5 checksum of FileSet |
CreateTime |
timestamp |
Time and date Fileset created |
The MD5 signature is kept to ensure that if the user changes anything inside the FileSet, it will be detected and the new FileSet will be used. This is particularly important when doing an incremental update. If the user deletes a file or adds a file, we need to ensure that a Full backup is done prior to the next incremental.
JobMedia
The JobMedia table contains one entry at the following: start of the job, start of each new tape file mark, start of each new tape, end of the job. You will have 2 or more JobMedia records per Job.
Column Name |
Data Type |
Remark |
---|---|---|
JobMediaId |
serial |
Primary Key |
JobId |
integer |
Link to Job Record |
MediaId |
integer |
Link to Media Record |
FirstIndex |
integer |
The index (sequence number) of the first file written for this Job to the Media |
LastIndex |
integer |
The index of the last file written for this Job to the Media |
StartFile |
integer |
Tape: The physical media file mark number of
the first block written for this Job.
Other: Upper 32-bit of the position of the
first block written for this Job.
|
EndFile |
integer |
Tape: The physical media file mark number of
the last block written for this Job
Other: Upper 32-bit of the position of the
last block written for this Job
|
StartBlock |
integer |
Tape: The number of the first block written
for this Job
Other: Lower 32-bit of the position of the
first block written for this Job
|
Endblock |
integer |
Tape: The number of the last block written for
this Job
Other: Lower 32-bit of the position of the
last block written for this Job
|
JobBytes |
numeric(20) |
The Volume use sequence number within the Job |
VolIndex |
integer |
The Volume use sequence number within the Job |
Device
This is the device table. It contains information about reading and or writing devices.
Column Name |
Data Type |
Remark |
---|---|---|
DeviceId |
serial |
|
Name |
text |
|
MediaTypeId |
integer |
|
StorageId |
integer |
|
DevMounts |
integer |
|
DevReadBytes |
bigint |
|
DevWriteBytes |
bigint |
|
DevReadBytesSinceCleaning |
bigint |
|
DevWriteBytesSinceCleaning |
bigint |
|
DevReadTime |
bigint |
|
DevWriteTime |
bigint |
|
DevReadTimeSinceCleaning |
bigint |
|
DevWriteTimeSinceCleaning |
bigint |
|
CleaningDate |
timestamp |
|
CleaningPeriod |
bigint |
Tape Volume
The number of records depends on the “Maximum File Size” specified in the Device resource. This record allows Bareos to efficiently position close to any given file in a backup. For restoring a full Job, these records are not very important, but if you want to retrieve a single file that was written near the end of a 100GB backup, the JobMedia records can speed it up by orders of magnitude by permitting forward spacing files and blocks rather than reading the whole 100GB backup.
Other Volume
StartFile and StartBlock are both 32-bit integer values. However, as the position on a disk volume is specified in bytes, we need this to be a 64-bit value.
Therefore, the start position is calculated as:
StartPosition = StartFile * 4294967296 + StartBlock
The end position of a job on a volume can be determined by:
EndPosition = EndFile * 4294967296 + EndBlock
Be aware, that you can not assume, that the job size on a volume is
EndPosition - StartPosition
. When interleaving is used other jobs
can also be stored between Start- and EndPosition.
EndPosition - StartPosition >= JobSizeOnThisMedia
Media (Volume)
The Media table contains one entry for each volume, that is each tape or file on which information is or was backed up. There is one volume record created for each of the NumVols specified in the Pool resource record.
Column Name |
Data Type |
Remark |
---|---|---|
MediaId |
serial |
Primary Key |
VolumeName |
text |
Volume name |
Slot |
bigint |
Autochanger Slot number or zero |
PoolId |
integer |
Link to Pool Record |
MediaType |
text |
The MediaType supplied by the user |
MediaTypeId |
integer |
The MediaTypeId |
LabelType |
integer |
The type of label on the Volume |
FirstWritten |
timestamp |
Time/date when first written |
LastWritten |
timestamp |
Time/date when last written |
LabelDate |
timestamp |
Time/date when tape labeled |
VolJobs |
bigint |
Number of jobs written to this media |
VolFiles |
bigint |
Number of files written to this media |
VolBlocks |
bigint |
Number of blocks written to this media |
VolMounts |
bigint |
Number of time media mounted |
VolBytes |
bigint |
Number of bytes saved in Job |
VolErrors |
bigint |
Number of errors during Job |
VolWrites |
bigint |
Number of writes to media |
VolCapacityBytes |
bigint |
Capacity estimate for this volume |
VolStatus |
text |
Status of media:
Full, Archive, Append,
Recycle, Read-Only, Disabled, Error,
Busy
|
Enabled |
smallint |
Whether or not Volume can be written |
Recycle |
smallint |
Whether or not Bareos can recycle the Volumes: Yes, No |
ActionOnPurge |
smallint |
What happens to a Volume after purging |
VolRetention |
bigint |
64 bit seconds until expiration |
VolUseDureation |
bigint |
64 bit seconds volume can be used |
MaxVolJobs |
bigint |
maximum jobs to put on Volume |
MaxVolFiles |
bigint |
maximume EOF marks to put on Volume |
MaxVolBytes |
bigint |
Maximum bytes to put on this media |
InChanger |
smallint |
Whether or not Volume in autochanger |
StorageId |
integer |
Storage record ID |
DeviceId |
integer |
Device record ID |
MediaAddressing |
smallint |
Method of addressing media |
VolReadTime |
bigint |
Time Reading Volume |
VolWriteTime |
bigint |
Time Writing Volume |
EndFile |
bigint |
End File number of Volume |
EndBlock |
bigint |
End block number of Volume |
LocationId |
integer |
Location record ID |
RecycleCount |
bigint |
Number of times recycled |
MinBlockSize |
bigint |
Minimum block size on this media |
MaxBlockSize |
bigint |
Maximum block size on this media |
InitialWrite |
timestamp |
When Volume first written |
ScratchPoolId |
integer |
Id of Scratch Pool |
RecyclePoolId |
integer |
Pool ID where to recycle Volume |
EncryptionKey |
text |
Key used for encryptoion |
Comment |
text |
User text field |
Pool
The Pool table contains one entry for each media pool controlled by Bareos in this database.
Column Name |
Data Type |
Remark |
---|---|---|
PoolId |
serial |
Primary Key |
Name |
text |
Pool Name |
NumVols |
integer |
Number of Volumes in the Pool |
MaxVols |
integer |
Maximum Volumes in the Pool |
UseOnce |
smallint |
Use volume once |
UseCatalog |
smallint |
Set to use catalog |
AcceptAnyVolume |
smallint |
Accept any volume from Pool |
VolRetention |
bigint |
64 bit seconds to retain volume |
VolUseDuration |
bigint |
64 bit seconds volume can be used |
MaxVolJobs |
integer |
max jobs on volume |
MaxVolFiles |
integer |
max EOF marks to put on Volume |
MaxVolBytes |
bigint |
max bytes to write on Volume |
AutoPrune |
smallint |
yes or no for autopruning |
Recycle |
smallint |
yes or no for allowing auto recycling of Volume |
ActionOnPurge |
smallint |
Default Volume ActionOnPurge |
PoolType |
text |
Backup, Copy, Cloned, Archive, Migration |
LabelType |
integer |
Type of label ANSI/Bareos |
LabelFormat |
text |
Label format |
Enabled |
smallint |
Whether or not Volume can be written |
ScratchPoolId |
integer |
Id of Scratch Pool |
RecyclePoolId |
integer |
Pool ID where to recycle Volume |
NextPoolId |
integer |
Pool ID of next Pool |
MinBlockSize |
integer |
Pool ID of next Pool |
MaxBlockSize |
integer |
Pool ID of next Pool |
MigrationHighBytes |
bigint |
High water mark for migration |
MigrationLowBytes |
bigint |
Low water mark for migration |
MigrationTime |
bigint |
Time before migration |
In the Media table one or more records exist for each of the Volumes contained in the Pool. The MediaType is defined by the administrator, and corresponds to the MediaType specified in the Director’s Storage definition record.
Client
The Client table contains one entry for each machine backed up by Bareos in this database. Normally the Name is a fully qualified domain name.
Column Name |
Data Type |
Remark |
---|---|---|
ClientId |
serial |
Primary Key |
Name |
text |
File Services Name |
UName |
text |
uname -a from Client (not yet used) |
AutoPrune |
smallint |
yes or no for autopruning |
FileRetention |
bigint |
64 bit seconds to retain Files |
JobRentention |
bigint |
64 bit seconds to retain Job |
Storage
The Storage table contains one entry for each Storage used.
Column Name |
Data Type |
Remark |
---|---|---|
StorageId |
serial |
Unique Id |
Name |
text |
Resource name of Storage device |
AutoChanger |
integer |
Set if it is an autochanger |
Counters
The Counter table contains one entry for each permanent counter defined by the user.
Column Name |
Data Type |
Remark |
---|---|---|
Counter |
serial |
Counter name |
MinValue |
integer |
Start/Min value for counter |
MaxValue |
integer |
Max value for counter |
CurrentValue |
integer |
Current counter value |
WrapCounter |
text |
Name of another counter |
Log
The Log table contains a log of all Job output.
Column Name |
Data Type |
Remark |
---|---|---|
LogId |
serial |
Primary Key |
JobId |
integer |
Points to Job record |
Time |
timestamp |
Time/date log record created |
LogText |
text |
Log text |
Location
The Location table defines where a Volume is physically.
Column Name |
Data Type |
Remark |
---|---|---|
LocationId |
serial |
Primary Key |
Location |
text |
Text defining location |
Cost |
integer |
Relative cost of obtaining Volume |
Enabled |
smallint |
Whether or not Volume is enabled |
LocationLog
The LocationLog table contains a log of all Job output.
Column Name |
Data Type |
Remark |
---|---|---|
LocLogId |
serial |
Primary Key |
Date |
timestamp |
Time/date log record created |
Comment |
text |
Time/date log record created |
MediaId |
integer |
Points to Media record |
LocationId |
integer |
Points to Location record |
NewVolStaus |
integer |
enum: Full, Archive, Append, Recycle, Purged Read-only, Disabled, Error, Busy, Used, Cleaning |
Enabled |
tinyint |
Whether or not Volume is enabled |
Version
The Version table defines the Bareos database version number. Bareos checks this number before reading the database to ensure that it is compatible with the Bareos binary file.
Column Name |
Data Type |
Remark |
---|---|---|
VersionId |
integer |
Primary Key |
BaseFiles
The BaseFiles table contains all the File references for a particular JobId that point to a Base file.
Column Name |
Data Type |
Remark |
---|---|---|
BaseId |
serial |
Primary Key |
JobId |
integer |
Reference to Job |
FileId |
bigint |
Reference to File |
FileIndex |
integer |
File Index number |
BaseJobId |
integer |
JobId of Base Job |
For example they were previously saved and hence were not saved in the current JobId but in BaseJobId under FileId. FileIndex is the index of the file, and is used for optimization of Restore jobs to prevent the need to read the FileId record when creating the in memory tree. This record is not yet implemented.
UML Diagram of Database Schema