This is a simple command-line utility that allows creating and freeing a database, dump, import, run some tests and more.
wgdb [shmname] <command> [command arguments]
The shared memory name identifies the database to use and is an arbitrary numeric value. If it is omitted, the default ("1000") will be used.
Commands commonly available:
help (or "-h") - display this text. version (or "-v") - display libwgdb version. free - free shared memory. export [-f] <filename> - write memory dump to disk (-f: force dump even if unable to get lock) import [-l] <filename> - read memory dump from disk. Overwrites existing memory contents (-l: enable logging after import). exportcsv <filename> - export data to a CSV file. importcsv <filename> - import data from a CSV file. replay <filename> - replay a journal file. info - print information about the memory database. add <value1> .. - store data row (only int or str recognized) select <number of rows> [start from] - print db contents. query <col> "<cond>" <value> .. - basic query. del <col> "<cond>" <value> .. - like query. Matching rows are deleted from database. createindex <column> - create ttree index. createhash <columns> - create hash index (JSON support). dropindex <index id> - delete an index. listindex - list all indexes in database. server [-l] [size b] - provide persistent shared memory for other processes (Windows). (-l: enable logging in the database). create [-l] [size] - create empty db of given size (non-Windows). (-l: enable logging in the database, mode: segment permissions (octal)).
Data may be exported to and imported from text files. This provides a way to exchange data between WhiteDB on different platforms or other data sources. The simplest format that is always available is CSV (comma separated values).
Since there is no straightforward mapping between most of WhiteDB types and the CSV format and as CSV is not standardized, only limited support for data types is available. The following data types are recognized when importing from CSV:
- NULL - empty string
- int - plain integer
- double - floating point number in fixed decimal notation
- date - ISO8601 date
- time - ISO8601 time+fractions of second.
- string - input data that does not match the above types
The field separator is , (comma). The decimal point separator is . (dot).
WhiteDB may also provide RDF (if libraptor is available) and JSON (ongoing development, undocumented) support.
In the default configuration, WhiteDB runs in shared memory only. There
are two methods of providing data persistence: memory dumps, which are
snapshots of the current database contents and journal logs which contain
incremental updates to the database contents. Logs are only available
if the database is configured with the
option and need to be explicitly enabled.
A memory dump is the memory image of a WhiteDB database, saved into a file. It stores everything except concurrency control and journal file metadata. Images are not compatible between systems of different endianness or word size (the most common case would probably be 32-bit vs 64-bit systems). Also, different versions of the database library may use different format for the memory image, therefore WhiteDB automatically refuses to import an image made with a different library version.
wgdb -v to list the compatibility information of the database
library. It will display something like this:
libwgdb version: 0.7.0 byte order: little endian compile-time features: 64-bit encoded data: yes queued locks: yes chained nodes in T-tree: yes record backlinking: yes child databases: no index templates: yes
Memory dumps are suitable for creating snapshots of the database for backup purposes. Type
wgdb export imagename.bin
to create a backup of the current database (since the shared memory name
was omitted, the
wgdb utility will use "1000" by default).
wgdb import imagename.bin
Will restore the image from disk. It will completely overwrite the current memory contents. Note that if there is an existing memory segment, it needs to be large enough to fit the image. Otherwise the import will fail with an error message (and the shared memory segment will not be modified).
wgdb tool is unable to access the memory image (for example, due
to a programming error that causes the database to become permanently locked),
a "rescue" dump can be created with
wgdb export -f rescue.bin
Note however, that in such cases care must be taken, as it is unknown what type of errors the image may contain.
Journal logs provide a way of keeping a continuous backup of the database. All of the changes to the shared memory are logged incrementally and the journal logs may be played back to repeat all of those changes, bringing the database to the same state as it was at the time the log file ended.
To enable journal logging, use
./configure --enable-logging --with-logdir=./logs
during the building of the database. Replace ./logs with wherever you’d like the library to store the journal files. This location and the names of the journal files are not changeable during runtime due to security reasons. If you do not specify the log directory, /tmp (or \windows\temp) will be used by default, which may work for testing, but is probably undesirable in the long term.
Journaling must be enabled at the time of database creation. To do this from
the command line, supply the
-l switch to either the
wgdb create command
or when importing an image with the
wgdb import command. This will cause a
journal file to be created. It will be placed in logdir/wgdb.journal.shmname
where shmname is the database name, for example, "1000". The journal is then
incrementally written, until one of the three things happens:
A dump file is created, either by
wgdb exportor by calling the
- A dump file is imported (again, by command line or API)
- A journal file is replayed.
In each of these cases, the current journal file is backed up, appending .0 to its name (or .1, if .0 already exists and so forth) and a fresh journal file is started.
The first case can be considered normal usage, as it creates a snapshot of the database. Since this snapshot contains everything that the journal has logged up to this point, the journal is no longer necessary for recovery and a fresh one will be used.
Importing the dump file will either be part of a recovery process or to simply work with a new image. In either case, the previous journal has become irrelevant to the current database contents, making it necessary to start a new one.
Finally, a journal replay itself will be not logged in a journal. Therefore, the database contents after the replay and the journal file that was in use during the replay have become inconsistent, similarly than whan happens with importing a memory dump. Generally, journal backups caused by these recovery actions should be cleaned up or moved away. The user is expected to handle this manually, case by case.
Assuming we use the ./logs directory to store the journal files and the database has been compiled with journal support, let’s start by enabling the journal and adding data to the database:
wgdb create -l wgdb 1011 add 1 2 3
The contents of the ./logs directory will now be:
-rw-rw-rw- 1 user group 27 Dec 7 22:00 wgdb.journal.1011
And the contents of the database, by typing
wgdb 1011 select 10:
Let’s create a memory dump of this database. Make sure you don’t have example.bin already, as it will be overwritten.
wgdb 1011 export example.bin
Now we have these log files:
-rw-rw-rw- 1 user group 4 Dec 7 22:04 wgdb.journal.1011 -rw-rw-rw- 1 user group 27 Dec 7 22:00 wgdb.journal.1011.0
Note that the original file received the suffix .0 and a new one was created in it’s place. Let’s add more data:
wgdb 1011 add Hello world
The database now contains:
Assume next that something destroyed our database. Try
wgdb 1011 free.
We now know that we have a recent dump called example.bin and some
-rw-rw-rw- 1 user group 47 Dec 7 22:06 wgdb.journal.1011 -rw-rw-rw- 1 user group 27 Dec 7 22:00 wgdb.journal.1011.0
The newest journal is what we’re interested in. However, keep in mind that importing the dump restarts the journal. Nothing will happen to our precious log file, but in order to avoid confusion, let’s move it somewhere safe first:
mv -i ./logs/wgdb.journal.1011 ./logs/recover.me.1011
Import the dump (note that the -l switch is used to re-enable journaling so that our updates will be logged again after we’ve completed the recovery):
wgdb 1011 import -l example.bin
Try to list the database contents (
wgdb 1011 select 10):
Finally, recover the log:
wgdb 1011 replay ./logs/recover.me.1011
And the database contents after this will be
We managed to restore our latest state of the database by first importing the dump file, then replaying the journal file that was written after the dump was created. If we would now continue to modify the database, the state could be recovered by importing example.bin and then replaying recover.me.1011 and the latest journal file wgdb.journal.1011 in that order.
The recovery process creates a number of intermediate journals which we
may now clean up by
rm ./logs/wgdb.journal.1011.?. CAUTION: in this case,
we moved the log file that we cared about, away first. This is why this
command is safe to use here. In general, however, it is better to make
sure that the deleted log files do not contain anything not backed up
elsewhere. The easiest way would be to verify that the database is
healthy and create a fresh memory dump.
A reasonable procedure that helps keeping track of image dumps and journals could be implemented with a following script:
- dump the memory segment into a file such as backup.YYYYMMDD.shmname.bin
- check that the dump was successful
- move logdir/wgdb.journal.shmname.0 to journal.YYYYMMDD.shmname
This way, logdir always contains only the current journal. Assuming that YYYYMMDD represents the current date, memory dumps and journals can be archived and accessed by date. If recovery is needed, the database can be restored from the latest image and the current journal in logdir/wgdb.journal.shmname (which should first be archived separately). Any journal backups in logdir should then be removed after the restore is successful, to ensure that step 3. archives the correct journal file next time.
dserve is work in progress and not part of the whitedb distribution yet.
This is a simple REST service tool taking a query described with cgi parameters and printing json or csv output.
dserve is useful both as a ready-made tool and as an example/template for making
your own tools for WhiteDB data handling.
dserve is not compiled by default. Find it under the Examples folder as a
single source file dserve.c, modify and compile yourself by doing:
gcc dserve.c -o dserve -O2 -lwgdb
Copy the resulting executable under the cgi folder of the Apache server.
You can use
dserve as a cgi program taking a few GET parameters like this
or as a command line program: on the command line simply pass the urlencoded query string as a single argument, for example
This will print rows 2..4 of the database, like this:
content-length: 110 content-type: application/json [ [1,1.1,"a simple string",[10,,"point to me"],"2013-10-24","23:17:36.68"], [12,"an:uri",-2000], [23,-12] ]
Observe that the fourth field contains a pointer to another record, printed as a sublist. The same example with a small addition:
showid=yes parameter prepends automatic record id-s
(encoded offsets) to each record as a first element:
content-length: 134 content-type: application/json [ [23368,1,1.1,"a simple string",[23304,10,,"point to me"],"2013-10-24","23:17:36.68"], [23408,12,"an:uri",-2000], [23432,23,-12] ]
The third example asks for three first records with field 0 less than 20:
dserve 'op=search&field=0&value=20&compare=lessthan&count=3' content-length: 178 content-type: application/json [ [10,,"point to me"], [0,0.1,"a simple string",[10,,"point to me"],"2013-10-24","23:17:36.68"], [1,1.1,"a simple string",[10,,"point to me"],"2013-10-24","23:17:36.68"] ]
Dserve does not facilitate adding or updating records in the database. We may add such a cgi utility in the future, but anyway, it is a good idea to write your own utilities which fit your own needs exactly.
Most of the dserve parameters are optional, i.e. have sensible defaults. Limits, error messages etc are all configurable by changing the macro definitions at the beginning of the C source.
You have to indicate the field numbers and the values to compare the contents of these fields with. The whole set of cgi parameters is as follows:
- db : numeric database name. Default 1000.
op : either
recids. Here we assume
search, will cover
- field : field number (0,1,2, etc) to check against the compared value.
- value : value to check the field contents against. Examples: value=32, value=sometext.
- type : datatype of the value: null, int, double, str, char or record. Guessed from the value by default.
compare : equal, not_equal, lessthan, greater, ltequal or gtequal.
- from : skip initial matching records, start from the result nr given here. Default 0.
- count : max number of matching records output. Default 10000.
- depth : max depth of nested trees of records. Default 100, hard limit 10000. Set to 0 for hiding all sublists. Modify the macro MAX_DEPTH_HARD to change the hard limit.
format : either
escape : escaping special characters in json strings,
no(replace nothing) ,
url(urlencode %, " and all non-printable and non-ascii characters, i.e. under 32 and over 126, to be completely ascii-safe) or
json(not ascii-safe: replace only the minimal set indicated in the rfc). Default
json. NB! this parameter has no effect for csv, where only " gets replaced with "".
yes, print the record offset (automatic id) as the first element of each record, moving all the other fields one position to the right. Default
All the input values are assumed to be urlencoded.
The json output is a list of matching records. Each record is also presented as a list. The list elements are integers, doubles, strings or records (again represented as lists) pointed to from the field. Null is printed as an empty list .
All the other datatypes, like dates, times, URI-s, strings with a language attribute etc in WhiteDB are converted to standard strings in a fairly intuitive manner. Blobs are url-encoded. For csv and too deep json branches the full internal record is represented by its offset (encoded pointer, i.e. automatic id).
Notice that for a graph database the json output can be a complex tree of records. Cycles are possible, but can be inhibited by the depth parameter.
Errors are reported by printing an error string as a single element of the output list, both for json and csv, like this:
content-length: 48 content-type: application/json ["unrecognized op: use op=search or op=recids"]
dserve should be able to free locks and detach the database even in case of
hard errors like segfaults.
Importantly, you can give several sets of field/value/compare/count parameters to perform a complex and-query. Example:
In case you use several fields in the query, you have to fill the otherwise optional type and compare parameters for all the fields indicated. BTW, it is OK to put the same field into several comparisons.
You can also have no fields in the query at all. In this case the full database will be traversed and printed according to the parameters given.
The second way to query is to simply indicate a list of record id-s (offsets: encoded record pointers) like this:
You can learn the record id-s by using the
showid=yes parameter described before.
The database selection parameter
db and the generic formatting parameters
showid function exactly as described above.
Things to note and useful code examples in dserve:
- does not require additional libraries except wgdb
- uses readlocks, does not use writelocks
- cgi parameter parsing
- various printing routines for WhiteDB values
- text is printed into a an automatically growing string buffer
- error and timeout handling with signals: when a signal arrives, free the readlock, detach database and output ["internal error"] or ["timeout"]. Available on Linux only.