Ten million inserts and one million queries per second for a single core

Core operations

Basic WhiteDB operations are extremely fast:

  • Inserting a new record of five fields and setting the values:
    10 million records per second.
  • Scanning a non-indexed field, counting occurrences of a value: 50 million records per second.
  • Querying an indexed field from a database of 10 million records: 300 000 full queries per second.

The numbers are obtained on a laptop with an Intel Core i7-2820QM and 8 gigabytes of RAM on a 32-bit Gentoo Linux, in a single process / core. See the details below.

For multithreaded efficiency and more complex loads see the Yahoo benchmark results


Everybody knows that databases are not easy to compare: it really depends on actual usage.

This said, we provide a comparison of WhiteDB with the leading NoSQL databases MongoDB and Redis in the main-memory mode, using the Yahoo cloudserving benchmark.

The Yahoo benchmark is implemented in Java and runs predefined workloads against the chosen database engine, using one or multiple threads. Our results are favourable, although there is always room to improve.

Yahoo benchmark results »

Core speed tests

1. Preliminaries

The tests below measure the speed of basic operations: creating a database, adding a record, inserting values, performing scans and queries. The timings are obtained on a laptop with an Intel Core i7-2820QM and 8 gigabytes of 1333 Mhz RAM running 32-bit Gentoo Linux. Both the library and the tests are compiled with gcc 4.2.4, using the optimization level -O2.

We run these tests on a 32-bit system with the field size in a record being also 32 bits. Using 64 bit fields would probably make the operations slightly slower.

All the tests are run as a single process without threads, hence utilizing a single processor core and not requiring any locks. As the Yahoo benchmark results indicate, you can get higher throughput using multiple processes/cores. On the other hand, multiple processes doing writes would require locking, i.e. inserting wg_start_write, wg_start_read, wg_end_write, wg_end_read from the C API into the test code. Locking speed depends on both where you call the lock functions and the locking strategy set during WhiteDB configuration. As said before, the following tests do not require and do not use locking.

2. How to run the tests

The tests are available as small standalone source files speed1.c, speed2.c etc in Examples/speed. Compile and run them yourself by

gcc speed1.c -o speed1 -O2 -lwgdb
time ./speed1

Tests 1-8 create a new database, run the operations and then delete the database. Tests 10 and later will not delete the database: do it yourself by wgdb 10 free where 10 is the test number (look at the test source).

Make sure that you have sufficient shared memory before running the tests: all the tests assume that you have at least one gigabyte of shared memory available. Check the available amount and increase as necessary:

less /proc/sys/kernel/shmmax
sysctl kernel.shmmax=1000000000
Alternatively, replace the wg_attach_database(name, 1000000000) calls with wg_attach_local_database(1000000000) to use ordinary non-shared memory, accompanied by wg_delete_local_database later: this is ok for tests 1-9.

Below we will always present the wall-clock time as reported by time, including both user and system time.

3. Creating and deleting a database

speed1.c creates a new 1 gigabyte database and then immediately deletes the database, repeating the whole process 1000 times.

The time spent is ca 10 seconds, meaning that you can create and then delete a 1 gigabyte database 100 times per second.

Creating/deleting a smaller database is faster: it takes 13 seconds to do 100 000 creations/deletions of a 10 megabyte database (the total amount of memory involved is the same as in the previous test).

The database creation time increases roughly linearly with the database size. Database creation splits the whole space into a number of subareas and datastructures, mostly organized into free memory chunk lists for various sizes.

4. Inserting and deleting records

speed2.c creates a new 1 gigabyte database and then creates 10 million records of length 5, with record values initialized to NULL.

The time spent is ca 0.6 seconds, meaning that you can create one 5-field record in ca 60 nanoseconds.

In comparison, on the same system it takes ca 0.08 seconds to allocate 1 GB with conventional malloc and then just sequentially write 50 million integer 0 values: the same number of integers as the NULL-initialized fields in the database.

speed3.c performs a very similar task to speed2.c: it creates a new 1 gigabyte database and then creates 10 million records of length 5, this time not initializing the field values.

The time spent by this faster version is ca 0.349 seconds.

speed4.c, again, creates a new 1 gigabyte database and then creates 10 million records of length 5, with record values initialized to NULL, this time deleting each newly created record immediately after creation.

The time spent by this create/delete version is ca 1.1 seconds.

5. Inserting and setting field values

In real life you insert a record to fill it with real values, not just NULL.

Integer values

speed5.c creates a new 1 gigabyte database, then creates 10 million records of length 5, with field values then set to integers of increasing value.

Observe that integers stored are encoded. Also, we are using wg_set_new_field which assumes the field does not contain an allocated or indexed value. The conventional wg_set_field checks for the previous value (to deallocate/remove from index as necessary) and is slightly slower.

The time spent by this proper insert is 1.16 seconds. The slightly slower wg_set_field version takes 1.55 seconds.

Now, when we add code to read back the immediately written value from the base, decode and add it to a counter, we get the running time 1.77 seconds.

Next we modify the code to create just 1000 records, but fairly long ones: 50 thousand fields each, again filling them with growing integers. This gives us the same total number of fields as in the previous test.

The time spent by the long-record version is 0.94 seconds, slightly faster than the initial proper insertion test.

String values

So far we have used integers, which should be expected to be the fastest value type.

The next step is to experiment with strings. WhiteDB uses two different encoding methods for strings:

  • Strings shorter than 32 bytes are always separately allocated as 32-byte chunks, with the field containing a pointer to the string. BTW, the 32-byte limit can be modified by changing the SHORTSTR_SIZE macro in dballoc.h
  • Longer strings (and also short strings with an added language/namespace attribute) are allocated uniquely: a single copy is kept of each string. This saves space and allows very fast string equality comparison: just pointers have to be compared. However, encoding a long string involves computing a hash of a string, looking it up from the hash table and inserting into hash chain, which takes some time.

speed6.c creates a new 1 gigabyte database, then creates 1 million records of length 5, with field values then set to a 30-byte string. NB! We have decreased the number of records from 10 million in previous tests to 1 million for a short string: otherwise the strings won't fit into our meager 1 GB database.

The time spent by the string-fields test is 0.4 seconds. This would translate to 4 seconds when compared with the 10 million records of the integer test, making the string-record insertion ca 3.5 times slower.

Testing with a 10-byte string instead of a 30-byte string has a very small effect.

speed7.c performs the same operations as the previous speed6, but this time using a 100-byte string as a field value. The string is kept uniquely, essentially just one copy for the whole database.

The time spent by the test is 1.5 seconds, about three times slower than the 30-byte string test was. The memory requirements are a lot lower for this test than the previous test, but this would change if we'd save a different string each time.

Double values

speed8.c performs the same operations as the last string tests, but using a double value. Again, just one million records of five fields are created. All of these are allocated as 8-byte values. The field is assigned a pointer to the allocated value.

The time spent by the test is 0.19 seconds, which is significantly faster than the string tests. When compared to the integer tests of 10 million records, it would correspond to 1.9 seconds, which is a bit slower than the integer test.

6. Scanning, queries and pointers

The first test scans the database, i.e. goes through all the records sequentially, looking for specific values in a given field. This is what happens when you search through a non-indexed field. The second test will query an indexed field and demonstrate, as expected, real speed.

Scanning records over a non-indexed value

speed10.c builds 10 million records of five fields, with the value of field 3 assigned an integer stemming from the last five digits of the record number, thus storing each number between 0...100 000 to 100 different records.

speed11.c takes the previously built database and scans through all the records, checking whether field 3 contains the value 123. Matching records are counted: there are 100 of these.

The scanning test takes 0.2 seconds. In the other words, you can expect to scan through 50 million records in a second.

Querying records over an indexed value

speed12.c creates the T-tree index for the (integer) field 3 in the previously created database of 10 million records with 100000 different values in the field.

The indexing test takes 6.5 seconds.

speed13.c performs a query over the previously indexed field 3, again looking for the value 123 and counting all the matching records (exactly 100). In order to make the time measureable, we repeat the whole query building / searching / fetching / deallocating process one million times.

The time to perform million queries is 3.2 seconds. In the other words, you can make ca 300 000 queries per second. For smaller databases the query time decreases logarithmically: running the same test on a 1000-record database takes 1.1 seconds, i.e. ca one million queries per second is the maximum speed you could get.

Record pointers: the foundation of a graph database

Searching for a related record through an index using the record id (the standard SQL way) is neither the fastest nor the simplest way of going through complex data structures.

It is much faster and easier to store direct pointers to records.

speed15.c builds a database of 10 million records and stores a pointer to the previously created record to the field 3 of each record. Essentially, all the 10 million records will form a long chain from the last record back to the first. Additionally, we store a pointer to the last record into field 2 of the very first record, to directly access the last record later.

The building test takes 0.66 seconds.

speed16.c traverses through the whole chain of backward pointers and counts the 10 million records in a list.

The traversal test takes 0.15 seconds: in other words, you can traverse almost 100 million records in a linked list in a second.