I\'m looking for a cross-platform database engine that can handle databases up hundreds of millions of records without severe degradation in query performance. It needs to h
i'm the author of hamsterdb.
tokyo cabinet and berkeleydb should work fine. hamsterdb definitely will work. It's a plain C API, open source, platform independent, very fast and tested with databases up to several hundreds of GB and hundreds of million items.
If you are willing to evaluate and need support then drop me a mail (contact form on hamsterdb.com) - i will help as good as i can!
bye Christoph
For anyone finding this page a few years later, I'm now using LevelDB with some scaffolding on top to add the multiple indexing necessary. In particular, it's a nice fit for embedded databases on iOS. I ended up writing a book about it! (Getting Started with LevelDB, from Packt in late 2013).
I believe what you are looking for is BerkeleyDB: http://www.oracle.com/technology/products/berkeley-db/db/index.html
Never mind that it's Oracle, the license is free, and it's open-source -- the only catch is that if you redistribute your software that uses BerkeleyDB, you must make your source available as well -- or buy a license.
It does not provide SQL support, but rather direct lookups (via b-tree or hash-table structure, whichever makes more sense for your needs). It's extremely reliable, fast, ACID, has built-in replication support, and so on.
Here is a small quote from the page I refer to above, that lists a few features:
Data Storage
Berkeley DB stores data quickly and easily without the overhead found in other databases. Berkeley DB is a C library that runs in the same process as your application, avoiding the interprocess communication delays of using a remote database server. Shared caches keep the most active data in memory, avoiding costly disk access.
- Local, in-process data storage
- Schema-neutral, application native data format
- Indexed and sequential retrieval (Btree, Queue, Recno, Hash)
- Multiple processes per application and multiple threads per process
- Fine grained and configurable locking for highly concurrent systems
- Multi-version concurrency control (MVCC)
- Support for secondary indexes
- In-memory, on disk or both
- Online Btree compaction
- Online Btree disk space reclamation
- Online abandoned lock removal
- On disk data encryption (AES)
- Records up to 4GB and tables up to 256TB
Update: Just ran across this project and thought of the question you posted: http://tokyocabinet.sourceforge.net/index.html . It is under LGPL, so not compatible with your restrictions, but an interesting project to check out, nonetheless.
You didn't mention what platform you are on, but if Windows only is OK, take a look at the Extensible Storage Engine (previously known as Jet Blue), the embedded ISAM table engine included in Windows 2000 and later. It's used for Active Directory, Exchange, and other internal components, optimized for a small number of large tables.
It has a C interface and supports binary data types natively. It supports indexes, transactions and uses a log to ensure atomicity and durability. There is no query language; you have to work with the tables and indexes directly yourself.
ESE doesn't like to open files over a network, and doesn't support sharing a database through file sharing. You're going to be hard pressed to find any database engine that supports sharing through file sharing. The Access Jet database engine (AKA Jet Red, totally separate code base) is the only one I know of, and it's notorious for corrupting files over the network, especially if they're large (>100 MB).
Whatever engine you use, you'll most likely have to implement the shared usage functions yourself in your own network server process or use a discrete database engine.
SQLite tends to be the first option. It doesn't store data as strings but I think you have to build a SQL command to do the insertion and that command will have some string building.
BerkeleyDB is a well engineered product if you don't need a relationDB. I have no idea what Oracle charges for it and if you would need a license for your application.
Personally I would consider why you have some of your requirements . Have you done testing to verify the requirement that you need to do direct insertion into the database? Seems like you could take a couple of hours to write up a wrapper that converts from whatever API you want to SQL and then see if SQLite, MySql,... meet your speed requirements.
you might consider C-Tree by FairCom - tell 'em I sent you ;-)