I am using Delphi 2009. I have a very simple data structure, with 2 fields:
Since your data is more than 3GB, you will need to make sure what ever database engine you select either handles tables that large, or split things up into multiple tables, which I would suggest doing no matter what the maximum size of a single table. If you perform the split, perform it as evenly as possible on a logical key break so that its easy to determine which table to use by the first or first two characters of the key. This will greatly reduce the search times by eliminating any records which could never match your query to start with.
If you just want raw performance, and will only be performing read only lookups into the data, then your better served by an ordered index file(s) using a fixed size record for your keys which points to your data file. You can then perform a binary search easily on this data and avoid any database overhead. For even more of a performance gain, you can pre-load/cache the midpoints into memory to reduce repetitive reads.
A simple fixed size record for your specs might look like:
type
rIndexRec = record
KeyStr : String[15]; // short string 15 chars max
DataLoc : integer; // switch to int64 if your using gpHugeFile
end;
For initial loading, use the Turbo Power sort found in the SysTools, which the latest version for Delphi 2009/2010 can be downloaded on the songbeamers website. The DataLoc would be the stream position of your datastring record, which writing/reading might look like the following:
function WriteDataString(aDataString:String;aStream:tStream):integer;
var
aLen : integer;
begin
Result := aStream.Position;
aLen := Length(aDataString);
aStream.Write(aLen,sizeOf(aLen));
aStream.Write(aDataString[1],aLen*sizeOf(Char));
end;
function ReadDataString(aPos:Integer;aStream:tStream):String;
var
aLen : integer;
begin
if aStream.Position <> aPos then
aStream.Seek(aPos,soFromBeginning);
result := '';
aStream.Read(aLen,SizeOf(aLen));
SetLength(Result,aLen);
if aStream.Read(Result[1],aLen*sizeOf(Char)) <> aLen*SizeOf(Char) then
raise Exception.Create('Unable to read entire data string');
end;
When you are creating your index records, the dataloc would be set to datastring record position. It doesn't matter the order in which records are loaded, as long as the index records are sorted. I used just this technique to keep a 6 billion record database up to date with monthly updates, so it scales to the extreme easily.
EDIT: Yes, the code above is limited to around 2GB per datafile, but you can extend it by using gpHugeFile, or segmenting. I prefer the segmenting into multiple logical files < 2gb each, which will take up slightly less disk space.