disk storage of arrays etc

北城以北 提交于 2019-12-11 13:02:02

问题


Does anyone have experience with storing data on disk? What I have is an in-memory modelling application that can do calculations etc. Basically the data is stored as lists of objects, that have nested key-value collections like Dictionary< int, Dictionary< int, T>>.

Right now I use SQL-Server as a persistance layer but I use very few features of it. So I'm thinking I could write/ read the data to disk myself to reduce dependencies and ease installation.

So I wrote a little routine that writes each array to disk in roughly this format, where the words "ObjId", "Type", "Valid" and "Count" are not actually in the file, they're the 1st, 2nd, 3rd an 4th int in the byte[], then come < int, T > pairs. The 52 comes from 4 * 4 + 3 * (4 + 8). (4 bytes for int, 8 for double)

Bytes: 52

ObjId: 123 
Valid: 234  
Type: double
Count: 3
    1 .23
    2 .34
    3 .45

In real life there's no indentation etc, they're all sequential bytes in a long stream.

This is fine, to write once. But when I want to write an extra value somewhere in the middle I have to rewrite the whole thing. Also I can't update a single value easily.

One alternative is to write each object to a separate file so I would only have to rewrite that. But but that seems quite inefficient because I get files that are 1kb, but 4kB on disk so I'd be wasting space there.

So what do I need to to do, to be able to incrementenally write to this file on disk? I know SqlServer has 'pages' where it writes data, is that the way to go?

Is there any library ready to go for this type of problem? Maybe some virtual file that will let me treat them as seperate byte[] but handles the storage as a single psysical file? Ideally compressed.. (pushing it, but who knows.. I've been surprised before :-)

Thanks in advance,

Gert-Jan


回答1:


If you don't want the overhead of an RDBMS, you could use a key-value database like Berkeley DB. There is a C# interface for it here:

Berkeley DB for .NET

You can have one entry for each array, and just rewrite that when you need to. The rest of the database file will be unchanged so it's much faster than rewriting the whole file.

You can reuse the serialization logic you've already implemented when you write out an array. All you need to add is a unique key for each array.




回答2:


You won't be able to get around having either 1 file per object, or having to rewrite the whole list of objects when you make a change. You could use SQLite. It is a single file embedded database that is very fast and efficient. This means your application doesn't have any external dependencies on the db.

If you are writing your data directly, you should read and write it in binary format. You will be storing your integers in one byte instead of their ASCII representation (1234 = 4 bytes, but is a 1 byte int).

This will speed the reading and writing to the file.

Some code from the article:

    Hashtable addresses = new Hashtable();
    addresses.Add("Jeff", "123 Main Street, Redmond, WA 98052");
    addresses.Add("Fred", "987 Pine Road, Phila., PA 19116");
    addresses.Add("Mary", "PO Box 112233, Palo Alto, CA 94301");

    // To serialize the hashtable and its key/value pairs,  
    // you must first open a stream for writing. 
    // In this case, use a file stream.
    FileStream fs = new FileStream("DataFile.dat", FileMode.Create);

    // Construct a BinaryFormatter and use it to serialize the data to the stream.
    BinaryFormatter formatter = new BinaryFormatter();
    try 
    {
        formatter.Serialize(fs, addresses);
    }
    catch (SerializationException e) 
    {
        Console.WriteLine("Failed to serialize. Reason: " + e.Message);
        throw;
    }



回答3:


There are a thousand and one ways of storing information on disk. You've already had suggestions about databases. You might also want to consider structured file formats such as HDF5 which has bindings for languages including C#. One of HDF5's strengths is its support for storing n-dimensional arrays.




回答4:


In addition to the other suggestions made here you could try MongoDB with NORM as a great, friction-free (no database to configure, no object relational mapping to create) way to store data without the overhead / cost of SQL server.



来源:https://stackoverflow.com/questions/4102237/disk-storage-of-arrays-etc

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!