问题
I have a binary format which is build up like that:
magic number
name size blob
name size blob
name size blob
...
it is build up to easy move through the file and find the right entry. But I would like also to remove an entry (let's call it a chunk as it is one). I guess I can use std::copy/memmove with some iostream iterators to move the chunks behind the one to delete and copy them over the chunk to delete. But then I have the space I deleted at the end filled with unusable data(I could fill it up with zeros or not). I likely would shrink the file afterwards.
I know I can read the whole data that I want to keep in a buffer and put it into a new file, but I dislike it to rewrite the whole file for deleting just one chunk.
Any ideas for the best way of removing data in a file?
回答1:
@MarkSetchell: Had a good idea how to threat that problem:
I now have a magic number at the beginning from every chunk to check whether there is an other valid chunk comming. After moving some data towards the beginning, I move the writer-pointer right behind the last chunk and fill the space for the next magic number with zeros. So when listing up the entries it will stop when there is no valid magic number and if I add an other entry it will automatically override the unused space.
回答2:
I know I can read the whole data that I want to keep in a buffer and put it into a new file, but I dislike it to rewrite the whole file for deleting just one chunk.
Any ideas for the best way of removing data in a file?
You can't have the best of both worlds. If you want to preserve space, you will need something to describe the file sections (lets call it an allocation table), with each file sections consisting of sequence of shards).
A section would start of normally (one shard), but as soon as it is de-allocated, the de-allocated section will be made available as part of a shard for a new section. One can now choose at what point in time you are willing to live with sharded (non-contiguous) sections (perhaps only after your file reaches a certain size limit).
The allocation table describes each section as a serious (link list) of shards (or one shard, if contiguous). One could either preserve a fixed size for the allocation table, or have it in a different file, or shard it and give it the ability to reconstruct itself.
struct Section
{
struct Shard
{
std::size_t baseAddr_;
std::size_t size_;
};
std::string name_;
std::size_t shardCount_;
std::vector<Shard> shards_;
istream& readFrom( std::istream& );
};
struct AllocTable
{
std::size_t sectionCount_;
std::vector<Section> sections_;
std::size_t next_;
istream& readFrom( std::istream& is, AllocTable* previous )
{
//Brief code... error handling left as your exercise
is >> sectionCount_;
sections_.resize( sectionCount_ );
for( std::size_t i = 0; i < sectionCount_; ++i )
{
sections_[i].readFrom( is );
}
is >> next_; //Note - no error handling for brevity
if( next_ != static_cast<std::size_t>(-1) )
{
is.seekg( next_ ); //Seek to next_ from file beginning
AllocTable nextTable;
nextTable.readFrom( is, this );
sections_.insert( sections_.end(),
nextTable.sections_.begin(), table_.sections_.end() );
}
return is;
}
};
...
来源:https://stackoverflow.com/questions/24496367/remove-memory-from-the-middle-of-a-file