How to ensure that data doesn't get corrupted when saving to file?

前提是你 提交于 2019-11-30 07:22:43

A lot of programs uses this approach, but usually, they do more copies, to avoid also human error.

For example, Cadsoft Eagle (a program used to design circuits and printed circuit boards) do up to 9 backup copies of the same file, calling them file.b#1 ... file.b#9

Another thing you can do to enforce security is to hashing: append an hash like a CRC32 or MD5 at the end of the file. When you open it you check the CRC or MD5, if they don't match the file is corrupted. This will also enforce you from people that accidentally or by purpose try to modify your file with another program. This will also give you a way to know if hard drive or usb disk got corrupted.

Of course, faster the save file operation is, the less risk of loosing data you have, but you cannot be sure that nothing will happen during or after writing.

Consider that both hard drives, usb drives and windows OS uses cache, and it means, also if you finish writing the data may be OS or disk itself still didn't physically wrote it to the disk.

Another thing you can do, save to a temporary file, if everything is ok you move the file in the real destination folder, this will reduce the risk of having half-files.

You can mix all these techniques together.

Rather than "always write to the oldest" you can use the "safe file write" technique of:

(Assuming you want to end up saving data to foo.data, and a file with that name contains the previous valid version.)

  • Write new data to foo.data.new
  • Rename foo.data to foo.data.old
  • Rename foo.data.new to foo.data
  • Delete foo.data.old

At any one time you've always got at least one valid file, and you can tell which is the one to read just from the filename. This is assuming your file system treats rename and delete operations atomically, of course.

  • If foo.data and foo.data.new exist, load foo.data; foo.data.new may be broken (e.g. power off during write)
  • If foo.data.old and foo.data.new exist, both should be valid, but something died very shortly afterwards - you may want to load the foo.data.old version anyway
  • If foo.data and foo.data.old exist, then foo.data should be fine, but again something went wrong, or possibly the file couldn't be deleted.

Alternatively, simply always write to a new file, including some sort of monotonically increasing counter - that way you'll never lose any data due to bad writes. The best approach depends on what you're writing though.

You could also use File.Replace for this, which basically performs the last three steps for you. (Pass in null for the backup name if you don't want to keep a backup.)

In principle there are two popular approaches to this:

  • Make your file format log-based, i.e. do not overwrite in the usual save case, just append changes or the latest versions at the end.

or

  • Write to a new file, rename the old file to a backup and rename the new file into its place.

The first leaves you with (way) more development effort, but also has the advantage of making saves go faster if you save small changes to large files (Word used to do this AFAIK).

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!