I am using svnX (0.9.13) on Mac OS X Lion (10.7.2 11C74) and have seem to have, what I believe, is a corrupted SVN repository. I have searched the site for similar questions and
You should do a dump and load as David W. suggested. However there are some gotchas that I encountered and I would like to post a complete solution.
Corruption typically occurs in single files on some revisions. We don't need to discard an entire revision just because some file had a checksum mismatch.
First we will try disabling checksum calculation, by removing lines matching Text-content-md5
svnadmin dump my_repo | sed '/^Text-content-md5/d' | svnadmin load second_repo
The incremental approach enables us to fix errors and continue our progress. If an error happens during the dump and load, look for the last --- Committed revision X >>> ---
message and put X+1 as starting revision as parameter -r and try again. This saves considerable time.
svnadmin dump --incremental -r1:100000 my_repo | sed '/^Text-content-md5/d' | svnadmin load second_repo
Or just load from the dumpfile:
sed '/^Text-content-md5/d' dumpfile.txt | svnadmin load second_repo
If that was not enough, and you're getting 'Premature end of content data in dumpstream' error or something similar, you should exclude that file completely from the dump by svndumpfilter
:
svnadmin dump --incremental -r1:100000 my_repo | svndumpfilter exclude myproject/lib/thirdparty-all.jar | sed '/^Text-content-md5/d' | svnadmin load second_repo
The command above excludes myproject/lib/thirdparty-all.jar
file from the dump.
Extra information:
--bypass-prop-validation
to svnadmin load
command. This works if the corruption is minor.Dump stream contains a malformed header (with no ':')
error with appending| grep --binary-files=text -v '^* Dumped revision'
svnadmin load
). Hope this post is useful to some people.
When you have a corrupt repository, your only real chance in saving the information is to do a dump and load. If you're lucky, doing a dump and load will sometimes correct the corruption.
If not, you can use the -r <from>:<to>
parameter on the dump to skip over the bad revisions. You can create several dump files and merge them into a single repository, so you can skip over the bad revision numbers. I've noticed that each dump file starts with a complete revision of the repository at that revision, and the dump/load process is usually smart enough not to double up changes.
In fact, I believe you can even put several dumps into a single dump file without too many problems. The following should skip over revisions 1001 and 1204 which are bad revisions:
$ svnadmin dump -r1:1000 my_repos > dumpfile.txt
$ svnadmin dump --incremental -r1002:1203 my_repos >> dumpfile.txt
$ svnadmin dump --incremental -r1205:HEAD my_repos >> dumpfile.txt
$ svnadmin load my_repos2 < dumpfile.txt
There are several Subversion backup scripts that backup the repository by taking dumps of the newest revisions. For example, the first time you run it, it dumps everything from the first revision to the last version (say revision 1000). Then, the next day it dumps revision 1001 to the last revision (say 1003), and the next day, revision 1004 to the last revision.
To restore, you have to restore all the dumps, but the backup times are suppose to be shorter than doing a full dump each time.
You can also do a hotcopy, but I don't find doing a hotcopy that much faster than doing a dump, and there could be issues if you have to move your repository to a different machine.