问题
I'm trying to write a program to store large amount of data (100s of PB) on tapes. I'm using tar to group files together, but for technical reasons I've decided to write multiple tars in one tape.
In order to easily find what data are on a tape, I've decided to create a small index and write it at the beginning of the tape. So I'm doing something like this:
# create an empty index file
head -c 1M < /dev/urandom > index.txt
# rewind tape
mt -f /dev/nst0 rewind
# write index to the beginning of the tape
dd bs=4k if=index.txt of=/dev/nst0
# write tar file to tape
dd bs=4k if=one.tar of=/dev/nst0
...
After I've copied all the tar files, I create a new index.txt with the exact same size and copy it into the beginning of the tape:
mt -f /dev/nst0 rewind
dd bs=4k if=index.txt of=/dev/nst0
But it corrupts rest of the data. By corrupt I mean if I rewind the tape and try to read from it, I can only read the index.txt file, after that it can't read any more data, and mt status
results in:
SCSI 2 tape drive:
File number=1, block number=-1, partition=0.
Tape block size 0 bytes. Density code 0x5c (LTO-7).
Soft error count since last status=0
General status bits on (9010000):
EOD ONLINE IM_REP_EN
At the beginning I though dd somehow ruined the EOF Mark at the end of the index.txt so I tried to edit only the beginning of the file:
dd conv=notrunc count=10 bs=4k if=index.txt of=/dev/nst4
The wired thing is after that, my first entry in the tape will have only 40K! (10 blocks each 4k)
Am I missing something in behavior of the tape and dd command?
P.S:The data is stored on a Ceph as objects and I need to download them, and I don't have enough space to store 1 tape
回答1:
I had the same idea and I hit the same problem. I am working on a simple tape backup program, which is basically a wrapper for tar which also includes a table of contents at the beginning which can be retrieved using the list function. It also has a verify function to check if the files in the archive still match their original checksum or if something has been damaged.
I wanted to implement a real append function but to my surprise, it doesn't seem possible to prevent the system from writing a filemark (at the wrong position, within the archive) after updating the TOC at the beginning.
However, my backup program, which goes by the name of "TOCTAR", also has a safety check that prevents the admin from overwriting the first archive on the tape if the tape file index option wasn't provided. It also has an auto-append feature, which tries to find all archives (which were created by that program), leaving them untouched and creates a new tape file at the end. Maybe it'll be useful for your use case. Feel free to open a Github issue if you find something that's wrong or missing.
In short: I haven't manage to overwrite a section of a tape without truncating it. But you could create as many archives on a tape as you can count and my tape backup program "TOCTAR" might help you with that. (Shameless self-promotion.)
I'd post the Github url here, but Github is currently down (504 Gateway Time-out). What a sad day.
Update: https://github.com/c0xc/toctar.git
来源:https://stackoverflow.com/questions/61087471/overwrite-a-file-on-tape