The Xcode XIP file, if you’re familiar with that particular format, is a couple layers of wrapping on top of a cpio archive, so deep inside my tool to expand these there’s a spot where I read these all out in a similar fashion. Tar’s “competitor”, cpio, does this as well (at least in one of the popular implementations). Tar is the only file format I know of which uses base 8 to encode numbers. But no, they're actually encoded as octal strings (with a NUL terminator, or sometimes a space terminator). ) would be encoded in base 10, or maybe in hex, or using plain binary numbers ("base 256"). > You may think that the numeric values (file_mode, file_size, file_mtime.
Even if the tape drive could actually read backwards without seeking to read each successive block in the normal direction (I'm sure there's at least one weird tape drive out there that could but not the common models, although if you read very small blocks you could get pretty good performance doing this as long as you kept the seeks within the buffer columns), the seek time to get to the end of the tape and then back to the beginning could be minutes. This works fine on random access devices but is of course pretty untenable on tape. Putting the master header at the end of ZIP files requires initially reading them backwards to locate the beginning of that data structure. probably mostly because this is way less practical if the tarball has been compressed (compressing the files in the tarball instead fixes this issue and I've written tools that do it that way before, but it doesn't seem to be common either). Similarly it is possible to build an "out-of-band" index for a tar file but I'm not aware of a tool that does so.
In fact a lot of common tape archiving solutions would just put the index on its own dedicated tape, which of course makes sense when you consider that a computer operator would probably need to find out which tape a record of interest was on. Actually there were tools that did this when writing to tape as well, but tar wasn't one of them. keep in mind that the common ZIP implements a very simple optimization to address this exact issue, which is writing the index at the end of the file instead of at the beginning. I wouldn't view this as being about write speed. It's even possible to do this relatively cheaply with Range headers if you're looking at a tar file stored on an HTTP endpoint, though you'll likely want to think about some smart pre-fetching behaviour. If the file is on disk, it should be relatively trivial. Your only bottleneck is just how fast random reads are. Generating an index is fairly straightforward, the file headers give you the information you need, including what you need to know to get to the next file header.
Taking time to construct a full index just delays starting the slowest stage of things, while providing only relatively minimal benefit. You need to start writing to the target device as fast as possible. Tape bandwidth has almost always been slow compared to source data devices.
It was designed around streaming files to a tape drive in a serial fashion.
The only kind of thing in a tar file is a file object.
There's no archive header, no index of files to fascilitate seeking, no magic bytes to help file and its ilk detect whether a file is a tar archive, no footer, no archive-wide metadata. Tar is pretty unusual for an archive file format.