At file offset 0 in an MSF file is the MSF SuperBlock, which is laid out as follows:
struct SuperBlock {
char FileMagic[sizeof(Magic)];
ulittle32_t BlockSize;
ulittle32_t FreeBlockMapBlock;
ulittle32_t NumBlocks;
ulittle32_t NumDirectoryBytes;
ulittle32_t Unknown;
ulittle32_t BlockMapAddr;
};
The Stream Directory is the root of all access to the other streams in an MSF file. Beginning at byte 0 of the stream directory is the following structure:
struct StreamDirectory {
ulittle32_t NumStreams;
ulittle32_t StreamSizes[NumStreams];
ulittle32_t StreamBlocks[NumStreams][];
};
And this structure occupies exactly SuperBlock->NumDirectoryBytes bytes. Note that each of the last two arrays is of variable length, and in particular that the second array is jagged.
Example: Suppose a hypothetical PDB file with a 4KiB block size, and 4 streams of lengths {1000 bytes, 8000 bytes, 16000 bytes, 9000 bytes}.
Stream 0: ceil(1000 / 4096) = 1 block
Stream 1: ceil(8000 / 4096) = 2 blocks
Stream 2: ceil(16000 / 4096) = 4 blocks
Stream 3: ceil(9000 / 4096) = 3 blocks
In total, 10 blocks are used. Let’s see what the stream directory might look like:
struct StreamDirectory {
ulittle32_t NumStreams = 4;
ulittle32_t StreamSizes[] = {1000, 8000, 16000, 9000};
ulittle32_t StreamBlocks[][] = {
{4},
{5, 6},
{11, 9, 7, 8},
{10, 15, 12}
};
};
In total, this occupies 15 * 4 = 60 bytes, so SuperBlock->NumDirectoryBytes would equal 60, and SuperBlock->BlockMapAddr would be an array of one ulittle32_t, since 60 <= SuperBlock->BlockSize.
Note also that the streams are discontiguous, and that part of stream 3 is in the middle of part of stream 2. You cannot assume anything about the layout of the blocks!
As may be clear by now, it is possible for a single field (whether it be a high level record, a long string field, or even a single uint16) to begin and end in separate blocks. For example, if the block size is 4096 bytes, and a uint16 field begins at the last byte of the current block, then it would need to end on the first byte of the next block. Since blocks are not necessarily contiguously laid out in the file, this means that both the consumer and the producer of an MSF file must be prepared to split data apart accordingly. In the aforementioned example, the high byte of the uint16 would be written to the last byte of block N, and the low byte would be written to the first byte of block N+1, which could be tens of thousands of bytes later (or even earlier!) in the file, depending on what the stream directory says.