Riven DVD Edition’s MPEG 1/2 Layer II decompression problem fixed

Thanks to the awesome work of Christian Walther, the MPEG 1/2 Layer II decompression problem has been addressed. This entry explains the aforementioned problem and how it was solved.

The problem

MPEG 1/2 Layer II packets (frames in MPEG terminology, to which I do not adhere in this instance) always contain 1152 audio frames (an audio frame contains one sample for every channel at a given time). Consequently, encoders have to decide how to handle the situation where an input signal’s frame length is not an integer multiple of 1152.

Now there may be some manner of standard or convention as to what should be done in such a case, for example padding the beginning or the end of the signal with silence, or padding with silence half at the beginning and half at the end. But none of those are the case for Riven DVD’s MPEG 1/2 Layer II audio resources. I quickly became aware of that fact when I finished the Core Media release and tested a number of cards with looking ambiance effects: there was a very noticeable gap in the audio playback when an MPEG 1/2 Layer II resource looped back.

The solution

So I started examining DVD audio resources versus their ADPCM counterparts from the CD edition. Let’s look at the beginning of the waveforms of such a pair.

Riven CD waveform

Riven DVD waveform

As you can see, the DVD version is clearly the same signal as the ADPCM version, only with some amount of garbage (not pure silence) at the beginning. The fact that it’s not pure silence made it far more difficult to determine the number of frames to drop at the beginning of the DVD resources. Things began to linger for weeks, weeks turned into months, with no solution in sight.

And then it came to me: we have the “original” resources from the CD edition. We could run a cross-correlative analysis on a large number of CD-DVD pairs and see if a number comes out on top statistically. Having never done such an analysis, I posted a message on the Riven X development mailing list, in the vain hope that someone there might be able to help. To my great surprise, Christian Walther offered to perform the analysis. Here are his results, for 2 sample resources.

Cross-correlation 31

Cross-correlation 35

The results couldn’t have been better. For all the files he analyzed, 481 always came out as the point of maximum correlation, and generally in a very sharp manner. We had our number.

What about the end of DVD resources? Because a fixed number of frames have to be removed at the beginning, something must happen at the end as well to fit inside the 1152 frames / packet restriction. Again, Christian provided the answer.

Waveform end comparison

It would seem that the DVD resources were simply trimmed to fit. I’m not entirely sure how they made that work for resources meant to be looped, but empirical tests with updated Riven X and MHKKit did not reveal audible gaps in cards that used to exhibit them.

I’m going to be checking things more thoroughly in the coming days, but I’m carefully optimistic that this problem has been solved.