Interpreting WAV Data

Question 1

I'm trying to write a program to display PCM data. I've been very frustrated trying to find a library with the right level of abstraction, but I've found the python wave library and have been using that. However, I'm not sure how to interpret the data.

The wave.getparams function returns (2 channels, 2 bytes, 44100 Hz, 96333 frames, No compression, No compression). This all seems cheery, but then I tried printing a single frame:'\xc0\xff\xd0\xff' which is 4 bytes. I suppose it's possible that a frame is 2 samples, but the ambiguities do not end there.

96333 frames * 2 samples/frame * (1/44.1k sec/sample) = 4.3688 seconds

However, iTunes reports the time as closer to 2 seconds and calculations based on file size and bitrate are in the ballpark of 2.7 seconds. What's going on here?

Additionally, how am I to know if the bytes are signed or unsigned?

Many thanks!

Question 2

Thank you for your help! I got it working and I'll post the solution here for everyone to use in case some other poor soul needs it:

import wave
import struct
def pcm_channels(wave_file):
 """Given a file-like object or file path representing a wave file,
 decompose it into its constituent PCM data streams.
 Input: A file like object or file path
 Output: A list of lists of integers representing the PCM coded data stream channels
 and the sample rate of the channels (mixed rate channels not supported)
 """
 stream = wave.open(wave_file,"rb")
 num_channels = stream.getnchannels()
 sample_rate = stream.getframerate()
 sample_width = stream.getsampwidth()
 num_frames = stream.getnframes()
 raw_data = stream.readframes( num_frames ) # Returns byte data
 stream.close()
 total_samples = num_frames * num_channels
 if sample_width == 1: 
 fmt = "%iB" % total_samples # read unsigned chars
 elif sample_width == 2:
 fmt = "%ih" % total_samples # read signed 2 byte shorts
 else:
 raise ValueError("Only supports 8 and 16 bit audio formats.")
 integer_data = struct.unpack(fmt, raw_data)
 del raw_data # Keep memory tidy (who knows how big it might be)
 channels = [ [] for time in range(num_channels) ]
 for index, value in enumerate(integer_data):
 bucket = index % num_channels
 channels[bucket].append(value)
 return channels, sample_rate

Question 3

"Two channels" means stereo, so it makes no sense to sum each channel's duration -- so you're off by a factor of two (2.18 seconds, not 4.37). As for signedness, as explained for example here, and I quote:

8-bit samples are stored as unsigned bytes, ranging from 0 to 255. 16-bit samples are stored as 2's-complement signed integers, ranging from -32768 to 32767.

This is part of the specs of the WAV format (actually of its superset RIFF) and thus not dependent on what library you're using to deal with a WAV file.

Question 4

Thank you! I can only hope it was my lack of sleep that kept me from noticing the stereo number ;-)

Question 5

I know that an answer has already been accepted, but I did some things with audio a while ago and you have to unpack the wave doing something like this.

pcmdata = wave.struct.unpack("%dh"%(wavedatalength),wavedata)

Also, one package that I used was called PyAudio, though I still had to use the wave package with it.

Question 6

Each sample is 16 bits and there 2 channels, so the frame takes 4 bytes

Question 7

The duration is simply the number of frames divided by the number of frames per second. From your data this is: 96333 / 44100 = 2.18 seconds.

Question 8

Building upon this answer, you can get a good performance boost by using numpy.fromstring or numpy.fromfile. Also see this answer.

Here is what I did:

def interpret_wav(raw_bytes, n_frames, n_channels, sample_width, interleaved = True):
 if sample_width == 1:
 dtype = np.uint8 # unsigned char
 elif sample_width == 2:
 dtype = np.int16 # signed 2-byte short
 else:
 raise ValueError("Only supports 8 and 16 bit audio formats.")
 channels = np.fromstring(raw_bytes, dtype=dtype)
 if interleaved:
 # channels are interleaved, i.e. sample N of channel M follows sample N of channel M-1 in raw data
 channels.shape = (n_frames, n_channels)
 channels = channels.T
 else:
 # channels are not interleaved. All samples from channel M occur before all samples from channel M-1
 channels.shape = (n_channels, n_frames)
 return channels

Assigning a new value to shape will throw an error if it requires data to be copied in memory. This is a good thing, since you want to use the data in place (using less time and memory overall). The ndarray.T function also does not copy (i.e. returns a view) if possible, but I'm not sure how you ensure that it does not copy.

Reading directly from the file with np.fromfile will be even better, but you would have to skip the header using a custom dtype. I haven't tried this yet.

SapphireSun 9,43811 gold badges51 silver badges61 bronze badges · Accepted Answer · 2010-02-09 06:18:05Z

Thank you for your help! I got it working and I'll post the solution here for everyone to use in case some other poor soul needs it:

import wave
import struct
def pcm_channels(wave_file):
 """Given a file-like object or file path representing a wave file,
 decompose it into its constituent PCM data streams.
 Input: A file like object or file path
 Output: A list of lists of integers representing the PCM coded data stream channels
 and the sample rate of the channels (mixed rate channels not supported)
 """
 stream = wave.open(wave_file,"rb")
 num_channels = stream.getnchannels()
 sample_rate = stream.getframerate()
 sample_width = stream.getsampwidth()
 num_frames = stream.getnframes()
 raw_data = stream.readframes( num_frames ) # Returns byte data
 stream.close()
 total_samples = num_frames * num_channels
 if sample_width == 1: 
 fmt = "%iB" % total_samples # read unsigned chars
 elif sample_width == 2:
 fmt = "%ih" % total_samples # read signed 2 byte shorts
 else:
 raise ValueError("Only supports 8 and 16 bit audio formats.")
 integer_data = struct.unpack(fmt, raw_data)
 del raw_data # Keep memory tidy (who knows how big it might be)
 channels = [ [] for time in range(num_channels) ]
 for index, value in enumerate(integer_data):
 bucket = index % num_channels
 channels[bucket].append(value)
 return channels, sample_rate

CollectivesTM on Stack Overflow

Interpreting WAV Data

6 Answers 6

Comments

1 Comment

Comments

Comments

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Linked

Hot Network Questions

CollectivesTM on Stack Overflow

6 Answers 6

Comments

1 Comment

Comments

Comments

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Linked

Related