Singapore Digital Music House

About Computer Music and Audio Formats

Introduction to MIDI

Musical Instrument Digital Interface is a simple digital protocol. By connecting a computer to a keyboard through a MIDI connection, you get the musical equivalent of a word processor. You can play the keyboard to enter music into the computer, edit it on the computer using softwares like Cakewalk and then play it back though the keyboard. If you do not have a synthesizer, there are softwares available to playback MIDI using the computer. I recommend Yamaha's XG50 Soft Synthesizer, Roland's Virtual Sound Canvas and Creative's SBLive! together with soundfonts. MIDI files do not specify precise instrument sounds, it depends on which synthesizer you use. General MIDI is standardized by MIDI Manufacturer's Association.

How is MIDI implemented?

General MIDI specifies 128 standard instruments, as well as other common capabilities. A standard MIDI file is a sequence of chunks having the same format as chunks used by WAVE and AIFF. There are three types of MIDI files. Type zero files contain one track, type one files contain multiple tracks and type two files are less common, contain multiple tracks without assuming any relation among the tracks.

MThd Header MTrk Track 0 MTrk Track 1 MTrk Track 2 ...

MIDI File Format 1
At the beginning of every MIDI file, there is a MThd chunk containing 2 bytes file type, 2 bytes for number of tracks and 2 bytes for time format. Data length can be variable in size, from 1 byte to 4 bytes. If the most significant bit is set to 1, it represents another byte is used to extend the data.

MThd Variable Bytes Data Length 2 Bytes Format Type 2 Bytes No of Tracks 0 + Ticks per Beat
1 + FrameRate + Ticks/Frame

MIDI Header Chunk
A track usually contains one instrument played on one channel. Many tracks may share the same channel. The format of a track is a list of MIDI events each preceded by a delta time value. This is the time interval between events measured in ticks. This depends on the time format specified in the header, which can be changed by special events within the file.

MTrk Variable Bytes Data Length MIDI Events ... Meta Events ...

MIDI Track Chunk
A MIDI event is a packet of data that specifies musical actions like key press using what velocity and channel, key release with what velocity and channel, pitch wheel change for which channel and proportion of change, selecting instrument sound for which channel, and changing the pressure (vibrato) of all notes playing on certain channel. Each packet has a header with the most significant bit set to 1 called the status byte. The header indicates the type of event followed by data bytes, which does not have the most significant bit set. MIDI uses a technique called running status. If a data byte is at the place of a status byte, the previous status is reused. A Sysex event contains (timestamp, F0, length, sysex bytes, F0), followed by (timestamp, F7, length, data bytes).

Variable Length Time Stamp Status Byte(0x80-0xFF) Data Byte(0x00-0x7F) Data Byte(0x00-0x7F) ...

Typical MIDI Message Format
A Channel message is directed to a particular destination and subdivided to voice messages and mode messages. Channel 1 is represented by 0x0, and channel 16 by 0xF.

Channel Voice Message Status Data Data

Note Off 0x8n Note(C4=60) Velocity(0-127)

Note On 0x9n Note Velocity

Key Aftertouch 0xAn Note Pressure

Control Change 0xBn Ctrl Value

Program Change 0xCn Prog. No.(0-127) -

Channel Aftertouch 0xDn Pressure -

Pitch Bend 0xEn LSB(0x00-0x7F relative to bend range) MSB(0x00-0x7F, 0x00 0x40 for no bend)

Channel Messages, where n=channel number
Here's a list of controllers that can be used in MIDI. Controllers 0-31 specify the Most Significant Byte(MSB) providing 7 bits of resolution, while controller 32-63 gives finer control. Controllers 64-127 acts as switches.

Controller Number Effect Controller Number Effect

0 Bank Select 1 Modulation

2 Breath 4 Foot Controller

5 Portamento time 6 Data entry MSB

7 Channel volume 8 Balance

10 Pan 11 Expression

12 Effect 1 13 Effect 2

16-19 General Purpose 32-63 LSB for controllers

64 Sustain 65 Portamento On/Off

66 Sostenuto 67 Soft Pedal

68 Legato footswitch 69 Hold 2

70 Sound Controller 1 71 Timbre / Harmonic intensity

72 Release time 73 Attack Time

74 Brightness 75-79 Sound Controllers

80-83 General Purpose 84-90 Portamento Control

91-95 Effects 96 Data increment

97 Data decrement 98 Non-registers parameter LSB

99 Non-registered parameter MSB 100-101 Registered parameter LSB&MSB

120 All Sound Off (0x Bn 78 00 ) 121 Reset All Controllers(0x Bn 79 00)

122 Local control (0x Bn 7A 00 for on FF for off) 123 All Notes Off (0x Bn 7B 00)

124 Omni Off 125 Omni On

126 Mono On (0x Bn 7E val = no. of channels or 0 to set number of voices) 127 Poly On

Meta events are used to store information like key signatures and copyright notices. They are time-stamped non-MIDI Events.

Variable Bytes Time Stamp FF 1 byte Type Length Data...

Value (0x) Event Data

01 Text Event ASCII Text

02 Copyright notice

03 Track name Length specifies the bytes used for text

04 Instrument name

05 Lyric

06 Marker Text event

07 Cue point

20 MIDI Channel Prefix

2F End of Track Length is always 0

51 Set Tempo Tempo is microseconds per beat. Default is 120 bpm

54 SMPTE Offset

58 Time Signature
1st 2 bytes is numerator and denominator(specified in power of 2). 3rd bytes is number of MIDI clocks per metronome click. 4th byte is number os 32nd notes(demisemiquavers) per 24 midi clocks.

59 Key Signature 1st byte, if positive, it is the number of sharps, if negative, it is the number of flats. 2nd byte, 0 represents major key, 1 represents minor key.

Meta Events and Different Types

Introduction to MP3

MP3 is the file extension for MPEG-1 Layer 3 audio files. Moving Pictures Expert Groups (MPEG)developed a standard way to compress video sequences. The MPEG standard is well known for their video compression, but they also support high quality audio compression. MP3 encoders compresses wave files to one tenth the original size using the maximum bit rate and sampling rate. MP3 decoders like Winamp are able to decode and playback MP3 files. If songs were not copyrighted, imagine the ease of obtaining unlimited songs through the Internet without having to buy CDs! All you need is a huge disk space and a MP3 decoder.

How is MPEG Audio implemented?

The audio portion of MPEG-1 is divided into three layers. Each layer provides successively better quality at the cost of a more complex implementation. Layer 1 is the simplest and best suited when data can be transferred quickly. Layer 3 offers better compression but requires a lot computational power to compress and decompress. MPEG-2 is an extension of MPEG-1 and supports a wider variety of applications. MPEG-2 audio supports up to five-channel audio producing a high quality surround sound, whereas MPEG-1 audio supports only two-channels(stereo). An MPEG audio bitstream specifies the frequency content of a sound and how the content varies over time.

The bitstream consists of frames aligned to byte boundaries of compressed data. Each frame contains a frame header that defines the format of the data. Layer 3 compression allows the compressed data to slop over, that is, a single frame may have data both before and after the frame header. The first 12 bits of the frame header is used for synchronization and the remaining bits indicates type of layer, sampling rate, bit rate, mode, copyright and others. Each frame is measured in slots.

MPEG Layer 1 stores 12 groups of 32 subband samples in each frame. Each sample requires 2 to 15 bits. The allocation and scale factors are important for extracting data from the sample.

MPEG Layer 2 uses fewer subbands at lower bit rates to reduce the amount of allocation information required. A single Layer 2 frame stores 3 groups of 12 samples and allows scale factors to be shared across the 3 groups of samples.

MPEG Layer 3 is more complex than Layer 2 as it allows the frame data to vary in size. This allows the Layer 3 compressor to vary the frame data size depending on the data to be compressed. The scale factor selection bits can apply to groups of scale factors and have different lengths for different subbands. Huffman encoding is used to store samples.

To decode Layers 1 and 2, you need to reconstruct PCM (Pulse Code Modulation) audio after decoding the bits. The initial decoding gives groups of 32 subband samples. Each sample is the amplitude of a particular frequency subband. The 32 subband samples have t be converted to 64 PCM samples by summing a collection of cosine waves. After that, the successive sets of PCM samples are blended to obtain 32 output samples.

Introduction to WAVE

WAVE is the native sound format used by Microsoft Windows. The overall structure is based on the Interchange File Format (IFF). Microsoft defined a general file format called the Resource Interchange File Format (RIFF). RIFF files are organized as a collection of nested chunks. Tags within the RIFF file identify the contents. Two common variations are WAVE file that hold audio and AVI files that hold video.

How is WAVE implemented?

A WAVE file begins with the characters RIFF, followed by 4-byte length and type code. Most WAVE files contain both fmt chunk and a data chunk. fmt chunk contains information about the format of a sound. Usually, a WAVE file format has a header containing 4 bytes of chunk type(RIFF), 4 bytes of total file size minus 8, 4 bytes of RIFF container type(WAVE), 4 bytes of chunk type(fmt ), 4 bytes of format chunk type data length (16), 16 bytes of format chunk data, 4 bytes of chunk type (data), 4 bytes of length of sound data and the actual sound samples. There are nearly 100 compression codes registered with Microsoft for use in WAVE files. Some of them are PCM, Microsoft ADPCM, MPEG and others.

Reference:Programmer's Guide to Sound by Tim Kientzle published by Addison-Wesley.