Musical Instrument Digital Interface is a simple digital
protocol. By connecting a computer to a keyboard through
a MIDI connection, you get the musical equivalent of a
word processor. You can play the keyboard to enter music
into the computer, edit it on the computer using
softwares like
Cakewalk
and then play it back though the keyboard. If you do not have
a synthesizer, there are softwares available to playback
MIDI using the computer. I recommend
Yamaha's XG50 Soft Synthesizer,
Roland's Virtual Sound Canvas and
Creative's SBLive!
together with soundfonts. MIDI files do not specify
precise instrument sounds, it depends on which
synthesizer you use. General MIDI is standardized by
MIDI Manufacturer's Association.
How is MIDI implemented?
General MIDI specifies 128 standard instruments, as well as other
common capabilities. A standard MIDI file is a sequence
of chunks having the same format as chunks used by WAVE
and AIFF. There are three types of MIDI files. Type zero
files contain one track, type one files contain multiple
tracks and type two files are less common, contain
multiple tracks without assuming any relation among the
tracks.
At the beginning of every MIDI file, there is a MThd chunk containing 2 bytes file type, 2 bytes for number of tracks and 2 bytes for time format. Data length can be variable in size, from 1 byte to 4 bytes. If the most significant bit is set to 1, it represents another byte is used to extend the data.
A track usually contains one instrument played on one channel. Many tracks may share the same channel. The format of a track is a list of MIDI events each preceded by a delta time value. This is the time interval between events measured in ticks. This depends on the time format specified in the header, which can be changed by special events within the file.
A MIDI event is a packet of data that specifies musical actions like key press using what velocity and channel, key release with what velocity and channel, pitch wheel change for which channel and proportion of change, selecting instrument sound for which channel, and changing the pressure (vibrato) of all notes playing on certain channel. Each packet has a header with the most significant bit set to 1 called the status byte. The header indicates the type of event followed by data bytes, which does not have the most significant bit set. MIDI uses a technique called running status. If a data byte is at the place of a status byte, the previous status is reused. A Sysex event contains (timestamp, F0, length, sysex bytes, F0), followed by (timestamp, F7, length, data bytes).
A Channel message is directed to a particular destination and subdivided to voice messages and mode messages. Channel 1 is represented by 0x0, and channel 16 by 0xF.
Here's a list of controllers that can be used in MIDI. Controllers 0-31 specify the Most Significant Byte(MSB) providing 7 bits of resolution, while controller 32-63 gives finer control. Controllers 64-127 acts as switches.
Meta events are used to store information like key signatures and copyright notices. They are time-stamped non-MIDI Events.
Introduction to MP3MP3 is the file extension for MPEG-1 Layer 3 audio files. Moving Pictures Expert Groups (MPEG)developed a standard way to compress video sequences. The MPEG standard is well known for their video compression, but they also support high quality audio compression. MP3 encoders compresses wave files to one tenth the original size using the maximum bit rate and sampling rate. MP3 decoders like Winamp are able to decode and playback MP3 files. If songs were not copyrighted, imagine the ease of obtaining unlimited songs through the Internet without having to buy CDs! All you need is a huge disk space and a MP3 decoder. How is MPEG Audio implemented?The audio portion of MPEG-1 is divided into three layers. Each layer provides successively better quality at the cost of a more complex implementation. Layer 1 is the simplest and best suited when data can be transferred quickly. Layer 3 offers better compression but requires a lot computational power to compress and decompress. MPEG-2 is an extension of MPEG-1 and supports a wider variety of applications. MPEG-2 audio supports up to five-channel audio producing a high quality surround sound, whereas MPEG-1 audio supports only two-channels(stereo). An MPEG audio bitstream specifies the frequency content of a sound and how the content varies over time. The bitstream consists of frames aligned to byte boundaries of compressed data. Each frame contains a frame header that defines the format of the data. Layer 3 compression allows the compressed data to slop over, that is, a single frame may have data both before and after the frame header. The first 12 bits of the frame header is used for synchronization and the remaining bits indicates type of layer, sampling rate, bit rate, mode, copyright and others. Each frame is measured in slots. MPEG Layer 1 stores 12 groups of 32 subband samples in each frame. Each sample requires 2 to 15 bits. The allocation and scale factors are important for extracting data from the sample. MPEG Layer 2 uses fewer subbands at lower bit rates to reduce the amount of allocation information required. A single Layer 2 frame stores 3 groups of 12 samples and allows scale factors to be shared across the 3 groups of samples. MPEG Layer 3 is more complex than Layer 2 as it allows the frame data to vary in size. This allows the Layer 3 compressor to vary the frame data size depending on the data to be compressed. The scale factor selection bits can apply to groups of scale factors and have different lengths for different subbands. Huffman encoding is used to store samples. To decode Layers 1 and 2, you need to reconstruct PCM (Pulse Code Modulation) audio after decoding the bits. The initial decoding gives groups of 32 subband samples. Each sample is the amplitude of a particular frequency subband. The 32 subband samples have t be converted to 64 PCM samples by summing a collection of cosine waves. After that, the successive sets of PCM samples are blended to obtain 32 output samples. Introduction to WAVEWAVE is the native sound format used by Microsoft Windows. The overall structure is based on the Interchange File Format (IFF). Microsoft defined a general file format called the Resource Interchange File Format (RIFF). RIFF files are organized as a collection of nested chunks. Tags within the RIFF file identify the contents. Two common variations are WAVE file that hold audio and AVI files that hold video. How is WAVE implemented?A WAVE file begins with the characters RIFF, followed by 4-byte length and type code. Most WAVE files contain both fmt chunk and a data chunk. fmt chunk contains information about the format of a sound. Usually, a WAVE file format has a header containing 4 bytes of chunk type(RIFF), 4 bytes of total file size minus 8, 4 bytes of RIFF container type(WAVE), 4 bytes of chunk type(fmt ), 4 bytes of format chunk type data length (16), 16 bytes of format chunk data, 4 bytes of chunk type (data), 4 bytes of length of sound data and the actual sound samples. There are nearly 100 compression codes registered with Microsoft for use in WAVE files. Some of them are PCM, Microsoft ADPCM, MPEG and others. Reference:Programmer's Guide to Sound by Tim Kientzle published by Addison-Wesley. |