Monday, November 24, 2008

Remaining Class Information

A couple of things to remember:
1. No class for Wednesday's section this week, November 26. No class for the Monday section next week, December 1.
2. Papers and presentations are due in 2 weeks! Be sure that you have handed in a proposal to me and that we've talked about your topic. Check your syllabus or the post below for more information about the papers and presentations.
3. Final exam is in 3 weeks! Make sure you have read chapters 1-8, and start reviewing now...
4. If you want to make up any sound journals or other assignments, be sure to let me know. You should have 6 sound journal entries at this point in time. A sound journal entry is not required for the Thanksgiving holiday weekend, but if you want to use it as a make-up entry, feel free.

As always let me know if you have any questions.

Loudspeakers & Monitoring

The quality of any sound is based on what you hear from the loudspeaker interacting with the room acoustics. But choosing the "best" loudspeaker depends on many factors, both for the loudspeaker and also the location it will be in. Key requirements though are that it must be able to handle a range of frequencies from 30-16kHz and reach 120db-SPL.

A loudspeaker is a transducer that converts electric energy to acoustic energy - essentially the opposite of a microphone. Like a mic, speakers come with different elements. The moving coil is the most common and is used for a wide range of monitors because it is sturdier and handles a high SPL. Ribbon speakers are used for high frequencies because they have smooth high frequency response. Electrostatic (capacitor) is rarely used because they are expensive and have a low dB rating.

There are two ways of powering speakers. Passive power uses external amplifiers, while active power uses a built in amplifier to power the speaker internally. Speakers can't linearly output all frequencies - a large speaker that generates low frequency sound waves can't efficiently create high frequencies and vice versa. So multiple speaker drivers are needed to create the whole frequency range. Using a crossover network to divide the frequency spectrum into high/low sections, a large driver creates the low frequencies (woofer for the bass) and a small driver creates the high frequencies (tweeter for the treble). The point where the divide occurs is called the crossover frequency. One division is a 2-way system loudspeaker, with the crossover at 1500-2000 Hz. Two crossover frequencies is a 3-way system loudspeaker, with divides at 400-500 Hz, and 3500-4000 Hz.

A passive crossover network has an external amplifier before the crossover frequency divide, which may cause distortion. An active crossover network has the crossover divide before the two internal amplifiers power it, also called Biamped. This is less expensive, causes less distortion, requires less power, and gives a better transient response.

A speakers specications re based on tests in an anechoic chamber (room with no reflections), so there will be different performance in every room because of the different acoustics. Some things to look for include the following.
Wide frequency response - at least 40-20kHz needed for professional audio. Studios use 3 sets of speakers: low quality with midrange response only like in car radios, average quality ones with more high/low, and good high powered wide speakers.
Linearity - where the input loudness is reproduced at the same level iwht no more than +/- 3 db variance.
Amp power - sufficient power to create loud sound levels without distortion. 30 Watt tweeters and 100 Watt woofers.
Dynamic range - pro audio needs 40dB-SPL for soft sounds and 120dB for loud for an 80 dB range. Most consumer speakers only have 50dB to 105dB capabilities for a 55 dB range.
Sensitivity - tells you the overall efficiency of the speaker. Should be 93dB or higher.
Polar Response - how a speaker focuses sound at the monitoring position. You want to monitor with the fewest number of reflections - no affect of the room. Use bass traps for the low frequencies because the longer wavelengths can't be controlled by room dimensions.
Arrival Times - reproduced sound must reach the listener within 1 ms of one another.
Polarity - if the driver cone motions are opposite (compression versus rarefaction) then they are out of phase and must be rewired.

Distortion is the appearance of a signal in the reproduced sound that was not in the original sound. It can be created anywhere in the sound chain. There are several types of distortion: Intermodulation (IM) - 2 or more frequencies occur at the same time and create combination tones and dissonances unrelated to the orginal sounds. Rating of 0.5% IM or lower is best.
Harmonic - introduced harmonic not in the original signals, occurs when input and output of sound system are nonliner.
Transient - inability of audio component to respond quikly to a rapidly changing signal.
Loudness - overload - signal recorded or played back at level greater than the system can handle.

Monitor placement also affects the sound quality, dispersal and arrival time. Flush mounting the speakers keeps the walls and protusions from affecting the sound. When monitoring stereo, there are 2 dimensions, so 2 symmetric speakers placed in an equilateral triangle with the listener is best for monitoring. Far field monitors are large high quality speakers flush with the wall, while near field are on or near the console closer to the listener for just direct sound. Surround sound expands the depth by placing the listener in the middle of the aural image instead of in front. 5.1 surround has 6 discrete channels - 5 full range and one low frequency subwoofer.

To calibrate the speaker system you need an obective measure of the correlation between the monitor sound and the room sound. A spectrum analyzer will display the frequency response over time or in real time. An SPL meter will give approximate levels in real time (RTA). Fix the problems at the source by modifying the monitor position, amplifier, acoustics instead of attempting to use signal processing/EQ to fix it.

Reference levels for film are 85dB, TV is 79dB, and music is 79-82dB.

Headphones give a wide, flat uncolored frequency response, that is consistant across different studios and allow you to hear sutle changes. However you don't get the same sound quality as from monitors, and it is an unaturally wide aural image.

The 60%/60 min rule states that you should not listen to your headphones for more than 1 hour per day at 60% of the maximum volume.

Make your own loudspeakers

Sunday, November 23, 2008

Signal Processors

Signal processors are devices or software used to alter a characteristic of sound. A plug-in is an add-on software tool that provides a DAW with more signal processing alternatives than what is built in.

Spectrum processors affect the frequency aspect of the signal.
Equalizers (or EQ) alters the frequency response by increasing or decreasing the level of a signal at a specific portion of the spectrum. The increase or decrease is down around a center frequency, which is the most affected value while those around it have gradually lesser changes. The range of frequencies affected is called the bandwidth. Shelving affects all frequencies above or below the selected frequency equally. There are fixed frequency EQs with a certain number of fixed center frequency options, generally knobs for high, upper middle, lower middle and low frequencies. Graphic EQ uses sliders instead of knobs. Parametric EQ allows for continuously variable frequencies and bandwidth size, which gives more flexibility and precision.

Filters attenuate bands of frequencies, usually at a preset and with a steep drop. High pass (low cut) cuts frequencies below the preset point. Low pass (high cut) cuts frequencies above a preset point. Band pass filters have both a high and low cutoff point and allows the frequencies between them through. A Notch filter cuts out an extremely narrow band, such as the hum at 60 Hz.

Psychoacoustic processors dd clarity and definition with EQ and harmonics.

Time processors affect the time relationships in a signal.
Reveration is created by random multiple blended repetitions of a sound. Dry sound is without reverb, we sound is with added reverb.
Digital reverb is the most commonly used, where the original signal is delayed and attenuated multiple times and then added to the original signal. Predelay is the time between the direct sound and the early reflections, and tells you the room size.
Convolution reverb is a sample based process that multiplies the spectrums of 2 audio files. The first is the acoustic signature of a space, called the Impulse Response. It is acquired by recording the room response after a quick impulse is played in the room. The second file is the source signal.When the two files are multiplied, it applies the room characteristics to the original signal, overlaying any room or location.
Plate reverb is a from a mechanical-electronic device with a thin steel plate that vibrates and is miked and sent through a console.
Acoustic chambers are dedicated rooms that create realistic reverb, but are very expensive to build.

Choose a reverb that sounds natural, with bright highs and clear lows. Listen to vocals to check for clearless, and sharp transients like a drumbeat for density.

Delay is the time interval between a sound and its repetition. Digital delay routes audio through an electronic buffer and holds it for a specific amount of time. The delay time is how long the sound is held. Feedback is how much of the delayed signal is returned. Higher feedback increases the number of repetitions and longer decay. No feedback means just one repetition.
Uses of delay include doubling, chorus, slapback echo, and prereverb delay.
Flanging is the original signal combined with a 0-20ms time delayed replica. Phase cancellations occur and create comb-filter efects with peaks and dips in the frequency response that lead to a holly, swishy sound. In phase (positive flanging) accents the even harmonics for a metallic sound, while out of phase (negative flanging) accents the odd harmonics for a warm sound.
Phasing uses a phase shift instead of a time shift, giving more irregular peaks and dips for a wavering vibrato and less effect on the pitch.
Morphing is a continuous seamless transition from one signal to another. It is not crossfading - the signal takes on actual characteristics of the other sound. Examples are available here.

Amplitude processors affect the dynamic range of the signal.
Compressors output level increases at a slower rate than the input level increases, and therefore restricts the dynamic range for peak signal limitations. The compression ratio establishes the proportion of change between the input and output levels. This can range from 1.1:1 to 20:1, which means that every change in 20db for the input only gives 1db in output change. The compression threshold is the level where the ratio takes effect. When it is reached, compression begins, which reduces the gain according to the amount the signal exceeds the threshold level and the ratio set. Knee is the moment the compressor starts gain reduction; hard knee is abrupt while soft knee is smoother. Broadband compressors act on the dynamic range of the input signal across the entire frequency spectrum, while split band compressors affect the input signal independently by splitting the audio into multiple bands.

A Limiter is a compressor where the output level stays the same at a preset point, no matter what the input level is. Basically gives a ceiling on the loudness of a sound, and reduces the high frequency response.

De-essers are a fast acting compressor that attenuate high frequencies to remove hissy consonants in s, z, ch, and sh vocals.

Expanders increase the dynamic range, essentially the opposite of compressors. They are triggered when the signal falls below a set threshold at a set ratio. A noise gate is used to reduce or eliminate low-level noise from amplifiers, ambience, rumble, noisy tracks, etc.

Pitch Shifters use time compression and expansion to change a flat or sharp pitch to be in tune. With time compression, the signal runs faster and raises the pitch. Time expansion runs it slower and lowers the pitch.

Noise processors reduce noise using DSP (Digital Signal Processing), removing clicks, cracks, humming, etc. Multieffects processors combine a number of functions into one unit.

Synchronization & Transfers

Synchronization is the ability to lock 2 or more digital signals or devices so that they operate at exactly the same rate. This is used to synch DAWs, consoles, synthesizers, outboard effects, and other digital equipment.

SMPTE is the time code (TC) developed for film based on the frame rate. HR:MIN:SEC:FRAMES is the format used. With 30 frames per second used, every 1/30 second of audio has a unique indentifying number called the TC address. MIDI TC translates SMPTE to MIDI to control MIDI devices like a keyboard.

A word clock is a signal generated in a digital audio system to control the sampling frequency. You must synchronize the word clocks in individual devices so that when tranferring the data signal is not degraded. A master clock sends a synchronization signal to the slave devices to synchronize all of the devices to the same sampling frequency.

Jitter is a variation in time from sample to sample that causes changes in the shape of the audio waveform and is caused by a degradation in word clock signals among digital devices. It results in lower detail and harsher sound. Using a master clock generator with low jitter and balanced, well-sheilded digital cables will minimize the effect.

An audio driver is a low-level program that allows the transfer of audio signals from/to audio interface. DAE (Digital Audio Extraction) is a multichannel driver for Pro Tools, Logic and Digital Performer. Latency is the period of time it takes for data to get through the audio driver and interface to an output. Low latency is desirable.

To synchronize sound and film, a clapslate can be used. The clapstick stops the TC and shows the exact time for the audio.

Transfer, or dubbing, audio from one device to another. Analog to analog results in lower quality and worse SNR (signal to noise ratio). Analog to digital transfers must take into account that there is not headroom in digital like there is in analog. Digital to digital dubbing creates an exact replica.

Monday, November 10, 2008

Project Proposal

As you know, there is a paper and presentation due the second-to-last week of classes. [Monday, December 8 or Wednesday, December 10 depending on which section you are in.] In order to get you started early on this, a proposal of your topic is due next week. Please write a couple paragraphs or an outline of what you would like to research for the paper and presentation. Please turn in your proposal in class, not on your Sound Journal. To remind you, here are the guidelines for the project:

Topic can be any of the following:
- Historical uses or developments of audio
- An event, device, or person involved in the history of audio
- An alternative use of sound
- An area of audio that you are interested in pursuing

The final paper will be 2 pages in length, double-spaced with standard font and font sizes.
Presentations must be 5-10 minutes long and discuss your topic and findings.

Please let me know if you have any questions!

Helpful links:
AIA Library - search the online databases and the library resources
Google Scholar - has more validated sources than standard Google

Thursday, November 6, 2008

Digital Recording

In analog recording, the waveform of a signal being processed resembles the waveform of the actual sound itself. The analog signal is a continuous waveform, made up of an infinite number of points. In order to translate the analog signal into discrete digital levels, there are a 2 main parameters to determine- the sampling rate and the bit rate.

Sampling rate is the frequency that a digital sample is taken of the analog waveform. This is measured in Hertz, and is defined as the number of samples per second. An example of a common sampling rate is 44.1 KHz, which means that 44,100 samples are taken per second.

The Nyquist Frequency is defined as the minimum sampling rate that can be used without losing data about the signal. It is 2 times the highest frequency. For example if the highest frequency in a song is 20kHz, you must double this to get the minimum sampling rate of 40kHz. Higher sampling rates (or "oversampling") have a wider/flatter frequency response, leading to more clarity and detail in the sound, with low noise and low distortion.

Sampling is the time component, while quantization is the level component (like amplitude for the amplitude signal). Quantization is the process that converts voltages of each sample into a discrete quantity and assigns bit values.

Bits are binary digits, using a base 2 system with either 1 or a 0 value. Decimal is a base 10 system that you are used to - using ten values, 0 -9. In binary, a voltage on represents a 1 while voltage off represents a 0. A digital word is a combination of 1s and 0s that create a distinct value. 10 is a 2 bit word - each n-bit word makes 2^n discrete levels.
1 bit = 2 levels = 1, 0
2 bit = 4 levels = 00, 11, 10, 01
3 bit = 8 levels = 000, 001, 011, 111, 100, 101, 010, 110

The longer the word length the more quantizing levels that are available and the more dynamic range allowed, or resolution. The higher resolution allows for a more accurate representation of an anolog signal.

Some additional information on binary is available here.

Digital audio is expressed with both items: 16 bit resolution and 44.1 kHz sampling rate.

An example of an early digital system is the DAT recorder, or Digital Audiotape. However, the audio industry is generally going more towards a tapeless system, using flash/hard disk recorders instead.

Compression is a method to reduce the size of an audio file, by representing information with the fewest number of bits possible and removing redundancy. MP3s are an example of a codec algorithm that reduces the size from a WAV or AIFF audio file.

Interfacing with digital systems uses different protocols. The most common is the AES/EBU which is a pro audio connection interface standard that is used in XLR cables. Firewire and USB are low cost, flexible, compatible standards used as well.

ISDN (integrated services digital network) is bases on a public telephone all-digital network so that audio professionals can record from across the country without audio quality losses.

Mid-Term Exam

The mid-term exam will be next week (November 10 @ 6pm or November 12 @ 8am). Please study Chapters 1, 2, 4, and 5 from the required textbook. You are still responsible for a Sound Journal entry by Sunday at midnight this (and every) week! Start thinking about what you might want your final paper and presentation to be on, as proposals will be due in 2 weeks. Let me know if you have any questions, as always.