INTRO (revised 4/15/11): Being an engineer by education and profession I might take testing methodology more serious than most. I think it’s important to understand how to do it right. Even websites that have a relatively large audience (and budget) are publishing invalid, wrong, biased, or meaningless test data. They apparently either don't know any better, don't care to do it right, or they're trying to mislead people on purpose.
RMAA IS NOT AS GREAT AS PEOPLE THINK: One of the biggest "tools" you see results from is RMAA. That's likely because it's free and works with standard PC sound interfaces. But, as I document in my RMAA Article, it has lots of problems and limitations. It’s often impossible to make a valid comparison between RMAA results made by one person to those made by another. And even RMAA results made by the same person may not be trustworthy. One common reason for this is RMAA itself has no concept of absolute levels. So test A might be done at a higher volume setting stressing the device, while test B was done at a much more favorable level. There are just too many uncontrolled variables that can make a bigger difference in the RMAA numbers than the device being tested.
14+ THINGS RMAA DOESN’T TEST: There are many important, and common, audio measurements RMAA does not test for. These include maximum output, output impedance, output power vs THD, square wave tests, slew rate, CCIF IMD, and many more. See the RMAA Article for more.
A REAL WORLD EXAMPLE: A friend built an amp and was raving how it was so detailed he could finally hear differences in RCA interconnect cables. I was skeptical so we measured it and found out it was unstable and ringing (partly oscillating) to varying degrees depending on the input cable capacitance. So his cables did indeed sound different because they caused the amp to produce different amounts of ultrasonic garbage! We traced the problem to a poor ground scheme for the input stage and jacks.
The funny (sad?) part of the story is he seemed disappointed once the grounding was fixed and the amp stopped oscillating. He thought he was onto something great. He heard the severe instability as being "different" and hence "better". It's interesting (sad?) how misleading "designing by ear" can be with all the psychological biases that are typically present.
It's also worth noting he tested the above amp, in its original highly unstable form, with RMAA and it passed with flying colors. The ultrasonic oscillations were well above the cutoff of any soundcard's anti-aliasing filter and hence were invisible to RMAA. And apparently the audible side effects were also not picked up by RMAA. That's just one of many real-world examples why I take RMAA measurements with several big grains of salt.
BAD TESTING IS WORSE THAN NO TESTING: The great thing about the web is you can post something and share it with a very large number of people. If it's something that's obviously subjective such as a favorite restaurant, those reading your review know their tastes might be different that yours. But if something appears factual, and the author seems to know what they're talking about, that's very different. But what if it's really wrong or misleading? Because so few people are in a position to verify the results, and the few who can likely won't take the time, the bad data is very likely to go unchallenged. So it just hangs out there on the web to potentially mislead anyone who happens to find it. This happens more than you might think!
VIRAL TEST RESULTS: There are many examples where someone posts their test results for some piece of new gear and the numbers look great, they rave about it, and soon others run out and buy the same gear and post their own glowing subjective reviews, and before you know it the product has an almost cult-like following of fans on the web. But if you trace some of these back to their source, there's sometimes only a single set of sketchy results that helped start it all. And it can be a long time before anyone tries to verify the results—if ever. All the rest of the glowing reviews are often purely subjective--i.e. people's personal opinions. And those are biased from all the other positive comments and other influences.
ANOTHER REAL EXAMPLE (added 5/14): I tested the AMB Mini3 headphone amp partly because I was impressed by the fairly complete set of impressive measurements listed on the AMB website—many from RMAA. It’s safe to say many others were impressed as well--over the last 3 or 4 years many have decided to build (or buy) a Mini3. But, it turns out, in proper testing the Mini3 didn’t come close to meeting many of it’s measurements. The actual performance was much less impressive and even problematic in some areas.
PROPER TESTING ISN'T SIMPLE: Many think you can just hook up a few cables, run the RMAA calibration routine, click the "Go" button, and get good results. But it's far from that easy. And, ultimately, even used in the best way possible, RMAA still has some serious limitations and doesn't test some important things. So, at best, RMAA is only a partial window into the performance of a device. And making more accurate measurements, and all the ones RMAA can’t make, requires expensive instrumentation and lots of knowledge. Entire books have been written on the subject.
TEST CONDITIONS MAKE A HUGE DIFFERENCE: If I publish a Ford Mustang can go from 0-60 MPH in 5.5 seconds that sounds fairly impressive. But is that on perfectly flat ground or was it down hill? Was there a head or tail wind? How much extra weight was in the car? How accurately was it timed? Was it some guy in the passenger seat with a wristwatch or with professional timing equipment? Was the road wet with lots of wheelspin or dry with good traction? All of these things will significantly change a car's 0-60 time--sometimes dramatically.
MISLEADING INFORMATION: If I run a 0-60 test of a Mustang on dry pavement, and someone else does a 0-60 test on a fairly similar Chevy Camaro but does it in the rain, which one do you think will have the better result? Obviously the Mustang will get a lot more traction at the start and be the clear winner. But if you just saw the two numbers published on the web, with little explanation of how they were obtained, you might easily think the Mustang is a much faster car than the Camaro. This is exactly analogous to what happens when people test audio gear and post their results. While testing a car's acceleration in the rain is hopefully a bad idea to most people, many are not even aware they’re making similar mistakes during audio testing.
THE BEST TEST RESULTS CAN BE READILY COMPARED: What good are test results if you can't make valid comparisons to other results? That's why car magazines try to test cars under as controlled of conditions as possible. They correct for weight, wind, and even things like temperature which affects an engine's horsepower. So when they test a Mustang in January in Detroit and a Camaro in August in California, you can safely compare the results to each other. But most of the audio results being published on the web cannot be compared in similar ways because they're often measured under different or unknown conditions.
THE BEST TEST RESULTS ARE VERIFIED BY OTHERS: In the scientific and medical communities results that can't be verified are completely dismissed as invalid. But in audio many tend to take them as fact. When you conduct testing in a controlled way, it's much easier for others to verify your results. But if they don't know under what conditions the tests were made--what signal levels, loads, settings, with what equipment, etc.--they're almost impossible to verify. And without being able to verify the results, there's no way to know if they're reasonably accurate.
THE FINEST TESTING ADVERTISING WON'T BUY: There are certainly organizations out there with the equipment and knowledge to run proper tests. But, sadly, most of the consumer oriented magazines and sites tend to leave out, or gloss over, anything very negative for fear of losing their advertisers (which very often make the very gear they're testing). A classic example is even some relatively expensive (i.e. $1000) A/V receivers come nowhere close to their advertised power ratings under real world conditions. But tests on these receivers are often done in such a way to avoid revealing just how bad they really are). Why? Most likely because those same manufactures advertise with the same organization publishing the review.
TECH SECTION (for test geeks revised 4/15):
AUDIO ANALYZERS: The best solution is a dedicated instrument designed for audio testing. The two companies with the most market share are Audio Precision and Prism Sound. Audio Precision ( ap.com ) is widely considered the reference standard. Their entry-level product, the APx525, starts around $6,000 for the most basic analog-only 2 channel model with reduced specifications. And the pricing approaches $20,000 with the more popular options for analog and digital measurement. Their better models go up in price from there. Prism Sound ( prismsound.com ) offers the dScope Series III as competition to Audio Precision's analyzers and they're more reasonably priced with similar specifications. When fully configured for analog and digital measurement, the dScope is around $10,000. There are 2 analog-only versions at lower prices similar to the entry-level APx525.
OTHER ANALYZERS: There are a few other choices out there but they're typically either even more expensive and/or have relatively limited capabilities. Some examples are the Rhode & Schwarz UPV/UPL and Agilent U8903A. In my experience, these products are rarely used for consumer gear audio measurements. It makes the most sense to fully leverage the power of today's PCs for the heavy lifting (FFT, etc.) like the dScope and APx5xx products do. Stanford Research took an interesting approach with their relatively new SR1 which I’ve played with at trade shows. They essentially built a PC into a very large bench instrument along with an enhanced “sound card” and made a self contained analyzer for under $9,000. But, to me, the dScope and APx5xx approaches make more sense. I don’t know of anyone using an SR1 and I’m not sure where it offers an advantage unless perhaps you don’t have a PC.
DISCONTINUED ANALYZERS (added 4/15): There are older discontinued products on the used market but they’re mostly relatively limited in what they can measure and/or have other serious issues. Beware of the older Audio Precision products (i.e. System One, System Two, etc.) as some require a proprietary PC card ISA (think IBM XT circa 1984) interface and are difficult, expensive, or impossible to use with modern PC’s. And you can also find things like the HP 8903A and 8903B distortion analyzers, but like the early AP instruments, they’re big, heavy, clunky and make better boat anchors than audio analyzers. The AP ATS-2 was very limited in performance. And older self contained devices like the Audio Precision ATS-1 usually can’t talk to a PC which means you can’t even do a screen capture of a result.
PRISM dSCOPE: I use the Prism Sound dScope Series III for most of my measurements. It can run pre-defined or user-defined set ups and scripts in a single mouse click. This makes running identical tests on different gear and a whole series of tests much easier and more consistent. It eliminates most sources of human error. The dScope overcomes nearly all the limitations of RMAA as well as providing much higher accuracy, absolute level measurement and much more. It has isolated balanced inputs and outputs and doesn’t suffer from ground loops and problems. You can set the signal generator outputs (analog or digital) to 0.06 dB accuracy and measure/analyze input signals to 0.06 dB accuracy from literally less than 0.000001 volts up to 159 volts RMS with no external dividers or hardware required. The time base is accurate to a few parts per million and can be used for measuring the quality of a digital signal in the digital domain (i.e. true actual jitter, deviation, eye patterns, etc.). Here are the full specs for anyone interested.
AUDIO PRECISION & dSCOPE COMPARISON (added 4/15): Most of the difference between these competing products is in the analog performance and architecture. The best-in-the-world Audio Precision SYS-2722 manages a few dB less residual noise and distortion than the dScope but at a price that will buy you a brand new German luxury car. The dScope essentially matches or exceeds the performance of the less expensive APx5xx line and their older System One and System Two analyzers. Here are some of the more notable specifics:
- Analog Noise Floor – The dScope is rated at –115 dBu worst case residual noise and typically measures closer to –116 dBu (about 1.2 microvolts). This is within 1 dB of the newest Audio Precision APx5xx series analyzers, or the older AP analyzers like the System One and System Two units. To put this level of noise in perspective, the self generated Johnson Noise of a single 4.7K resistor is approximately –115 dBu. Put another way, a single 10K resistor can produce more noise than the entire analyzer section of the dScope! The current Audio Precision flagship 2700 Series has residual noise across the audio band of –117.8 dBu This is about 1.0 uV or the noise you get from a single 2.7K resistor. When you consider both products have balanced inputs (which are inherently noisier than the unbalanced variety), an extreme input range from microvolts to around 200 volts, and analog circuitry in close proximity to lots of noisy digital hardware, this is very impressive performance from both companies.
- Signal Generator Distortion – Over the audio band, the dScope’s analog generator is rated at 0.0007% worst case distortion. The APx5xx is about the same. I have no way to measure just the generator, but even the dScope’s combined THD+N of the generator and analyzer at 1 Khz is typically below 0.0006% at 1 Khz. The costly SYS-27xx is rated at 0.0003% which is significantly better but we’re talking about levels of distortion that are nearly always masked by other factors and any sane person would consider long since inaudible and well past the point of diminishing returns.
- Measurement Bandwidth – The dScope, APx5xx, and older AP analyzers have a maximum sampling rate of 192 Khz limiting the measurement bandwidth to 96 Khz (or > 90 Khz as AP likes to put it). The signal generator of the dScope is limited to 91 Khz. The much more costly AP SYS-2722 generates and measures out to 200 Khz The good news is most anything that needs to be done beyond 90 Khz can be done using other equipment such as my 14 bit 100 Mhz digital scope, 14 bit Tektronix 25 Mhz waveform generator, and Agilent DMM. Put another way, there’s little justification in audio for having –120 dB noise measurements or 0.001% THD measurements past 90 Khz.
- Real Time Analysis – The dScope and flagship AP analyzers have the ability to measure a number of parameters in real time and that’s a big deal for R&D work. Audio Precision’s more reasonably priced products (like the APx515, APx525, APx585, and ATS-2) behave differently—more like fancy sound cards--and they lack this critical capability. A typical example: Most class AB output stages using bipolar transistors have a fairly narrow range of bias current for the lowest distortion. The dScope, using its Continuous Time (CT) detector, can internally measure the THD and display the result in real time. So you simply adjust the bias pot for the lowest distortion. An APx525, by comparison, has to capture an entire sample buffer, send it over the USB link, run a FFT, and finally display the result. So it’s harder to find the “null” point and optimal setting. The dScope can also let you monitor, in real time, residual signals. You can even listen to what the distortion (or signal) sounds like in real time if you want. And the CT detector can be used for noise, IMD, levels, channel balance, crosstalk and more. So let’s say you want to orient the power transformer for the least amount of hum in a product. You can just move it around and watch the 60hz/120hz readings in real time on the dScope. You can also apply a huge variety of filters to the live result. You have to buy the flagship AP series to get this capability but it’s standard in the dScope.
AVERAGING IS A DOUBLE EDGE SWORD (added 4/15): Some companies, especially those selling soundcard based audio software, publish specs that (often in the fine print) include averaging. The residual performance limits of most any analyzer are substantially improved by using averaging as it helps remove inherent noise in the analyzer. For example, the –115 dBu absolute noise floor of the dScope improves to –123 dBu with the right averaging. The noise is uncorrelated (relatively random) between the averaged passes and tends to cancel itself out. But, beware, averaging isn’t always applicable. If you’re trying to measure the random noise floor of the device under test you don’t want to unfairly average its noise out in the process. Averaging, when done wrong, can also effect the accuracy, in a bad way, of various FFT calculated measurements—especially twin-tone IMD measurements. So averaging is often no substitute for having a low noise floor to begin with—something to remember when trying to use say a PC sound card for high-end measurements. Yeah averaging can improve the apparent loopback performance of your sound hardware, but it’s also unfairly improving whatever you’re trying to test.
HIGH-END BENCH DMM (revised 4/15): A surprising number of people are trying to make audio measurements with typical portable DMM’s. And the readings are often grossly wrong without even realizing it. True RMS measurements are not trivial. In effect, the meter has to accurately measure the “area under the curve” and time average it—see True RMS Measurements for more information. This proves to be rather difficult across a wide range of frequencies if you want to maintain reasonable accuracy at high frequencies and not have the reading “hunt” up and down at low frequencies. The fact is, most DMM’s priced under a few hundred dollars that claim “True RMS” are really only accurate around 60hz—i.e. power line frequencies. Some will measure sine waves accurately across the audio band, but many will not even do that. I have a $150 “True RMS” Extech meter--a relatively well regarded brand--that’s off by nearly 6 dB at 20 Khz compared to 60 hz on a sine wave and is a joke above 1 Khz on non-sinusoidal waveforms. And really complex rapidly changing waveforms like white/pink noise or real music drive such meters crazy. To do it right, you need expensive true RMS circuitry and the ability to optimize the sample rate and averaging for the waveform being measured. Good high end bench DMM’s, like the Agilent 344xx series, let you set these parameters. They also read directly in dB. I use a 6 1/2 digit Agilent true RMS bench DMM that's extremely accurate and flat from 10 hz - 100 Khz for exact levels and other measurements. It has resolution down to 0.1 microvolts so it can even be used to measure noise.
THE RIGHT OSCILLOSCOPE (updated 4/15): Most sound cards and digital audio analyzers, best case, only have a usable bandwidth to around 90 Khz. But it’s useful to know what’s going on beyond 100 Khz—like with my friend’s amp mentioned at the start of this article. The right scope can be essential. But most digital scopes use 8 bit D/A converters which are borderline useless for a lot of audio measurements as they typically only have about 45 dB of dynamic range. And many of the cheaper ones, or PC-based USB digital scopes, have miserably slow waveform update rates which is a serious problem for audio use. They really suffer when analyzing non-repetitive waveforms. And most scopes have grounded single-ended inputs--either power mains ground, or USB PC ground. This often creates noisy problematic ground loops or other problems when making audio measurements. Here are a few examples of when typical digital scopes fail:
- Music Clipping Behavior - Lot’s of gear exhibits “ugly” behavior at clipping—but sometimes only into a real reactive load like speakers or headphones. You typically don’t want to drive many amps into clipping with sine waves into real loads as you would fry your tweeters, headphones, etc. Using real music, the slow refresh rate on typical digital scopes makes for on-screen waveforms that are jerky, blurry, and don’t come even close to following the music in real time. It’s more like watching a video with a far too slow frame rate and it’s hard to see what’s really going on. Worse, these scopes have a hard time even detecting brief infrequent clipping because their update rates are so slow (typically < 25 updates/second). Only a fraction of the music waveform is being sampled into the scope’s buffer and analyzed—the rest is missed completely. So you can easily miss random clipping. And even if you get lucky and capture a clipping event when you try to zoom in to look for signs of instability, oscillation, “sticking”, “shoot through”, etc, you might not see much because of the 8 bit dynamic range. If it’s a power amp the scope is probably set for 20 volts/division to handle the 80+ V p-p waveform at clipping. A typical 8 bit scope only gives you about 0.6 volts of resolution per bit at that setting. So any clipping behavior less than a few volts will not be terribly visible as it would only represent a couple of A/D steps in the waveform. Good luck with that.
- FFT Behavior - Typical 8 bit digital scope have a best case 48 dB of dynamic range. That’s further compromised by the resolution of their input gain ranges—i.e. the A/D often isn’t being operated over a full scale range. And, in some cheap scopes, the analog circuitry can further compromise the performance. It costs real money to design and manufacture wide input range, low distortion, flat response, low noise analog front ends with bandwidths out to 60+ Mhz. So cheap scopes usually just aim for “close to 8 bit performance” and call it good enough. If you’re hoping to use one of these scopes for audio FFT work, you’ll find most things of interest completely lost in a very high noise floor.
- Amplifier Destruction – If you’re trying to evaluate the output of say a bridged amplifier, or an active ground design, with most scopes you have a big problem. Just like with a soundcard, or other PC audio interface, the grounded inputs mean connecting the scope will either damage the amplifier, or at best, make it shut down or perform very poorly. So people do crazy things like un-grounding their scope, or using a battery powered laptop, but even under these circumstances, you have to be really careful not to accidentally ground anything, or even electrocute yourself. Plus there’s often a significant level of parasitic (i.e. stray) capacitive grounding still present which creates a common mode high frequency signal and can cause problems in a variety of ways. And even on non-bridged gear, grounding the outputs through a different ground path often creates other problems and sometimes even damaging oscillation. Trying to float the scope or otherwise work around this can still create problems due to parasitic capacitance creating unwanted feedback loops.
I frequently use four different scopes regularly as each offers various advantages. There isn’t one scope that’s ideal for all audio work so I use the following:
- 100 Mhz 14 bit digital scope. This improves the theoretical dynamic range from a typical scope’s 48 dB up to a much more useful 84 dB. It’s ideal for evaluating the spectrum beyond 90 Khz well into the megahertz region with a reasonably low noise floor. The downside is the update rate is relatively slow. This scope can also safely “float” for isolation but the inputs are not isolated from each other. This avoids most (but not all) of the grounded input problems.
- 200 Mhz fast update deep buffer digital scope. This scope updates the entire sample buffer faster than most audio is sampled (i.e. > 44 Khz). This allows it to easily detect infrequent events (like clipping with real music). It’s also excellent for evaluating digital signals like S/PDIF I2S. But it’s only an 8 bit A/D so it has limited dynamic range. The deep buffer allows capturing infrequent problems and zooming in while retaining full resolution.
- 60 Mhz isolated digital scope. This scope has inputs that are fully isolated from each other and ground with low parasitic capacitance. It can handle several hundred volts of common mode signal without a problem. This scope is ideal of making measurements that normally would require expensive differential probes such as the voltage drop across emitter resistors (i.e. AC current waveform) in a power amp to look for things like shoot through. This scope also works where the ground schemes of other scopes create noise, ground loops, or even oscillation. Fully isolated inputs are very rare in scopes but they can be essential for certain measurements.
- 100 Mhz analog scope. I have a nice analog scope with digital readout capability for when you really want to see something in true real-time without the limitations of any digital sampling. My 200 Mhz fast digital scope offers a similar “real-time” visual waveform. But it’s 8 bit resolution limits zooming in on say clipping behavior as outlined in the example above. So there are a few things an analog scope can still do better. But, the downside is not being able to capture and “freeze” non-repetitive waveforms and measurements are less accurate.
SIGNAL GENERATION: Most signal generators have much more distortion than even the $30 Sansa Clip+. A typical $200 bench function generator often has around 0.5% THD+N which isn’t very useful for measuring the THD of most of anything these days. While it’s relatively easy to generate reasonably low distortion sinewaves with a PC soundcard or the dScope, you’re limited to a maximum of about 90 Khz. The PC-based solution also usually creates ground loop/isolation problems. If you need something higher in frequency things get more complicated. I use a 14 bit Tektronix AFG3000 series arbitrary waveform generator (AWG). It can produce sine waves up to 1 Mhz with noise and distortion below –70 dB (0.03% THD+N) and up to 25 Mhz with THD+N < 60 dB. It’s also flat +/- 0.15 dB from 0.001 hz to 5 Mhz and has an 18 nS rise time for impulse, square wave, and slew rate testing. The 14 bit Tek AWG, paired with the 14 bit 100 Mhz digital scope, allows audio testing well beyond the bandwidth restrictions of even the flagship Audio Precision SYS-2722. This is useful for exploring slew rates, open loop characteristics, loop stability, Class-D artifacts, switching power supply artifacts, etc.
RMAA IMPROVED: For RightMark Audio Analyzer testing I use a Benchmark ADC1 for A/D and a Benchmark DAC1 Pre for D/A. Both work with built-in Windows drivers and hence play nice with RMAA. And both have some of the best specifications available at any price from a USB audio interface. The ADC1 features precise gain adjustments and metering, and the DAC1 has analog output level adjustment with the 24 bit D/A always operating at full resolution. The Benchmark “twins” overcome many of RMAA’s limitations. I use the Agilent DMM (or dScope) to set exact and repeatable levels and also use the correct loads. See the RMAA article for more.
PERFORMANCE EXAMPLES (updated 4/15): Here’s the dScope’s averaged absolute noise floor in dBu:
Here’s the dScope’s residual analog performance (loopback) with a “zoomed” vertical axis operating at the 400 mV reference level I use for many tests. The third harmonic is the worst at –110dB below 400 mV:
And, while it’s a bit silly, here’s the dScope loopback performance in the digital domain (click to get rid of the blur):
THE ENTIRE CHAIN'S PERFORMANCE: Here's the full deal from the photo at the start of this article. This is the analog output (signal generator) of the dScope, feeding the Benchmark ADC1 to digitize the signal. The ADC1 feeds the Benchmark DAC1 Pre via S/PDIF at 24/96, and the analog line output of the DAC1 is connected to the analyzer input of the dScope. The number on the left is pure THD (just the harmonics, not the noise floor) and the one on the right is the THD plus the noise in the entire signal chain. Note the bandwidth is out to 96 Khz on the analysis side. This is at 1 Khz and 0 dBFS digital while the analog levels are 2 V RMS (click for the full size image):
BOTTOM LINE: Equipment of this level might be overkill for testing something like a $40 portable player. But it's nearly essential if you're designing or testing high-end audio gear. For example, just a small error in designing the PCB for a DAC or amplifier can seriously degrade the performance due to noise problems, grounding issues, etc. You can't just slap a D/A or audio chip on a PC board and expect to get anywhere near the manufacture's specs without some very careful design work. But the only way to verify you have it close to right, is to have the correct instruments to test with or getting assistance from someone who does. So, for design work, and testing higher-end gear, this level of equipment is invaluable.