Someone asks, "Can you hear the difference?" and the room instantly divides. That's because this single question is actually doing the work of three.
Is there a change in the signal that measurements indicate at the load? (If physics doesn't allow it, you can't hear it.)
Can you detect that change in short, level-matched, controlled trials such as ABX?
Does that change matter over time — the kind of change you'll notice after an hour, a week, or a month, when you stop hunting for it and start listening?
This framework gives those three questions names: Threshold 1 (measurable), Threshold 2 (audible), and Threshold 3 (livable). These thresholds stack, but they do not always align, which is why one number, one sweep, or one test can end up being mistaken for the whole story.
Why These Arguments Go Straight to Nowhere
At some point, when you scan through yet another forum thread: "Is it Better or Placebo!", you realize you'd rather discuss politics at the Thanksgiving dinner table than take the bait and hit the 'Reply' button. It's usually the same loop, same cast, and the same predictable moves: someone posts a measurement, someone else counters with a listening experience, someone demands an ABX test, and then a goofy meme makes its appearance. These threads eventually devolve into a leg-lifting contest with everyone staking out their territory and daring the others to cross it. Wash, rinse, repeat.
But the interesting part is: nobody is actually wrong...
The measurement advocates are right that physics matters. The ABX camp is right that controlled testing can help isolate bias. And the listening crowd is right that some gear just feels more engaging than others.
So why does every conversation about subjectivity vs objectivity feel like we’re talking past one another?
The Three Thresholds
I've been chewing on these questions for decades, hunting for the 'why.' My own experience — as well as direct feedback from thousands of fellow audiophiles — kept highlighting a gap that measurements and quick tests didn’t appear to account for.
For years, I couldn't pin it down, so I kept gathering information on psychoacoustics, audio-adjacent sensory fields (especially vision), and cognitive science. The more I learned, the more it made sense to map what I found into three mental ‘buckets’.
It starts when someone asks "Is there a difference?"
They could mean: “Does it show up on an analyzer?” or “Can you pass a blind test?” or “Do you like it enough to want to live with it?”
Here is how I define the Three Thresholds:
Threshold 1: The Measurable (Domain of the Analyzer)
Can instrumentation see the change? Distortion products, noise spectra, impedance behavior, time-domain ringing, load interactions — things that exist whether anyone is listening or not. If physics doesn't allow it, you can't hear it. If a Threshold 1 claim is made, support it with the measurement(s).
Threshold 2: The Audible (Domain of Vigilance)
Can listeners reliably tell A from B under controlled conditions? Level-matching, rapid switching, ABX — short trials built to answer a specific question. It's powerful, but it also requires vigilance: you're on the hunt for a 'tell'. If a Threshold 2 claim is made, support it with the controlled detection method.
Threshold 3: The Livable (Domain of the Chair)
Can you live with it longer-term? This is where fatigue, ease, attention drift, and engagement show up. You notice it in how long you stay in the listening chair, whether it becomes work, or if it is truly 'go time'. If we’re sharing a Threshold 3 experience — how a component settles into our system (and your life) after weeks in your own chair — support it with time.
These thresholds build on each other, but they don't always agree. A change can be measurable but it might not be heard. Heard but not enjoyable. Hard to nail down in a quick test, but obvious over the longer term. Maybe you can’t confidently pick right after a swap, but one component keeps drawing you back into your listening chair. The goal here isn't to 'dunk' on measurements or blind tests — it's to stop forcing one tool to answer all three questions.
Not hearing a difference in a careful short test is useful information, but it’s not the same as answering what happens over time in your own chair. You don't have to turn in your 'Scientist' card to also trust your ears. There is common ground if we have a framework that includes both.
Once we name which question is on the table, we can focus on what’s truly ‘signal’ vs static. It helps us sort what’s being tested, so we can be on the same page.
"The specs are identical!" — Threshold 1 claim (Measurable)
"I couldn't hear it in a blind test." — Threshold 2 claim (Audible)
"I eventually got bored of it." — Threshold 3 claim (Livable)
These examples sound nice and tidy, but it’s not always so clear-cut. In practice, there are two ‘gotchas’ that might be lurking…
The Listener’s Trap
It was the late 80’s, and I’d been a Zildjian cymbal fanboy my whole life. Then I heard some sweet cymbal sounds in a recording and learned that the drummer used Paiste’s RUDE series cymbals. I was totally smitten with the sound. That same summer, while browsing Daryl Stuermer’s (Genesis) music store in West Allis WI, I spotted a set of used RUDEs: hi-hats, ride cymbal and two crashes. They just needed a home, and I’m charitable that way… so I snagged them.
Took them home, set them up, and… ugh. Dark. One-dimensional. Nothing like what I'd heard on that album. I stuck with them for a few days, figuring maybe I’d warm up to them — "new cymbals, different feel, just give it time..." I was just about ready to give up, cut my losses and trade them in, until an odd thing happened: I had an impromptu recording session in our attic rehearsal space when a guitar player buddy came over, and I flipped on the Tascam multitrack and Beyerdynamic mics just for kicks.
Days later, when I hit playback... I was listening to a completely different set of cymbals. Everything I'd wanted was right there on the tape. Rich, complex, alive, and that ride cymbal cut right through beautifully. The crashes sounded full - not the dry ‘trash can lids’ I heard behind the kit.
I eventually sold those cymbals with that drumset, and I still regret it. I think every drummer (and guitar player) has a 'one that got away' story. Those RUDEs were mine.
My best friend Brian Kelly — a guitar player who was doing session work for big names in 1960’s Cincinnati at the tender age of sixteen — told me a similar story. While chatting about ‘tone monster’ electric guitars, and how they often sound great unplugged, he flagged one notable exception: the Gibson L-5 archtop can sound awful unplugged, but plug it in… and it's a different beast.
Same instruments, different verdicts when the perspective changes. This recurring pattern shows up elsewhere too…
Highway Gothic: When Better Is Worse
Take a look at the green signs on the interstate tonight. For decades, the US standard was a font called Highway Gothic. It's a typographic detail monster: hard edges, high contrast, everything shouting LOOK AT ME. In quick identification tests, it wins.
But put that same font on a freeway at night, hit it with headlights, and the trap snaps shut. That 'maximum data' blooms into a glowing white blob. The light bleeds, closing up the holes in an 'e' or an 'a.' Typographers call it halation. In audio terms, it's glare — the difference between 'faux transparency' and "I can't listen more than an hour or two."
This is where a font called Clearview came into the picture. It had softer edges, and slightly lower contrast. In a side-by-side ‘showroom’ test, it looked worse... less punchy, less immediately sharp. On the highway at night, legibility improved — especially by addressing halation/overglow — which is exactly the kind of 'looks softer, works better' tradeoff the showroom test can miss.
Audio has the same trap – some call it ‘Showroom Treble’. The demo that grabs you — the system with the sharpest attack, the most forward mids, the hyped presence — may win the quick test. You're not wrong for hearing it. Your brain is wired to notice transients, edges, contrast. But that same wiring doesn't predict what happens three hours later, when the sharpness becomes fatigue, and you find yourself losing interest and shutting your rig down for the evening.
The opposite trap exists too — the warm, syrupy presentation that flatters a vocal in a quick demo, but turns dense mixes into mush. Different bait, same hook.
The showroom score and the living-room score are not the same test.
The Clip Is the Sip
Audio demonstrations often use brief clips to showcase differences. The clip and the sip are structurally identical problems.
A thirty-second demo of a component swap is the 'sip test' of audio. It captures the moment of maximum attention, the first impression, the edge-detection response. What it can't capture is what happens when you live with that component for two months. Does the system invite long listening sessions? Do you lose track of time? Or do you find yourself turning it off after an hour, slightly fatigued or bored?
The clip optimizes for Threshold 2. The living room optimizes for Threshold 3. They aren't the same test.
When Play Becomes Work
A component or system tweak that imparts subtle edge or glare to the high frequencies might be undetectable in a five-second ABX switch. Your attention is consumed by the comparison task itself. But over a two-hour listening session, this edge accumulates.
Your auditory system spends extra effort compensating for the subtle wrongness. The brain is a prediction engine, working to anticipate what comes next. When artifacts don't match natural sound, those predictions fail. Each failure has a price, and this extra burden is like running too many computer programs at once on an old machine. One program is easy, but too many running in the background bogs it down. Each cost accumulates... and this burden takes its toll.
The bench is great at steady-state; but your grey matter lives in dynamics, pattern-matching, and recognition. When its predictions fail repeatedly, the brain flags it as ‘off’ – similar to the odd feeling you get when you see a wax figure that looks remarkably lifelike, but is obviously fake. That wax museum effect shows that technical accuracy can also have an experiential hole that raises alarm bells.
It’s like a chair — you can sit in a showroom office chair for a minute or two and declare it comfortable. Get it home, and after a couple of days, it's become an instrument of torture. The short test doesn't reveal the long cost.
‘Hunting’ identifies differences. ‘Soaking’ reveals whether you can live with them.
The Groove Test
You can put a metronome on a drum track and prove the timing is perfect, but if it doesn't make your head nod or toes tap... something's missing. I'll take the groove laid down in the pocket of a drummer like Bernard Purdie over a metronome every time. Audio works the same way. Perfection on the bench isn't always the groove in the room.
If you sat down for a 'quick' listening session after dinner... then realized it was 3am... or if you 'rediscovered' why you love music in the first place... you've hit paydirt.
You can count a lot in audio — and you should — but the thing you're actually buying is the part that makes you forget you're evaluating.
Threshold 1 is what you can measure. Threshold 3 is what draws you back for more music tomorrow. Threshold 2 sits between them — and it's where the tool gets confused for the destination. That's the Measurement Gap: the space where the instrument or process is bound to themselves, and the way you actually listen to music is bound to you.
The Preference Trap
Ask any guitar player about their favorite neck profile and opinions will be all over the board. A vintage Telecaster U-shaped ‘baseball’ bat, modern thin, or something in between like a 60’s C. Tall jumbo frets or thin vintage style? Everyone has their favorite, and everyone is convinced they're right. But the 'right' neck depends on how you’re built, what feels comfortable, and your type of playing style. A jazz player's dream neck might cramp a shredder's hand by the end of the first set. If you think the objective vs subjective debates are wild in our hobby, walk into any guitar builder forum and start a topic about tonewoods used on electric guitars. You’ll see what 'audiophile-level' disagreement looks like — sprinkled with sawdust.
These same dynamics play out in audio, and they run deeper than "I like it" vs. "I don't."
Different Menus, Different Ears
I like to think of each audio manufacturer as operating similarly to a restaurateur. They build products that reflect what they think sounds good — their 'recipes.' Some listeners will love that product. Others will think it's just okay. Still others will have a visceral "this isn't for me" reaction. None of them are wrong.
To be clear: Broken or inferior gear is still broken and inferior. This is where Threshold 1 is a valuable benchmarking and screening tool.
Once you're in the world of competently engineered designs, you're often choosing between different priorities, not 'right vs wrong.' Manufacturers are well aware of this, because sometimes choosing more ‘X’ means less ‘Y’ in a design. These distinctions matter because people hear differently, value different aspects of music, and listen in different environments. It’s as personal as our palates. Our systems end up custom-fit — tuned for how/why we actually listen.
This doesn’t mean “I like this” is the same as “I’ve explained this”. One involves taste, the other is the ‘recipe’. Both matter, but each is an answer to a different question.
Now for a practical problem using my restaurant metaphor: how do you evaluate a restaurant from one bite, and how does a manufacturer communicate what their menu is really about in a 30-second sample or a quick blind test?
A friend of mine used to compete on the barbecue circuit — a competitive pitmaster who ranked in the top 1% nationally by the KCBS. His competition ribs were legendary: mahogany-dark, glazed to perfection, intensely flavored. But if you went to his house for a backyard cookout, he wouldn't dare serve that.
He told me flat out: "This isn't what I'd cook for my family or friends for a meal."
In competition, a judge takes exactly one bite. That single bite has to cut through the palate fatigue of tasting dozens of entries. So the recipe gets tuned for maximum immediate impact: extra salt, extra sugar, a texture dialed for a specific snap. The barbecue world calls it 'one-bite barbecue.' It's engineered for a six-second window.
Put those same ribs on a picnic table and ask someone to eat a whole rack. Different story. What dazzles in bite one... wears out its welcome by the third rib.
Competition food is a performance. Backyard food is a meal. Neither is wrong — they're optimized for different contexts.
The system that leaps out of the speakers in a showroom can win the quick audition. But the system that wins the quick audition isn't necessarily the system that wins an entire evening… or the month.
When the Metric Becomes the Target
Economists have a name for what happens when a single measure becomes the target: Goodhart's Law. "When a measure becomes a target, it ceases to be a good measure."
Barbecue judges look for the smoke ring — that pink band under the bark that reads as 'authentic low-and-slow.' The chemistry is real. But once judges started rewarding it, it became possible to chase the badge instead of the process. The signal of quality became a cosmetic target.
Audio has its own history of 'smoke rings'.
For a long stretch, THD was the number you could print in bold. The lower, the better; the scoreboard was public. So designers naturally chased it — cleaner sine waves at 1 kHz into a friendly load. In many cases, that work produced genuinely better gear.
But the '70s and '80s also taught us something important: you can ace the test and still miss the experience. Otala's work on transient intermodulation distortion was one of the moments that made people realize: you don't know… what you don't know...
Once a single metric becomes the scoreboard, the designs drift towards that number — and the goalpost shifts.
Today the 'smoke ring' shows up in new clothes: SINAD (signal-to-noise-and-distortion) leaderboards and spec-sheet races. Useful? Absolutely. It can keep you from buying junk. But it still doesn't answer the question most of us care about:
"Will I want to fire-up my rig on a random Tuesday night?"
Same Seat, Different View
I've occupied the exact same chair that well-known audio reviewers sat in at shows, listening to the exact same track they'd just auditioned, and came away with an entirely different impression. Same system. Same room. Same recording.
Two listeners in the same seat, hearing the same track, reaching different conclusions. Is this really a surprise?
Our audio systems are akin to prescription glasses — each listener designing for their own listening perception. The goalposts aren’t static for everyone. What counts as the reference: the master tape? Or maybe it's what the engineer heard in that room with their own monitors? Perhaps the musician's perspective as they played the instrument... or an audience member in 7th row or 15th?
Right Tool for the Right Job
Early on in my audiophile journey, I often wondered if I was listening ‘the right way’. "What should I be listening for, specifically?" — "How is the bass on this track?" — "Let me queue up a female vocal and see how that is." — "Wait, maybe I should try another recording that doesn't have that ‘sss’ emphasized so much."
I think most of us have been through this dance.
Are we even capable of hunting for ‘is it bright’ and ‘how’s the bass punch’, while simultaneously analyzing the soundstage? Can we do all of that and also let the music just take us on a ride?
Tasking vs Listening
ABX is a vigilance task. You're not 'just listening' — you're comparing. You're holding two sonic snapshots in your head, flipping back and forth, trying to catch a tell. That's a real skill, and it's exactly what Threshold 2 is built for.
But it's not the same headspace as an enjoyable, long listening session. When a system is really tuned, you stop grading it. The gear disappears. You relax. You're not hunting — you're immersed inside the music.
And here's the part that gets missed: those two modes don't play nicely at the same time. Turn the comparison dial up to ‘11’ and the dial for ‘immersion’ goes the other way.
Are we testing detection, or testing livability? If we run the wrong test, we can get a clean answer to the wrong question. Measurements and quick A/B tests help answer the questions they were built to ask, but there’s more to the story when we look closer and see the gaps. We can find examples of these types of gaps in daily life…
A Mattress of Fact
We’ve all been through the ‘mattress’ decision. We’re in the showroom – this one is ‘too firm’, that one is ‘too soft’. It gives us a relative ‘Goldilocks’ read on what might work for us. You make the selection, get it home and… “Oops – felt right in the store, but my back is sore after a week of sleeping on it.” The better deal you got, the salesman’s pitch, or your spouse’s insistence “this one will be better…” fades behind the reality of actually living with it.
Quick A/B tests are similar. They aren’t flawed – they’re scoped. They answer: "Can I reliably hear a specific difference right now, under these conditions?" They weed out the obviously too firm or too plush, narrowing you down into a zone that might work.
Rapid swaps are biased toward 'switch-winner' traits — contrast, edges, spotlight detail — but what grabs you in the comparison doesn't always predict what you'll want to live with. If you've ever had a component that wowed you in the demo but wore you down over time, you didn't fail to hear it — you entered the wrong contest...
Just like the mattress demos in a showroom, time acts as a truth serum, and you're left with a more honest reality of what’s livable: "Does this change make me want to listen longer... and will I be drawn back for more tomorrow?" That's not a 30-second evaluation, and it doesn't live in the same mental state as 'spot the difference now.' It’s a bit like your spouse measuring the quality of your sleep by waking you up every ten minutes. It’s the same structural failure: the act of checking prevents the thing being checked.
The Tuesday Night Test
When I began my audiophile journey, I’d built a system around components that seemingly checked every box — Class A rated, reviewer-approved, specs that looked great on paper. I was proud of it. And then I noticed I was listening less in my 'big rig.' I'd sit down, play a couple tracks, admire the 'resolution,' and then calmly proceed to do something else... I blamed the recordings. I blamed my mood. I thought maybe I was just over the hobby. But the truth was simpler: I'd optimized for someone else's destination. Those reviewers weren't wrong — they heard what they heard. They were listening in a different context, for a different purpose, with a different question in mind. I imported their answers without asking whether we shared the same question. That wasn't a failure of my ears. It was a failure of my map.
Here's how I think about it now: Measurements are like a topographic map — contour lines, elevation data, drainage patterns. I routinely use topo maps for landscape photography. They're incredibly useful for planning, for eliminating dead ends, for knowing what you're getting into. But the map can't tell you whether the light will be worth the long hike at 4 a.m., or whether the area will blow you away when you are actually there.
A/B tests are like travel photos — more vivid than the map, a little closer to the experience, but still just a representation. We've all seen stunning photos of places that felt ordinary in person. We've also visited places no one bothered to photograph that we still yearn to go back to years later.
“I can measure it,” (T1) and “I can detect it,” (T2) are tools for the journey — useful, often essential, but always in service of moving you forward to the real destination: “I can live with it” (T3) — the random Tuesday night that pulls you into the chair… and keeps you there.
Those are three very different questions — two of them help you navigate. The last one is where you actually want to ‘live’.


