● Print the Quiet · 6 / 9
One Take, or Twenty
Why phoneme comping breaks something the ear can't name, and how Andy Wallace and Jeff Buckley found the honest middle at Bearsville.
There is a thing a singer does on take seven that they could not have done on take one. They have learned the song's geography. They know which line lifts and which line drops. They have started shaping the second verse as a response to the first verse, the chorus as the answer the verses were asking for. By take eleven they are getting good. By take eighteen they are committing — gambling, leaning into the risk that this take is the one, that they can let go and trust the shape they have built up underneath them.
The take after the take they trust is usually worse. They tighten. They overthink. They start performing the version of the song they think they should sing instead of singing the song. A great engineer hears this coming and stops them. A great engineer also knows that the next take, after the bad one, is often the best — because the singer has scared themselves into letting go.
This is what a take is, and it is why "did you keep all the takes?" is the most important question in any vocal session. The thing being made on tape is not a sequence of independent attempts at the same thing. It is a continuous emotional negotiation in time. Each take is a response to the take that came before it. The singer's body is learning.
This is the sixth piece in the Print the Quiet series. It is about what happens to that negotiation when modern recording lets us comp at the syllable level — and about the honest middle ground that Andy Wallace and Jeff Buckley found at Bearsville thirty years ago.
The Glyn Johns school
Glyn Johns engineered, produced, or mixed records for the Stones, the Who, the Faces, the Eagles, Led Zeppelin (the Led Zeppelin and II sessions), Bob Dylan, Eric Clapton, and Joan Armatrading, among others. He has a working philosophy that has not changed for fifty years: capture the live performance, do not punch in, do not fix later. Either the take is good or it is not. If it is not, you do another take. You do not stitch.
Johns has said, in interviews and in his memoir Sound Man, that he refuses on principle to use the punch-in. He has been mocked for this — it is, technically, antique — and he has been hugely influential because of it. The reason: punch-ins introduce a microsecond discontinuity that the ear may not consciously hear but the body clocks. The phrase that started before the punch and finishes after the punch is no longer one phrase. It is two phrases stitched together. Johns argues that listeners can feel it, and that the loss compounds across a record.
The same logic explains his famously minimal drum miking — three microphones on the Stones' "Honky Tonk Women," four mics on the Who's "Won't Get Fooled Again" — which now bears his name as the Glyn Johns method. He wanted to capture the kit as one event in one room. Adding mics and gates and triggers, in his view, would have replaced the kit with a representation of the kit. The takes had to be the kit, in the room, at the time.
This school has its inheritors. T-Bone Burnett records bands live to tape with vintage gear and lets the takes stand. Daniel Lanois is a comper, but he comps from full takes, not phonemes. Rick Rubin, perhaps the modern Johns, has said in many interviews that the first usable take is usually the best one, and that he stops singers from singing too many times — he wants them to commit before they get bored. Steve Albini, until his death, refused even to call himself a producer because the word implied imposition; his job was to capture what the band did, in the room they did it in.
This is one pole.
The phoneme-comp era
The other pole is what mainstream pop has done since approximately 2002, which is when ProTools 5 made phoneme-level editing trivial and engineers learned to do it.
The workflow is roughly this. The singer records ten to thirty full passes of the song over the course of a day or three. The engineer or vocal producer then opens all of them on adjacent ProTools tracks, lined up to a tempo grid, and assembles the released vocal by choosing the best version of each syllable — sometimes the best version of each consonant. The "ch" in church might come from take seven. The vowel from take twelve. The closing "sh" from take three. Each fragment is then re-tuned with Melodyne for note and timing, breath samples are removed or replaced with cleaner ones from elsewhere in the session, sibilance is de-essed, and the result is a vocal that is — by all measurable standards — perfect.
Listen to almost any major-label pop release from the last fifteen years. The vocal is on the grid. Every consonant is clean. The vibrato is even. The breath sounds are present but they are selected breath sounds, not the breath the singer happened to take after the line. The pitch is correct.
This is not a complaint about effort. Doing this well is a real craft. There are great phoneme-comp engineers, and the results, technically, are stunning.
The complaint is about the causal chain.
The causal chain of phrasing
Here is what we mean.
When a singer sings line four of a verse, the way they sing it is causally linked to how line three just felt. If line three caught them by surprise — too much breath spent, too much emotion in the wrong place, a bend that wobbled — they compensate on line four. They tighten the grip, or they let go further, or they re-attack the next vowel with extra commitment. The body is responding to itself in real time, the way an actor's third sentence in a monologue is informed by their first two.
This is causation. Line four is because of line three.
Now consider phoneme comping. Line three is taken from one performance; line four is taken from another. The line-four singer was not responding to the line-three singer. They were responding to a completely different version of line three, in a different take, where they were in a different emotional state, where the vibrato had landed differently, where the room was, microscopically, a different room.
Listeners cannot articulate the discontinuity. But — and this is the point that matters — listeners' nervous systems are exquisitely tuned to causation in vocal sound, because evolution built them to read other humans' speech for causal coherence. When someone you know is upset, you can hear it across one sentence. The same circuitry is running when you listen to a record.
Phoneme comping breaks the causal chain. Not always audibly, but always structurally. The result is technical perfection that listeners describe, vaguely, as feeling "off" or "cold" or "too much" — without being able to say why. The why is that the causal information has been removed, and their nervous system is reading the absence.
Comping at the full-take level — Andy Wallace's method on Hallelujah, Glyn Johns's method on his rare modern jobs, Daniel Lanois's standard practice — preserves the causal chain inside each take. The comp lives at the seams between full performances, not inside individual phrases. The body that sang line four was responding to the body that sang line three on the same pass. The continuity holds.
This is why a full-take comp can feel like a single performance and a phoneme comp, however perfect, cannot.
Sinatra, briefly
Frank Sinatra is the most quoted single-take legend in popular music. The story is that he walked into Capitol Studios in the 1950s, recorded each song in one or two takes with the band live, and walked out. The story is mostly true. Nelson Riddle, Billy May, Gordon Jenkins, the arrangers — they wrote the charts knowing Sinatra would not be doing them again. The band knew it. Sinatra knew it. There was, in some sessions, a third take held in reserve, but the released master was almost always the first or second pass.
The technical reason this worked: Sinatra had spent twenty years on the road before he started cutting masters at Capitol. He arrived at the session with the song already inside him. He was not learning the geography. He had already learned the geography on three hundred nightclub stages. By the time tape rolled, he was reporting on a relationship he had already established with the lyric.
This is not how modern pop is made. Modern pop is made by writing the song on the day, recording it on the day, and finding out what the song is during the recording. That process generates many takes because the singer is in the act of discovering the song on tape. There is nothing wrong with that workflow, but it is a different workflow, and it requires more comping. The honest version of the argument is not that comping is bad — it is that the modern workflow has necessitated heavy comping by changing the order of operations.
What's been lost in the change is the apprenticeship. Sinatra's first take worked because he had already done two thousand takes, just not on tape. The modern singer's twentieth take is doing the apprenticeship and the master at the same time.
The honest middle: Buckley at Bearsville
Wallace cut about twenty full takes of Hallelujah over the Grace sessions. The released master is a comp of somewhere between three and five of those takes. Wallace and Steve Berkowitz disagree, in print, about how much stitching there was — Wallace remembers a light comp, Berkowitz remembers a more substantial one. Alan Light's book The Holy or the Broken captures the disagreement and does not resolve it.
What is resolved is the structural fact: the comp was at the full-take level, not the syllable level. Wallace was choosing between complete emotional arcs. The seams between takes are at line boundaries or verse boundaries, not inside words. The causal chain inside each segment is intact. When you hear Buckley sing "the holy or the broken Hallelujah," the whole phrase is one performance — Buckley's body responding to itself across the breath, the bend, the resolution.
That is the honest middle ground. Not romantically one take. Not the modern phoneme comp. Twenty performances captured, three-to-five chosen, glued at structural boundaries, no pitch correction (Auto-Tune didn't yet exist), no time correction inside the takes. The result is a recording that listeners cannot tell from one performance because, in the way the body reads it, it is one performance.
What this means for the working musician
A practical coda.
If you are a singer or an instrumentalist working in a modern session, three things follow.
One: track in full passes, not punch-ins, whenever you possibly can. It will cost you takes. It will also preserve the causal chain. Tell your engineer you would rather do twelve full passes than punch four bars at a time. The full passes will compound — by pass nine you will have learned the song in a way punching cannot teach you.
Two: ask your engineer to comp at the line level or the verse level, not the syllable level. This is a conversation worth having before the session. Engineers will accommodate the request. The result will be a vocal that sounds more like you and less like everyone.
Three: resist the urge to fix the imperfect take. The slightly-wobbly take is often the truer one. Listeners' bodies read the wobble as evidence that you were there. Punching it out replaces the evidence with a stand-in. The stand-in sounds correct and feels less.
If you are a producer or engineer, one thing follows.
Ask whether the take you are stitching needed the stitch. Sometimes it did. Often the unfixed take, with one questionable note, is the take. Listeners are forgiving of one off-pitch syllable inside a committed performance. They are unforgiving — at the level of the nervous system, even if not at the level of conscious thought — of a flawless performance assembled from twenty competing emotional states.
A note on AI vocals
It is worth saying out loud where this argument lands in 2026. AI vocal generation is now extraordinary at the phoneme level. The models can produce a clean, in-tune, emotionally plausible vocal performance of any song in any style in less time than it takes to read this sentence. What they have not yet learned to produce, well, is the causal chain across phrases — the because of relationship between line four and line three. Phrases in current AI vocals tend to be locally beautiful and globally disconnected, in the same way that early phoneme comps were. The body is not there to respond to itself.
That may change. But it tells us, again, where the human is showing up most clearly. The human is in the causation. The human is in the line that is a response to the line that just happened. The human is in the body that sang both lines on the same pass and let one inform the other.
Twenty takes, three chosen, no fixing inside the chosen ones. That is the recipe. Buckley wrote the receipt thirty years ago.
Print the Quiet is a Suede Social series on tone, dynamics, and the parts of music that don't fit on a lead sheet. Next: tape as a compressor.