The Rise of Second Brain Apps and What They Get Wrong About Audio
The “second brain” movement is one of the most successful intellectual products of the last decade. Tiago Forte’s PARA method. Roam Research and its rapid rise. Obsidian’s quieter dominance among power users. Notion’s databases. Reflect, Mem, Logseq, all the rest.
If you’re reading this in 2026, you almost certainly know what these are. You may use one. You may have abandoned three.
I want to make a specific argument: the entire second-brain ecosystem has a blind spot. It’s structurally biased toward text in a way that’s been invisible to the people who built it, because the people who built it are writers who read.
The blind spot is audio.
In an era where serious thinkers spend ten hours a week listening to podcasts and watching YouTube essays, comparable to the time they spend reading, there is no real “second brain” infrastructure for audio. We have built sophisticated tools for capturing, linking, and re-encountering text. For audio, we have… highlights? Voice memos? Sometimes? It’s an embarrassing gap.
This essay is about why that gap exists, what an audio brain would actually look like, and why I think this is one of the most underexplored opportunities in PKM right now.
The text bias of PKM
Start with the obvious. Almost every “second brain” tool’s primary input is text you type.
Roam, Obsidian, Logseq: type into a daily note. Notion: type into a page. Reflect: type. Mem: type. Readwise (the closest to a non-typing input) accepts highlights from Kindle, Pocket, and webpages, but the highlights are text. Even Readwise’s recent additions, including read-it-later integration, are about converting non-text into searchable text first.
This isn’t a complaint. Text is searchable, structured, durable, and easy to manipulate. The reason PKM tools are text-first is that text is the easiest substrate for the things PKM tries to do.
But the consequence is that anything that isn’t text gets second-class treatment. And the most important non-text substrate in the modern intellectual diet is audio.
How much audio you’re actually consuming
Let’s quantify the asymmetry.
A heavy reader in 2026 reads maybe one book a month, ~250 pages, ~5 hours of reading time, plus a few hours a week on long essays and Twitter. Call it 15-20 hours a week of text consumption.
A heavy listener consumes maybe 10-15 hours of podcasts per week, plus another 5 hours of YouTube essays and videos. Call it 15-20 hours of audio consumption.
In other words: for many serious knowledge workers in 2026, audio consumption equals or exceeds text consumption. And the entire PKM toolchain serves the smaller half.
This isn’t because audio is unimportant. It’s because audio is harder to capture, harder to search, and harder to integrate, and so the tools that exist have addressed text first and audio later, or never.
Why audio is hard for PKM
Three reasons audio is hard:
1. It’s ephemeral. You hear a sentence and then it’s gone. There’s no surface to mark. By contrast, text sits still, you can underline, highlight, copy, paste. This is the central asymmetry. Audio doesn’t have a physical place you can put a pencil.
2. You can’t ctrl-F. Even after the fact, an audio file isn’t searchable in the way a text file is. You can scrub through, but scrubbing is slow. Transcripts help, but transcripts are expensive (computationally or financially) and lossy (you lose tone, emphasis, the quality of how something was said).
3. The capture context is wrong. Most audio consumption happens while you’re doing something else, walking, driving, cooking. The classic PKM gesture (write a note) requires both hands and visual attention. The audio context has neither.
So the PKM tradition, which evolved around the assumption that you’re sitting at a desk with a keyboard, has no clean answer for the situation in which most audio consumption happens: standing, moving, hands busy.
The half-solutions that exist
Let’s give credit where it’s due. Several tools have tried to bridge the gap. None has fully done it.
Readwise added “Readwise Reader,” which can ingest podcast transcripts. The model is: download the transcript of an episode, highlight passages, sync them to your PKM. It works. The limitation: you’re not annotating in the moment. You’re annotating a transcript after the fact, which loses the magic of capture-at-the-moment.
Snipd lets you capture audio clips inside its player with AI-generated transcripts. Closer to the right idea, capture happens while listening, but only inside Snipd’s player, so it doesn’t integrate with your broader knowledge management.
Otter.ai transcribes whole audio files. Useful for research and journalism, useless for casual listening.
Voice memos + manual notes is what most people fall back to, and it works at the cost of significant friction and no integration with your text knowledge base.
Margin (the app I built) is my own attempt, press-and-hold capture from anywhere in iOS, transcribed on-device, anchored to the episode and timestamp. I think it’s closer than the others, but I’d be a bad essayist if I pretended it solves the whole problem. It solves capture. It doesn’t yet solve integration with your broader PKM system, which is the harder problem.
What an audio brain would actually look like
If I were designing the audio brain from first principles, here are the features it would have:
1. Capture without unlocking your phone. The gesture has to be available from the lock screen, the Action Button, or a wearable. Anything that requires opening an app has lost.
2. Voice as the primary input. Typing notes during audio listening is broken by definition. The notes are voice. Transcription happens automatically and on-device. (Both the speed and the privacy story matter.)
3. Timestamps as a first-class citizen. Every note is anchored to the second of the episode that triggered it. Tap a note → play that exact second. This is not optional; it’s the whole reason audio notes are different from text notes.
4. Two-way link with your text PKM. Notes flow into your existing system (Notion, Obsidian, Roam, whatever) so they’re searchable next to your text notes. Tags carry through. Backlinks work.
5. Re-encounter loops. Just like Readwise does for highlights, an audio brain should resurface old notes on a spaced schedule. The note you made six months ago about “the moat = distribution thing” should pop up next March, in case it now connects to something new.
6. Cross-listener community (optional). This is more controversial, but: the most interesting moments in podcasts are often the moments other smart listeners also caught. An audio brain could optionally surface “other Margin users also marked this moment”, a quiet, anonymized social layer that helps you discover what was important even on episodes you missed.
Nobody has built this end-to-end. Margin does (1)-(3) and is starting on (4). The full vision is two or three years out at minimum, possibly five.
Why this is the underexplored opportunity
Here’s the case for why I think someone is going to build the full audio brain, and it’s a big opportunity:
- The audience exists. Heavy listeners are exactly the audience that loves PKM. They’re the same people, knowledge workers, founders, scientists, designers.
- The infrastructure is ready. On-device speech recognition is fast enough. Spotify’s API exists. iOS lock-screen extensibility is rich enough to support good capture.
- The text-first incumbents are unlikely to build it. Roam, Obsidian, Notion are all deeply invested in their text-first identity. The audio brain would be a separate product, not a feature inside an existing tool.
If you’re a builder thinking about PKM and looking for an unclaimed corner: this one is unclaimed for the same reason most unclaimed corners are. It requires technical work in audio, mobile, OS-level capture, and OAuth integrations, none of which the text-first PKM crowd has wanted to do.
The corner is wide open.
What you can do today
If you don’t want to wait for someone to build the perfect audio brain, here’s a minimum viable system you can run today:
- Use Margin (or Snipd, or voice memos) for capture-in-the-moment.
- Export weekly to your existing PKM. For Margin, this means copying the week’s notes into Notion or Obsidian. For voice memos, it means transcribing the ones that survived.
- Tag aggressively. Use the same tags you use for text notes, so audio notes show up in the same searches.
- Schedule a quarterly review pass. Most audio insights age into either irrelevance or deeper relevance over months. The review is when you see which.
This isn’t elegant. It’s the manual version of what an integrated audio brain would do automatically. But it works.
A small prediction
I’ll close with a prediction. In five years, the “second brain” category will have audio at its center, not at its periphery. The PKM tool people use in 2030 will treat voice notes from podcasts as a first-class equal to text notes from books and articles. The blind spot will close.
I hope to help close it. If you’re interested in the closing-the-blind-spot work, Margin is where I’m starting. The press-and-hold gesture is step one. The integrations are step two. The full audio brain is step five. We’ll get there.
Selinay
Note taking for podcasts.
Press and hold to capture a thought. Margin auto-pauses Spotify, transcribes your voice, and pins your note to the exact moment in the episode that triggered it.
Get early access →