MindMakers

Why Memory Is AI's Most Underestimated Layer

Episode Summary

Storage used to be an afterthought. In the era of trillion-parameter models and agent swarms consuming millions of tokens per session, it has become one of the most strategically critical layers in the AI stack. On this episode of MindMakers, John Kim is joined by Val Bercovici, Chief AI Officer at WEKA, the AI storage and memory systems company.

Episode Notes

Val brings a rare vantage point of two decades across NetApp, SolidFire, and early Kubernetes development to explain why memory architecture is the defining challenge of this moment in AI. He breaks down the gap between what models can theoretically do and what infrastructure actually allows, introduces WEKA’s concept of a "token warehouse," and explains how the shift from prompt engineering to context engineering is changing the way teams build and scale AI systems.

Val also shares a candid take on the Chief AI Officer role and whether it's built to last, his read on the cultural and competitive differences between Anthropic and OpenAI, and why the most durable business model in AI may ultimately come down to who sells the tokens.

For anyone building or leading at the infrastructure layer of AI, this episode offers both the technical depth and the strategic clarity to think several moves ahead.

—
Guest Bio

Valentin (Val) Bercovici is the chief AI officer at WEKA. He has extensive experience in the data infrastructure industry, having previously been the CTO at NetApp/SolidFire, where he drove innovation in cloud storage and data management solutions. Val co-authored the Windows Shadowcopy snapshots and has made significant contributions to the storage standards community.

As co-chair of the Storage Networking Industry Association's (SNIA) Solid State Storage Initiative, Val helped to establish the first NAND Flash SSD storage standards. Additionally, Val served as the chair of the SNIA Cloud Storage Initiative (CSI), where he led the development of the international S3 standard CDMI (ISO 17826). He was also a founding member of the Kubernetes Cloud Native Computing Foundation’s Governing Board, helping to shape the global direction of container orchestration.

Val holds patents in AI agent smart contracts, streaming data integrity, and augmented reality (AR) for data center maintenance. His work continues to push the boundaries of what’s possible at the intersection of AI, cloud, and emerging technologies.

—

Guest Quote

"The biggest value of memory today is without any additional expense, without more CAPEX or energy OPEX, you can have a high level of concurrency... Even at a modest scale, we're seeing 6.5x, another way of saying it is 550% more concurrent tokens without latency sacrifice, without any more GPU spend or any more energy spend." – Val Bercovici

—

Time Stamps

[00:49] Welcoming Val to the show

[01:02] Val's background pushing innovations

[02:58] Are Chief AI Officers here to stay?

[08:40] How WEKA delivers for clients

[12:16] Unlocking memory architecture bottlenecks

[18:53] Where memory impacts the bottom line

[24:51] The future of reliably stored context

[28:29] Keeping up with AI research

[31:47] The power of being an early adopter

[34:19] Who is winning the AI code battle?

[39:15] Are we heading into a SaaS apocalypse?

[44:31] Val's Human Prompt

—

Links

Episode Transcription

[00:00:00] John Kim: Welcome to Mind Makers. I'm John Kim, co-founder and CEO of Delight AI, uh, AI agent from Sendbird. On this show, I talk to people building the next generation of artificial intelligence, uh, from founders to executives to technologists shaping the future of AI and its real-world impact. Today, um, I'm joined by Val Bercovici, Chief AI Officer at WEKA.

The company that's enabling enterprises, neo clouds, and exascale AI innovators accelerate real-world performance. You bring a wealth of infrastructure experience onto the show today, and has overseen technological evolution throughout this career, such as coordinating the migration, uh, of Google Borg into Kubernetes, and officially bringing NetApp into the cloud era.

This is awesome. Um, Val, welcome to the show. How are you doing today?

[00:00:51] Val Bercovici: Very good. Excited to be here, John. Awesome.

[00:00:53] John Kim: Um, before we go too deep into what you and your company does, I'd love to kind of, for the audience, to really get to know you better. Can you share a bit about your early part of your career, your journey?

Uh,

[00:01:05] Val Bercovici: I won't go too far back, but I did start my career in Canada. The accent's Canadian, as I was telling, telling you earlier on. Grew up in Montreal. Started working as a contractor for the Canadian federal government.

[00:01:14] John Kim: Mm-hmm.

[00:01:14] Val Bercovici: Got involved into pretty big systems pretty early on that spanned the mainframe, and, like, back then, client server before cloud and all that.

[00:01:21] John Kim: Yeah.

[00:01:22] Val Bercovici: Got recruited into Silicon Valley right before Y2K. Moved here right after Y2K. Spent about 20 years here. A lot of time with NetApp- Mm-hmm ... just watching the rise, the challenges during the dot bomb sort of crash, and then the- Yeah ... resurgence of a business like that was fascinating to watch. As you said, I was a, the, the early cloud tsar at the company.

Ended up having to leave the company because I didn't think it was pivoting to cloud fast enough, and got reacquired back, ironically, unexpectedly. A small company called SolidFire, so I rejoined as CTO of NetApp and SolidFire. That's when the fun opportunity to work with the Google Borg folks.

[00:01:58] John Kim: Mm-hmm.

[00:01:59] Val Bercovici: Purely following my geek instinct. No one on that team, uh, thought that Kubernetes would blow up to be as big as it's become today.

[00:02:06] John Kim: Mm-hmm. Mm-hmm.

[00:02:06] Val Bercovici: But we all thought it was the right thing to do. We were all fans of containers as opposed to virtual machines from a developer perspective. We were all fans of containers in the cloud.

Uh, we can go into it if you're curious. There's a, just a long strategy story that Google had around trying to leapfrog Amazon by being- Oh ... early to containers. Didn't work out, because Amazon executed really well on containers as well. Yeah. But Kubernetes worked out really, really well. Uh, so spent time there.

Worked even on early machine learning projects. Uh, Kubeflow-

[00:02:33] John Kim: Mm-hmm ...

[00:02:34] Val Bercovici: was an early project before TensorFlow, PyTorch. Worked on that. Uh, but then I thought infrastructure, storage in particular, was done. Like, what more could you do than cool machine learning in the cloud? Uh, went away from infrastructure for a while into cyber and other, other startups.

And then November '22, ChatGPT happened, and that, like, Al Pacino quote from Godfather III, you know, it drew me right back in.

[00:02:56] John Kim: Wow. That's awesome. Um, actually, I, I have one kind of burning question. You're the first chief AI officer we interview-

[00:03:03] Val Bercovici: Okay ...

[00:03:04] John Kim: um, on this podcast.

[00:03:04] Val Bercovici: Yeah.

[00:03:05] John Kim: So kind of, like, curious about your, um, thoughts on this title.

It's obviously, like, AI is, like, top of mind for everyone and super important. We're starting to see a lot of our enterprise, um, clients also have chief AI officers as a new kind of role. What's your thoughts on this role? Do you see this as a more of a, like, really core pillar, almost like a CTO, but like a C- CAIO, uh, role that will continue to be a, a new kind of important pillar in one of the executive teams, or do you see more...

this more of a tr- you know, kind of transformational, transitional role? Maybe we'll get rolled back to the technology side of things. Like, how do you think about this role?

[00:03:39] Val Bercovici: It's a great question. I don't think I've been asked that question in such great detail before. So I do believe it's transitional.

Like most cliches, you're really successful in your role if you work yourself out of a job.

[00:03:49] John Kim: Hmm.

[00:03:49] Val Bercovici: Uh, I harken back to the days when, again, the, the dotcom boom, we always were talking about that in the context of a potential AI bubble. Uh, when you said you were working for an internet company, like in the mid-'90s, that was novel, right?

Hmm. The internet was new. By the time that came around post-Y2K, like, you know, seven years later, that was, like, the most vague, generic thing you could possibly tell someone, "I'm an internet person," right?

[00:04:10] John Kim: Yeah.

[00:04:10] Val Bercovici: I think the same thing's about to happen with AI. Like- Mm-hmm ... it's gonna become ubiquitous. It's becoming ubiquitous, but even more so, to the point where saying, "I work for an AI company" is kind of meaningless.

Like, what are you applying AI for?

[00:04:21] John Kim: Hmm.

[00:04:21] Val Bercovici: Similarly, the chief AI officer role is really essential right now 'cause there's a lot of confusion, there's a lot of, you know, myth-busting that has to happen, misconceptions. Yeah. We can talk about that MIT p- paper that was famous- Yeah ... last year. And so you need to educate internally.

Uh, strategy is fundamental. You really... Nowadays, and we'll talk about how the Silicon Valley ethos of, you know, ideas are worthless and execution is everything has been turned upside down by AI. So-

[00:04:48] John Kim: Yeah ...

[00:04:48] Val Bercovici: strategy is really essential in the role. Uh, and, and being able to have that technical leadership both outside and inside the company.

You know, what I'm doing today is helping really create a category, a market category around context memory, context memory storage. Yeah. It was, uh, a lonely effort last year. Hmm. Jensen Huang himself from Nvidia really validated the market at CS at his keynote this year, so it's working.

[00:05:11] John Kim: Do you think, like, this role of chief AI officer is actually changing very quickly?

'Cause to your point, probably like a year ago when we started our AI journey, a lot of people were still, like, scratching their heads like, "Well, we heard it, about this AI. We... I've been using, like, ChatGPT for the first time." So people are still s- kind of like trying to figure out what AI looks like in enterprises.

But, uh, latter half of last year, we start to see customers actually adopting AI and rolling things- Yeah ... out to production. So it's almost like, you know, like how universe expands at the peripheral edge faster than speed of light, so people or companies that are adopting AI are getting so far ahead, whereas the rest of the world is still kind of, like, lagging behind.

People are s- starting to use ChatGPT on a day-to-day basis, but not really much beyond that. Yeah. So it's like the entire ecosystems like s- being stretched out. So like how do you kind of define the role of chief AI officer? Like where does it fit in from your vantage point when you work with customers?

[00:06:05] Val Bercovici: As you described it, I think the most important goal of the role is to make sure that you take full advantage as an organization, whether you're public sector or private company, of e- all the potential that AI has to offer, which is unprecedented. It's enormous.

[00:06:19] John Kim: Yeah.

[00:06:19] Val Bercovici: But that's overwhelming at times. So to help demystify and to help sort through and prioritize what you should act on, what the risks are, what you can even say no to, which is very important.

Mm. So it's a very strategic role, and I don't think the role is changing yet. The day-to-day activities in the roles are very much changing- Mm ... because it's following the, the rapid evolution of the industry itself, the science. I always like to divide the science, which is what these models can do, and the papers that these big, you know, foundation labs publish about the models versus the engineering, which is what the products are.

So Claude Code, hyper-popular product, particularly in the Bay Area and Silicon Valley- My very new

[00:06:53] John Kim: friend, yep ...

[00:06:54] Val Bercovici: that's an instantiation of engineering. It's really taking the ca- the science, the capabilities of these models, and delivering real business value and, and really radically disrupting a particular industry of software development, and there's many, many more, many more coming right now.

There's obviously Manus, which is more of a general purpose version of that, that ended up being acquired by, uh, by Meta over the, over the holidays. And, uh, and now of course, we have to say, you know, OpenClaw, and, uh, and I just love the Cambrian explosion of mainstream non-developer type innovations we're seeing, the creativity coming out of that community.

[00:07:27] John Kim: Yeah. OpenClaw was crazy. Um, I think it was one of the fastest project to, I don't know, get hundreds of thousands of stars ever on, you know, GitHub. Yeah. And, um, I think I've been following or using the product, uh, since when it first came out as Claude Bot.

[00:07:42] Val Bercovici: Yes, exactly.

[00:07:43] John Kim: Uh, and then quickly changed to Multibot.

I'm like, "What's happening?" And the Multibot came out, and now it's like, oh, it's trying to find a project. It's now called OpenClaw. I'm like, "What's hap-" Like, it's like the, the ecosystem's changing so quickly. And I'm really, um, happy for kind of like the outcome they had, uh, you know, now working with OpenAI.

And hopefully, hopefully, uh, really kind of continue to build, he- Peter continues to build out this long-lasting project under the new foundation. Um, but I think it really kind of gave a glimpse into what agentic experience really looks like for-

[00:08:11] Val Bercovici: The most mainstream example yet of the power of angels.

And re- Yeah ... it's, it's really an agent platform.

[00:08:16] John Kim: Yeah.

[00:08:17] Val Bercovici: But yeah, people are, are really inspired by what it's, what it can do now for, for mainstream requirements.

[00:08:22] John Kim: Yeah. Before we kind of like diverge into different parts of AI, I would love to kind of understand what does your company actually do for a lot of audiences?

I mean, we have the same experience as a B2B company. We're like known by the customers in the industry, but like really for people, the general audience or broader people who are just getting their feet wet into AI, kind of like would love to, for them to understand what your company does.

[00:08:43] Val Bercovici: Absolutely. So we'll start with a more relatable mainstream example, and then we'll dive absolutely into the details.

Think of if you're buying an EV, if there's a battery technology that can give you a 3,000-mile range, you know, car today, right?

[00:08:56] John Kim: Can I have it now?

[00:08:57] Val Bercovici: Exactly. So we all, we all want that, but you and I, when we're looking at buying cars, are not looking at buying batteries. We're looking at buying cars.

[00:09:04] John Kim: Yeah.

[00:09:04] Val Bercovici: So WEKA is that 3,000-mile range battery right now.

We work with people that sell cars to end users. So we work with NeoClouds. We sell to NeoClouds. We sell to their big tenants, the model labs, and the inference runners, providers. Mm. Sometimes the same thing as if it happens to be OpenAI or Anthropic, for example. Yeah. So we work with that infrastructure, almost physical infrastructure part of the stack.

More traditionally, if you were to ask a Gartner or a semi-analysis what does WEKA do, they'd say they're a storage company- Mm ... that's now offer- also offering memory solutions- Mm ... into the marketplace, because memory, it turns out, if you really spend a lot of time with engineers, scientists and engineers in AI, that is the big bottleneck today for the technical bottleneck for the industry.

[00:09:47] John Kim: So without knowing, going too deep into the company's history, so it's very interesting 'cause story used to be or could be considered like a commoditized infrastructure. But now with AI, and to your point around, like, context management and memory, all of a sudden this became, like, a super hot, sexy thing.

[00:10:01] Val Bercovici: Yeah.

[00:10:02] John Kim: Um, was that, like... Do you think that was a strategic decision, uh, just company trying to figure out how to catch the tailwind of the market? Or is there more, like, from architectural level that has been, like, been building for a long period of time, just that happened to be a perfect fit-

[00:10:15] Val Bercovici: Yeah ... for

[00:10:15] John Kim: this AI market?

[00:10:16] Val Bercovici: Those are the two key components there. This- That's the personal reason why I joined WEKA, is I started as a board advisor.

[00:10:22] John Kim: Mm.

[00:10:22] Val Bercovici: Started to understand the architecture beyond just what the outside in perspective was from the inside out.

[00:10:27] John Kim: Mm.

[00:10:28] Val Bercovici: Quickly realized that this solves a first principles GPU scientific computing problem, 'cause you're always trading GPU floating point operations, FLOPs- Mm

for memory. AI is the most successful scientific application on GPUs ever.

[00:10:42] John Kim: Yeah.

[00:10:42] Val Bercovici: Uh, and the opportunity basically to take some of the high-performance benefits of WEKA, which are fairly unique, particularly as a commercial offering in the marketplace-

[00:10:50] John Kim: Mm ...

[00:10:51] Val Bercovici: and literally repackage the WEKA solution into also offering memory functionality.

[00:10:56] John Kim: Mm.

[00:10:56] Val Bercovici: When I saw that architecture that was possible, then the strategy element kicked in and said- This is a giant bottleneck in the marketplace. At the time I joined, the AI dollars were still shifting primarily or being spent primarily on training.

[00:11:10] John Kim: Mm-hmm.

[00:11:10] Val Bercovici: But it was obvious that you have to pay for this sometime, and it was obvious that we were starting to use these technologies.

Uh, and inference, particularly throughout 2025, as I predicted, exploded in terms of really, uh, a shift in, in spend in the industry.

[00:11:23] John Kim: Yeah.

[00:11:23] Val Bercovici: And even training itself right now with the advent of reinforcement learning is fundamentally bottlenecked by inference during the reinforcement learning loops themselves.

[00:11:32] John Kim: Yeah.

[00:11:33] Val Bercovici: So the strategy was very much let's really focus on the inference market as well as the training market because that's growing much faster. Uh, really, really big dollars. And the strategy was what is the bottleneck that WEKA can solve, and it's a memory bottleneck. The memory wall is another name for

[00:11:47] John Kim: it.

Yeah. This is, like, fascinating 'cause, uh, about a year ago when we started, you know, really building out enterprise AI agents, uh, internally the- there was a hot discussion around the importance of the... or rising importance of memory 'cause it's very clear around context management. It's like everything feels like it's a working memory.

Like AI has working memory, but, you know, it's like go to sleep, wakes up, it's like completely, like, forgotten everything we've ever done. Exactly,

[00:12:09] Val Bercovici: yeah.

[00:12:09] John Kim: Uh, so people think like, "Oh, well, it's learning. I'm using ChatGPT and remember." It's like, well, it's still in the context, but it's not really like a long-term memory- Yeah

if you will. So if you think about, like, human brains, you know, we have working memory, but we have like semantic memory- Yeah ... episode memory, and procedural memory. Uh, so I was kind of like wondering how do you guys think about kind of architecting, uh, your memory structure in a way that can be useful for AI as scale?

Is-- Do you borrow ideas from, like, neuroscience perspective? Like, how are you thinking about this architecture?

[00:12:40] Val Bercovici: Not so much neuroscience inspired, uh, but very much in terms of the, the practical application of the transformer technology- Mm ... and large language models, which through hybrid diffusion models now is being adopted pretty widely for images, you know, audio, video as well.

Mm-hmm. Uh, so the actual technical name for the working memory, because I do love that description of what it fundamentally is, the technical name is KV cache, key value cache. And that is a data structure of, of matrices basically. Uh, and the, the rough math is about 100,000 tokens. 100K tokens is a pretty common large prompt or set of prompts for an agent right now.

That translates to roughly a megabyte of data.

[00:13:20] John Kim: Mm.

[00:13:21] Val Bercovici: As you embed that and vectorize that, essentially 10 to 20,000 dimensions, which is what we need for these large models to work their magic, that's about 50 gigabytes of memory. And not only memory, it has to, for maximum effect, for the latency we all crave, has to be that expensive, rare, high bandwidth memory that's co-packaged with the GPUs- Yeah

themselves for performance reasons. And it's just never enough. The models themselves are so big. Mm. The DeepSeek class-

[00:13:46] John Kim: Yeah ...

[00:13:47] Val Bercovici: and commercial OpenAI Anthropic class models are trillion parameter models now. So they hog essentially most of the memory before our working memory even gets going.

[00:13:56] John Kim: Yeah.

[00:13:56] Val Bercovici: Then you and I need to be sort of share these GPUs to make them economical.

You're just quickly out of that high bandwidth memory before you even begin.

[00:14:04] John Kim: Mm.

[00:14:05] Val Bercovici: And so you have these memory hierarchies. Commonly now as the industry matured, the inference ecosystem matured last year, it's like a bucket. You overflow from high bandwidth memory to the CPU DRAM that's shared on a motherboard across-

[00:14:17] John Kim: Yeah

[00:14:18] Val Bercovici: all the GPUs, and that's not enough.

[00:14:20] John Kim: Mm.

[00:14:20] Val Bercovici: And so the-- that bucket keeps spilling over into other tiers, and those become storage tiers-

[00:14:25] John Kim: Mm ...

[00:14:25] Val Bercovici: with a pretty big gap in performance- Yeah ... you know, almost a Grand Canyon- Yeah ... of performance. Uh, or there's ways to repackage those storage tiers-

[00:14:33] John Kim: Mm ...

[00:14:33] Val Bercovici: as memory.

[00:14:34] John Kim: Mm.

[00:14:34] Val Bercovici: And there's a couple of technical ways to do that, which WEKA implements.

But the net effect now is you have-- you go from a scarcity mindset of, of actual infrastructure resources, memory res- resources in particular, to an abundance mindset.

[00:14:48] John Kim: Mm.

[00:14:48] Val Bercovici: Because now you have, at least economically, a thousand times the capacity of memory without any sacrifice in latency or throughput.

[00:14:58] John Kim: Wow.

I'd be, like, super curious to learn from a har- hardware perspective how you guys are actually solving this problem too. But, um, kind of like just staying at a conceptual level for a second, if you think about how our human brains are structured, right? We basically have the model itself, but the model itself is also storage, right?

Yeah. Yeah. You know, how our brain process. Uh, so it's the neuroplasticity is a lot higher with human brains, whereas the current models, how it's being built, it's like you're almost, like, putting all the memory as part of the structure- Yeah ... that's fairly rigid. Um, so it's not as continuous, like the continuous learning or neuros- neuroplasticity doesn't exist- Yeah

as much in the model side of things. So we're kind of forcing, uh, almost like having the rigid structure as a kind of, like, fixed memory, if you will. Yeah. And then we're kind of offloading the dynamic, more of a, like, learning memory into storage functions. Do you think that is the right architecture, or do you think the model should be a lot smaller, and hopefully we figure out some kind of a different architecture where there is a greater learning, like high neuroplasticity component of storage of some sort that can mimic almost a human brain structure, if that makes sense?

[00:16:06] Val Bercovici: Yeah. The latter. So, uh, that's, it's again another great observation. And the reality is, as powerful as these models are, they still sort of follow routines, if you will. Yeah. They still have much more value in certain domains than other general purpose domains. Uh, and it's not very efficient to just run a big model for everything, and-

[00:16:23] John Kim: Right

[00:16:24] Val Bercovici: one of the things I love about the OpenClaw community is they've quickly figured this out, right? Yeah. Like tokenomics and, you know, I can't afford to just use Opus for everything, so I'm gonna tier my models based on- Even the agents themselves like routing exactly which model needs to be used. So back to your question, yeah, I do believe that just the way industries evolve, the way ultimately you have to have a gross and a net margin for an offering of this kind, we're gonna see more neuroplastic smaller models-

[00:16:51] John Kim: Mm.

[00:16:51] Val Bercovici: More, uh, more process accelerators are sometimes the terms for GPUs- Mm ... and alternative ASICs and so forth.

[00:16:57] John Kim: Yeah.

[00:16:58] Val Bercovici: More different kinds of accelerators. There's one that I use today out of Toronto, Canada, Talos- Mm ... that is now an ASIC specific to a model.

[00:17:05] John Kim: Okay.

[00:17:05] Val Bercovici: Right? So that I think is the direction we're heading in the future, and even the acquisition by NVIDIA over the holidays as well of Groq with a Q- Mm

acqui-hire, I should say, of Groq with a Q, a clear indication-

[00:17:18] John Kim: Mm ...

[00:17:18] Val Bercovici: that NVIDIA now themselves see that there's gonna be a spectrum of- Mm ... accelerators and part-part-particular models too and for particular accelerators with specialties. And I think the most obvious example is, and this was mocked in the Super Bowl commercials, is latency.

[00:17:34] John Kim: Mm.

[00:17:34] Val Bercovici: Right? So we want a natural experience with voice agents. Uh, there's enormous business benefit already. It's one of the hot enterprise use cases today- Yeah ... is voice agents. Yep. It's got to be natural. We got to have a conversation exactly like this. Got to be able to interrupt each other a bit- Yeah

and, and have almost no latency in, in our discussions, and, and that needs special models. That needs special hardware that's tuned to that.

[00:17:54] John Kim: Oh, that's super interesting, 'cause I was looking at this, all these models, and I, I feel like we're giving birth to this baby with a giant head and all the knowledge pre-installed a little bit, but it's not able to learn a lot.

So next time we want to have the baby to learn, it's like, well, we're gonna go back to the lab and cook up another giant baby with bigger head- Yeah ... and like let the baby sound smart, but it's not like picking up new things. So, um, I, I love that as kind of idea, and it seems like WEKA would be, uh, really in a good position as this market kind of evolve from the current gi-giant baby head model- Yeah

to more of a neuroplasticity model.

[00:18:30] Val Bercovici: Absolutely. Absolutely. 'Cause, uh, fundamentally, you know, uh, a technical detail about us that's worth remembering is we are purely software defined.

[00:18:37] John Kim: Mm.

[00:18:37] Val Bercovici: So we adapt really, really well to the hardware that we're installed on, and today there's a lot of value installing us in these big AI factories, these big AI data centers we read about- Mm

the big CapEx expenses and power and energy and water consumption, but it's software defined, so it adapts up and down, you know, the infrastructure, you know, hierarchy.

[00:18:55] John Kim: I guess on a related question, so because you're also B2B company working with a lot of enterprises or clients that serves a lot of these enterprises- Yeah

and the users also building with various models, like, um, any kind of like insight you can give around how to think about, again, using memory? 'Cause I, I don't think still the industry's using understanding, like how can memory actually benefit their business outcomes, and what does it mean from use case perspective, implementation perspective versus like, well, we can just like connect our database to, uh, an LLM.

It, it should just works. Like in reality, we've deployed a lot of AI agents and- It doesn't-- It's not as simple as that.

[00:19:32] Val Bercovici: Mm-hmm. Yeah.

[00:19:32] John Kim: Um, so I'm kind of curious for, uh, business people who are listening, like where does memory fit in, not conceptually, but in actual business-like use cases?

[00:19:41] Val Bercovici: Another great question.

And so we should probably talk about this evolution last year from the focus on prompt engineering-

[00:19:47] John Kim: Yeah ...

[00:19:47] Val Bercovici: to the more recent focus on context engineering. It's directly tied to this discussion. It's around the fact that we're always gonna have some level of scarcity in terms of the memory that the GPU can process in real time- Mm-hmm

despite other tiers available to it. Uh, and so the focus really needs to be on context engineering and understanding the science and understanding the engineering 'cause they're both rapidly evolving. So the science, for example, is still not very good, to your point. It's not very neuroplastic. The models don't understand memory tiers or memory hierarchies.

Mm-hmm. They understand very scarce, limited memory. Mm-hmm. They continue to optimize, uh, trading off quality, trading off accuracy for techniques, particularly around KV cache compression, reduction, summarization. Mm-hmm. Mm-hmm. And agent engineering-

[00:20:31] John Kim: Yeah ...

[00:20:31] Val Bercovici: uh, AI engineers themselves, right? Uh, they also are focusing now on this thing called compaction, where- Yeah

after a certain number of turns of an agent, you know this intimately, right? Yeah. It's more efficient to just compact and summarize, clear up some context, and then move on. I would say today w- the state-of-the-art of engineering, it has a long way to go, I'm excited about where it's going, is being able to essentially create copies or forks of certain contexts.

So instead of summarizing and compacting, just give full context to a new sub-agent, a sub-task- Yeah ... and let it start to evolve and, and, and fill up its context window. Mm-hmm. So this parallelization, a lot of concurrent agents, another common term is agent swarms.

[00:21:12] John Kim: Yeah.

[00:21:13] Val Bercovici: I've always believed that as agents really take off and become valuable, no one will be running just an agent.

It'll always be a swarm of agents- Yeah ... working in parallel concurrently. So this concurrent engineering, concurrent agent swarm engineering, is something we've termed context platform engineering. 'Cause the memory today, the biggest value of memory today, is without any additional expense, without more CapEx or energy OpEx than otherwise- Power cooling, you can have high level of concurrency.

In a, in a best case scenario, we estimate about 10x. You know, we haven't been able to test this at massive scale yet, but even at modest scale we're seeing 6.5x. So another way of saying it is 550% more-

[00:21:54] John Kim: Mm ...

[00:21:54] Val Bercovici: concurrent tokens without latency sacrifice, without any more GPU spend or any more energy spend. So it's, it's exciting where we-- the, uh, the technology is today.

Deep Seek, of course, is continues to be very influential. They published a, a preview paper of Deep Seek 4-

[00:22:11] John Kim: Mm ...

[00:22:11] Val Bercovici: with this concept called Engram, E-N-G-R-A-M.

[00:22:15] John Kim: Okay.

[00:22:15] Val Bercovici: And it is the first public indication, and we're sure the, the big, you know, closed commercial foundation labs, frontier labs are doing this, of awareness of memory tiers at the model level, not just at the inference server level.

[00:22:27] John Kim: Mm.

[00:22:27] Val Bercovici: And so when the models now know that they just have more memory- Yeah ... more working memory, but again, some of it is short-term, some of it is long-term-

[00:22:33] John Kim: Yeah ...

[00:22:34] Val Bercovici: uh, it really opens up what the notion of a context window is. It really opens up what the notion of attention is and these new, new kind of algorithms, radix attention, ring attention, helix attention.

[00:22:44] John Kim: Mm.

[00:22:44] Val Bercovici: These new attention algorithms are gonna get us much closer. It's gonna be kind of very, uh, abnormal. It's not gonna be very human-like. But the ability to have the baby with more experience- Yeah ... as well as more knowledge, uh, and, and just be more effective over longer and longer contexts. And that's the science, I think, being pushed to the maximum era of the science where it is today.

You've got, you know, giants in the industry, Yann LeCun and others, that are big believers in other model types, hybrid model types, Mamba and so forth.

[00:23:13] John Kim: Yeah.

[00:23:13] Val Bercovici: Uh, and that will continue to push the science into entirely new kinds of probably more efficiency in terms of processing so much information in these large contexts.

And, uh, and back to your original point, that'll result in, you know, different kinds of accelerators, more fine-tuned kind of approaches, more efficient, affordable, accessible versions of this super intelligence.

[00:23:33] John Kim: Yeah. It's fascinating 'cause I, I feel like even with the context management, and I'm just thankful that Opus 4.6 now have a 1 million token-

[00:23:40] Val Bercovici: Yeah

[00:23:40] John Kim: uh, limitation, which just makes my, uh, life, life easier. And it was actually, like, one of my personal projects since the end of last year was building what I call the swarm, uh, operating system, basically spinning up all the sub-agents and, you know, managing context and sharing context, uh, fairly efficiently.

But I feel like that's a problem that everyone is trying to solve already, right? Yeah. There's so many different, um, frameworks, open source that came out from a lot of these, um, large tech companies. So everyone's try- trying to figure out, figure out a s- or solve a similar problem.

[00:24:08] Val Bercovici: Yeah.

[00:24:08] John Kim: So assume this kind of, like, context management gets solved.

I think right now, to your point around compaction, I think a lot of people try to manage- At least on Cloud Code, uh, to keep the context window be- below 40%. Yeah. 'Cause after that there's like the dumb zone- Yeah. That's, uh- Where your accuracy drops quite a bit ... the Dave Shorn

[00:24:25] Val Bercovici: VI. Yeah,

[00:24:25] John Kim: so- There's a great talk about that.

Yeah. So yeah. And one of the tips, uh, that I heard from YC was people started, like, telling AI or let's say the models, like some random facts like, "Hey, my favorite color is purple. Remember this." And throughout, you know, occasionally they'll ask questions, "So what is my favorite color?" It's like, "Well, green."

It's like, okay. I got to, I got to reset this context. So-

[00:24:43] Val Bercovici: Exactly ...

[00:24:44] John Kim: it's almost like this tips and hacks around matching context. It's almost like this, you know, unnecessary overhead. Yeah. And I, I think it's gonna be solved and ab- abstracted away. So kind of like curious, like what comes after it? Let's say context management has been abstracted away by a wonderful company like you guys.

It's like we don't have to think about it anymore. Uh, AI just remembers context. Yeah. Still it's a working memory.

[00:25:04] Val Bercovici: Yeah.

[00:25:05] John Kim: What's interesting about humans' long-term memory is that it's actually fairly efficient. It's fast. We can retrieve information very fast, although storing it requires repet- repetition and things like that.

[00:25:14] Val Bercovici: Right.

[00:25:14] John Kim: Uh, but from the computer hardware perspective, actually long-term memory, if it's stored in some kind of hard disk, a lot slower.

[00:25:22] Val Bercovici: Mm-hmm.

[00:25:22] John Kim: It's almost the opposite of how human brains are working. Yeah. Human brains' long-term memory is, are f- fast and efficient, it's highly indexed. So, uh, what are your thoughts around how do you expand this memory from this mere a million token to, let's say, a trillion token long-term memory?

Yeah,

[00:25:37] Val Bercovici: yeah.

[00:25:38] John Kim: And how, what kind of like architecture hardware or software can really solve that?

[00:25:43] Val Bercovici: So we've given this a lot of thought, and we actually coined a term around it called a token warehouse.

[00:25:47] John Kim: Hmm.

[00:25:48] Val Bercovici: So if we think of a coin, two sides of a coin, one memory discussion with AI engineers and people that focus on compaction is all around preserving that memory and that intelligence, that short-term memory in markdown files basically, right?

Yeah. Or SQLite databases, Postgres, context graphs- Yeah ... and so forth. But they're basically pre-embedded, pre-vectorized structures.

[00:26:08] John Kim: Yeah.

[00:26:08] Val Bercovici: Which you then continue to redundantly, you know, vectorize and embed over and over again as it's, they're being used in that prefill phase of inference with the goal of just being able to decode them efficiently.

And if you think backwards, what will it take to just always be decoding and not pre-filling, like you said, more than once when you're learning?

[00:26:28] John Kim: Yeah.

[00:26:29] Val Bercovici: That's really the vision of a token warehouse. Nvidia's engineers have a term for it called local pre-fill global decode.

[00:26:36] John Kim: Hmm.

[00:26:36] Val Bercovici: But the kinds of augmented memory technologies that, that WEKA has released now in conjunction with new science, with the models being aware of that, do give you that.

Because what you really want to be is in a world where once a prompt has ev- has been issued once, it's never re-pre-filled again. It's pre-filled the first time-

[00:26:54] John Kim: Hmm ...

[00:26:55] Val Bercovici: and kept forever. And one of the technical ways of describing that is a KV cache without eviction or set of eviction every few minutes or an hour, which is the maximum Anthropic lets you buy today.

Mm-hmm. It can-- You can buy a week, you can buy a month- Hmm ... and efficiently at the same price or less.

[00:27:11] John Kim: Hmm.

[00:27:12] Val Bercovici: Which means you, you- you're always recoding. You're, you're decoding. You're always doing the low-cost cache read-

[00:27:17] John Kim: Hmm ...

[00:27:18] Val Bercovici: from that. And some of the science bottlenecks today are it's only for a full prefix.

Hmm. Only for the full system prompt plus perhaps, you know, your prompt on top of it, plus perhaps your subsequent prompts. There are already research papers. A, a nice new, uh, startup here from University of Illinois, TensorMesh up in Redwood City here, uh, they're famous for their LCMCache project in the inference server world, page attention vLLM world.

Uh, they wrote a brilliant white paper a year ago called Cache Blend-

[00:27:47] John Kim: Hmm ...

[00:27:47] Val Bercovici: which is how to think, you know, uh, strive towards the goal of dividing and conquering, creating smaller and smaller prefix subsets so that you don't have to match a full prefix. You really only care about a subset of the prefix for efficiency over time.

[00:28:01] John Kim: Hmm.

[00:28:02] Val Bercovici: There is a bit of a trade-off in spending a bit more cross-attention GPU cycles, but it's a very worthwhile trade-off. And if nothing else, that paper is a direction of where things are going, where we can get to this pre-fill once in a, like, local pre-fill, but global decode- Yeah ... for as long as you want with this architecture, with effectively configuring what is called Non-Volatile Memory Express or extension NVMe drives.

It really is a non-volatile memory technology and packaging it up in a fast enough way that it's the GPU sees it as memory.

[00:28:32] John Kim: Wow. How do you keep up with, uh, all this research that's coming out? 'Cause part of, part of my challenge is, you know, when I follow things like X, and there's so much paper just coming out, and it's almost like, like attention is everything, right?

Yeah. I have to figure out, like- Yeah ... what kind of things I gotta pay attention to. And to your point, I, I'm sure it's like the, maybe the, the- The battlefield, if you will, is slightly narrower in this space because I'm sure there are fewer people maybe might be trying to solve this problem. Um, but like how do you kind of keep track of what paper to kind of track and which kind of, uh, application or technology to apply to WEKA?

[00:29:09] Val Bercovici: Yeah. So this basically is the role, or part of the role of the chief AI officer is at this phase, and I don't know when it's going to end, but we definitely are still in this Cambrian explosion phase of new science and new engineering. Uh, a big part of my job is keeping up with these papers. Mm. It is going to conferences.

It is talking to folks like you and learning a lot. Uh, I would say, uh, one hack, and this is kind of well-known in the industry- Mm ... because, uh, whether it was Sam Altman or someone else at OpenAI kind of said the company runs off Twitter vibes.

[00:29:36] John Kim: Mm-hmm.

[00:29:37] Val Bercovici: Yeah. So it turns out that Twitter is a pretty controversial place.

X is a pretty controversial place. But the algorithm, more than ever now since Nikita Bier jumped on board a few months ago, but even before that, is really good at why-- what I call shrink wrapping around your engagement.

[00:29:50] John Kim: Mm.

[00:29:51] Val Bercovici: So yes, if you, if you click on clickbait and rage bait articles, you're gonna see more of those.

[00:29:56] John Kim: Yeah.

[00:29:56] Val Bercovici: Um, and maybe create an account if you want to indulge in that, separate account. But if you want AI content and really well-curated AI content, create an account that only engages with, you know, AI accounts you like-

[00:30:09] John Kim: Yeah ...

[00:30:09] Val Bercovici: people you respect, you know, the Nathan Lamberts of the world, the Anil Patels of the world- Yeah

Swixes of the world, et cetera. Uh, Simon Willison, I can go on and on. There's just some fantastic people there. You know, Sebastian Raschka and so forth. When you engage with that content, you get fed more of that content back, and it's like a, a shortcut towards what's the hot paper, what's the hot concept.

[00:30:29] John Kim: Yeah.

[00:30:29] Val Bercovici: That's one of my favorite ways to learn.

[00:30:31] John Kim: I mean, same. Like I, I think I stopped using all the other social media, and not because of whatever, like, controversy or political views. It's simply a lot of smart people use it, and when I just like read all this, uh, technology that's coming out and papers coming out, I'm like, it's just so mesmerizing.

I literally cannot go to sleep. Keep feeling so excited at like 2:00 a.m. like, oh my God, I can't go to sleep. I, I usually go to sleep at like 4:00 a.m. just because I'm too wired up at that point.

[00:30:55] Val Bercovici: I would say it really helps when, you know, your work is your hobby. So I think a- another, you know, simple tip to be a successful chief AI officer is if AI is your hobby, you al- you already have a head start.

[00:31:05] John Kim: Yeah. Um, kind of curious, like I, I think you, uh, mentioned this earlier too. You, I think, see yourself, uh, your-- it seems like your family's an early adopter in general sense. And, you know, being an early adopter was almost like a, I, I say an edge, or you can kind of have a sneak peek into the future of what's, what's to come.

And usually people are like, "Well, you're a geek, you know, you're doing something like I was always that person." But now with AI, like if I don't pay attention to, to your point around like X for a coup- even couple of days, I feel like I'm falling behind.

[00:31:34] Val Bercovici: Absolutely.

[00:31:35] John Kim: So like how do you kind of-- do you think- Like, how do you harness your power of being an early adopter to keep up with what's happening?

But also, like, I feel like even my rate is getting slowed down because the world is moving so quickly. How do you kind of wrap your head around that?

[00:31:50] Val Bercovici: There's no easy answer there. I can only say that I've been guilty in the past of being too early, and too early an adopter of technologies. So it could've been client server, it could've been cloud, it could've been blockchain even, and stuff like that.

That's not the case with AI, right? I can't be early enough. It's just... it's, it's a technology that is truly on this, you know, uh, exponential self-improvement curve right now- Mm-hmm ... the very steep part of the exponential.

[00:32:14] John Kim: Mm-hmm.

[00:32:14] Val Bercovici: And the only way to go is, like, a straight line up right now. There's not much curve to the line anymore.

So I've kind of philosophically said, "Okay, I can't stay on top of all of it all the time." Like everything else, you gotta specialize. Uh, so I'm just picking obviously things I'm interested in, which is, you know, infrastructure, storage and memory infrastructure- Mm ... and watching the science, watching the models be much more infrastructure aware.

Certainly keeping on top of all the cool inference server innovations that are leveraging infrastructure, that are making some of the... You know, I, I'm obsessed when I used to be a CEO at a startup with cash flow. Mm-hmm. And so the cash flow is involved in these businesses now are insane. Yeah. There's just no other way to describe it.

Uh, and they're very true. They're on a knife's edge. Even Dario on the Dwarkesh podcast the other week said- Oh, yeah ... "Yeah, it's literally all about cash flow." 'Cause we could spend an enormous amount of money to meet the demand we're seeing- Yeah ... but we could also bankrupt ourselves if we're off by, like, one degree of progress or whatever.

So it really is a cash flow game- Yeah ... and, uh, and focusing on that kind of grounds me.

[00:33:10] John Kim: Yeah. Kind of demand prediction, uh, that he kind of talked about in the podcast around, like, gross margin-

[00:33:14] Val Bercovici: Exactly ...

[00:33:15] John Kim: the amount of, uh, you know, R&D debt has to go in this year versus the gro- gross margin they get next year.

So kind of predicting that if they're off by-

[00:33:22] Val Bercovici: Yeah ...

[00:33:22] John Kim: a little bit, they can go literally bankrupt- Yeah ... while being successful. Those kind of

[00:33:25] Val Bercovici: trade-offs. Yeah, it's really a knife's edge that these businesses are running on.

[00:33:29] John Kim: Yeah, I was sitting next to a gentleman, uh, on one of the business trips, and, uh, he was, um, deep into the data, data center space and really building out, you know, tier th- two, tier three, uh, regions.

And, uh, I was asking, like, how much does it take to, you know... If I want to, you know, fund or join one of these, like, projects, like... It's like, "Yeah, the minimum check size is, like, $100 million."

[00:33:48] Val Bercovici: Keep hearing that over and over again. Yeah, I'm like, wow. It's like that's when you're serious in AI, so.

[00:33:51] John Kim: Yeah, and that's, like, the minimum entry barrier to check size.

I'm like, wow, this is, like, crazy. And more than half of the old commercial buildings today are built are data centers, so it's kind of-

[00:34:00] Val Bercovici: Yeah ...

[00:34:01] John Kim: mind-boggling, uh, the world we're headed. Um, kind of ch- uh, curious to hear about your thoughts on Anthropic and OpenAI, and around context management too. 'Cause I think this, this might be a general perception.

I haven't really, like, tested or benchmarked against it. Uh, I think so far context convection happens slightly better with Codex versus Claude, is what I've been hearing from- Yeah, yeah ... the development community. So people like to use, uh, Codex for a bit more complicated, complicated comprehensive projects where- Anthropic's a little more snappy quick.

[00:34:33] Val Bercovici: Yeah.

[00:34:33] John Kim: So if you want fast iteration, so I think that's why Codex also came out with Codex Spark. Um, and but Anthropic from revenue perspective, they went from whatever billion to 14 billion. Uh, and I think their faster slope, uh- Yeah ... on a sharper s- slope than, uh, OpenAI. OpenAI I think ended around 20 billion.

So there's some questions around maybe Anthropic will overtake, uh, OpenAI in terms of revenue because they're focused on B2B enterprise. Mm-hmm. But also, you know, I think you have some opinions around Anthropic. You know, are they really winning or are they in a losing streak? Love to kind of get your perspective on maybe those two companies.

[00:35:09] Val Bercovici: Yeah.

[00:35:09] John Kim: If you want to throw in Gemini 3.1. Sure. I'd love to hear about that

[00:35:12] Val Bercovici: too. How much time do we have? It's such a fun topic. Let's just- So much drama-filled Uh, in fact, you probably saw this, right? The big India AI Summit this week. Yeah. With the nice- You saw that they had Dario and, exactly- Yes ... and Sam on stage.

Not sure why they got placed next to each other. I think they weren't- Yeah ... too pleased with that, but they wouldn't even hold hands, right? Yeah. So there is a lot of, you know, let's just call it bad blood between those folks and, uh, and a lot of drama there. Focusing on each one, they are very different companies and, you know, you can just label OpenAI as more of a consumer company and Anthropic as more of an enterprise company, but it's- that's too simplistic as a, as a set of labels.

As you said, we're seeing that some of these models, if not all of them, like Gemini Pro 3.1, right, continue to leapfrog each other at the frontier.

[00:35:54] John Kim: Yeah. I-

[00:35:54] Val Bercovici: if anything, these models are converging more and more in terms of their benchmark capabilities. So it comes back to this intangible of taste. Mm-hmm. A lot of people say I, I like the personality of an Opus or a Sonnet, particularly like in an OpenClaw environment-

[00:36:07] John Kim: Yeah

[00:36:07] Val Bercovici: more than I like the personality of a Codex even though, to your point, if I give Codex a hard problem, it'll solve it accurately and faster more often than Opus or Sonnet will solve it accurately and faster.

[00:36:18] John Kim: Mm.

[00:36:19] Val Bercovici: So, um, it- it's interesting some of these intangibles now are becoming differentiators. The one thing that obsesses me a little bit about the difference between the companies is, again, OpenAI, Sam himself, just master lobbyist, master deal maker.

[00:36:32] John Kim: Mm.

[00:36:33] Val Bercovici: A visionary for sure, and just so much experience, right, in terms of just business experience with his YC background, whereas Anthropic is so mission-focused-

[00:36:42] John Kim: Mm-hmm ...

[00:36:42] Val Bercovici: you know, to, to a fault. Very mission-focused. The, the legendary Dario Esse is inside the company. Yeah. The ones he publishes outside the company as well.

The culture of Anthropic is a very, very unique culture.

[00:36:53] John Kim: Mm.

[00:36:53] Val Bercovici: One anecdote I want to... You know, I'll use Chatham House Rules. I won't attribute this to a particular individual, is that- Almost everyone at Anthropic doesn't have a 401plan because they firmly believe that money will be worthless by the time they retire.

Wow. They really believe in the deflationary value- Interesting ... of AI when executed. Yeah. Th- their goal, by the way, at Anthropic, and even at OpenAI, but again, it's more of a core mission at Anthropic, is super intelligence. And these 10x revenue bumps into the tens of billions of dollars annually now are merely stepping stones along the way to that mission.

But they are really focused on that mission, and they're glad they can fund this mission with Claude Code, Claude Cowork, et cetera. Whereas OpenAI seems to be just more of a commercial company. They just want to apply AI to our lives at home- Mm ... at work, uh, and, and, you know, be part of sovereign AI configurations all over the world.

Uh, opportunistically, of course, partner with NVIDIA and others. So it's, it's very different culture that these two different companies with some brilliant machine learning scientists at both, so.

[00:37:53] John Kim: Wow. I did not know about that 401, 401, but that's, like, fascinating. It's, like, a very opinionated view-

[00:37:58] Val Bercovici: Yes ...

[00:37:58] John Kim: of the world, and-

[00:37:59] Val Bercovici: Consistent if, if you really- Yeah

do believe in super intelligence. I

[00:38:02] John Kim: love that. That's,

[00:38:03] Val Bercovici: that's- It's not coming 10 years from now. It's coming, like, in a year or two.

[00:38:05] John Kim: Yeah, yeah,

[00:38:06] Val Bercovici: yeah. And, uh, and yeah, what does that mean for, just to overemphasize it or over-dramatize it, free labor?

[00:38:13] John Kim: Wow.

[00:38:14] Val Bercovici: Right?

[00:38:14] John Kim: I, I think I had that big realization, uh, at the end of last year when, you know, like, I was spending the last half of December just playing with Claude Code.

And this was around the time when a lot of people kinda switched their views, too. Before it's like, "Well, it's AI slab. You can't really use this in production." It's like, "Wait, this is, like, actually happening?" I think even Andrej Karpathy and, uh, few other folks really jumped that bandwagon. It's like, you know what?

Now how we used to engineer has to fundamentally shift. And, uh, yeah, I was, like, s- using Claude Code for, like, 16 hours a day literally. My wife was, like, looking at me like- ... "Are you a crazy teenager who's a- addicted to games?" Like- It's

[00:38:50] Val Bercovici: like a game, yeah. It's like our- Yeah ... first video game experiences.

[00:38:52] John Kim: Yeah.

So yeah, yeah. And I, I was a professional gamer, so, like, I did fair bit of my gaming, and it's like this is way more fun and addicting. Um, so a- as the joke goes, instead of having four to five unfinished side projects, now I have 20 unfinished- Exactly ... side projects. And, uh, I'm just trying to release a few more open source projects-

[00:39:07] Val Bercovici: Yeah

[00:39:08] John Kim: at this moment. But, um, speaking of, now w- what's really interesting thing that's happening in our company too is we encourage people to use AI, and I actually recommended couple p- people on the business side to use Claude Code. And people are like, "Well, I don't have a background in engineering. Like, why should I code?"

It's like, "No, no, just give it a try. You'll be surprised." Now we have a marketing team builds their own marketing tools that tracks whatever projects they're... They, they have, like, so many apps that they've built. So now I get marketing teams sending me links. It's not to Google Docs or Slides. It's to, like, Vercel apps and stuff.

Exactly. Yeah, yeah, yeah. It's like, it's crazy what's happening. Um, so what do you think about, like, this whole SaaS apocalypse? Like, is it- is SaaS really ending? I, I think we're- market might be overreacting, but maybe that's true. Yeah. Where are you on this?

[00:39:55] Val Bercovici: Yeah, I love that question. I'm trying to remember that quote, right?

The, the market measures some, you know, measures, like, weight in short term, and, you know, just measures, like, your overall impact of- Voting

[00:40:03] John Kim: machine

[00:40:03] Val Bercovici: and the- Exactly ...

[00:40:04] John Kim: weighing machine over the long term. Yeah.

[00:40:05] Val Bercovici: That's, that's the quote. That's what's happening right now.

[00:40:08] John Kim: Mm.

[00:40:08] Val Bercovici: But I don't think the market's wrong, right?

Mm. Um, because the, the valuation of SaaS companies was based on growth.

[00:40:14] John Kim: Mm.

[00:40:15] Val Bercovici: So is SaaS worthless? Of course not. Are their systems of records worthless? Of course not. There's enormous value there. Mainframes, you know, are still around. Mm. And I think that's the analogy here, is that the, the, the, the air has come out of the growth balloon for SaaS- Mm

if I were to generalize. There's always gonna be some SaaS companies that are more successful than others. But look at public information like ServiceNow, a great SaaS company, with big deals with both Anthropic and OpenAI publicly announced. So the value layer, the growth layer, is AI now.

[00:40:44] John Kim: Mm.

[00:40:45] Val Bercovici: And yes, we know, you know, ServiceNow, for example, was successful.

I think there was this, like, Vancouver codename released just after ChatGPT, where really valuable, you know, service agents were, were released by them and well accepted in the marketplace. They've concluded, though, this is just not, like, a treadmill they can really be on anymore. You've... The capital expense alone required, much less the recruitment to recruit these top researchers, means that- Mm

it's the, as, as, as Bill Gurley likes to say in his podcast, it's a sport of kings.

[00:41:12] John Kim: Mm.

[00:41:12] Val Bercovici: Uh, and so I think we're gonna see this trend where the growth for SaaS is in partnership with AI companies that are proven, like Claude Cowork or just Claude for PowerPoint, Excel.

[00:41:23] John Kim: Yeah.

[00:41:23] Val Bercovici: So super valuable. Uh, those partnerships are, are definitely the trend going forward right now.

The growth is in those areas. I've got a prediction I put on LinkedIn the other week, which is, uh, ultimately, the most durable business model in this era of AI, software business model is selling tokens.

[00:41:40] John Kim: Mm.

[00:41:40] Val Bercovici: So I believe if not the top SaaS companies and the top neo clouds, the mid-tier or the smaller SaaS companies and neo clouds are gonna merge-

[00:41:48] John Kim: Mm

[00:41:49] Val Bercovici: out of necessity. Because if you really wanna extract value, it's gonna be less and less in prepackaged software. This is, like, one of Jensen's public statements as well. Most software is becoming inferred now, as opposed to precompiled and run.

[00:42:01] John Kim: Mm-hmm.

[00:42:01] Val Bercovici: And that requires tokens, and the tokens, of course, require infrastructure and data centers.

[00:42:08] John Kim: So maybe one- picking one company, for example, like Salesforce, since you mentioned system of records. Will Salesforce still be relevant assuming we figure out memory, we- Yeah ... get this AI super intelligence and all the memory? It's like, "Hey, here's... I'm just gonna feed you-" Data dump you through APIs or MCPs or whatever, or just email you all my interaction with customers onto Zoom calls, con calls, whatever.

You tell me which customer I need to talk to. Yeah, yeah. You tell me where we're headed, what's our forecast. I have a set of questions I want to get re- uh, reported on every weekly, 'cause I still am a human, I gotta talk to other executives. But I don't want to deal with a clunky UI- Right, right ... that has a million buttons and fields that are s- Salesforce hygiene is a issue.

Yeah, yeah. I want the AI to tell our sales rep to fill them out or make best guesses-

[00:42:55] Val Bercovici: Yeah ...

[00:42:56] John Kim: and course-correct later. I'm like, "Do we- do I need a Salesforce when we have a perfect memory and a smart AI?"

[00:43:01] Val Bercovici: Yeah. I think you do. And this is again why strategy is so important in this era, and, and getting strategy right is so important.

Again, I'm gonna back- go back to the tired mainframe analogy 'cause it works. Yeah. Right? COBOL and mainframes haven't gone away. It's still the fundamental financial system of record.

[00:43:16] John Kim: Yeah.

[00:43:17] Val Bercovici: We've tried to replace it for decades. We haven't been able to. Mm. People who wrote the code are dying right, like, now- Yeah

literally right now, and we're using AI to try and maintain it. So SaaS, I believe, for, for qui- foreseeable future, let's say end of the decade, will absolutely remain the single source of truth-

[00:43:33] John Kim: Mm ...

[00:43:33] Val Bercovici: but not the active source of truth, right? Mm. The active source of truth already is being pulled by agents into markdown files and other- Mm

kinds of agent working memory, and as we discussed earlier, will end up ultimately pre- prefilled once, hopefully, and then decoded forever. So that decoded system of record in large KV cache token warehouses- Mm ... will become the new active, you know, system of record.

[00:43:58] John Kim: Yeah.

[00:43:59] Val Bercovici: And where the value is there, you know, it'll be companies that operate at that native, you know, decode-only performance, at that post-embedded KV cache level of performance.

Those will be the most successful companies 'cause they'll literally be able to react in real time, be able to always have current information, always have the AI version of the closed books every minute, every hour, not just once a quarter. That, I think, will be how this evolves for the next few years. Uh, and it's possible that people will start to really question, you know, "Why am I paying for last decade's system of record if these AI systems now after a few years have proven to just work?"

[00:44:33] John Kim: Yeah. Let's finish it off with what we call the human prompt. Um, basically, we like to end each of our, uh, episode on a more personal note-

[00:44:42] Val Bercovici: Uh-huh ...

[00:44:42] John Kim: to uncover our, our guest's human prompt. So what it is, is, um, uh, let's assume you had this access to this ultimate, all-knowing, omniscient-

[00:44:51] Val Bercovici: Mm-hmm ...

[00:44:52] John Kim: um, GenAI model, free of no hallucinations, uh, no privacy concerns, uh, used to make something very easy- easier or better for- Yeah

your personal life. What would be the one-sentence prompt- Hmm ... you're gonna ask this amazing, omni- omnipotent, omniscient- Uh, AI agent

[00:45:11] Val Bercovici: Can it be a run-on sentence? 'Cause-

[00:45:13] John Kim: Can be more descriptive. Like, you can structure it, marked on file maybe.

[00:45:17] Val Bercovici: Right. It would be something around the fact that you absolutely have to respect, like, Asimov's three rules for robots, right?

Mm. You, you cannot harm a human- Yeah ... you cannot be tricked into harming a human, and so forth. I am getting increasingly concerned about the power of these models and the fact that our safety is really an issue right now- Mm ... where it was more conceptual a year or two ago. The exponential progress is real, and in that, again, I'm gonna go back to Star Trek and, like, the Gene Roddenberry vision of the future-

[00:45:43] John Kim: Mm

[00:45:44] Val Bercovici: where it's not just about capitalism, it's really just about senses of purpose.

[00:45:48] John Kim: Mm.

[00:45:49] Val Bercovici: And obviously a very, uh, inclusive type of multiracial, multi, you know, species society and so forth. I just like that vision of the future, right? It's very utopian, and I'm a realistic, right? We don't live in a utopian world, but we have to guide AI towards that kind of world, and we have to create...

It's always about incentives. We have to create incentives for AI to help us get there, and we have to have some basic safety rules as well. So-

[00:46:13] John Kim: Mm ...

[00:46:13] Val Bercovici: a run-on sentence for sure. I would compress that. I would refine with my favorite, you know, chatbot that- Mm ... and to be my system prompt.

[00:46:21] John Kim: Wow. You want to use your human prompt to make human society better.

[00:46:25] Val Bercovici: Oh, yeah.

[00:46:25] John Kim: Love that.

[00:46:25] Val Bercovici: Oh, yeah.

[00:46:26] John Kim: Well, Val, thank you so much for joining us today. Um, I hope the audience learned a lot. So it's a very, um, in- uh, interesting conversation. Um, and I'm, I'm really, like, hugely interested in what, uh, WEKA is building around, like, memory. We talk a, uh, a lot about memory at our company, so really, um, keen on potentially partnering, uh, from business perspective too.

But really can't wait to see what you're up next.

[00:46:49] Val Bercovici: I really look forward to that, John. Yeah, I really enjoyed this conversation, love this facility, and hopefully I can come back again, and we can discuss all the great, uh, the great updates of the industry.

[00:46:58] John Kim: Sounds good. All right, thanks so much.

[00:47:00] Val Bercovici: You're

welcome.