Microsoft Teams Insider

AV, AI and Meeting Equity - Multi-Camera, Multi-Agent, Multi-AI with Bo Pintea

Tom Arbuthnot

Bo Pintea from Huddly talks about the evolution of multi-camera setups and their role in creating more immersive and equitable meeting experiences. 

Bo shares insights from his Microsoft journey and current role at Huddly, covering:

  • Why IP-based AV and edge AI are critical for scalable multi-camera deployments
  • The science behind camera angles, spatial reasoning, and reducing hyper-gaze fatigue
  • How AI agents are reshaping meeting dynamics—from intelligent shot selection to real-time transcription
  • The future of corporate knowledge, digital twins, and who owns the data we generate
  • Why SMBs might leapfrog larger enterprises in adopting AI-first business models

Bo also reflects on the shift from human-centric UC to AI-augmented collaboration, and what it means for the next generation of workplace tools.

Thanks to AVI-SPL, this episode’s sponsor, for their continued support. 

Tom Arbuthnot: Hi, and welcome back to the Teams Insider Podcast. This week we are getting into all things multi-camera. Not just the technology, but what's the experience they're looking to achieve, and what's the science behind multiple cameras, different angles, AI direction. Really interesting conversation with Bo.

Thanks so much for him coming on the podcast. We're get into his Microsoft history and a little bit of what he's doing at Huddly now. Also, many thanks to AVI-SPL who are the sponsor of this podcast. Really appreciate their support of everything we're doing at Empowering.Cloud. And with that on with the show.

Hi everybody. Welcome back to the podcast. Excited to finally get Bo on the podcast. I can't believe this is your first one, Bo. We've had so many good conversations at the show. He's finally gonna get you, uh, on the record. So, uh, thanks so much for joining. 

Bo Pintea: Yeah, great to be here. Tom. I know we had many great conversations on the sidelines of different events and, uh, now we are in the comfort of our respective home offices with all the gear that any anybody out there would want to.

So let's see how this works. 

Tom Arbuthnot: Yeah. Yeah. Hopefully it should come together. So, um, for those that don't know you, they might know you from your Microsoft days. Maybe you can give us a bit of background of what you did at Microsoft and, and what you're doing these days as well. 

Bo Pintea: Yeah. You know, like once you cross.

Kind of the 25 year mark, you start, uh, reminiscing about, uh, the good old days. So that puts me back in the last century, literally when I joined Microsoft, uh, Exchange in 1998. So everything was happening on, on premise. We had, uh, I was responsible for some of the, uh, email protocols and then, uh, I ended up in what became the Windows RTC server.

It was Exchange RTC server, then the Windows RTC server, then the office communication server, then the live communication server. I might have gotten those two in the reverse order. Mm-hmm. And eventually Lync. And Lync was truly when, um, we decided that it was time for us to make a play for the Holy Grail at the time, which was, uh, IP telephony.

So it was the first time where we had the ability to break out. From, from the peer to peer or just the, uh, you know, windows to windows, communication devices, communications and break out to, to telephony. I remember because the, the first time. We made an assumption that maybe people would not need phones on their desktop, on their desk.

Uh, but we were quickly, uh, Ja. 

Tom Arbuthnot: Jamie's handle on, on Twitter, I think is still no more phones, which makes me laugh. 

Bo Pintea: Yeah, exactly. Uh, you cannot find one nowadays. Uh, you, you, you can, but most people I think just prefer to do it on their, um, on their desktops. And primarily I attribute that to the fact that video has become such an important component of communications, like being able to establish, have gaze, interaction, and, and be able to understand what are all the nonverbal cues.

Uh, those are super important. 68% of all the information that you and I transact right now are nonverbal in nature. 

Tom Arbuthnot: Awesome. So, so you went through the Microsoft journey that, uh, a lot of people on this pod did, and these days you are, uh, Huddly. So what's the, what's the role there? 

Bo Pintea: Yeah, so I joined Huddly two almost and a half years ago in May, was two years ago.

And my role is quite broad. Uh, so obviously I'm still in, in Seattle and still in Redmond. So working with Microsoft a lot, folks on the west coast, including Zoom. Um, on defining what is the future path for, uh, for our industry. Um, and, uh, just joined Huddly because I had known the founders for a long time, uh, since actually before I had come back for my second tour of duty at, at Microsoft, uh, when I came back to Skype.

And, uh, and I just have, uh, a strong alignment on, on the vision that Stein Ove and the company put together around multi-camera device, multi. Camera assistance, which is something that surely we'll we'll get to talk more about, uh, in ad ed. 

Tom Arbuthnot: Yeah, Huley was kind of, um, uh, ahead of the game in that conversation of like camera based intelligence.

I remember, I can't remember what it was, but I remember years ago, Frazier coming into one of the offices I worked at, he like, look, well, we've gotta look, look, we can do this, we can do this. And it's just developed from, from there. So yeah, I guess you are the perfect person to talk to about. Uh, a perspective on multi-camera and, and we've had a few pods where we've talked about this and there's kind of interesting conversation about the, you know, the kind of, uh, center of desk type devices.

There are a few in play now and the multi-camera outside in. Um, so I'd love you to give kind of a perspective on, on that conversation, 

Bo Pintea: right. Perf Perfect. So first of all, let, let me take a step back and say that. While most of the focus these days is on multi-camera setups, there are actually three different trends that are unfolding in parallel, which are kind of relevant to this conversation.

And the first one is ip, right? So we talked about IP telephony, uh, for a variety of reasons. Most of the cameras still connect to these days via USB. To the compute, right, to the codec, to the controller, whatever you wanna, you wanna call it. Right? So I think it's super important and if I'm put in looking into my, my crystal ball, 27, 28 onwards, I believe all these audio video notes, 'cause it's not just ultimately about video.

I think we are talking about multi-camera solutions today. Uh, next year we're gonna be talking about multi audio, video, perceptual nodes, whatever the term is gonna end up being, where you capture both the audio and the video as well as you play back the audio, uh, around the, uh, the, the, the meeting space, right?

You wanna kind of fill it in on the perimeter. So IP is super important. Uh, secondly, edge AI is super important because not all multi-camera systems are made the same. Um. There are two different schools of thought there as well. One of them, which seems to be prevalent but it's not ours, is that all the video streams should be sent to the MTR or some compute for pre-processing, and then it's the compute that decides how to composite the view.

Our belief is that. Data should be processed as close to where it's being captured, specifically on the camera or the audio video note to the edge. Because look, and, and this speaks to scalability, right? If I want to have 5, 6, 7, 10 cameras in a room, I cannot send ten six eight K streams. To any compute, it will just drown in a volume of data.

So first ip, second edge ai, and third is, is outside in versus inside out. Right? And, and obviously we have some, uh, very strong opinions about not necessarily what is the correct architecture, but what is the architecture that works best with the way. Uh, people, uh, conduct their business, right? So we, we take our cues from how humans behave in a meeting space.

We also take our cues from how experiences are produced in other sectors, adjacent sectors like, uh, broadcast TV broadcast. Like what do those TV broadcasters do to retain the attention of their audiences? 'cause that's ultimately what it's about, right? What is, yeah, that engagement conversation is 

Tom Arbuthnot: really interesting, isn't it?

Like, why can I. Why can't I watch a, uh, a debate on TV and feel super engaged? And yet it's the same for people on a, a teams meeting, but it feels different. Can we get some of that bark of engagement we get in, in the broadcast areas? Interesting. 

Bo Pintea: E exactly. And, and I think, you know, if you're looking all the discussion these days about agents, right?

So you have. Uh, different agents that are kind of replicating the role that humans play in the real world. So one of those agents obviously is the camera man. So the camera man. What do they do in a TV studio? I mean, we've all seen a bunch of shows, uh, that, that deal with that life. So we are familiar with it even if we don't have firsthand experience.

So the camera men kind of selects what are the best shots from their perspective, and then all of these Canada shots are being sent to, uh. Producer, right? So that producer then across all the shots, makes some determination as to how to, uh, engage the audience, right? And we have something similar, right?

We have a camera agent that presents those, uh, Canada shots. And then there is a distributed producer agent which decides which one shot to be, uh, broadcasted to be sent to the other participants. And the, the natural way in which our brain works is that. We don't look in all direction at all the times, right?

We're not spiders, so whatever. Animals can look in all directions all the time. We kinda. Create and maintain a mental map of the meeting space. So we, when we are physically present in a meeting room, we look around slowly, slowly, slowly, and then just one person at a time. So we want to create the same experience.

I think you and I spoke about what are some of the problems that cause people to have a less than ideal, optimal experience in the meeting room. Um, one of those is, and my favorite. Is the, the, um, meter anxiety, right? So the fact that you have this secondary screen that shows you back at yourself, and then your brain consciously or subconsciously keeps looking at it and, and that distracts you, right?

The second one, which is more interesting and relevant to what we are discussing, it's, it's called hyper gaze. So if I have a bunch of people looking at me at the, from, from the screen. Again, my brain will try to process and understand what are the nonverbal cues that these people, what are they trying to tell me continuously?

And we simply cannot cope with that. Our brain is also limited in, in its processing capabilities, so we cannot process more than two or three streams at the most. Um, so, and, and the third one is. Being trapped. Like right now, I kind of feeling trapped in this, this box, like in the field of view of the camera, in a meeting room, I can get up and walk around.

So for all of those issues we solve with our multi-camera system, right? With, uh, for the, the hyper gaze, uh, problem we solve by, instead of providing 20 streams in the same time, you have a carefully curated sequence of shots. That give the remote participant context, right? You are not just showing one person at a time.

If you and I will be talking to each other in a meeting, we'll try to, uh, to capture that dynamic and spatial relationship. Uh. Between the participants because special relationships, understanding that as a remote participant is critical to me kind of building the, the meeting graph, uh, and then reasoning on this meeting graph.

The typical questions that one unconsciously ask themselves are, where is this person relationship to the other person? Who's looking at whom, where, where is the focal point of the meeting? 

Tom Arbuthnot: It's interesting that the goal is not. They get a best straight on shot of everybody on the meeting and grid them out like, like that's, that felt like where we were at a point in time where it was like, we need all these angles to get the best face on, and then we present everybody and you can tell what's going on.

It's really interesting to think about that fatigue of. Your brain is hardwired to try and understand what's going on. If you present 10 people in your field of view, you are constantly processing what are all 10 trying to do. But also you've got that problem of you. You are starting on certain screens and not even have the capacity if you are remote.

They get a good view of any of those people. So that that idea of AI direction is really interesting to say, right? How do we A, keep it engaging and b, not create that fatigue. Right? 

Bo Pintea: Yeah. And, and you know, to your point, right, like how do we, how do we process our surrounding reality? Like we have, we have two eyes.

Like we actually evolve from some predator because, you know, the, the, the prey has the eyes on the side, but we have the side, uh, the eyes in the front. And the reason we have them in the front is so that we can establish parallax. And we, in order for us to be able to reach and grasp something or grab the prey or whatever we evolved for, uh, you need to be able to determine the actual relationship between, uh, you and, and the object we're trying to grasp.

So that's why we try to emulate with our multi-camera system to allow people to create a 3D representation, a 3D mental map of the meeting room. If you are looking at. Uh, let's, let's take a, a 360 camera, right? So it's sitting in the middle of the room, and then it looks as if I am turning my head around and I only have one eye, right?

So you have this cylinder view of the surrounding space, which is, you know, cylinder is a two dimensional space. It's it's surface. Now you throw in, uh, a video bar front of the room, but they're all kind of like on the same optical axis, right? So now you kind of have 2.5 D, like you can make some inferences.

About where people are in relation to each other, but you can really accurately judge distances or line of sight. So if you have a multi-camera system, you know, three cameras, four or five around the room, and we have pretty clear guidelines to our systems integrators, how to roll them out in such a way that every person is kind of seen by.

Two cameras. So you have this overlapping, almost like two eyes, right? That are distributed looking at every person. 

Tom Arbuthnot: Yeah. I was interested to ask actually, 'cause like about like the 360 cams, it's fairly obvious, right? Like we've got a front of room bar. Mm-hmm. We attach one of these three sixty's, and as you said, it's kind of solving a problem of.

We've got a better view face on of more people. So solving that problem. Right. But I haven't seen any great, and I talked to all the OEMs, I haven't seen any great guidance and maybe I've missed it on like why three cameras versus five versus six to what layout, to what seating plan. Um, so it's interesting to think about what the optimal use case is.

Bo Pintea: Yeah, exactly. And, and I have to admit, you know, we probably haven't done a great job educating. The, the market either, we're still, we build a foundation and we continuously refine the experience in order to provide our users with, with what's best for them. So the best way that I can describe it, and that's limit ourselves to five cameras, if you have one front of the room under, above the display, and we have adaptive perspective correction that allows you to place it as high or as low as you want.

Or just having a, you know, kind of a sidebar conversation with a, with a customer earlier today. You have this massive 21 9 display that take pretty much the entire wall. So then being able to place the camera higher than that, even like we see them being dropped from, like, you have a, a mount, a ceiling mount, and then you drop them for a ceiling on, but then compensate and, and kind of adjust so that it feels like it's lower so you don't have distortion to the edge of that.

Yeah, you don't want the 

Tom Arbuthnot: the, the old school CCTV, top down view. 

Bo Pintea: Right, exactly, exactly. Like, uh, you know, surveillance in a, in a prison. Yeah. So, so that's the one camera, right. So that gives you usually the overview shot, but then cameras, let's call them two and three, and four and five. You kind of have them in the, towards the, the middle of the, the wall, like three meters from the front wall and then six meters.

For, for those that I'm talking to you in meters. Yeah. Yeah. We do meters. 

Tom Arbuthnot: That's good. 

Bo Pintea: Multiply by three for four feet for, for the audience that's operating an imperial system. So cameras two and three will look backwards into the room, but cameras four and five, we kind of look forwards into the room. So that's how you get those overlapping field of view that those are super important, again, for the system to be able to reason in a three dimensional space, because you have this.

Slam, right? It's, it's a different branch of, uh, of, uh, computer science that allows you to do a, a reconstruction of the room. And you need these overlapping, uh, fields of view so you can. Say, well, I see a head, well, I see it from this angle with this camera. I see it from this other can, uh, uh, angle from this other camera.

So now I can determine where that, uh, where the head, what is the pose, right? Like I can do pose, estimation and spatial, uh, relations with other heads, uh, much better than if I have just one view of that particular person. 

Tom Arbuthnot: And has the ML and AI really advanced in recent years over. Being able to deal with multiple feeds and intelligently choose the shot.

You talked about the kind of producer agent type scenario, right? I feel like we've seen so much advancement in AI in the last 36 months that surely some of that is coming through to much smarter capabilities device side, 

Bo Pintea: right? Super important issue, right? So obviously we see a huge volume of advancements lately in, in the public arena, if you will, with LLMs.

But even before that, you have specific advancements in the way video streams are being processed. And we, we used to have. Well, we still have in some of our systems, uh, an Intel NPU Aprocessor unit or, or an a accelerator, whatever term you want to use. Um, we've recently, with our C one, we've shifted to a newer one from, from, uh, umbrella, which is significantly more powerful.

So we have more head room to run more models and specifically what these models are doing. Sounds, 

Tom Arbuthnot: sounds like you're giving me a new thing to compare on all the devices. Now I have to look at tops for the different devices. Oh, 

Bo Pintea: absolutely. I actually, I'm, I'm not sure what that's. Not being, uh, we, we talk about it a lot, right?

But I don't see that being part of the standard evaluation matrix for, for devices because you don't know the current pace of innovation, what capabilities LLMs will have in in two or three years, right? So we are going to be able, and we are designing such a way that you can run some of these models at the edge because there's just too much data to feed and we are too expensive to feed everything to the cloud.

So you have to kind of do it in a decentralized fashion. Actually, that's one of the things that I had worked on in Microsoft Azure Edge, edge ai. So I continue to have, uh, to have a passion for it. Uh, but specifically, um, we talked about what can you understand, what can these models understand and they can understand.

Uh, body, uh, pose, estimation, head orientation. Uh, but also you can look at who's looking, again, I gave this example earlier, who's looking at whom, right? Even if these people are seen by different cameras. So you can kind of reason across the entire, uh, visual space of, of the devices. But equally importantly, if you are thinking about the way audio is done, and I think.

This conversation will accelerate over the next few months. Again, um, audio for intents and purposes is being processed in a visual domain, right? Like, you know, being super technical about it. You have a, you have a spectrogram and then you analyze it as a picture. Uh, so unfolding, revolving, evolving picture.

So that's why all the advancements that we've seen in visual understanding. Start applying to the audio understanding. So you have this evolution from, um, kind of hardcoded classical AI pipelines to ML audio to ML based pipelines. But a lot of the components are, they were hardcoded by developers are now.

Learning from data, right? So they, that's what machine learning ultimately, you, you still have an algorithm that's deterministic, but it's being learned from data and, and you are kind of optimizing it towards a goal, right? And that goal is isolate this voice, right? I, I, I know exactly what my voice looks like in the visual domain so I can process in such a way that I can eliminate everything else, uh, around it.

And just present it as this is the artifact that has to be shared with the rest of the participants. 

Tom Arbuthnot: Yeah, it's interesting, the audio, we had a conversation before and it's like, I, I, I kind of was like making the point that the, the, the audio in you being in the room is the bar, and we're trying to replicate that.

But you made a really good point that actually with all the audio processing we do now, if you have a higher end mic set up or some of the multi mic directional beam forming mics, there's an argument to be made that, with that plus the processing, the remote audio experience might be. As in the echo's been processed out, the, the different speakers at different volumes, different parts of the room gets, gets optimized.

And actually we can do a really good job of optimizing it for the remote participant 

Bo Pintea: E. Exactly. You're absolutely right. It as if you know somebody is just whispering in your ear directly. Right. Even if they are 10 feet away. So that's why I believe. The, the mantra that we are working towards is better than being there.

Um, and I, I think, you know, this is a great segue into there is this, we talk about ai, but AI is now embodied, right? Like in the sense that it actually occupies the same physical space that, that we do, right? It, it has its sensors spread all over the room. So if before or until now at least. The, uh, the main audience, the main customer for using Unified communication system was.

People, right? Usually remote participants. That's where most of the value and people, uh, caring about inclusivity and equity. But now you see the AI agent emerging as a, as a persona, right? So it's equally important for the remote participant, the human to hear well as it is for the AI agent to hear well.

So it can transcribe and it can, uh, surface contextual insights during the meeting. So I, I think that that's a very important, uh, uh, phase shift in, in the way we design our systems, because obviously everybody's talking about AI agents, but in practical terms, what, what does it mean? It, it means that you have these participants, some of which.

Are representing the, uh, the humans, the participant, meaning others are representing the, the, the company, right? Uh, and they have a goal of capturing the knowledge that's being created in the meeting and then, uh, persisting it and disseminating it, and ensuring that the, the business processes, uh, are unfolding properly.

Yeah, we had a conversation. Yeah. 

Tom Arbuthnot: Yeah. We had a conversation with the person who looks after transcription, uh, for Microsoft and Teams, and it was the LA like the AI conversation has really brought the quality of transcription into focus, like really sharply because it's being used to feed the, the data conversation.

So A, how good those models are at getting from the audio to the transcription. But B, how much more important quality audio is. To capture every part of that conversation. Yes. 

Bo Pintea: Yeah. I mean, you know, we, we talked, I think yesterday as well about what is, uh, Satya calls them frontier firms. What, what is a frontier firm like?

How do you, A lot of companies are just kind of bolting AI to their processes today, but a true transformation to take advantage of these capabilities means that you have to put the AI at the core. Um, and, and ultimately. You know, as I say, garbage and garbage out, the AI will only be as good as the data that feeds it, right?

Like you don't wanna end up with a model at the core of your corporation that hallucinates. So architecting your business operating system, let's call it around, uh, uh, shared data. That's being leveraged by all your business applications. Like today, a typical company will have Salesforce for their sales organization.

They'll have some ERP software, they have some human resources software, and they're all kind of disjointed, and getting data from one to another involves writing a bunch of connectors, but you don't have a single source of truth. 

Tom Arbuthnot: No. 

Bo Pintea: So, and 

Tom Arbuthnot: all in all incomplete as well with, I think everybody's very confident that it is, it's really hard to get people to fill in a CRM or a Salesforce and ERP to, to the level you would like, right?

Um, but potentially we're bridging that gap now with AI being able to consume everything and kind of store and retain. 

Bo Pintea: Right, exactly. Exactly. So again, like my, my belief is that the AI model itself, like a lot of the companies are using open AI models. I'm not sure to the extent to which any modern enterprise that what, what, what is the, the asset that a company really has.

When I joined Microsoft, it used to be people, right? Because in people's heads is where the knowledge that gets created is residing. But nowadays, I will argue that, that, that, uh, that core of the company is the data that it creates, the knowledge that it creates, and it's now spread across. People heads as well, unfortunately.

Well, fortunately for us still, but also in the code, in the IP filings, in in Word documents, in PowerPoints, it's all over the place. So you wanna kind of standardize it and create this data spine that we spoke about, which then you can write business applications on top. And when you have this end to end uh, uh, workflow from.

The data that salespeople collect from the field to that informs the, the product and feature prioritization and design that informs the business strategy so that the CEO can just kind of look at a dashboard and, and see at a glance how is the business doing and what are the opportunities. Um, that's, that's I think, you know, how most of these frontier firms will spend their resources over the next two, three years.

It's, it's a transformation akin to what had happened in the nineties. When all the companies were deploying the SAPs of the world, uh, yeah, the ERP software. So everything will be flipped a little bit on its head. 

Tom Arbuthnot: It's interesting to think about. It's also interesting that actually some of the. Mid-market.

SMB customers might be the first ones because they're more adaptable, they're faster to move, they're, they're willing to change and compete. And I, I have a hypothesis that actually some of these frontier firms won't be the, the tier one brands we think about today. It'll be the competitors who are willing to go all in to compete and change that will take first take advantage of this new capability.

Bo Pintea: Oh, you are absolutely right. It's, uh. Clay Christensen calls it the plane of non-consumption. So the, the early adopters of any new technology, especially a disruptive technology, will be those, uh, companies or individuals that do nothing today. Right. So a lot of, uh, existing companies that are fairly advanced.

They already have a lot of systems in place. So the new system, every, any new destructive technology will usually not have the same capabilities as the technology that it aims to replace. So then you will see the companies that don't really have a lot of ERP or SAP, or they don't use Salesforce, they still use, I dunno, pen and paper or, or.

Those are the ones that will adopt it. Uh, first that is my, my strong belief. And actually, you know, a lot of business, uh, research shows that to be the case, IP telephony, again, being a prime example, all the companies that had the fancy Nortel PBXs and Avaya PBXs, they, they didn't deploy link. It was those that didn't have anything else that, uh, yeah, yeah.

Tom Arbuthnot: It was, it was still the, the, the analog or non IP system jumping straight over the first gen IP T into, into uc. Yeah. Super interesting to think about this. I, I, I feel like we're in such an exciting time, both kind of in the, the, the, the, the audio video collab conversation we had there about optimizing the experience, but then that next jump of mm-hmm how are we going to collect this data and, and help people build this kind of corpus of corporate knowledge and how will we use it?

How will the agents use it? It's still definitely up for kind of debate, I think at this point. 

Bo Pintea: Right, and I think especially relevant to the conversation that we're having about unified communications is that. If at the moment this is the bridge between humans, I think there will be this active chatter like you and I will have our own respective agents.

And as you and I are talking at human speed, they will negotiate and agree on things on our behalf and Exchange relevant information at machine speed, be behind our backs, um, and then by the time our conversation is over. We would already have our notes transcribed. We'd already have our next meeting set up.

Uh, we've, our, 

Tom Arbuthnot: our, our AI agent will just be telling us what to do. It'll be like, come on, you're not going fast enough. 

Bo Pintea: They, they, they might, I mean, you know, we already see that with teams, right? Where, uh, it tells you to emphasize certain words or speak slower or speak faster or acknowledge the other people in the room.

Uh, but they will do a lot more than that, which. Uh, brings up another interesting, uh, conversation topic, which is who owns this data? Right? Like, you know, you're, you're kind of your own master, right? But for those that work for a, for a company, will essentially their digital twins be owned by the company?

Or will they continue to retain ownership of their, uh, memories and intellectual party that they create. Uh, and I, I think, yeah, it strikes, 

Tom Arbuthnot: strikes me as similar to the conversation, but way more accelerated of like the, um, IP conversations. So some organizations when you join, you basically sign up to, like, anything you come up with during your employment term effectively is the, the company's ip.

And that kind of, you can kind of see the logic of that, particularly in software development where it's like, well, were you thinking about it? Off hours on hours like that. This is, this is different, isn't it? This is if I spend five years working at an organization and you've recorded everything I've said and everything I've done, is that your knowledge to carry on leveraging after I'm gone, or is it like, where's the line on that?

It's super interesting. 

Bo Pintea: Yes. I think a lot of lawyers will make a lot of money sorting out these issues. 

Tom Arbuthnot: Yeah. I mean, it feels kind of inevitable on some timeline that the, that. The, the, the not, I mean, you are being paid during that time. It's a work asset coming out. Um, but yeah, it's a weird thing to think about that we've never had that concept of.

The, the, the value of my, my thoughts and expertise are bringing to the conversation can live on beyond me, 

Bo Pintea: right? Yeah. I'm, I'm thinking, uh, Harry Potter, uh, the, the wizard had this thing called a pensive, I think it was called, where it was pulling his memories. Into some sort of a device. Right. So who, who owns that, that 10?

'cause obviously you think of a lot of other things during your workday that are not necessarily work related. 

Tom Arbuthnot: Yeah. And, 

Bo Pintea: and what happens after I leave? Like, um. You already have today a lot of, uh, controversy around, well, who owns my network? Right? Like, will the digital twin stay with the company and continue to impersonate me in, in dealing with those, uh, uh, business connections that I, that I had made?

Will there be a fork essentially between me and the digital me and I have to start again with a new baby to educate and, yeah. Or can I 

Tom Arbuthnot: take the digital me to the next org? Like, like does that come with, I remember OpenAI, um, Sam was hypothesizing about a kind of OpenAI login. Experience where your memories and brain stayed with that, and you go org to org a bit like the, the Bloomberg thing was always, it was your login, not the company's login.

Mm-hmm. So as you move financial org, you kind of took your network with you. Right. Um, that was a really strong point for them and their network effects because essentially the, the person owned the, the network and they took it business to business. 

Bo Pintea: Yes. So now do I want to open AI to own that? Me, it, it's almost like, you know, you were talked, uh, we were talking yesterday about the Black Mirror episodes, right?

So Will, will I have to pay Sam for Yeah, yeah, yeah. To maintain my, 

Tom Arbuthnot: uh, my teach tools, right? 

Bo Pintea: Um, so there are a lot of people obviously, uh. Techno anarchists, let's, let's call them that, that believe that all of this should be persisted locally and as much as possible, uh, you should retain control of that and expose just enough of it to the outside world.

Uh, the same way that you actually have to open your mouth to communicate as a human, that the AI would also have to have an equivalent of opening it, it its mouth and, and obviously you have street controls, uh, from, from the human side. Onto what it can say and what questions it can, uh, answer. Um, we're seeing that a lot.

There's, uh, a company started by Team Burners Lee, who famously founded, uh, the web Yeah. Called Interrupt, and you own your own data. I think they work with a lot of governments. So you have this concept of, uh, of a hardware pod where you own your own data. And then if, uh, an insurance company comes, then you can grant them access for the purpose of them evaluating your suitability as a potential customer just for a, a limited amount of time.

But they don't actually, cannot copy that data. They can just, 

Tom Arbuthnot: yeah. 

Bo Pintea: Uh, query it. Um, and then it has to be flushed. So I, I see an equivalent of that where, uh, I, I have my agent, it can be talked to even without me present. But, uh, the, the, the actual data cannot be Yeah, it is mine. Yes. It's, it's just, uh, so, and you can do that, right?

You know, in, in, in machine learning you have a concept of federated learning. Uh, and it was developed for banks to be able to pull their data to determine financial, trans, uh, fraud, right across different domains. So they are not sharing the actual data, they're just sharing. The, the gradients from those models running on the data.

So you can then aggregate this so you arrive at a conclusion. And similarly, you can apply that concept to a meeting, right? Where our agents can vote on an issue and it's sort of anonymous, like they arrive at a conclusion, but it cannot be traced back to who said what. It's, well, the, the group agrees that we should move in this direction versus that direction.

Um, and, and, and it's done without the agents disclosing how you truly feel. 

Tom Arbuthnot: Fascinating. 

Bo Pintea: Your arrive at consensus. 

Tom Arbuthnot: Yeah. So it's so interesting what we're on the verge of and where this, where this goes. Thanks so much for, uh, jumping on. You definitely have to come back though 'cause we always have a, always have an interesting conversation with you and uh, yeah, I know you're very plugged into the uc, but you're doing a lot of thought on the AI side as well, so it's interesting times.

Bo Pintea: Thank you for your time, Tom. It was a pleasure always. 

Tom Arbuthnot: Awesome. I'll see you again soon. Thanks a lot. 

Bo Pintea: Bye.