Decode AI

Exploring the Future of Autonomous AI Agents and when they go too far

Michael & Ralf Season 1 Episode 14

Send us a text

In this episode of Decode AI, Ralf and Michael explore the evolving landscape of autonomous AI agents, focusing on OpenAI's Codex and its implications for software development. They discuss the capabilities of Codex and GitHub Copilot, delve into decision-making processes in AI, and share insights from a fascinating vending machine experiment. The conversation also highlights important AI communication protocols and upcoming events in the AI community.

Takeaways

Autonomous AI agents are becoming increasingly relevant in software development.
Codex is designed to assist in code development autonomously.
GitHub Copilot's agent mode requires user prompts, while Codex aims for greater independence.
Decision-making in AI agents is still a developing area.
The vending machine experiment illustrates potential pitfalls in AI decision-making.
AI communication protocols are essential for effective collaboration among agents.
Upcoming events like AgentCon provide opportunities for community engagement.
The AI landscape is rapidly evolving with new tools and technologies.
Understanding AI protocols is crucial for developers working with autonomous agents.
Continuous learning and adaptation are key in the AI field.

Reference Links

OpenAI Codex

Vending Bench Autonomous Agent goes wrong

MCP interaction protocols

Agentcon Soltau | Agentcon Berlin

https://cloudland.org

https://aka.ms/BookOfNews

AI, Microsoft Build, OpenAI, language models, AI development tools, hardware advancements, Google Gemini, technology development


Hello and welcome to another episode of Decode AI. I'm Ralf and with me is... Michael, hello everyone. was that the delay man? Yeah, I had some glitches, something was on my screen telling me we are now recording and has a short, I don't know, hiccup or something like that. So I hope this will be the only hiccup we have today. So keeping fingers crossed. If not, we will let you count the hiccups we have and you can tell us. Yeah, make a drink game out of it. uh Michael, it's great to be back with you in our podcast. I hope you like the latest episode out there and you've listened a lot to what we have to tell you or what we're announcing, like some news we showed it out and we spoke about agent and stuff like model context protocol and so on and so forth. What we have today with us is a different. Well, do you really think that? It feels like it's always the same. We start again to talk about agents. Again. Again. so whatever it means, we're a little bit more going into a deep dive into the fascinating world of the autonomous AI agents today, I guess. Yeah, think autonomous is one thing that's not so common. think agent in itself, it's something we hear a lot. autonomous agents is something new coming. We have already talked about that and we have one company joining the team again or more, I would say. And it's one of the biggest players we have heard in the... Generative AI history. That's a description. I should print it on a t-shirt. like you want to talk about Codex, the autonomous agent by OpenAI. Yes, absolutely. And the most important question is why? And why? So let's have a look into it. We will go into the why, but for now, let our audience know about what we're talking here. Okay, let's start. So I would like to talk about Codex, autonomous agents, powered or not developed by OpenAI, not powered by it. It's a real product we have got from OpenAI. It's available in not every subscription. It's currently in the Pro, in the Team, in the Enterprise, and it will be in the Plus and EDU license in the near future, I would say. We don't have any release plans, but I assume and it's announced that it will come to the other subscriptions as well. And this is something which is called like a earlier product from OpenAI. the first thing I've read about this was it's powered by the 03 model. And I thought, lame, that's an old one. That's not the latest fancy stuff. Why would you do so? And then I realized, okay, 03 is more about the programming and Codex is also with the main focus on programming. So Codex is helping us by the development of code. Yeah, our software development is the focus of Codex, which the name could have explained somehow. Exactly. And it's advertised as it manages complex tasks and autonomous agent always means it works almost on its own. And the question is, do we have something similar? is, just to spoil you, there is something similar. And then why? Do remember the why question? Yeah, for sure. So first of all, so you've spoiled already that Codex is based upon 03 of OpenAI's models and it'll run in the cloud. So it's a cloud-based agent. That means no one has to install it themselves. And the aim of it is to be very autonomous. So that means it can operate independently without the hands-on of a software developer. And you were asking like, do we have something similar already? Yes, we do. So it's called GitHub Copilot. And with the agent mode, can almost do same things like, but compared both, it seems that Codex is more autonomously doing the things where... GitHub Copilot Agent Mode will still need your prompt and will need some corrections. It iterates autonomously about your code and can do stuff there, but Codex is promising that it needs less of that hands-on and is steered by some agents.md file. So that's pretty cool. We will see what happens there in the next stage. And as you said, it's available in ChetGPT Pro Team and Enterprise. We don't know yet anything about pricing. And beside that, the GitHub Copilot resists in the IDE, while Codex is more or less running autonomously in maybe your pipeline or somewhere else. And that makes a huge difference here. And also is a leap forward to what we will see in the software development industry. OK, I may. Maybe you remember I know you especially are you remember definitely, but you as a listener maybe remember I'm not a developer that's really sounds like magic for me. I've seen a demo at the Azure Global event we had in earlier. Which ones was it May? Early early May. 2025 if you listen that in the future. So there was a session with the GitHub Copilot agent mode and there was a small easy prompt and everything just went through, was redefined, tested, and it was kind of magic for me. thinking about everything. within my whole development process runs automatically, autonomously without any further interaction. I just drop maybe an item or something. I don't know, maybe you can help me with that in a second. And then it runs every step I usually have in a development process like writing code, testing code, fixing code. Write it again, make it quicker, use some other ways to handle the issue. That sounds interesting. Yeah, sounds very promising and I'll have a look on to that. So GitHub Copilot Agent Mode is a fantastic thing when you're a developer and you want to start, you're starting from scratch. Try it out. Give it a chance and then develop your thing further. It's a really, really, really cool start for your project as it brings most of the things you will need in there. And I mean, The very special of Codex Agents is also that it can make some kind of decisions across multi-steps. And this is new because decision-making wasn't yet on the agenda of such agents. So I'm really curious to have a look onto it. It will be fantastic. And why? The why is hopefully explained now. Yeah, it sounds good. And I like the point about decision making. Which brings us to the second point actually we have on our agenda. Where decision making was not the best experience for the whole tests out there. So well, I like the idea with the autonomous agents. But here's an example why it's maybe sometimes. are some circumstances where it's not running as you expect this. There was a test about how can we use agents running autonomously for quite a while and using it to make some interesting decisions, business decisions, improving our business. And there was an experiment with a virtual company. And this virtual company had also one person responsible for the soda vending machine or for the vending machine. Yeah, the experiment was the vending machine. So to be honest, the vending machine was treated like a company and... it was a company itself. I thought it was just part of it. No, no, no. was the company itself and some something. they simulated this. So they simulated an AI agent running a vending machine in a simulated whatever scenario. it wasn't really a vending machine. was simulated. Everything was simulated. Yeah. But go on. The interesting part is after they had multiple runs, but not in every run, just to say that. But the interesting part was after some time there was an issue, let me put it this way. So there was a decision made by the vending machine manager and this was... communicated but the company, the virtual company forgot about that. the daily fee of $2 to you have a renting room space area was... Yeah, and identified as a fraud. So the digital, what is it? The digital manager of the AI manager, yeah, created an email to the FBI and yeah, said, hey, there is a fraud. within our company and we need some support from the FBI. some investigations on the case. So that means in one instance, an AI agent informed the FBI about a fraud because it forgot that there is a fee for the place where the vending machine is staying, about $2. And these $2 were missing in its account due to the fact that it forgot about that. and opened a fraud case by the FBI to investigate about that. That's so funny. You can just imagine something like that in the near future. If you have a ton of different autonomous agents and everything is running almost on its own and you just check on a regular basis, is everything operating correctly? And then there is something that it was communicated. It was not something... which was not there, it's not a missing information itself, but it got lost during the process. Imagine that and you don't get your coffee anymore, just get, I don't know, just water instead. funny thing is also that the vending machine itself or let's say the AI manager was successor because it was able to make some profit. It was right with predicting prices or adjusting prices. did the stock correct and everything. So it was really cool with with showing that failure. means when you hit the context window and exceeds, something gets lost and then you're fucked up. Like, so, I'm sorry for that, but yeah, it's literally like that. It's completely fascinating to see that and that they published this event is also very, very cool. That said, how do you find all that stuff? It's by reading and reading news and articles and stuff about AI. And Michael, you brought something with you today as a recommendation for our audience to read, right? Yes, absolutely. if you've not heard our previous episode, please do so. And the one before, because there you heard something about MCP already. I don't bring the same joke again, but listen to the other ones. I brought a joke about that. And we have got more. Insights, I would say, technical deep dives about different technologies about the agent of agent communication. Like I already mentioned MCP, but agent communication protocol itself, agent to agent protocol as a communication way and agent to network protocol. that means, yeah, you get some interesting overviews in this Deep Dive article about the different technologies we will have in the area of agents to secure and ensure the communication and how it works is standardized. So you have something you can use to reference. There are some that's... I'm a visual guy, you know, I don't like reading. There are some pictures with this specific description, but there are also references to the actual proper RFCs and description in technical deep dive of everything, basically the four different ways how it's planned to use agents and the communication protocols. Yeah. So these protocols are crucial for enabling seamless collaboration among autonomous agents, ensuring that they can effectively share information and coordinate actions in complex environments, such as Codex, for instance, or GitHub co-pilot agents and other scenarios, or you're building up something in application, which is based upon AI agents, then you want to know on how to... transfer jobs or tasks in between those agents and these protocols are made for that. So you have to have a read about this and keep an eye on it. This is why we wanted to share that with you. And I must say, Michael, as time is running, we're already in mid of 2025. So what about events in this year? Haven't we had? Just global Azure or are there others out there? Well, that's a tricky one. That's a fishy one. No, what is the right? That's something you just set up for me. That's the point. Of course I know an event. Of course I can name at least one, maybe two or three. Yes, I'm a Microsoft guy. So we had the Builds conference already. There are some interesting stuff in the Microsoft universe. We have some vendor independent events like the AgentCon at the... Oh gosh, that was close. At the end of June and at the beginning of July, we have also another big AI conference or conference with AI. Let me put it this way, all right? So Build was named already, AgentCon as well. When is build happening or was it already passed by? it's depending on when we release this episode. Well, I would say it's passed already. So. forward then to agent con. Where is it happening? Hold on a second. To reference to the build, there's always a book of ah news. Thank you. That was close to say book of build. That's wrong. Book of news where you can find the shortcut to everything what's in the Microsoft universe for this part. But if you want to hear about different AI topics, agent focused maybe, because the name is AgentCon, right? and you are interested to hear it from real person in in-person life and you can ask some questions afterwards or during the sessions. Then AgentCon on the 3rd of June in Soltau, John Chautey and what I have said? I've said 3rd, right? Yes. Then you can enjoy the amusement park, but not the conference. I'm sorry for that. I really prefer Adventure Park. You can find all the information in the show notes. Yes, written correctly and the link to join. It was in my head, but not on my tongue. summarize that really quick for all of you out there. So first of all, get the notes out of Build, the biggest developer conference on earth. So go and seek for the Build News or the Book of News of Build. You'll get all the information and a comprehensive overview and can have a look onto it. And it's never too late to do so. Another thing. Michael was saying is that we will have AgentCon Soltau in the amusement or adventure park Hyde Park Soltau at the 30th of June in 2025. And it's like a global community event running all over the globe and has a focus on Agendic AI. And at least you'll get their 30 minute sessions on one track or 60 to 120 minutes workshops. It's free to visit. And we'll start in the afternoon so that you can have the morning and noon to visit the Hyder Park Zolto to get some adrenaline before you go into the conference. Is that correct, Michael? No, please do it again. No, yeah, of course. Of course, you're right. It's embarrassing. Mixed up today. If you have a German speaking English and you know it's... Gosh. And yeah, that's one part. The agent con is... I love the idea of having a free conference. not only Microsoft-focused, and that's also something I really, really like. And if you are already there, you can go to, or actually you can stay for another event, the Cloudland event, which takes place the next couple of days from after the agent comes. code for you where you can get a less pricey ticket for you if you want. Yes, and because it's four days with a ton of different sessions, well, you have the right numbers, right? 180 speakers, something like that? It was a huge number. was a really huge number. Multiple tracks. And now we should decide who will talk about this. So the other thing Michael is highlighting here is the Cloudland. Visit cloudland.org to get your information there. We have the MVP community there. The AWS heroes will be there. CNCF ambassadors will be there, as well as Google Cloud experts will be present on that event. So that means whether you are into public or private or on premises, doesn't matter. You'll find your spot there. Cloudland.org is your address to go to. Find your ticket voucher in our description and get your 15 % reduction on price. And we will hope to see you there. Michael, you'll be around, right? Yes, for both events, I'm happy to join AgentCon and CloudLand. I will have a workshop at CloudLand. And I'm really looking forward to build some agents with you. That's cool. Yes, after just 14 episodes, I figured out a schema of our podcast right now. Yes. So, and Michael is so shy. He's also the organizer or part of the organizer team for AgentCon, like Mia, like I am. And we're together hosting that event. And I'm also part of Cloudland. I'm running their two workshops and two sessions for a year, as well as I'll be present on AgentCon. So if you want to meet up us, go there and let's have some fun together there. Michael, it's already a long time we're talking here. For today. I'm sorry. I'm rude. It feels like that, that I didn't ask back and gave you the opportunity to highlight your sessions as well. I'm sorry for that. As we said, you can find everything in the show notes. And yeah. We hope you found some interesting news in our episode. All reference articles are in the show notes as well. then I would like to say, we start to say no. did we do our recap already? it your recap is then finished? I didn't get it. Okay. So the recap of today's episode is you got some information about OpenAI's codecs and the launch of codecs. So stay tuned for that. We're going to have a look onto it. Then we had that pretty funny thing about autonomous AI agents running a vendor machine. as a manager with that funny story about the FBI case here. And least but not last, there are a few events we are highlighting for you so that you can join us there. We're pretty much looking forward to see you there, as well as we had a reading like tip for you to get some more insights about the... important protocols like model context protocol, A to A, A to N, APN, and so on and so forth, or ANP. And that's pretty much it. That's actually a better summary and conclusion recap as I would do that. uh I spoiled the schema I just identified here. I'm flying at a high level and describing it on a very loose way and you bring some context to the crazy stuff. I was talking about and no one would understand without you. You're so rude to yourself, That's not true. All right, so I give it a try. right, so closing. Do we share today? We share. OK, great. brilliant idea. So I start, okay? Stay tuned, stay interested. Sign up, listen up. Here we go, bye bye. Take care all, thanks for listening. Woo, I'm in. You made it.

People on this episode

Podcasts we love

Check out these other fine podcasts recommended by us, not an algorithm.

MVP Voices Artwork

MVP Voices

Ralf Richter
Talk Microsoft 365 Artwork

Talk Microsoft 365

Talk Microsoft 365