Search overlay panel for performing site-wide searches
Salesforce (Heroku) Named a Leader. Learn More!

The Development Basics of Managed Inference and Agents

TAGS

  • Deeply Technical
  • Agents
  • AI
  • Heroku AI
  • Managed Inference and Agents

The Development Basics of Managed Inference and Agents

The Code[ish] Podcast is back! Join Heroku superfan Jon Dodson and Hillary Sanders from the Heroku AI Team for the latest entry in our “Deeply Technical” series. In this episode, the pair discuss Heroku Managed Inference and Agents—what it is, what it does, and why developers should be using it.

Hillary also shares tips for new developers entering the job market, and Jon pits 10 principal developers against one hundred fresh bootcamp graduates (hypothetically, of course).


Show Notes

Rachel West
Hello and welcome to Code[ish]. An exploration of the lives of modern developers. Join us as we dive into topics like languages and frameworks, data and event-driven architectures, artificial intelligence, and individual and team productivity. Tailored to developers and engineering leaders, this episode is part of our Deeply Technical series.

Jon Dodson
Hello everyone, my name is Jon Dodson and I work for Heroku on the Builds Team. I’ve been programming ever since my parents bought a VIC-20 and I wanted to make my own software and, if I’m being honest, it was mostly just weird games. I’m a huge Heroku superfan and I’m excited to talk to you about what’s awesome in Heroku and today to talk about what’s awesome at Heroku, I’m joined by Heroku’s own Hillary Sanders. Hello, Hillary.

Hillary Sanders
Hey there.

Jon Dodson
So, Hillary, we’re just going to jump right into it. So, I wonder if you could tell us a bit about yourself and what you do at Heroku.

Hillary Sanders
Yeah, I am an AI engineering researcher with a background in statistics and neural networks, and I work on the Heroku AI Team where we build products to help Heroku customers use AI and integrate AI into their applications. And we also use AI to sort of build custom internal tools and pipelines and improve existing Heroku products.

Jon Dodson
That sounds fun. So, your journey as a software engineer, what’s it been like up to this point? Any tips for developers just starting out in the game?

Hillary Sanders
Yeah, my journey has been super fun and lucky. I think it started out because I was in college and I fell in love with statistics, obviously.

Jon Dodson
A common path.

Hillary Sanders
Yeah, a common path. I was puttering around taking way too many classes and took a stats one and it maybe was not exactly like a religious calling, but something like it. I think stats is, like, the study of how to optimally evaluate evidence about the world, which I find to be very beautiful and important and really speaks to how my mind works and how I felt the world should make decisions to maximize good.

But I realize that statement might sound horrifying to some people. If it does, then they should talk to me. But that’s what got me interested in stats. What I realized is that if you combine that with the power of big data and compute, stats becomes not only beautiful but very powerful, i.e. machine learning. And in this day and age, and certainly in the next couple decades perhaps, scary powerful.

And so that’s how I got into machine learning, I just fell in love with stats and started doing research with professors and then went to the Bay Area.

I got into neural networks, honestly, because I was confused by descriptions of them because no one could explain them back in the day. I actually didn’t study that in college, it was all about hierarchical Bayesian networks and Markov chains. Amazing things.

But I ended up getting into neural networks because I was just burningly curious why no one could explain how they worked, which made me really want to figure out how they worked. And that helped me end up getting some jobs for quite a few years doing neural network optimization and eventually led me to Heroku.

So that was sort of my journey to become an AI software engineer. Insofar as tips, I would comment on the job market today, which feels in part inauthentic because I haven’t experienced a job tech market this bad before.

Jon Dodson
Agree, it’s rough out there.

Hillary Sanders
And I’m in a better position than most.

Jon Dodson
Yeah.

Hillary Sanders
Yeah. Because like I think if you’re a junior dev entry, it’s worse than the experience I’m going to have. But I’ve also made lots of mistakes, so might have some tips. My first ignorable tip would be to not write cover letters. I don’t know if this is correct, but I think they take a long time. Often people are just going to assume that you’re writing them with an LLM. You can apply to a lot of jobs in the time it takes for you to write one.

So, unless you have something really meaningful and specific to say and the question is pretty unique, I think just skip it. And it’s not just for the time tradeoff, I think it’s to avoid depression because I am friends with people who have interviewed for a year and have gotten really depressed and it’s like hundreds and hundreds of applications. And it feels so depressing to write all these cover letters and kind of pour your heart out even though you’re trying not to, and then essentially be ghosted by all these companies. So, you know what? Just skip it. I think it’s too depressing and not very useful. That’s my first tip that is not really informed by data.

Jon Dodson
Right. No cover letters. Check.

Hillary Sanders
Yeah. Another one is if bad companies reject you don’t pay too much heed because that is an incredibly noisy signal.

Jon Dodson
Absolutely.

Hillary Sanders
I and others I know have gotten job offers for like twice the salary in the same week that other companies paying half as much, and that sounded way less cool, rejected us. And that’s like very standard. And in fact, I would argue I’ve noticed a positive relationship between the job and company quality and the simplicity of the questions being asked during the interview process, for maybe reasons that I won’t get into. So, if you get rejected from places, don’t take it too personally. Maybe take it as very, very noisy data on what to focus on.

Hillary Sanders
My third tip is the most uncomfortable one.

Jon Dodson
I can’t wait [chuckles].

Hillary Sanders
Yeah, it’s just awful. A big part of interviews, you know, once you get in and get an interview, you have a much higher probability of getting a job. It’s pretty important, you know, the code you write and how you do, but how you present yourself and how you communicate is also very important and maybe often underestimated, because how you communicate and present yourself is really important to the job.

Hillary Sanders
So I recommend doing mock interviews and videotaping yourself and then gasp, watching them back over, which may cause you significant nausea and emotional discomfort. But I think like per unit time investment, it’s super, super effective at making you get better at interviews. And I really recommend it, even though it’s just the worst and terrible.

Jon Dodson
I agree with you there. I think doing mock interviews is one of the most important things you could do. Maybe practice with a friend. You’ve got a job. There’s oftentimes an employment office around that you can do that with or you could, just like you said, record yourself. There’s plenty of interview questions online that you can practice with. So yeah, I think that’s great, that’s great advice.

Jon Dodson
So, Hillary, you’re on the Heroku AI Team, which is still a pretty new team at Heroku, and I’m wondering if you can tell us a bit about why it was created and what problems the team is trying to solve.

Hillary Sanders
Yeah, I think we’re trying to solve lots of problems. Essentially, Heroku was incredibly cool 15 years ago, very, like, hot.

Hillary Sanders
I think it’s still very cool and we’re trying to do a great job of keeping up with the times and making sure we do a very good job of doing that. And so that leads us to focusing on two main areas when it comes to AI. Like one, making it really easy and seamless to incorporate AI into your Heroku applications and making sure those AI components can easily interact with your databases and other components in your Heroku space.

And then also just using AI to make our existing products better. So, if you have databases on Heroku or apps on Heroku or if you want to be vibe coding with Cursor or VS Code, we should be helping to make sure that AI is making that experience really, really good. Cursor should have an extension that makes it easy to have the LLM understand how to deploy your app as a Heroku app, that kind of thing. It’s kind of enabling all of that is the main goal and I think that leads to lots of really interesting, fun products and features.

Jon Dodson
So Hillary, this is a really important question. My son’s eight, he enjoys watching various brainrot videos on YouTube, you know, as one does, and one such piece of nonsense are these versus videos such as 10,000 Harry Potters versus a million Predators?

Like I said, real important. So, we’re going to apply this to AI. Which team do you think would make a better application faster in six months? Ten principal developers with no AI or 100 fresh bootcamp graduates with all current AI tools at their disposal?

Hillary Sanders
Woo… Okay. That I have complicated feelings here, because there’s a lot of trade-offs.

Jon Dodson
[Laughs] There are, absolutely.

Hillary Sanders
I will first raise a concern or discussion topic with the implicit hypothesis that a hundred engineers on a single application makes things better, unless you’re organizing it very well. I feel like that’s implied by the question.

Jon Dodson
Uh huh, that’s the first trap of the question. Absolutely. [Laughs]

Hillary Sanders
That is really hard.

Jon Dodson
Right. It is.

Hillary Sanders
If they all have to work together on one app and one code base…

Jon Dodson
Right.

Hillary Sanders
And have six months, in this specific situation, I would bet on the 10 principal developers. However, there are many permutations to this that I think would lead me to bet on the fresh boot camp grads, for sure.

Jon Dodson
Which permutation?

Hillary Sanders
Okay, so if they’re allowed to break up into groups of five or 10, grab eight smartest people you know that do different things and the 100 people and run into a room. If they can all go be siloed and work together, and then maybe you take like the best thing that they’ve built of all those groups after six months. That I could see beating out on the 10 principal developers for sure.

Jon Dodson
Agree. Agree on that. Yeah.

Hillary Sanders
Because like if you’re a fresh boot camp grad with like a good product vision and you’re using AI, like, sure, a lot of the time you might end up going in bad directions. Kind of because you just don’t have all of the lovely mistakes and learnings that the experienced developers have made along their career path.

Hillary Sanders
You might go in the wrong direction, but one or two of the teams will probably go in a really good direction. So, I would bet on them in that circumstance. Another permutation is if you shorten the time period, if you have one day to build a thing or a week or even a month, then maybe I bet on the bootcamp grabs because you can do so much, so fast with AI and that is just a lot harder without it.

So that’s like a thing. And additionally, if you have six months when you’re not integrating into existing complicated services, bureaucracy etc., you can build a lot and that means your project gets pretty complex and your code base gets fairly big. Especially if you’re using too much AI and existing LLM models struggle with various things. They’re like amazing, but they also struggle with common sense reasoning and they struggle with understanding a really complex code base and then making changes on just a little bit.

There’s a lot of really cool tools that people are working on, and that Heroku’s interested in, to make that easier. Like shoving like a whole repo into the context window of a bigger and bigger model or trying to have good ways to look up the relevant parts of your code to put into your context window for your LLM so that it can edit your code or adjust things.

But that still doesn’t work amazingly on pretty complex code bases. So, on a six-month time span, I think there are decreasing marginal returns to like AI, especially when if you made wrong turns in the beginning, that’s not going to be great. But if it’s a short time period and you have a clear vision like yeah, junior devs with AI, they can build a lot.

Jon Dodson
Awesome. So, a different track here. Ruby on Rails. Ruby on Rails framework are the language and platform of choice for Heroku, at least historically. So, we support much more than that, including .NET, which we just added, which I’m really excited about. But historically Ruby on Rails are the bread and butter of what we do. So, for you, like, what language would you consider to be the language of AI if there even is one?

Hillary Sanders
I mean, Python for sure is super popular.

Jon Dodson
Yeah.

Hillary Sanders
If you’re trying to like develop on neural networks, you’re typically almost always using Python. It’s great. It’s very popular amongst AI enthusiasts. It’s very easy to use, lots of support and lots of packages. So, yeah.

Jon Dodson
All right. So, moving to talking a little bit about MIA, which is our newly released product here. So, question for you, what is MIA? What does it do? And why should customers use it?

Hillary Sanders
MIA, yay. MIA is Heroku’s Managed Inference and Agents add-on and it does a lot of things and it will do even more things. But essentially it is an add-on that makes it really easy to integrate AI, so specifically like large foundational models, like an LLM, like Claude Sonnet 3.7, into your Heroku apps.

So, it lets you kind of just add the add-on, attach a model like Claude directly to your app without the hassle of external account setups or like sending your data out of Heroku or API key management, data security issues and inference calls will just work. That is very nice and avoids a lot of hassle, but there’s also a lot of kind of special sauce features that we have been adding to make the experience nicer and to take away a lot of boilerplate code that you often have to write in certain situations.

So, we have really cool features relating to like agentic automatic tool execution, including some tools that just come built into MIA that will just work off the bat automatically without you having to deploy your own tool servers. We have nice dashboards and dashboards for the tools you attach to your models. And we’re also providing a lot of really cool features relating to MCP.

MCP is an open-source protocol. It stands for Model Context Protocol, and it basically helps define how AI applications should interact with tools and databases and resources in like a nice standardized way. And we’re betting pretty heavy on MCP, and I’m very happy that we’re doing that. So, a lot of our features are built around making sure deploying your own MCP servers to Heroku is really easy and adding in yet more special sauce around that.

Jon Dodson
Great. I like MIA because it’s as easy to add AI to your application as adding a database or adding Redis, and it’s going to be first class. When I was originally looking at the API that the team designed, I love the simplicity of it and the extensibility of it. It’s really great work.

Hillary Sanders
Yay!

Jon Dodson
[Laughs] Yeah, yay indeed. So, what’s the biggest problem MIA solves for Heroku customers and where do you think developers should absolutely consider adding MIA to their applications?

Hillary Sanders
Yeah. Okay. So, the biggest problem MIA solves, MIA solves like a lot of tiny annoying problems.

Jon Dodson
Right.

Hillary Sanders
Similar to what Heroku does really well. Like, we solve a lot of tiny annoying problems for you, so it’s not a frustrating experience to deploy apps. So, if I think about the biggest problem MIA solves, it’s probably pretty boring. You don’t have to go make an external account with like OpenAI and give them your credit card and worry that your data is being transferred to them.

That’s pretty simple, but it is nice. And then you have a very easy-to-set-up LLM or image model or embedding model that can easily connect to your other Heroku apps or databases if you so choose. So maybe that’s the biggest problem solved. It’s perhaps not the most interesting one, but it’s relevant to everyone who uses it. So, it’s kind of high impact.

Jon Dodson
Mm hmm. Yeah, absolutely.

Hillary Sanders
What was the other question? It was where do you think developers should be adding it? If they have an application where they were going to call out to an external third party, large foundational model, I would strongly consider using MIA because you’re getting often the same thing, but it’s just going to work really nicely on Heroku because you get all of these features built in for free. Like dashboards and token consumption and all of those really nice features relating to MCP.

And if you want like an MCP server, oh, all of a sudden it’s pretty easy to add OAuth authentication and do server registration that works really well with MIA. And that kind of stuff is just going to be totally doable, but harder if you use an external third party LLM. So, I would say if you want to use large foundational models like these APIs in your apps, that is awesome and consider using MIA.

Jon Dodson
So, this is really, really important. In the Star Wars film Attack of the Clones, Padmé and Anakin travel to Naboo via the H-type Nubian yacht, which is a luxury vessel which is part of Naboo’s fleet and known for its sleek design and strong deflector shields. Obviously. According to a Tumblr blog post that did the math, the total travel time to Naboo was 10 days and 10 hours, so my question for you, Hillary, is how many movies or Disney Plus seasons do we need to cover this journey? Ten days, 10 hours, really important.

Hillary Sanders
Yeah, I think that’s a really excellent question.

Jon Dodson
Oh, good.

Hillary Sanders
And I have an answer, and that answer is it’s Andor season two.

Jon Dodson
Oh my gosh, I’m watching that right now. It’s so great.

Hillary Sanders
It’s a great show.

Jon Dodson
Yeah.

Hillary Sanders
Ten out of 10.

Jon Dodson
Yeah.

Hillary Sanders
I’m literally watching it for movie night tonight with friends. So, no spoilers.

Jon Dodson
Oh yeah, it’s fantastic.

Hillary Sanders
Yeah.

Jon Dodson
I agree.

Hillary Sanders
Yeah. If you’re listening now, like, what are you doing? Go watch it. Oops! After. Yeah, it’s really good.

Jon Dodson
So, what kinds of technologies did you use and the team use in the development of MIA?

Hillary Sanders
Ooh, that’s a fun question! So insofar as like MIA as it exists today, we are mostly using technologies that Heroku provide. So we’re dogfooding Heroku, which I think is just typically an excellent thing to do. So, we’re using Heroku Dynos and Heroku Private Spaces and Heroku Postgres and Heroku Redis. And despite what I said earlier about Python, a lot of our code is written in Go because we wanted routing to be fast.

Jon Dodson
Right.

Hillary Sanders
And we’re doing less neural network development. We want API routing to work really well. The actual models are hosted by Amazon and secure AWS accounts. We might add more models in the future without the might part actually. So that’s sort of like the rough tech stack we’re using today. I will say that a while back we were playing with the idea of hosting our own models, which I think it was definitely the right decision to move away from that.

But we got to play with a really fun set of technologies. So like Triton and TensorRT and vLLM and PyTorch and doing a bunch of cost-performance analysis on different EC2 instances like Inferentia and Trainium, classic GPU power EC2s. That was super fun. I think that was just fun. But those pieces of technology we’re not using in the current MIA.

We are though using a bit of Python like we’re publishing various, you know, open-source MCP repos so people can use our first-party tools. If that floats their boat, they can deploy their own MCP servers and have that work with MIA, and they can also kind of just clone some of our getting-started repos to do that kind of thing even more easily.

Jon Dodson
Can you walk me through what you think is the coolest feature of MIA?

Hillary Sanders
Yeah, I don’t know if this feature will be for general audience by the time this podcast comes out. If it’s not…

Jon Dodson
Right.

Hillary Sanders
It will be very soon. Probably the coolest feature for MIA, for me is what we’re calling First-Party Automatic Tool Execution. So, MIA offers like an agents endpoint, which allows you to tell your model like, hey, you can call X, Y, Z tools and normally with an inference provider, the model would select a tool and then call back to you and you, the client would have to be like, oh, the model wants to call this tool. Now I have to like handle that and call it to some server I’ve deployed or do something and then give it back a response, but if you use the agents endpoint you have the option of us just doing all of that control loop nonsense for you.

And what is extra cool and I think is maybe the coolest feature, is we are also offering tools that just come working built-in natively with MIA. So the idea is you can create an app on Heroku and attach it to MIA and maybe attach Claude Sonnet 3.7 and say like, hey Claude, you have all these tools available that I know like Heroku makes available and I want you to like write Python code and it’s going to run on a one-off dyno in my Heroku account. Or like, I want you to like, be able to search Google or I want you to be able to look at one of my read-only databases and tell me about it. Or I want you to be able to like parse this random PDF and talk about it. There’s various tools that we’re just thinking are very useful and we want to just offer natively for free.

And what I think is really cool is that that’s something that people can just use in the first three minutes of using MIA, and it just doesn’t take a lot of boilerplate and a lot of work to get that working because again, like that’s something you can build yourself, but it is really nice if you don’t have to build it yourself and it just works beautifully in three minutes of writing a couple lines of code.

Jon Dodson
That’s my favorite kind of development.

Hillary Sanders
It’s, it’s pretty nice.

Jon Dodson
It’s incredible. So, oftentimes when teams get together to mix up the stuff that comes out in terms of products, oftentimes we have technical disagreements. It happens. People disagree. I was wondering when it came to building MIA, were there any disagreements the team had or did the product just sort of naturally evolve without any?

Hillary Sanders
I guess we’ll have to define disagreements, right? Because I will say honestly the Engineering Team and the Heroku AI Team is the most pro-social set of engineers I’ve ever worked with. And I’ve been doing this like 12 years, so that should be a significant statement. So, if I define disagreements like high uncertainty, high impact, hot topic decisions that we waffle on as a team, we’ve totally had those because there are a lot of decisions that there’s not a clear answer and we really need to work through it together.

Like we’re still waffling over some things like this. If you want the hot gossip that may or may not be deleted from this, one really interesting problem that we’re working through is what is the best API schema for like future models that we release, or future end points that release. Like what’s the best API format? And there are just so many conflicting incentives here that are really important.

I don’t think there is a clear, perfect answer. There’s just a lot of different tradeoffs. You don’t want to overwhelm customers with choice. That’s exhausting. I don’t want to evaluate 250 types of jam. I just want like a delicious jam and I want to eat a snack, if that’s what I want. Same with Heroku. On the other hand, though, there’s tradeoffs between different API schemas.

Like you can take your underlying model providers and use their API schema, but then you have like a different one per model. You can convert everything to like a really popular schema, like OpenAI, but then you’ll kind of fall short on the edge cases and you can do a lot of work to fix that in many situations, but at the end of the day, you’ll still have little edge cases that might make the experience less elegant. When you want to use a feature that our model supports but OpenAI doesn’t, or vice versa,or want to be explicit that we’re actually ignoring a feature but you also don’t want to break things when people are running the code through OpenAI SDKs.

It’s like there’s so many benefits with doing that too. Like, it’s so nice to just use all of the example code online and all the packages online that really know OpenAI really well. And then to add to that party, you also have all of the custom stuff we’re building that needs to be like a superset on whatever API format we decide on that relates to the agentic capabilities that we’re offering with like the automatic tool execution and that kind of thing.

And so that’s a really fun maybe quote/unquote disagreement. But I think more just like super interesting problem that we will continue to think about solving really well as we release more and more things. And honestly, if you’re listening and have a strong opinion, email me because I’m curious to hear what you think because it’s a super tough problem and people have a lot of good opinions.

Jon Dodson
Well, inbox flooded, I’m sure.

Hillary Sanders
I hope.

Jon Dodson
That would be great. So finally, I wonder to wrap things up here. Thank you, Hillary, for talking to me today.

Hillary Sanders
Oh, thank you.

Jon Dodson
Yeah, you’re welcome. What’s coming next with MIA? What can we look forward to? I know it just came out. I know. And everyone’s looking to see what’s next, but I’m just curious.

Jon Dodson
What’s next for MIA?

Hillary Sanders
So much. I mean, mostly the things that we wanted to squeeze into our initial release that didn’t get squeezed in. The biggest category of stuff is a lot of really cool features relating to MCP, the Model Context Protocol I talked about. So, we’re offering different types of models, like an embedding model for RAG—Retrieval Augmented Generation image model and a couple chat models, but you can really supercharge chat models with tools.

And so we’re making that big bet on MCP and really seeing a lot of super cool features relating to that pretty soon. So, I’m excited about that.

Jon Dodson
Me too. And thank you again, Hillary, for talking to me today.

Hillary Sanders
Thank you very much.

Rachel West
Thanks for joining us for this episode of the Code[ish] Podcast. Code[ish] is produced by Heroku.

Rachel West
The easiest way to deploy, manage, and scale your applications in the cloud. If you’d like to learn more about Code[ish] or any of Heroku’s podcasts, please visit Heroku.com/podcasts.

About Code[ish]

A podcast brought to you by the developer advocate team at Heroku, exploring code, technology, tools, tips, and the life of the developer.

Subscribe to Code[ish]

This field is for validation purposes and should be left unchanged.

Hosted By:
Jon Dodson
Jon Dodson
with Guest:
Hillary Sanders
Hillary Sanders