Video: Ship Code Faster with Claude Code on Vertex AI | Duration: 3612s | Summary: Ship Code Faster with Claude Code on Vertex AI | Chapters: Welcome and Introduction (21.695s), New Chapter (25.375s), Webinar Overview (62.600002s), Cloud Code Overview (134.145s), Setup and Demo (354.04498s), Vertex AI Monitoring (656.675s), Vertex AI Comparison (976.43s), Enterprise Deployment Patterns (1101.06s), Cloud Environment Architecture (1268.5299s), Environment Control Demo (1376.4401s), Gateway Access Control (2005.42s), Usage Analytics & Impact (2456.29s), Troubleshooting & Best Practices (2725.555s), Wrap-Up & Resources (2911.125s)
Transcript for "Ship Code Faster with Claude Code on Vertex AI":
Everybody. Thank you for joining us here. I'm Roy Arsan. I'm a solution architect here at Anthropic, and I'm joined by Ivan from Google. We're super excited to have you listen to our webinar. We're to dive quickly into how to shift code faster with Claude Code on Vertex AI. With that, let's start with the quick intro Ivan. Yeah, hi everyone. Nice to see you all here. I'm Ivan, I'm a developer advocate at Google Cloud working together with Anthropic and Roy. So pretty much excited to be here today. So just a few housekeeping items before we begin. Number one is don't worry if you miss something, we're going to share a recording at the end of this in the next day. We also encourage you to submit questions in the Q&A chat. We have folks also listening in the chat and helping answer questions. And do give us feedback. We are spending more content in webinar form and looking for your feedback to help us address topics that are top of mind. And then so what we're cover today is, number one, a quick overview of Claude Code. I'm gonna walk you through how to set up Claude Code on Vertex AI. Then we're gonna cover, you know, feature availability, differences between different platforms, whether you're running Claude Code on Anthropic versus on Vertex AI. We wanna make sure that picture is clear. And then Ivan will walk you through how to to leverage GCP controls from a security and access and governance. So walk you through these best practices, and then we'll close together. So by the end of the session, the goal is for you to be comfortable with setting up Claude Code on Vertex AI and using it and managing it. So let's dive right in. Before we talk about Claude Code, let's just do a little bit of framing. Claude is a full platform. We provide the models at the bottom. It's you know, this is a frontier intelligence layer. You know, you're familiar with models like Sonnet and Haiku and Opus. On top of that, we have the agent capabilities. These are the building blocks like persistent memory, tools like web search, skills, agents, sub agents. You know? And then on top of that, this is the platform that you are the surface area that you are interacting with. Whether it's Claude Code or cloud apps, this is where you have the cloud desktop app or the cohort cloud cohort. And then you have the cloud developer platform or the cloud console if you're building against our API to build your own products. So we're gonna focus on, you know, this piece right here, Claude Code running on Vertex AI, but it's worth noting where it sits in the broader ecosystem. So what is Claude Code? Claude Code is our coding agent that lives in your terminal or Versus Code. You give it a task. It fixes it. It refactor the module, write test for the service. It goes on and does it. It reads your code base, makes edits, run commands, shows you everything that it's doing. So it maintains transparency and keeps you in the driver's seat if you if you wish to to be in control. So it's one of the fastest growing products that we've ever seen. Now how fast? In fact, Claude Code? you know, this is already old statistic now, but it reached 1,000,000,000 in December 2025, which feels like a lifetime with AI. Last month, which is February, it's crossed 2.5, so more than doubled since then. The interesting stat, I think, is that enterprise is over half of that revenue. So in the last six months, enterprise customers contributed 10 x growth. So developers like you are driving this adoption, and many of this adoption is happening on Vertex AI. So we thank the you know, our partnership with Google Cloud. Today, we'll show you how to run that Claude Code Agent on Vertex AI with native GCP using GCP off, I'm and cloud monitoring all built in. So one of the most compelling thing about Claude Code is that, you know, it augment different aspect of your software development life cycle. So anywhere from exploring an unfamiliar code base to planning architecture to building features to debugging CI failures. It uses your team's existing CLI tools so you can focus on the solutions. Things like, you know, CLIs, like Git, Docker, VQ, if you work a lot with BigQuery like like myself and Yvonne here. So this quick slide, I'm not gonna go through every box here, but, you know, as you can imagine, Anthropic uses Claude Code heavily. We dog food it. What's interesting is where we see the value shows up across the entire organization, not just from an engineering point of view. Product uses it for fast prototyping. Data science builds notebooks. Marketing automates their ad creative. The pattern is anywhere there's a repetitive technical or knowledge work, Claude Code can create leverage. So this webinar is focused on developers and cloud administrators like yourselves, but it's also worth noting that it's not just for software engineers. So let's dive right into the demo. Let me show you how to actually set this up. Alright. So I'm jumping into my GCP console, aka the cloud console. Number one is before any developer touches the terminal, you wanna make sure that you are, you know, set up on the Google Cloud side. So first, you wanna enable Vertex AI, on the project where you are running and, you know, Claude Code? Next, jump into the model garden. We're gonna spend a minute here because this is important. Now you wanna enable the different models that you are using. I've already enabled HYCU four dot five. You wanna make sure you enable different models, specifically HYCU and, you know, Sonnet four dot six or OPUS four dot six. HYCU is useful because that's what Claude Code uses for a lot of the housekeeping items. It is a fast, cost effective model, so we use it whenever we can to maximize your token efficiency. So, on this, you know, model card, the thing I wanna direct your attention to is the specific, you know, capabilities of this model and when it is deployed. So you can see that, you know, it does offer prompt caching, which is, you know, relevant for what for Claude Code? It does offer a global endpoint. So, again, check your model card to see the latest features and the latest version, names. Specific syntax matters as we're gonna see later as we set up things on the Claude Code side. So in this case, make sure you copy this, and we're gonna paste it shortly, on the Claude Code settings. The yeah. So the global already, you can see here, and this becomes relevant if you decide to use the global endpoint. And then last but not least, you really wanna check your quotas. Quota, just a quick note. In my case, my quota, you can you can access your quota from the I'm admin panel. And then, in this case, I'm zeroing in on the Vertex AI service. This is you have lots of quotas in terms of the most important ones are the query per second, the request per second, the input token per minute, and the output token per minute. Those are per model, per region. So you can see here, this is my quota for the global endpoint for Claude Sonnet 4.6. And in this case, my newly provisioned GCP project is 3,000,000 input token per minute, and Haiku is much higher, 10,000,000 token per, you know, per minute. You wanna budget you know, as your team grows, you wanna make sure that you have enough quota. You know, a good rule of thumb is a couple 100,000 per, you know, token per minute per developer. That number will vary depending on your usage, obviously, but do plan for it. The good thing is you can keep track of your peak usage, current usage, so and you can set up, know, usage alerts so you can be on top of it. But it's you know, if you think you're gonna hit that quota, you know, quickly, definitely before you roll it out to a team, file a quota request or talk to your account team. So this is what we have on the Google Cloud side. This is the setup, and it's important to note that all of these controls are available to you, out of the box. So let's jump into the terminal, on the developer side. So now I'm jumping into the my terminal. In this case, I'm working on a, you know, hypothetical app. It's an inventory management app. I'm building new features. This is my, you know, Git branch with new features. I already have cloud installed. You can install it from a one liner in our docs page. What's important to note is, you know, it leverages your GCP application default credentials or whatever GCP, offload you're using. So in this case, definitely, number one, make sure you are, logging in. So, you know, I already logged in, but, you know, I wanna show you the, you know, full command for everybody's awareness. And then, you know, this is gonna create your auth token and Claude Code will pick it up from there. The other thing I wanna mention is my environment variable that I set up. You wanna make sure that Claude Code is set up to use Vertex AI, and then you point it to your GCP project and then the endpoint. Those are the three important environment variables to set up. In my case, I'm using the global endpoint. You may want to set pin it to a regional endpoint. Like I've like I've shown you earlier, the models that I'm using, they support a global endpoint, but you may wish to pin it to a specific region. So there's a lot of environment variables available for you to configure the way you want. We do recommend you pin model versions to prevent, you know, Claude Code picking up a new version that you haven't approved in Vertex AI. That's a common that's a common issue. Pinning the model version, it's highly recommended. Remember that value I asked you to copy from the model card? So paste it right here. In this case, Anthropic default HiQ model is the value that we saw earlier. The one thing I wanna point out as well is I wanna make sure I use the 1,000,000 contacts window. So by default, so I I add the bracket one m for 1,000,000 token. So this is really, what you have to set up on your side. I do have also the Claude Code disable experimental betas. So, basically, I'm disabling all the beta headers. The reason being is Vertex AI will reject request that has, you know, headers that it doesn't recognize. So to be on the safe side and to minimize your error rate, you know, currently, it's good to disable that. Let's jump into cloud here. So I'm just gonna fire cloud session. I'm gonna open, just see the status. You know, I wanna, you know, validate that this is running against the Vertex AI endpoint, and sure enough, it is using Vertex AI and my project and the region that I set. So this is, you know, you're probably already familiar with this, but let me show you what's going on in the serve in this Claude Code? I have MCP server set up. I have, you know, skills, and I also wanna show you my context, what's going on in the context already so you're not losing features. I have custom agent. I have sub agents, skills, MCP tools. I'm using the 1,000,000 token. All of that is backed by you know, every request is backed by Vertex AI. So to generate some traffic, I'm just gonna request the model to test some new features that I added to the app. So it's gonna leverage Playwright in this case to actually open up the app and test the different features. So with that, I'm just gonna jump back to the GCP console. So one of the things that GCP Vertex AI ships with are out of the box dashboards. So in this case, there's a Vertex AI model garden dashboard. This is this is aggregating metrics and logs, so they're readily available for you to see, you know, what models are being used, what regions are being hit, what's your throughput. So it's a lot of important visibility that you have available without any instrumentation. So I'm just gonna refresh this just to get the latest, you know, over the last you know, there's not a whole lot of traffic because this is, you know, just one one developer in this project, but you can see there's a throughput. Let me actually reduce the timeline to the last fifteen minutes. Okay. So, you know, a lot of visibility and, you know, SRE level metrics that you can keep a tab on your latency, QPS, your error rate, again, to make sure that, you know, to validate if you have set up anything wrong on your side when it comes to the, you know, Claude Code and regional endpoint, for example, for the models, or if you're using a wrong model. The other thing I wanna point out is you can see the cache read percentage is pretty high. So, again, you can validate that prompt caching is working even though it's a global endpoint. So this should be good news for a lot of you that traditionally pin to regional just for caching. Now you could still pin to regional for other reasons, like regional residency, and we're going to talk about that later. And then last but not least, in terms of provision throughput, if that's what you have set up, you can also keep track of utilization of the provision throughput to make sure you are right sized from a capacity point of view. So this is I know I spent a bit of time on the Vertex AI ModelGuard and dashboard, but the bottom line is without any additional setup, you have production grade observability on Claude Code with the Vertex AI. The other thing I wanna mention is if you enable the audit logs for Vertex specifically the data audit logs for Vertex AI, you can actually see down to the request level. Now you're not gonna see the actual request, you know, the prompt and and response, but you will see just the all the, you know, Vertex AI model prediction request. So this helps you, you know, drill down to the specific users that is interacting with Vertex AI and, you know, the trend over time. So as you saw, folks, the setup is just a few environment variables. You tell Claude Code to use Vertex AI, point it to your project, and then you run Cloud. That's it. Your existing GCP I'm governs the access. And one thing is that we didn't go through is you can set up the I'm access control for the specific users to be able to access Vertex AI, so not anyone can hit that endpoint without, you know, without your you know, access management. So most of the Claude Code features are work out of the box. And now that you've seen this setup, let's talk about how, you know, how the picture looks like when it comes to running Claude Code on Anthropic versus Vertex AI. This is a common question. Here's the full comparison. The key takeaway is that you have CLI capability, VS Code extension. You lose out on the desktop, web, mobile, Slack experience. But the CLI capabilities is basically what the majority of the features in terms of client side sandboxing, plug ins, proxy support, which Ivan is going to talk about. Tools like web search are also available. One of the where Vertex differs also is the managed settings are distributed via mobile device management rather than your Anthropic enterprise account. Also, user cost control is handled through your gateway, which Ivan is gonna cover, but you also have the GCP project quotas that I talked about earlier. Vertex also gives you something that is pretty unique as, you know, the native request and response logging and sampling straight into BigQuery. So that's also available for you if you wanna audit anything along with the ability to run your own security perimeter in GCP, of course. Ivan is going to show you how you build on top of all of this in the next section. Ivan, over to you. Thank you, Roy. So now let's move from how do I set up Claude Code with Vertex AI on how do I run it in an enterprise context. So I've been working with several teams that adopt Claude Code on Vertex AI and there are some patterns that really make the difference. For the rest of this presentation we are going to cover them. In particular, when you roll out any AI coding tool at the enterprise scale, platform teams, they usually look at three main areas. So the first one is related to environment controls. So any development tool that has access to Shell needs some kind of execution boundaries. So Claude Code already handles these on the client side, as you know and as Roy was also mentioning, for example, they ask you approval before to run any Bash commands, it ships with a sandboxed Bash as well, but in an enterprise context you may want to have an additional control also on server side, especially when compliance or data governance is involved. The second consideration is related to access. So Claude Code works great with the individual credential as also Roy was highlighting before but when you go from five developers to 50 or hundreds of developers, you naturally want to start having a centralized layer. So a unique place that allows you to manage credentials, set developer policies as well as control budgets. So this is the second and most important consideration, everything related to access. The third and the last one is related to impact. So at some point, once your team is enabled with Claude Code, your leadership or your stakeholder, they may ask questions like what's the adoption trend? What are we spending? So you already have this data from the operational perspective, but you want to provide this data to your stakeholders in a way that they can run their own financial analysis and see the impact of Claude Code at enterprise level. So we are going to cover these three main aspects, these three main considerations and I will show you how you can kind of implement architectures on Google Cloud in order to cover them. But before to do that, let's see the environment that we are going to use. So compared to what Roy just shared, this is the environment that we use for the remaining demos and it's not a local setup. It's more, as you can see, it's an architecture, it's an environment that runs on Google Cloud. So how does it work? It's pretty simple. Developers essentially open a browser and land on a portal that requires some sign in. Only the authorized developers can go through this and then once you get access, you have a web portal that allows the developers to provision an environment, a development environment with the IDE installed, can be in our demo will be Visual Studio Code with all the Claude Code integration, so the extension and everything will run on a VM that has no public IP, so all the traffic related to this VM is controlled. The VM connects to two main things. So on one side we have APIs that allows you to get access to cloud models on Vertex On the other side, as you can see in this picture, we have some services that we are going to use with Claude Code? including MCP servers, SLN gateways, dashboards at the end, some of them they run on serverless functions like a Cloud Run. But everything in terms of authorization, everything is at the VM level. So it has a service account and that is used to invoke those services. So the next three demos we are going to use all these pieces in these representation in order to run our demo. So let's start with the first consideration that we were saying and I will show you the demo related to it. So the first consideration as I mentioned is the environment control. So we want to have some execution boundaries on the developer, on the DevTool, in this case on the Claude Code. So again Claude Code very good on client side. It asks approval. It comes with sandboxing on the OS level. So you can use SeatBell if you use the Mac OS or BubbleBrap if you use Linux, but on the enterprise context, you want an additional boundaries on the server side. So when you run the code in Claude Code? it runs in a completely separate environment that has its own dedicated compute. So this is very useful, for example, when you have shared development environments or when teams need some requirements. Specific So this is related to code. The same thing we can apply for web search. So Claude Code comes with some web search capabilities that developers can disable or enable, but at the enterprise level you want to have more governance on these choices and usually you set these controls at the organisation level, so the security team has a better view and has a better control with respect to these capabilities. So how do you implement these two patterns? So the server side code execution and the control on the search capability of Claude Code. So here is how you can do that on Google Cloud. So for the code execution, what you can do is that you can build an MCP server that run on a serverless function like Cloud Run using FastMCP, for example, which is one of the most common framework to build MCP tools. And each time Claude Code needs to run a code, it runs this code in an isolated environment. So what it does, it calls the MCP server through an authenticated proxy and the MCP server in this case will expose several tools that will allow us to create a sandbox, execute the code and delete the sandbox. So it's pretty straightforward. Under the hood, what the MCP server does, it calls the Vertex AI Agent Engine API and Vertex AI Agent Engine is a platform that provides a set of services to build agentic application and in this case you will use the code execution service. The code execution services provide a sandbox, which an ephemeral VM managed by Google. It's an isolated environment with no network access. So it supports several languages and it keeps the states across multiple sessions using the same sandbox and you can also set some time limit in order to force the code execution into a certain time. So when the code runs in the sandbox, again, it's running remotely, not on the client side and it's isolated. So if something crashes, crashes on the sandbox and not on your local environment. From the web search, it's easier. You don't need any external integration with MCP. There is a native integration of Claude models on Vertex AI with the search capabilities and the Claude Code will use the standard web search tool to call the API. But the cool part of the search on Vertex AI is the governance part. So these searching capabilities can be enabled only from the organizational administrators. So it's an org level control, developers cannot override it and if your project for example runs in a VPC, the web search will be blocked by design. So again, these are all ways to give more control on what can be executed. To give you an idea of the environment that I just showed and how does it work, let me jump into a demo. Okay, so you should see my screen pretty well. So this is the portal that I was describing before. As a developer, I go, I access to this web portal, I sign in and then I land on this page. So as you can see, this page gives me two options. I have the possibility to open my development environment and we will see later how I can track my usage, my token counts with a gateway dashboard that I will show you later. But for now, let's focus on we want to see how Claude Code can execute code remotely with the sandbox environment. So I will jump into my development environment. So this is how it looks like. So just to give an idea, all the code that we are going to show you today will be available at the end of this webinar. Today we're going to run three demos. The first one is the sandbox. Now, I'm explaining you what I've done, let me quickly just ask Claude to execute so we will see the results of running the code remotely. So for the sake of the demo, I just created a few scripts that allows me to run the entire demo. But essentially, what I want to show you is that in the first demo you will have a setup script and the setup script what it does is essentially it will allow you to connect to the remote MCP server through gateway that I will create and it will allow you to use the tool available on this remote MCP server to execute the code. So, this script essentially what it does is configure the new cloud setting to point to the remote MCP server to run the code. How I build the MCP server to run the code? As I said, I use FastAPI and here you can see the code that I created to create a sandbox environment and execute the code. So essentially, as I said in the presentation, we use the Vertex AI AgentEngine API. So just to give an idea, this is the operation that will allow me to create a sandbox and as I said, you can define some configuration to hard stop the code execution on sandbox, as well as I have a function here that will allow me to get the sandbox that I created, get the code that Claude Code will send as an input and eventually you can also pass some input files that are run from your local environment and it will execute the code in the isolated environment. So once you create this MCP server, all you can do is that you can package it with a simple Docker image and deploy on Cloud Run. So this is essentially the step that you need to cover in order to get an endpoint that will allow you to execute the code remotely. So now, as you can see, Claude is running the entire process of setting up the connection with the remote MCP server and they will run a test. A test is a very simple Python script that I will run-in the remote environment and then I will get back the results. It gets some time, but in the meantime that we are waiting, I just want show you how my MCP server look like on Cloud Run. So this is just the view that you will get so you can observe the logs and all the interaction that Claude Code is having with myMCP server. So here I have some past interaction and if you are fast enough because it will spin up the sandbox code environment on the fly, you can also jump in the Vertex AI agent engine. So this is Vertex AI and the agent engine, as I said, is the platform that provides several services around agentic application and you will see a sandbox popping up here that will allow you to run the code. This is a temporary environment, as I said, so it will be eliminated. Now, as you see, the proxy is set, the prompt is sent and it is running in background. It takes some time, but I can guarantee you that the code will get executed. But this is just give you an overview on remote code execution looks like. So to recap, you create an MCP server to connect to the Vertex AI agent engine API to connect to the code execution environment. Yeah, and this is you see, so we get the results. This is the funny, like a cool code that I ask Anthropic to print and so you connect with the remote APIs, you create the sandbox and then you just set up the proxy, you change the configuration file on cloud side and that's it. From the developer perspective, nothing changed. As you can see, it's just some view setting that Roy was also showing you. Okay, so this shows you how you can quickly set a remote development environment. So at this point, we kind of set one of the potential environment controls that you can have. Now let's talk about the access. So Claude Code works great with individual Google Cloud credential. So each developer authenticate with the ADC as Roy was mentioning and you are ready to run essentially. And this works very well with the small teams, right? But as you grow, you want to create, you probably need a centralized layer and the reason is that because you want to have a one place to manage credential instead at the machine level or you want to set some per developer or per team rate limits and budget. You want to have visibility on who is using the model and how much is using it and the ability to manage all these things in just one place. So that's when the gateway pattern comes in. Essentially, it's a medium between developers that want to use Cloth on Vertex AI and Vertex AI itself and it gives you this control again without affecting the developer experience and I will show you in the demo. So how you can set the gateway pattern on Google Cloud? It's pretty simple. One of the most common gateway projects that you can use is Light LLM, it's open source. Again, you can deploy it on Cloud Run and on the developer side, as I said, nothing changed. You just need to change an environment variable in your Claude Code setting, the Anthropic based URL that points to gateway and the Claude Code will work exactly as it was working before without the gateway. On the platform side or the other side, things change because now with the gateway you have that central control point, the gateway authenticate to Vertex AI with its own service account and one service account configured for one place to manage the access. Every developer as well with the gateway layer gets a virtual key, so the gator knows who made every request and not just requests happen but this developer used this model, he burned these many tokens at this cost and the important thing on the virtual key is that you can set some RAID limits. So again, you can have control on the API keys and set some budget caps if you want to have this kind of control. The other cool thing of having a gateway is also that in particular gateway like Light LLM provides two outputs. So first of all, you can log everything that is collected by the gateway in a cloud SQL, so a Postgres SQL that will be used to power, to feed the Light LLM dashboard that I'm gonna to show you in a few minutes but this dashboard gives you real time view of the request, token counts, cost of the endpoint and so on. And the second output which is also important is that thanks to Light LLM you will handle open telemetry tracing and you can push these trace directly in CloudTrace with a collector and again, this will allow you to collect a lot of information that you can then ingest in BigQuery and build custom dashboard. So with that being said, let me jump into the second demo that I want to show you and let me share again my screen. Okay, so we are back to our demo environment and in this case I want to run the second so the one related to the Light LLM Gateway. So, this case, again, first of all, let's ask Claude to run again the entire demo while I execute DN2N again while I will explain you what I did. So essentially everything is in the setup again. So in this setup there are three things that I want to highlight. First of all, the first thing that we are going to change is related to the proxy. So we create a proxy that will allow us again to connect our Claude Code from this development environment the remote gateway. Then, as I said, we create a virtual key and the virtual key, it allows the developer to be identified in the gateway. So again, gives that level of control on the token burn, the API calls that are made directly and then we need to also reconfigure Anthropic in a way that can use the gateway and again, the beauty of Anthropic is that you can just change a setting file as Roy also was sharing and again, as I was saying, in this case, we just need to change the entropic base URL variable to point now to the gateway and we need to disable the direct connection that our Claude Code was having with Vertex AI. So now all the API calls are made through the gateway, just changing an environment variable. So these are the two things that you need to change. Of course, this setup file is pretty simple and as you see Claude, we already set up and run 10 prompts and why I run these 10 prompts? Because the beauty of Light LLM is that now we have access to a dashboard that will allow me to see this consumption. So let me quickly find the dashboard. You should see the dashboard. So when you join the dashboard, you have several components in the Light LLM gateway. I'm not going to cover all of them. What I really want to show you is that it gives you very detailed capabilities in terms of observability. So here you have the usage. So in the last few days I made almost 100 requests, most of them they went through and this is the amount of token that I burn, input to output, the cost, you have the cost by provider, you can have multiple providers in this case we have just Vertex AI and then if I go on the logs, these are all my API calls associated to the key, a particular developers that I set and then I can click on each entry and see in details all the request information, the metrics and the response in this case is not available. Why? Because we redacted it. So I said in a way that I don't show the response, but here you have all the metadata associated to this request and I mean this is just a bit of what you can see in this dashboard, you can go ahead and see the model activities, this is the trend over time, this is the model that I called and so on, But this is a very powerful tool in terms of controls and operations. Okay, so with that being said, let me quickly go here. Okay, so at this point we covered environment control and access. We just need to cover the last step which is related to the impact. So, have the server side execution boundaries, we have the centralized access. Now, as the adoption grows of cloud coding on enterprise contexts, you may have leadership and stakeholders asking what does the usage look like? What's the cost by team? What are the models that give us the best value? And so on. So some of these answers you can find in the dashboard that I just was showing you but usually stakeholders want to have a better view of this data because maybe they want to merge those data with other financial data. Let's assume that you build a product with Claude Code, right? So Claude Code in that sense will represent a cost and they want to do some cost revenue analysis, so they need to merge the data of the model consumption with those data. So you need an analytical layer that is fully customizable on their side in order to make these analysis. The good news is on Google Cloud collecting this data and building this dashboard is pretty simple and you can do it in three steps. First of all, you can leverage the native integration of CloudTrace to BigQuery. Why Cloud Trace?? Because as I said, the gateway will push some of the information that I was showing you in the dashboard on Cloud Trace. So what you can do is that you can collect spans in a trace bucket and then you can create a dataset linked to this bucket and this dataset will give you a read only view which is related to all spans. So on top of this read only view, you can run how many SQL views you want. So you can have queries that will allow you, for example, to count the total token per model and the costs associated to it. You can have SQL queries that will allow us to calculate the cost by hour and model itself. So this is an important information for the trend in terms of trend for your financial teams. Once you collect all this data in BigQuery, you can use a visualization tool like Looker to build your custom charts. So just to give you a quick sense, because I think we are going along with the presentation, I'll just quickly share how these look like at the end in BigQuery and you will be able to run your code by yourself. So, as I said, after you leverage the integration from CloudTrace to BigQuery, so this is how BigQuery looks like for people that are new. BigQuery is analytical data warehouse that you can use. So you will get our old SPAN table and this table contains essentially all the information that more or less they were powering the Light LLM dashboard in the back the Light LLM dashboard that I was showing you before. So from this data, on top of this view, can run several queries and create additional views. As I said, someone to run latency analysis, usage per model and so on. So at the end of the day, get table with this kind of row and you can get information related to your models like in this case, so this is a very simple query that tells me like the number of requests, input output token consumed by models, of course you need to clean a little bit this data but it just gives you an idea of the information that you can leverage and then from this table you can just use Looker to, let me see if I can share this. So, from the BigQuery table, can create a visualization like this one. And I mean, in this case, they look pretty much similar to what I was showing you before in the LIDA LLM dashboard. But imagine you have other data related to again the revenue that the product that you're building with Claude Code is generating. You can merge this and build very powerful financial reports that gives you an idea of the impact that Claude Code has on your product life cycle and your business. Okay, so with that being said, let me go back on my presentation. The last thing that I want to share with you is mostly related to some troubleshooting. So once you start using Claude Code and Vertex AI, there are some common issue that usually developers face and you want to be aware of. So two main ones are related to some code errors. So the four zero four and the four twenty nine. So the four zero four is related to it's a very simple error to interpret this model not found and it usually means that either you forgot to enable the model or Model Garden, so don't forget to go on Model Garden, find the model that you want and click enable as Roy was mentioning and if you already enabled the model but you're still getting this, it's probably related to the region that you are choosing. So be sure that you choose the right region. Some models support global, some other DANSON support do not support global and be sure to set the Vertex AI region environment variable in the Claude Code configuration. The other error that is pretty common is the four twenty nine which is quota exceeded. Now with respect to this one, again, it's very important that you set the configuration correctly. As Roy was mentioning, don't forget to set all environment variables that are related to the model and, for example, use a small model, faster model like IQ for background work, like summarization rather than big models as Opus. Also whenever possible, again play with the different region, check which region are you using, if it's global or not. Usually the global endpoint can help because it provides a widest quota pool that you can use. But again, it depends on your configuration. And the last thing that I want to mention is the old conflicts that you may face. So Vertex AI, as we talked also in this webinar, handled with the Google Cloud credential, the main thing to watch is that you don't have both the Gcloud off or the Google application credential environment variable set at the same time. Just pick one depending on your environment. If you are a VM, on IBM as I was showing, probably service account would be good. If you are on a laptop, probably the authorization like the one that Roy shared, it works the best. And also there is this new feature of GCP refresh that allows you to automatically refreshing the ADC tokens, so you don't need to handle this part for you. So I hope with this I covered more or less some of the three main enterprise considerations and how to solve them on GCP and also some of the issues that you may face. Back to you Roy to some quick reference and to wrap up. Thank you, Yvonne. Folks, that was a lot of ground we covered together. So I'm gonna leave you with this quick reference guide. These four steps will help you have a smooth rollout. We try to capture as many steps as possible here, but if you think about it, it's about setting up your auth, setting up the quota, configure, and cover. Number one Number one is for the authentication, leverage Google ADC, default credentials. Set GCP auth refresh, which was just shipped a couple of weeks ago. Very excited about that because now every time your token expires, Claude Code actually will trigger your auth flow, so it will not interrupt your sessions regardless of what your offload is, so you can just point it to that. If you're using a sandbox like Ivan showed you earlier, definitely leverage the IAP with the VMs. The gateway pattern, if needed, if you want a centralized authentication, use a central API key or service account credentials, you can use gateway for that. Next step is the quota, setting up your quotas. Using global endpoint is helpful for maximum availability and highest quotas, actually. That's why it's listed there. You wanna use the right model for the task. I showed you Sonnet earlier as my default coding model, but if you are doing a lot of reasoning and planning, you may want to use Opus 4.6 for that. Again, you can use a gateway pattern as well there if you want to have user level cost control and visibility. The visibility that is provided out of the box that I've shown earlier is telemetry level across the models and different regions, but it's not at the user level. So if that's what you're looking for, the gateway pattern is helpful. And then configuration is key. Claude Code hierarchical settings model is very powerful. You have the managed settings that gives you as organization admins ability to enforce the tools, permissions, the sandbox policy, MCP allow list, what domains Claude Code can visit. A lot of those are you have fine grained controls to begin with, and you can, at the same time, developers have flexibility to customize those settings at the project level or user level, but they cannot override them. And everything that Ivan showed you from a governance control are additional control provided to you by the platform by GCP. And then last but not least, as OGL tracing as well as part of your governance, export those logs, audit your tool usage, audit you may want to audit your skills usage and things like that. So that's questions. I'm sure there's going to be a lot of questions, so definitely keep firing those away in the chat. Reach out to either one of us on socials if you have additional questions. We're both active on LinkedIn. And finally, last but not least, additional resources to throw at you. Everything Ivan showed you is the demo code is gonna be available for you in addition to the sample app that I showed you earlier. There's a lot of documentation on both sides on Claude Code? how to set up on Vertex AI, but also the managed settings that I talked about the permissions and the LLM gateway. And then that link at the end of Vertex AI setup lands you directly into the model garden so you can enable those models. Yeah, I think with that we have a wrap. Any other thoughts, Ivan? No, no. I mean, think it was a great session. I hope you enjoy it as I was enjoying it. The only thing is that, yeah, like with you can easily use Claude Coder on Vertex AI and you can have the control that you need at enterprise level. Again, all the demos that we showed are going to be available, just go and try yourself. Thank you all for joining us and I hope you enjoy