Skip to main content

How to Build and Deploy a Managed Agent With the Antigravity SDK and Gemini API

A single POST to the Interactions API spins up a Linux sandbox, runs Gemini 3.5 Flash in a tool-use loop, and bills you for every token it burns getting there.

AnIntent Editorial

10 min read
How to Build and Deploy a Managed Agent With the Antigravity SDK and Gemini API

Photo by Mohammad Rahmani on Unsplash

By the end of this walkthrough, you will have a custom Antigravity managed agent on the Gemini API that lives behind a stable identifier, mounts a Git repository at boot, and resumes its own filesystem across calls. This is the right tool when you need an autonomous code-executing agent in production without renting GPUs, running Firecracker yourself, or stitching together a model, a code interpreter, and a browser tool by hand.

The core mental shift: you are no longer sending a prompt and reading a completion. You are starting a process. Unlike a standard chat request that produces a single output, an Antigravity interaction is an agentic workflow. A single request triggers an autonomous loop of reasoning, tool execution, code running, and file management. That changes how you authenticate, how you bill, and how you debug.

What Managed Agents Actually Are on the Gemini API

Managed Agents launched at Google I/O 2026 on May 19 as one of five surfaces sharing the same harness, alongside the Antigravity 2.0 desktop app, the agy CLI, the Antigravity SDK, and the Gemini Enterprise Agent Platform. A single API call spins up an agent that reasons, uses tools and executes code in an isolated Linux environment, powered by the Antigravity agent harness, built on Gemini 3.5 Flash and available via the Interactions API and in Google AI Studio.

The distinction from a standard generateContent call is architectural, not cosmetic. According to Google's Gemini API documentation, each interaction provisions or reuses a Linux sandbox where the agent can run Bash, Python, and Node.js, install packages, read and write files that persist across calls, and hit Google Search or arbitrary URLs. The docs also flag a feature that matters for long sessions: automatic context compaction kicks in around 135k tokens so multi-turn runs do not silently drop earlier state.

State is the headline. As ofox.ai's breakdown of Antigravity 2.0 notes, the Managed Agents tier keeps state persistent across calls and does not reset between turns, which is what distinguishes it architecturally from stateless chat completions.

Get the SDK Installed and Authenticated

The Python SDK ships in the same google-genai package that already serves the Gemini models. Philipp Schmid's developer guide shows the install path is just pip install google-genai followed by exporting your GEMINI_API_KEY. The TypeScript surface mirrors it through @google/genai.

If you intend to call the REST endpoint directly, the contract is rigid. Google's documentation specifies that you POST to https://generativelanguage.googleapis.com/v1beta/interactions with the header Api-Revision: 2026-05-20 and the agent identifier antigravity-preview-05-2026. Drop either header and the request returns a 400 before the sandbox is ever provisioned.

A minimum viable Python call looks like this:

from google import genai

client = genai.Client()
interaction = client.interactions.create(
    agent="antigravity-preview-05-2026",
    input="Clone github.com/octocat/Hello-World, run the tests, summarize failures.",
    environment="remote",
)
print(interaction.output_text)

That single call boots a sandbox. Each interaction runs inside a Linux sandbox (Ubuntu, Python 3.12, Node.js 22) with 4 CPU cores and 16 GB RAM, isolated at the OS level so the agent can install packages, run code, and write files without affecting your machine.

Wire In Your Repo Before the First Token

The most common mistake on day one is sending the agent a prompt that says "clone this repo and..." when the platform already has a native primitive for that. The environment parameter accepts a structured object with a sources array, and each source is mounted into the sandbox before the agent reasons at all.

interaction = client.interactions.create(
    agent="antigravity-preview-05-2026",
    input="Audit /workspace/repo for unused dependencies and open a patch file.",
    environment={
        "type": "remote",
        "sources": [
            {"type": "repository",
             "source": "https://github.com/my-org/backend.git",
             "target": "/workspace/repo"},
            {"type": "gcs",
             "source": "gs://my-bucket/fixtures/",
             "target": "/workspace/data"},
            {"type": "inline",
             "target": "/workspace/config.yaml",
             "content": "mode: audit\noutput: patch"},
        ],
    },
)

Three mount types cover almost every real workflow: Git repositories, Google Cloud Storage buckets, and inline content passed straight in the request body. Schmid's guide documents the same shape, and using it directly cuts the agent's first-loop token spend because it never has to issue git clone as a tool call.

Make It a Real Agent With AGENTS.md and Skills

Until this point you have been calling the stock antigravity-preview-05-2026 agent with one-shot inputs. To get reproducible behavior across teams, promote your configuration to a named, server-side agent. According to Google's launch post for Managed Agents, instead of writing orchestration code, you define behavior in markdown files like AGENTS.md and SKILL.md and register them as a managed agent.

The filesystem layout the platform expects is conventional. An AGENTS.md at the workspace root gives the agent its standing instructions. A .agents/skills/ directory holds composable skill files. Both get mounted into the sandbox at boot. From the SDK, you create a named agent that references a base environment, then call interactions against the agent ID:

const agent = await client.agents.create({
  id: "code-reviewer",
  base_agent: "antigravity-preview-05-2026",
  system_instruction:
    "You are a senior code reviewer. Flag bugs, style issues, and security risk in every diff.",
  base_environment: {
    type: "remote",
    sources: [{
      type: "repository",
      source: "https://github.com/my-org/backend",
      target: "/workspace/repo",
    }],
  },
});

const result = await client.interactions.create({
  agent: "code-reviewer",
  input: "Review the diff at /workspace/repo/PR-482.",
  environment: "remote",
});

Management primitives are the unglamorous part nobody documents until production: client.agents.list(), client.agents.get(id=...), and client.agents.delete(id=...) are the operations you will live in once more than one agent exists. Treat agent IDs like database tables. Versions matter, and naming things v2_final_final will hurt.

The Cost Trap Nobody Mentions on the Keynote Stage

Here is the single most important thing to understand before you put this in front of a billing alert. Unlike standard Gemini models, the Antigravity agent runs through multiple autonomous loops per interaction and can accumulate a high number of tokens. One badly scoped prompt can rack up tens of thousands of tokens before you see a status update.

Three mitigations actually work. First, stream the interaction with server-sent events and watch the step.delta events; you can monitor your agent runs through SSE streaming and cancel the request if the agent appears to be stuck or is running longer than expected. Second, lean on caching: Google's docs report that 50 to 70 percent of input tokens are typically cached on repeated runs, so reusing the same environment ID across follow-ups is materially cheaper than provisioning fresh sandboxes. Third, send follow-up work with previous_interaction_id so the next request continues from the same context instead of re-priming the agent.

The model choice complicates this. Antigravity supports Claude Sonnet 4.5 and GPT-OSS in addition to Gemini, but as ofox.ai notes, the platform is optimized for Gemini and using non-Gemini models incurs a real latency and cost penalty. If you want model portability, you will pay for it in wall-clock time. Independent benchmarking cited by buildfastwithai.com clocks Gemini 3.5 Flash at 289 tokens per second inside the Antigravity harness, which is the speed you forfeit when you swap models.

Antigravity 2.0 CLI vs Managed Agents: Pick the Right Surface

The agy CLI and the Managed Agents API solve adjacent but distinct problems, and using the wrong one is the second most common mistake new teams make. According to buildfastwithai.com's developer guide, the Antigravity CLI is invoked as agy and replaces the Gemini CLI subagent model with a multi-agent architecture. It runs on your workstation. It is interactive. It is for humans writing code.

Managed Agents are headless. They run inside Google's sandboxes, return structured output to your application, and persist environments by ID. Use the CLI for development and one-off automation; use the SDK for anything that needs to be triggered by a webhook, a cron job, or a user action in your product.

The migration deadline is real. Buildfastwithai.com confirms that Antigravity 2.0 killed Gemini CLI with a shutdown date of June 18, 2026, giving existing users 28 days from the I/O launch to migrate. If your CI scripts shell out to gemini, that pipeline breaks on June 19. For broader context on how this fits the rest of Google's developer stack, see our coverage in AI Tools articles and Developer Tools articles.

The One Check That Catches 80% of Failures

The agent runs, the API call returns 200, the output text looks plausible, and your tests still fail. The cause, eight times out of ten, is that the agent never actually saw the files you thought it did. Download the environment snapshot and inspect it before you trust anything.

Schmid's guide shows the pattern: fetch https://generativelanguage.googleapis.com/v1beta/files/environment-{interaction.environment_id}:download with your API key, extract the tarball, and read what the sandbox actually contains. If /workspace/repo is empty or only has a .git directory, your source mount silently failed, usually because the repository is private and the platform had no credentials. The fix is to either make the source public for the agent's lifetime, mount it via inline content, or stage it in a GCS bucket the API key can read.

A second class of silent failure: the agent returns success because it interpreted "summarize the failures" as "the task is done if there are no failures to summarize." Force structured reporting in your instructions. Ask the agent to enumerate the files it touched and the assumptions it made. The fix lives in the prompt, not the SDK.

Deploying Beyond Google's Sandbox

The SDK has a feature the desktop app does not advertise. ofox.ai's analysis confirms the SDK lets developers define custom agent behaviors and host them on their own infrastructure, not just on Google's sandboxes, and Google's I/O 2026 post frames the SDK as providing programmatic access to the same agent harness powering Google's products, with the option to host on infrastructure of your choice.

In practice this means you can keep using the Gemini API for inference while running the execution environment inside your own VPC, which is the answer to most enterprise data-residency questions. The trade-off is operational: you inherit sandbox lifecycle, image hardening, and egress filtering, all of which Google's hosted environment does for you. For teams already running multi-tenant Kubernetes, this is cheaper at scale. For everyone else, the hosted sandbox is the default for a reason.

One historical parallel is worth naming. Buildfastwithai.com describes Antigravity 1.0 as a single desktop IDE launched in November 2025 and 2.0 as "a fundamentally different product" centered on the agent rather than the editor. Six months from shipping an IDE to deprecating the free CLI it competed with is fast. If you are building production systems on the SDK, version the agent identifier in your code and budget for at least one breaking change before this leaves preview. Google's own blog.google announcement lists the feature under "preview" in the rollout language.

Next step: stand up a second agent that calls the first one as a tool, and you have the foundation for the parallel agent system Google's Varun Mohan used to build a working OS core live on stage at I/O for under $1,000. That demo and the failure modes above sit closer together than the keynote suggested.

Frequently Asked Questions

What is the difference between the Antigravity SDK and the agy CLI?

The agy CLI is an interactive terminal tool that replaces the Gemini CLI subagent model with a multi-agent architecture and runs on your local machine. The Antigravity SDK is a programmatic client for the Managed Agents API that runs headless inside Google-hosted Linux sandboxes, intended for production applications triggered by webhooks, cron jobs, or user actions.

When does Gemini CLI shut down and what replaces it?

Gemini CLI has a confirmed shutdown date of June 18, 2026, giving users 28 days from the I/O 2026 launch on May 19 to migrate. Google directs existing Gemini CLI users to the new Antigravity CLI, invoked as agy, which extends the previous subagent model with parallel multi-agent workflows.

Can Antigravity managed agents use models other than Gemini 3.5 Flash?

Yes, Antigravity supports Claude Sonnet 4.5 and GPT-OSS in addition to Gemini models. The platform is optimized for Gemini, so non-Gemini models incur a real latency and cost penalty, and independent benchmarks place Gemini 3.5 Flash at 289 tokens per second inside the Antigravity harness.

How much does the Google AI Ultra plan cost for agent usage?

Google AI Ultra starts at $100 per month and provides 5x the usage limits of Google AI Pro. The top Ultra Premium tier launched at $200 per month, reduced from $250, with 20x limits, and the Managed Agents API itself follows a pay-as-you-go model based on the underlying Gemini tokens and tool calls.

What multimodal inputs does the Antigravity agent currently accept?

Only text and image inputs are supported at launch. Audio, video, and document inputs are explicitly not supported in the preview, and the agent also does not support structured outputs, background execution, or the file_search, computer_use, google_maps, function_calling, and MCP tools.

Written by

AnIntent Editorial

AnIntent is an independent technology and automotive publication. Our editorial team researches every article from live primary sources, cross-checks key facts across multiple references, and cites claims inline so readers can verify them directly. We cover smartphones, laptops, EVs, gaming hardware, AI tools, and more — with no sponsored content and no paid placements.

More from AnIntent

Keep reading

All articles