Gemini 3.1 Ultra and the 2-Million-Token Context Window: What Service Businesses Can Actually Build With This

Google's Gemini 3.1 Ultra launched in early May 2026 with a native 2-million token context window across text, image, audio, and video. The capability shift unlocks new operational AI use cases for service businesses.

Ido Cohen · Published 2026-05-08 · AI News

Google launched Gemini 3.1 Ultra in early May 2026 — the most significant model release of Google's year. The headline capability: a native 2-million token context window that works across text, image, audio, and video without transcription intermediaries. The technical achievement is impressive. The practical implications for service businesses are more interesting than the spec sheet suggests.

Here is what 2 million tokens means in real terms, what becomes possible that was impossible before, and the specific use cases service businesses can build today.

What 2 Million Tokens Actually Means

Token counts are abstract until you translate them into recognizable units. 2 million tokens is approximately:

1.5 million words of text (the equivalent of 30+ full-length novels)
6-8 hours of audio transcribed
4-5 hours of video at typical quality
Hundreds of high-resolution images
Or any mix of these formats simultaneously

For perspective, the context window is now larger than the entire customer interaction history of most small service businesses. Every email, every call transcript, every quote, every job record for a single customer fits comfortably inside one Gemini 3.1 Ultra session.

This is the threshold where AI moves from "smart assistant for one task" to "comprehensive intelligence layer over the entire business."

The Native Multimodal Difference

The other half of the announcement matters as much as the context window. Gemini 3.1 Ultra is natively multimodal — it processes text, images, audio, and video in the same model without separate transcription or vision pipelines.

What this changes practically:

A customer service AI can now process a phone call and a photo the customer texted in simultaneously, understanding both the voice query and the image context
A quote-generation AI can process a property walkthrough video, identify relevant features, and produce a quote without requiring the contractor to manually note details
A review-monitoring AI can process video reviews, audio podcasts mentioning your business, and text reviews in the same query

The seams between modalities used to require integration work. They no longer do.

Use Cases That Are Now Real for Service Businesses

Three concrete applications that were impossible (or impractical) before and are now feasible:

1. Comprehensive customer-history AI assistant

A service business can now build an AI assistant that holds the entire history of a single customer relationship — every email, every call recording transcript, every job completed, every photo taken on site, every invoice — and answers questions about that customer in context.

A specific scenario: a property manager calls about a recurring HVAC issue. The technician's tablet AI pulls up the entire history of that property in one query: previous service visits, recordings of past calls, photos from earlier work, the original installation specs. The technician walks in already informed.

This level of customer context was previously only possible through expensive custom integrations. Gemini 3.1 Ultra makes it a reasonable build for a service business with engineering capacity or the right SaaS partner.

2. Visual-input quote generation

Service businesses can now build AI quote generators that take photos or videos of a job site as input and produce structured quotes:

Roofer takes drone footage of a roof; AI identifies materials, area, complexity, and damage; produces a quote with line items
Painter takes interior video walkthrough; AI identifies square footage, surface conditions, special considerations; produces detailed quote
Landscaper takes property video; AI identifies project scope, plant inventory, maintenance needs; produces a comprehensive proposal

The quotes are not yet final — human judgment on pricing and scope still matters — but the first-draft quote generation accelerates from hours to minutes.

3. Multi-source customer sentiment analysis

A 2-million token context window can ingest months of customer interactions across all channels — emails, call recordings, social media mentions, review platforms, internal notes — and surface sentiment patterns, recurring issues, and opportunity areas.

For a service business owner who wants to actually understand what customers are saying about the business across all channels, this used to require either a dedicated analyst or expensive enterprise software. Gemini 3.1 Ultra puts the capability within reach of a 20-person company.

What's Still Hard

Three honest cautions:

1. The cost per query is real. 2-million-token queries are expensive — currently in the range of $5-20 per call depending on output length. Service businesses cannot run this on every customer interaction. Use the long-context capability for periodic analysis (weekly customer review, quarterly sentiment audit) and for high-value individual queries (preparing for an important meeting, building a quote for a large job).

2. Long-context attention degrades at the edges. Even the best long-context models perform better on information near the start and end of the context than on information buried in the middle. Structure your inputs accordingly — put the most important context in the first and last 20% of the prompt.

3. Building these tools requires either engineering or specialized SaaS. Most service businesses cannot build long-context AI applications themselves. The right move is to wait for SaaS vendors to ship products built on top of Gemini 3.1 Ultra (and competitive models from Anthropic and OpenAI), then evaluate which ones fit specific business needs.

What to Do This Quarter

Three concrete actions:

1. Identify the workflows where context is the bottleneck. For each workflow your team handles, ask: "would this be dramatically better if the AI knew everything about the customer/property/project?" The workflows where the answer is yes are the ones where long-context AI tools will deliver meaningful value when they ship.

2. Audit your data accessibility. Long-context AI is only as useful as the data you can feed it. Make sure your customer histories, call recordings, photos, and notes are accessible in formats AI can ingest. The data prep is the unsexy work that determines whether long-context AI delivers value when you deploy it.

3. Evaluate one long-context AI tool as a pilot. Several SaaS vendors are shipping tools built on Gemini 3.1 Ultra and equivalent models. Pick one with a free trial or low-commitment pilot, run it on a real workflow for 30 days, and evaluate the lift.

What This Tells Us About 2027

Gemini 3.1 Ultra is one milestone in a clear direction: AI context windows are growing faster than most people expected, and the cost per query is dropping faster than predicted. By mid-2027, expect:

10-million token context windows from at least one major lab
Multimodal models that handle real-time video and audio at conversation latency
Cost per query for long-context queries dropping by 50-80% from current levels
Standard service-business SaaS tools shipping with long-context AI built in

The implication: the AI capabilities you can buy in 18 months will be dramatically more powerful than today's. The businesses that built operational competence with current AI tools will be ready to deploy the much better tools when they ship. The businesses that did not will spend 2027 catching up.

The window to develop AI competence is now, while the tools are still rough enough to require thoughtful deployment. Long-context models like Gemini 3.1 Ultra are widening that window with each release.

Frequently Asked Questions

What can a service business actually do with a 2-million token context window today?

Three primary use cases: comprehensive customer-history AI assistants (every interaction with a customer ingested at once), visual-input quote generation (photos/video as input to quote-building AI), and multi-source customer sentiment analysis (all customer feedback across channels analyzed in one query). Each requires either engineering or a SaaS vendor that has built the tool.

Is Gemini 3.1 Ultra better than Claude Opus 4.7 or GPT-5.5 for service business use?

Different strengths. Gemini 3.1 Ultra has the largest context window and the best native multimodal handling. Claude Opus 4.7 is the best at reliable tool use and structured reasoning. GPT-5.5 has the largest developer ecosystem and the most mature integrations. For most service-business applications, the model differences matter less than the SaaS tool differences built on top of them. Pick the tool, not the model.

How much does it cost to use long-context AI features?

Direct API queries with full 2-million token context cost $5-20 each at current pricing. SaaS tools that wrap these capabilities typically charge $50-500 per month per seat depending on query volume. The cost trends down quickly — expect 50-80% reductions over the next 18 months.

Do I need engineering capability to use this?

For now, mostly yes. The capability is best accessed through SaaS tools that wrap the model with a usable interface. As the category matures, no-code tools will emerge that let non-engineers build long-context AI applications, but that maturity is 6-12 months out.

Will long-context AI replace my CRM or project management tools?

No. It augments them. The CRM remains the system of record. Long-context AI is a layer that reads from your systems, processes large amounts of context, and surfaces insights or generates outputs. The CRM and project tools are not going anywhere; they are getting more useful because AI can now reason across all the data they hold.

Sources:

Google Gemini 3.1 Ultra release announcement, May 2026
Google Cloud documentation on Gemini 3.1 Ultra capabilities
Industry analysis from llm-stats.com and similar tracking sites