Customer Support

Visual product identification with honesty guardrails

The agent uses camera input to identify customer devices and applies explicit SOP rules to ask for a better view instead of guessing when the image is unclear.

Why the human is still essential here

Humans are still needed to verify uncertain identifications, handle edge cases, and make final support decisions when visual evidence is incomplete or misleading.

How people use this

Camera-based device recognition

A support bot analyzes a live camera view of a modem, router, or appliance and asks the customer to reposition the camera when the model is not confident.

Google Lens

Screenshot and hardware inspection

An AI assistant reviews customer screenshots or product photos to identify the interface or device model and explicitly flags uncertain matches instead of inventing one.

GPT-4o

Image-based equipment classification

A service intake workflow uses image recognition to classify customer-submitted hardware photos and routes low-confidence identifications to human review.

Amazon Rekognition

Related Prompts (1)

Community stories (1)

Medium
9 min read

How I Built a Multimodal CX Agent with Just an SOP and Gemini Live API

I wanted to test a simple idea: what if you architected an AI support agent the same way? Give it a training manual instead of a workflow tree. Give it Google Search instead of a RAG pipeline. And use a single multimodal model so you don’t need separate systems for voice, text, and vision.

I built Cortado for the Gemini Live Agent Challenge to explore what that looks like in practice.


...

VS
Vasundra SrinivasanAI Architecture and Data Strategy
Mar 14, 2026