Every Agency Claims to Do AI. Here's How to Tell Which Ones Actually Can.

With AI washing now drawing FTC enforcement action and a market flooded with agencies that slap 'AI-powered' onto their pitch decks, business owners face a genuine vetting problem. Here's a practical framework for separating legitimate AI development partners from the noise — before you sign anything.

AI StrategyEvery Agency Claims to Do AI. Here's How to Tell Which Ones Actually Can.

The Problem With 'AI-Powered' Everything

Spend an hour talking to software development agencies right now and you'll notice something: every single one of them does AI. The pitch decks have changed. The websites have changed. The case studies have been updated. 'Custom AI development,' 'AI-powered solutions,' 'machine learning integration' — these phrases appear so frequently they've become nearly meaningless.

Here's what's actually happening behind many of those claims: agencies are calling OpenAI's API, wrapping it in a UI, and presenting the result as a custom AI solution. In some cases, the 'AI' in the workflow is barely more than a prompt template. The FTC brought over twelve AI-washing enforcement cases in 2025 — against companies that made inflated claims about AI capabilities to investors and clients. That number will grow in 2026.

None of this means you should avoid working with an AI development agency. The opposite, in fact. Real AI implementation — custom models, retrieval-augmented generation, fine-tuning, agentic workflows, integration with your existing systems — can genuinely transform how your business operates. But getting there requires finding an agency that can actually build what they're describing, not just describe it.

The good news: AI washing is detectable. Agencies that understand this space answer certain questions differently than ones that don't. Here's what to ask.

Six Questions That Separate Real AI Partners From the Noise

1. 'Walk me through how you'd actually build this — not what it would do, but how it would work.' This is the most revealing question you can ask. Agencies with real AI expertise will describe specific architectural decisions: whether your use case calls for RAG versus fine-tuning, how they'd handle data ingestion, what happens when the model is wrong, how they'd build in human oversight. Agencies without it will loop back to benefits — faster, smarter, more efficient — without ever describing a technical approach.

2. 'Do you use proprietary models or third-party APIs, and what are the cost implications at scale?' Many 'AI solutions' are API wrappers around OpenAI or Anthropic. That's not inherently wrong — these are powerful tools — but you need to understand what you're buying. API-based solutions have ongoing inference costs that scale with usage. A solution that costs $500/month at 1,000 users can cost $15,000/month at 30,000 users. Any agency worth hiring will model this for you upfront. An agency that gets evasive about this question is either passing that risk to you unknowingly or doesn't understand it themselves.

3. 'Can you demo this on real data — specifically messy, incomplete, or edge-case data?' Demo environments are curated. The data is clean, the prompts are tuned, and the failures are hidden. Real-world AI has to work on real-world data — which is inconsistent, incomplete, and full of edge cases your demo won't surface. If an agency can't or won't demonstrate their solution on data that looks like yours, that's a signal worth taking seriously.

4. 'What does your team's AI development background actually look like — who will be hands-on?' Ask specifically about the engineers who will work on your project, not the company overall. Many agencies have one or two engineers with genuine ML or AI infrastructure experience, and a larger team of generalist developers who will handle the bulk of the build. That may be fine for your project — but you need to know. Ask for the relevant experience of the people actually touching your code, not just the agency's overall positioning.

5. 'How do you handle accuracy, hallucination, and failure cases in production?' This is the question that reveals whether an agency has shipped AI systems versus built AI prototypes. Production AI requires monitoring, fallback logic, confidence thresholds, human-in-the-loop escalation paths, and audit logging. If an agency's answer to this question is vague or treats it as something to 'figure out later,' they have probably built demos, not production systems. The answer should be specific, technical, and slightly boring — which is exactly what you want.

6. 'What happens to our data, and who controls the model after the engagement ends?' Your business data — customer interactions, operational records, proprietary processes — is the asset that makes a custom AI system valuable. Understand exactly where it goes, who processes it, and what agreements govern its use. Also ask who owns the model weights, the fine-tuning data, and the inference infrastructure after the project is complete. Some agencies build on their own platforms, which means you're renting access to something you paid to build. Get this in writing before anything else.

Key Takeaways

  • Technical evasion is the clearest red flag: real AI engineers describe how, not just what
  • API cost scaling at volume should be modeled before any contract is signed
  • Demo performance on curated data tells you almost nothing — request a test on your actual data
  • Ask specifically about the engineers assigned to your project, not the agency's general credentials
  • Production AI and prototype AI are fundamentally different; ask about failure handling to tell them apart
  • Data ownership and model rights should be defined contractually before work begins

The Red Flags That Signal AI Washing

Beyond the six questions above, certain patterns consistently appear in agencies that are overselling their AI capabilities.

No engineers in the sales conversation. If every meeting is with account executives or business development representatives who defer all technical questions to a follow-up, the engineering depth may not exist. Legitimate AI development shops are comfortable putting technical people in front of clients early — because those people can speak specifically about the work.

Case studies that describe outcomes without process. 'We reduced customer support tickets by 40% with AI' tells you nothing about what was actually built. Ask for a technical breakdown of any case study they present. If they can't describe the architecture, the training approach, the data pipeline, and the failure modes they managed, the case study is marketing, not evidence.

Pricing that doesn't reflect the actual complexity. Real AI development — especially custom model work, fine-tuning, or complex agentic systems — is expensive. If an agency is quoting AI development at rates that feel like standard web development, they're either not building real AI or they'll discover scope issues after you've signed.

Reliance on buzzwords without definitions. 'AI-powered insights,' 'intelligent automation,' 'machine learning optimization' — ask any agency using these phrases to define them specifically for your use case. What model? What data? What does the system do when it's uncertain? Agencies with genuine expertise are happy to answer. Agencies without it will find reasons to stay abstract.

Key Takeaways

  • No engineers in sales calls = potential depth problem
  • Outcome-only case studies without technical detail are not evidence of capability
  • AI development rates that match standard web development rates should prompt follow-up questions
  • Any buzzword should be definable in concrete terms for your specific use case

What a Genuine AI Development Partner Looks Like

The agencies doing real AI work share some recognizable characteristics — and they're different from what you might expect.

They slow the conversation down. Agencies with legitimate AI expertise tend to ask more questions than they answer in early conversations. They want to understand your data infrastructure, your existing systems, your definition of success, and your tolerance for uncertainty before they propose anything. That deliberateness is a feature, not a delay.

They're honest about what AI can and can't do for your specific situation. Genuine AI practitioners know that not every problem is an AI problem. If an agency immediately maps your challenge to an AI solution without asking whether simpler approaches might work, that's a signal they're pattern-matching to what they sell rather than what you need.

They can describe failure modes before you ask. The mark of production AI experience is an unsolicited discussion of edge cases, failure modes, and what happens when the model is wrong. Agencies that have shipped real AI systems have learned these lessons the hard way — and they'll bring them up before you do.

They distinguish between what they build and what they connect. Custom AI development — training or fine-tuning models on your data, building proprietary inference infrastructure, designing novel architectures — is different from AI integration, which connects existing AI tools to your workflows. Both have value; they're not the same thing. A genuine partner will be clear about which one your project actually requires.

The Bottom Line

The AI agency market is noisy right now, and the noise is getting louder. But the underlying opportunity is real: businesses that get AI implementation right are seeing meaningful gains in operational efficiency, customer experience, and product capability. The gap between agencies that can deliver that and agencies that can only describe it is detectable — if you know what to look for. At StepTo, our AI engagements start with a technical discovery process designed to establish exactly what's feasible, what's appropriate for your situation, and what the build would actually involve — before any commitment is made. If you're evaluating AI development partners and want a candid conversation about what's real and what isn't, we're happy to have it.

Building a team in Eastern Europe?

StepTo helps European and US companies build senior-led nearshore engineering teams in Serbia. Let's talk about what your next engagement could look like.

Start a conversation
I

Written by

Igor Gazivoda

Co-founder & CEO · StepTo

Igor has 15+ years in software engineering and business development. Former CTO at a Series A fintech startup, he specializes in scaling engineering teams, nearshore strategy, and AI-driven product development. He holds a Master's in Computer Science from the University of Belgrade and has published on distributed systems architecture.

LinkedIn →
Performance-led engineering

Senior engineers who move work forward, not just tickets.

Work with accountable, English-fluent professionals who communicate clearly, protect quality, and deliver with a steady operating rhythm. Cost efficiency matters, but performance is why clients stay with us.

Delivery signals · senior engineering team
Senior ownership
Lead-level
Delivery rhythm
Weekly
Timezone overlap
CET
1 teamaccountable for outcomes, communication, and execution