How to access DeepSeek and Qwen alongside OpenAI without managing separate API keys for everything

# ai# llm# devops# machinelearning

yukixing6-star

This is a problem I spent longer on than I should have. We run a mixed model stack. DeepSeek V3 for...

This is a problem I spent longer on than I should have.
We run a mixed model stack. DeepSeek V3 for the cost-sensitive tasks, Qwen 2.5 for multilingual, GPT-4o for the things that actually need it. On paper this sounds fine. In practice managing the API layer for all of these became a part-time job I didn’t sign up for.
Separate credentials for each provider. Separate rate limit handling. Separate billing accounts to reconcile at the end of the month. Separate integrations that break independently when providers push updates. And Chinese model providers in particular have a different update cadence than western ones — I’ve had DeepSeek change something without warning twice in six months.
The thing I kept running into was that solutions that worked well for western models didn’t work as well for Chinese ones. The latency going through certain routing layers was noticeably higher for DeepSeek and Qwen than for GPT and Claude. Pricing at the volume we’re running them wasn’t competitive either.
I spent a while on a DIY approach. Built a thin routing layer that sat between our application and the provider APIs. It gave us a single interface internally. It worked fine until it didn’t. When a provider API changed we had to fix the integration ourselves. When we wanted to add a new model we had to extend the abstraction. The maintenance surface kept growing.
What I ended up on was Yotta Labs AI Gateway. I want to be accurate about what this is because it’s a bit different from what I expected going in.
It’s not a pure API aggregator in the way that some other tools are. It’s more of an infrastructure layer that also handles model routing — it manages the GPU compute underneath, which is why the latency profile for Chinese models is different. When you’re routing to DeepSeek or Qwen through an infrastructure gateway rather than a proxy layer you’re getting lower latency because the request path is shorter.
The practical setup is: one API key, routes across Chinese and western models, fallback handling built in. Billing is compute-based rather than per-token markup, which at the volume we’re running DeepSeek works out cheaper.
The thing I had to adjust my mental model on: this isn’t a drop-in replacement for anything I was using before. It’s a different architecture. The unified key is a feature of the infrastructure layer, not the point of it. If you’re mostly running GPT and Claude and occasionally touching Chinese models there are simpler options. But if Chinese models are a real part of your stack and you’re running them at any meaningful volume, the routing and cost profile is meaningfully better than what I had before.
Setup took about a day. I’ve been running it in production for four months. The Friday afternoon API update situation hasn’t happened since.