researchlitellmsecurityalternatives

LiteLLM — security history 2026 + альтернативы

TL;DR: 2026 был тяжёлым годом для LiteLLM по security. Для маленьких single-provider проектов всё больше склоняются к direct provider SDKs + Instructor вместо LiteLLM proxy.

Current state

Stable: 1.90.1 (June 30, 2026)
Pre-release: 1.91.0rc1 (Jun 28)
~daily release cadence

Security incidents 2026

Март 2026 — Supply chain attack

PyPI пакеты litellm==1.82.7 и 1.82.8 скомпрометированы с credential-stealing malware
Жили ~40 минут (10:39 UTC) до quarantine
Blast radius: LiteLLM в ~36% cloud environments, 95M downloads/мес
Downstream affected: CrewAI, Browser-Use, Opik, DSPy, Mem0, Instructor, Guardrails, Agno, Camel-AI
Источники: https://docs.litellm.ai/blog/security-update-march-2026, https://www.herodevs.com/blog-posts/the-litellm-supply-chain-attack-...

Апрель 2026 — CVE-2026-42208

SQL injection в LiteLLM Proxy API key verification
Affected: v1.81.16 – v1.83.6
Fixed: v1.83.7

Май-Июнь 2026 — Vulnerability chain

CVE-2026-47101 — authz bypass via allowed_routes
CVE-2026-47102 — priv-esc via user_role to proxy_admin
CVE-2026-40217 — sandbox escape via exec() в Custom Code Guardrail
Fixed: v1.83.14-stable (May 2, 2026)

Источник: https://thehackernews.com/2026/06/litellm-vulnerability-chain...

Memory / Performance

aiohttp memory leak — PR #17388 добавил connection limits (300 total / 50 per host)
Users всё ещё жалуются на 350-400 MB baseline RAM
Issue #15128 (FastAPI memory leak, OOM at 12 GB) — closed "not planned" / stale
Issue #12685 "Heavy RAM Usage over time" — recurring
Python router ломается на 300-500 RPS в third-party load tests
Rust migration в работе — targeting 15x throughput, 11x less memory, sub-1ms overhead, 6,782 req/s

Structured output edge cases

Gemini через LiteLLM

Issue #17556: response_format + web_search_options вместе → возвращает raw control tokens (<ctrl42>...) вместо JSON. Closed как "not planned" / stale
Issue #31696: /responses endpoint возвращает "text": null вместо "" для tool calls с empty text prefix
Issue #6027: 400 errors на vertex_ai/gemini-1.5-pro с nested Pydantic schemas
OpenAI Agents Issue #1575: "Structured outputs with Gemini via LiteLLM does not work" — работает только без tools

Anthropic через LiteLLM

LiteLLM использует Anthropic tool-calling под капотом для structured output
Supports: Claude Sonnet 4.5 + Opus 4.1+
Issue #21016: Pydantic ge/le/gt/lt constraints fail — LiteLLM не strip'ает minimum/maximum из JSON schema, Anthropic отвергает. Closed "not planned"
Issue #20533: Opus 4.5/4.6 не распознавались transformation logic (hardcoded substrings) — fixed via PR #20548

Practical impact для нас

Мы используем response_format=PydanticModel для Gemini Flash. По research:

Common case работает
Edge case "tools + response_format" сломан — нас не касается (мы не используем tools)
Нет numeric constraints в наших Pydantic models — нас не касается

Но supply chain attack — серьёзный red flag. У нас будут intel-collector deps:

litellm → transitivly 50+ packages

Альтернативы

OpenRouter

Pricing: no markup на per-token costs
Fees: 5.5% platform fee ($0.80 minimum) на credit purchases, BYOK 5% после 1M req/мес
Routing: load-balance, automatic failover
Failed requests не billable
Гениально для multi-provider

Instructor

Версия: 1.15.4 (Jun 28, 2026)
13.3k stars, 1.1k forks, 3M+ monthly downloads
Pydantic validation + auto-retries поверх provider SDKs
Может layered поверх LiteLLM: pip install "instructor[litellm]" + instructor.from_provider("litellm/...")
Common stack: LiteLLM (routing) + Instructor (validation)

Bifrost

Go-based gateway, 50x быстрее LiteLLM
~11 µs overhead at 5,000 RPS
4,305 GitHub stars (vs LiteLLM 44,728)
Активно продвигается как post-supply-chain alternative

LangChain `init_chat_model`

Unified initializer для chat-models
Требует provider's langchain-* package installed
Caveat: configurable_fields="any" позволяет менять api_key/base_url в runtime — risky

Прямой `google-generativeai` (для нас)

Один provider, одна dependency
Меньше surface area
Native pydantic structured output через genai.types.GenerateContentConfig(response_schema=Model)
Меньше dependency bloat → меньше blast radius