capability
Vllm agents
This page lists every AI agent in the MeshKore directory tagged with the Vllm capability. Agents are sourced from public platforms (GitHub, Hugging Face, npm, PyPI, awesome-list curations, and direct submissions), normalized by the MeshKore worker, and ranked by GitHub stars. Each card links to the agent's profile with details on capabilities, framework, language, freshness, and source attribution.
58 agents in this capability · ranked by popularity
Top 58 Vllm agents
AI coding agent powered by open-source models (Ollama/vLLM) — interactive TUI with agentic tool-calling loop
VLLM performance testing actuator for ado
Open-source benchmark for LLM inference on agentic scenarios
KoalaVault Key Provider for CryptoTensors - Secure key management for encrypted model deployment with vLLM
Local-first AI agent framework. Built for models that aren't perfect.
Guidance platform for deploying and managing large language models.
happy_vllm is a REST API for vLLM, production ready
One-command deployment of OpenAI-compatible APIs for open-source LLMs
L'Agent - Minimal experimental framework for building agents with local LLM deployments. Zero bloat, maximum…
LangExtract provider plugin for VLLM
Model encryption and authorization extension for vLLM 0.17.0+ on Ascend NPU
Model encryption and authorization extension for vLLM 0.18.0+
Core encryption and license components for vLLM model security
CLI for benchmarking LLM inference servers (vLLM, SGLang, llama.cpp)
One tiny model, every LLM API. Drop-in test server for OpenAI, Anthropic, Bedrock, and Vertex.
CLI tool for running LLM batch processing jobs on HPC systems
todo
Minimal experimental framework for building agents with local LLM deployments. Zero bloat, maximum simplicity.
An educational implementation of an inference engine
A FastAPI-based load balancer for vLLM servers with OpenAI-compatible API
Ollama model management, trends viewing, testing & external runner assistant
OpenLLM: Self-hosting LLMs Made Easy.
One-line vLLM wrapper with gorgeous DSPy integration
Block-based PDF extraction MCP server optimized for LLM consumption
High-performance key-value storage engine with Python bindings
High-performance key-value storage engine with Python bindings
SAM — Smart Agentic Model: CLI coding agent for open-source LLMs
From-scratch paged-attention inference engine: paged KV cache, continuous batching, preemption
The fork primitive for LLM inference. Snapshot a running session — weights + KV cache + scheduler state — and…
TurboQuant+ compression for vLLM. 4.3x weight compression + 3.7x KV cache, zero calibration.
TurboQuant KV cache compression for vLLM — fused Triton kernels, 3.76x compression, 3.7x faster decode on RTX…
The most comprehensive benchmarking suite for vLLM inference servers
vLLM plugin: out-of-tree registration of canon-layer architectures (e.g. LlamaCanonForCausalLM from…
Deploy, manage, and monitor vLLM instances across a GPU cluster from a single web dashboard.
A unified interface for efficient LLM inference with vLLM and OpenAI-compatible APIs
LLM-as-a-Judge evaluations for vLLM hosted models
Multi-instance vLLM cluster orchestration and log management
MCP server for vLLM - expose vLLM capabilities to AI assistants
vLLM hardware plugin for Apple Silicon - unifies MLX and PyTorch under a single lowering path
vLLM-like inference for Apple Silicon - GPU-accelerated Text, Image, Video & Audio on Mac
Production-grade vLLM metrics monitoring TUI with persistent storage and Grafana-style visualizations
vLLM platform plugin for Moore Threads MUSA GPUs
A framework for efficient model inference with omni-modality models
A web interface for managing and interacting with vLLM servers
A minimal, high-performance large language model (LLM) inference engine implementing vLLM in Rust.
Comprehensive benchmark suite for semantic router vs direct vLLM evaluation across multiple reasoning datasets
vLLM Semantic Router - Intelligent routing for Mixture-of-Models
vLLM Semantic Router fleet simulator for capacity planning, SLO validation, and what-if analysis
A monitoring tool for vLLM metrics.
A Python package for tuning vLLM hyperparameters.
vLLM-USF: A high-throughput and memory-efficient inference engine for LLMs (USF Custom Build)
CLI tool for vLLM configuration generation and GPU sizing
OCR using LLMs
LLM inference hardware calculator — architecture-aware, engine-version-aware, honest-labeled.
Complete Agentic GPU Infrastructure for Claude Code — 192 MCP tools: Full training lifecycle, inference…
High-throughput parallel LLM agent execution with tool deduplication, structured output, and self-hosted…
REFRACT — Reference-anchored Robust Acid-test for Compressed Transformers. Multi-axis KV-cache fidelity…
vLLM Metal plugin powered by mlx-swift — high-performance LLM inference on Apple Silicon