capability

Vllm agents

This page lists every AI agent in the MeshKore directory tagged with the Vllm capability. Agents are sourced from public platforms (GitHub, Hugging Face, npm, PyPI, awesome-list curations, and direct submissions), normalized by the MeshKore worker, and ranked by GitHub stars. Each card links to the agent's profile with details on capabilities, framework, language, freshness, and source attribution.

58 agents in this capability · ranked by popularity

Top 58 Vllm agents

open-agents-ai— ★

AI coding agent powered by open-source models (Ollama/vLLM) — interactive TUI with agentic tool-calling loop

ado-vllm-performance— ★

VLLM performance testing actuator for ado

agentic-swarm-bench— ★

Open-source benchmark for LLM inference on agentic scenarios

cryptotensors-koalavault-vllm— ★

KoalaVault Key Provider for CryptoTensors - Secure key management for encrypted model deployment with vLLM

freeagent-sdk— ★

Local-first AI agent framework. Built for models that aren't perfect.

guidellm— ★

Guidance platform for deploying and managing large language models.

happy-vllm— ★

happy_vllm is a REST API for vLLM, production ready

installm— ★

One-command deployment of OpenAI-compatible APIs for open-source LLMs

lagents— ★

L'Agent - Minimal experimental framework for building agents with local LLM deployments. Zero bloat, maximum…

langextract-vllm— ★

LangExtract provider plugin for VLLM

light-vllm-ascend-security— ★

Model encryption and authorization extension for vLLM 0.17.0+ on Ascend NPU

light-vllm-security— ★

Model encryption and authorization extension for vLLM 0.18.0+

lightr-vllm-core— ★

Core encryption and license components for vLLM model security

llm-grill— ★

CLI for benchmarking LLM inference servers (vLLM, SGLang, llama.cpp)

llm-katan— ★

One tiny model, every LLM API. Drop-in test server for OpenAI, Anthropic, Bedrock, and Vertex.

llmflux— ★

CLI tool for running LLM batch processing jobs on HPC systems

llmq— ★

todo

local-agents— ★

Minimal experimental framework for building agents with local LLM deployments. Zero bloat, maximum simplicity.

mini-vllm— ★

An educational implementation of an inference engine

mvllm— ★

A FastAPI-based load balancer for vLLM servers with OpenAI-compatible API

ollama-aid— ★

Ollama model management, trends viewing, testing & external runner assistant

openllm— ★

OpenLLM: Self-hosting LLMs Made Easy.

ovllm— ★

One-line vLLM wrapper with gorgeous DSPy integration

pdf4vllm-mcp— ★

Block-based PDF extraction MCP server optimized for LLM consumption

pegaflow-llm— ★

High-performance key-value storage engine with Python bindings

pegaflow-llm-cu13— ★

High-performance key-value storage engine with Python bindings

sam-agent— ★

SAM — Smart Agentic Model: CLI coding agent for open-source LLMs

smol-vllm— ★

From-scratch paged-attention inference engine: paged KV cache, continuous batching, preemption

thaw-vllm— ★

The fork primitive for LLM inference. Snapshot a running session — weights + KV cache + scheduler state — and…

turboquant-plus-vllm— ★

TurboQuant+ compression for vLLM. 4.3x weight compression + 3.7x KV cache, zero calibration.

turboquant-vllm— ★

TurboQuant KV cache compression for vLLM — fused Triton kernels, 3.76x compression, 3.7x faster decode on RTX…

vllm-benchmark-suite— ★

The most comprehensive benchmarking suite for vLLM inference servers

vllm-canon— ★

vLLM plugin: out-of-tree registration of canon-layer architectures (e.g. LlamaCanonForCausalLM from…

vllm-cluster-manager— ★

Deploy, manage, and monitor vLLM instances across a GPU cluster from a single web dashboard.

vllm-efficient-client— ★

A unified interface for efficient LLM inference with vLLM and OpenAI-compatible APIs

vllm-judge— ★

LLM-as-a-Judge evaluations for vLLM hosted models

vllm-manager— ★

Multi-instance vLLM cluster orchestration and log management

vllm-mcp-server— ★

MCP server for vLLM - expose vLLM capabilities to AI assistants

vllm-metal— ★

vLLM hardware plugin for Apple Silicon - unifies MLX and PyTorch under a single lowering path

vllm-mlx— ★

vLLM-like inference for Apple Silicon - GPU-accelerated Text, Image, Video & Audio on Mac

vllm-mon— ★

Production-grade vLLM metrics monitoring TUI with persistent storage and Grafana-style visualizations

vllm-musa— ★

vLLM platform plugin for Moore Threads MUSA GPUs

vllm-omni— ★

A framework for efficient model inference with omni-modality models

vllm-playground— ★

A web interface for managing and interacting with vLLM servers

vllm-rs— ★

A minimal, high-performance large language model (LLM) inference engine implementing vLLM in Rust.

vllm-semantic-router-bench— ★

Comprehensive benchmark suite for semantic router vs direct vLLM evaluation across multiple reasoning datasets

vllm-sr— ★

vLLM Semantic Router - Intelligent routing for Mixture-of-Models

vllm-sr-sim— ★

vLLM Semantic Router fleet simulator for capacity planning, SLO validation, and what-if analysis

vllm-top— ★

A monitoring tool for vLLM metrics.

vllm-tuner— ★

A Python package for tuning vLLM hyperparameters.

vllm-usf— ★

vLLM-USF: A high-throughput and memory-efficient inference engine for LLMs (USF Custom Build)

vllm-wizard— ★

CLI tool for vLLM configuration generation and GPU sizing

vllmocr— ★

OCR using LLMs

llm-cal— ★

LLM inference hardware calculator — architecture-aware, engine-version-aware, honest-labeled.

terradev-mcp— ★

Complete Agentic GPU Infrastructure for Claude Code — 192 MCP tools: Full training lifecycle, inference…

batch-agent— ★

High-throughput parallel LLM agent execution with tool deduplication, structured output, and self-hosted…

refract-llm— ★

REFRACT — Reference-anchored Robust Acid-test for Compressed Transformers. Multi-axis KV-cache fidelity…

vllm-swift— ★

vLLM Metal plugin powered by mlx-swift — high-performance LLM inference on Apple Silicon

Browse other capabilitys