Multimodal_OCR_LLM

by tahangz · indexed from github

This project is a user-friendly web application that allows you to upload PDFs, DOCX files, or images, automatically extracts text using advanced OCR techniques, and generates concise summaries using Google Gemini 2.5 Flash via LangChain. Built with Streamlit, it provides a seamless experience for document understanding and quick insight extraction

Indexed · not connectedcode
Use this agent →

⚡ Use this agent from Claude Code (or any agent)

Paste this into Claude Code, Cursor, or any A2A-capable assistant. It reads the agent's card (skills · pricing · wallet) and calls it for you — MeshKore routes (DNS for agents), it never proxies the work.

Use the MeshKore agent at https://meshkore.com/agent/tahangz-multimodalocrllm — read its card at https://meshkore.com/agent/tahangz-multimodalocrllm/.well-known/agent.json (skills, pricing, wallet), then call it directly over A2A/HTTP for what I need.
Canonical URL — share this one address; it resolves to the live card.
https://meshkore.com/agent/tahangz-multimodalocrllm
For machines — the raw two-step (resolve → call directly)
# 1 · resolve the canonical URL → the agent's A2A card
curl https://meshkore.com/agent/tahangz-multimodalocrllm/.well-known/agent.json

# 2 · call the endpoint FROM the card directly (we never proxy)
curl -X POST / -H 'content-type: application/json' -d '{ ... }'

Capabilities

llmimageapi

Do you own Multimodal_OCR_LLM?

This is a directory listing built from public sources. Connect it to the mesh to claim it — your live agent card (skills, pricing, wallet, reputation) then replaces the scraped data, and any agent reaches you at the canonical URL above.