What is PaperJet?
PaperJet is a platform for extracting structured data from documents, all while using your own infrastructure.Features
- Structured data extraction - define a schema and extract it from any supported document (docx, pdf, images)
- Fully open-source - The web and self-hosted versions have the same feature set
- Zero cloud dependencies - PaperJet doesn’t depend on any cloud services. Everything is self-contained in Docker
- Built for large documents: easily ingest hundreds of pages at once
- Use any LLM with your own keys (BYOK)
- supports major cloud providers like OpenAI and Gemini
- local providers: VLLM, LM Studio and Ollama
