paperless-gpt –Yet another Paperless-ngx AI companion with LLM-based OCR focus
Hey everyone,
I've noticed discussions in other threads about paperless-ai (which is awesome), and some folks asked how it differs from my project, paperless-gpt. Since I’m a newer user here, I’ll keep things concise:
Context
- paperless-ai leans toward doc-based AI chat, letting you converse with your documents.
- paperless-gpt focuses on LLM-based OCR (for more accurate scanning of messy or low-quality docs) and a robust pipeline for auto-generating titles/tags.
Why Another Project?
- I didn't know paperless-ai in Sept. '24: True story :D
- LLM-based OCR: I wanted a solution that does advanced text extraction from scans, harnessing Large Language Models (OpenAI or Ollama).
- Tag & Title Workflows: My main passion is building flexible, automated naming and tagging pipelines for paperless-ngx.
- No Chat (Yet): If you do want doc-based chatting, paperless-ai might be a better fit. Or you can run both—use paperless-gpt for scanning/tags, then pass that cleaned text into paperless-ai for Q&A.
Key Features
- Multiple LLM Support (OpenAI or Ollama).
- Customizable Prompts for specialized docs.
- Auto Document Processing via a “paperless-gpt-auto” tag.
- Vision LLM-based OCR (experimental) that outperforms standard OCR in many tough scenarios.
Combining With paperless-ai?
- Totally possible. You could have paperless-gpt handle the scanning & metadata assignment, then feed those improved text results into paperless-ai for doc-based chat.
- Some folks asked about overlap: we do share the “metadata extraction” idea, but the focus differs.
If You’re Curious
- The project has a short README, Docker Compose snippet, and minimal environment vars.
- I’m grateful to a few early sponsors who donated (thank you so much!). That support motivates me to keep adding features (like multi-language OCR support).
Anyway, just wanted to clarify the difference, since people were asking. If you’re looking for OCR specifically—especially for messy scans—paperless-gpt might fit the bill. If doc-based conversation is your need, paperless-ai is out there. Or combine them both!
Happy to answer any questions or feedback you have. Thanks for reading!
Links (in case you want them):
- paperless-gpt code and docs:
github.com/icereed/paperless-gpt
- paperless-ngx:
github.com/paperless-ngx/paperless-ngx
Cheers!