PDF/Image Text Extractor | Jade IT Services

OCR to JSON API (PDF/Image Text Extractor) — Flask + Tesseract + PyMuPDF

A lightweight, fast, and powerful text extraction tool built using Flask, Tesseract OCR, and PyMuPDF.
This project allows users to extract text from scanned images and PDF documents and return it in structured JSON format through both a web interface and a REST API.

Key Features

Upload Support — Accepts PDF and image files (JPG, PNG, TIFF, BMP, GIF)
OCR + Digital Text Extraction — Handles both scanned and digital text
Multilingual Support — English and Hindi (easily extendable to other languages)
JSON Output — Extracted text returned as structured JSON via API
Modern Web UI — Built with Bootstrap for a clean, responsive interface
Cross-Platform — Works seamlessly on Windows, macOS, and Linux

Tech Stack

Backend: Flask (Python)
OCR Engine: Tesseract OCR
PDF Processing: PyMuPDF
Frontend: Bootstrap
API Format: REST (JSON Response)