ExfParse
ExfParse
Release Date
5 Jan, 2025
Release Version
v1.0
ExfParse is an intelligent document processing platform that extracts and transforms unstructured data from PDFs, DOCX, PPTs, and scanned files. It leverages regex, table-based, and LLM-powered extraction to automate data parsing into structured formats. With built-in workflow orchestration and seamless integration with storage and processing pipelines, ExfParse enhances efficiency in document processing and decision-making.
Key Features
  • Multi-format Support – Extracts data from various file formats like PDFs, DOCX, and PPTs.
  • Data Storage Integration – Directly stores parsed data in target tables for further use.
  • Workflow Orchestration – Orchestrates backend processes through run, intermediate, and target tables.
  • Advanced Data Extraction – Leverages regex, table extraction, and LLM-based methods.
  • Review & Refinement – Allows users to review and verify uploaded files.
  • Customizable – Can be tailored to handle specific business extraction needs.
Specifications
Minimum Software Requirements
  • Operating System: Linux (Ubuntu 20.04 or later)
  • Python Version: 3.9+
  • Orchestration: Prefect:1.1.0 or equivalent tools
  • Containerization: Docker 20.10+ / Kubernetes 1.20+
  • Database: MongoDB & PostgreSQL 12+ or equivalent db/dw
  • Storage: AWS S3 or equivalent cloud storage for document uploads
Minimum Hardware Requirements
  • CPU: 2 Cores
  • RAM: 4 GB
  • Storage: 50 GB free disk space
Resources
  • Docker Compose File: Link
  • Kubernetes YAML File: Link

Low Code & High Impact

45 days to results!
100% your cloud!
30% less TCO!