jPDFText is a Java library to extract text from PDF documents. With jPDFText, PDF documents can be processed to extract the textual content for archiving, storage, searching or indexing.
Main Features
Load PDF documents from files, network drives, URLs or input streams
Extract text in the logical reading order
Extract words as a vector of Strings
Works on Windows, Linux, Unix and Mac OS X (100% Java)
No need to install or configure additional drivers or software when deploying
Tested on JDK 1.4.2 and above
If you require any additional information, dont hesitate to contact us at info@qoppa.com.
jPDFText can extract existing text content from PDF documents. If you are interesting in recognizing text in scanned PDF documents or PDF documents containing images, you may be interested in our Java OCR feature.