- Extracts text, metadata and hidden information from Microsoft Office (Word, Excel and PowerPoint, versions 97-2010) and PDF documents
- Identifies, reports and optionally removes or modifies more than 40 metadata and hidden data elements
- Bursts and reassembles slides from multiple PowerPoint presentations
- Provides accurate text offset information to automate native search hit-highlighting of PDFs in Adobe Reader
- Architected for high document throughput required by the most performance sensitive environments
- Easy integration via a Java API for Java or any Java compatible environment like JSP and J2EE, or via a C/C++ or .NET APIs for integration with traditional languages
- No Microsoft Office dependency eliminating the reliability, scalability and platform dependency issues that arise when automating Office applications to process files in high volumes
- Available on Windows with Java and C/C++ and .NET interfaces, on Linux x86 with Java and C/C++ interfaces, and on Solaris SPARC with a Java interface. Supported on any Java 1.5 or above compliant JVM
|