| Outside In Clean Content
Outside
In Clean Content addresses particularly challenging issues in native
file processing. Focusing specifically on widely used formats
(Microsoft Office and PDF), its extended extraction provides all text,
properties, hidden information and system data emedded in native files.
Its extended extraction includes the ability to analyze and process
malformed documents, which is critical to accurate text extraction from
PDFs. Clean Content can also programmatically modify native files
enabling features such as scrubbing, property modification and document
assembly. Outside In Clean Content is a pure Java technology that
offers Java, C/C++ and .NET APIs.
|
- Extracts
text, metadata and hidden information from Microsoft Office (Word,
Excel and PowerPoint, versions 97-2007) and PDF
documents
- Identifies, reports and optionally removes or modifies more than 40 metadata and hidden data elements
- Bursts and reassembles slides from multiple PowerPoint presentations
- Provides accurate text offset information to automate native search hit-highlighting of PDFs in Adobe Reader
- Architected for high document throughput required by the most performance sensitive environments
- Easy
integration via a Java API for Java or any Java compatible environment
like JSP and J2EE, or via a C/C++ or .NET APIs for integration with
traditional languages
- No
Microsoft Office dependency eliminating the reliability, scalability
and platform dependency issues that arise when automating Office
applications to process files in high volumes
- Available
on Windows with Java and C/C++ and .NET interfaces, on Linux x86 with
Java and C/C++ interfaces, and on Solaris SPARC with a Java interface.
Supported on any Java 1.5 or above compliant JVM
|
|
|
 |
Outside In SDKs
|
 |
 |
Technical Support
|
 |
|