Outside In Search Export Outside In Search Export extracts the text and metadata of nearly 500 supported file types and converts it into XML, HTML or text specifically designed for search and forensic applications. This SDK offers a rich feature set and the option of four output formats:
- SearchML: Lightweight XML containing text, embeddings and metadata optimized for search and text extraction;
- SearchHTML: HTML optimized for Web crawlers but with limited display formatting;
- SearchText: Plain text file (UTF-8 encoded Unicode) with properties and body text from the input file;
- PageML: XML which provides paginated text.
Its use is appropriate for search, forensics or any application that needs to extract content and convert it into a format conducive to post-processing and analysis.
- Extracts text and metadata information from files
- Developers can choose the output format most suitable to their application
- Optional 'metadata only' mode extracts document properties to build metadata repositories or to quickly flag key documents for further processing
- Optimized for performance and is designed for high-throughput server environments