The author would like to thank Rakesh Srivastava, COO and co-founder of UniQreate, and Mitesh Bhopale, senior product manager of Oracle for StartUps, for their insight and support.
According to the May 2020 IDC’s Global Datasphere Forecast, the amount of data created over the next three years will exceed the data created over the past 30 years, and the world will create more than three times the data over the next five years than it did in the previous five. As this data grows, the need to make sense of that data before it becomes redundant grows.
Organizations are a treasure trove of data available across millions of documents varying in structure, context, layout, and formats. However, getting insights from this data can be a daunting task, with organizations not knowing where to start and how to make sense of this massive unstructured data. People who know how to make sense of these data sets rely on hundreds of manual resources or technical teams to achieve it.
A lot of time is lost addressing hidden challenges, such as data silos within the organization, gathering insights from the data, and the limited scalability of such an approach. Loss of time also means a missed opportunity to make an informed business decision. Here, the budding startup UniQreate creates value.
UniQreate’s mission helps businesses maximize the value of their unstructured data before that data loses value. More simply, UniQreate is a data-extraction automation company. UniQreate utilizes AI learning systems, intelligent workflows, and web-enabled interfaces to help business users manage their data needs with the most efficient workflows and the least intrusive interactions.
The vision for the company is to constantly evolve three primary dimensions—convenience, cost, and compatibility—to drive continuous innovation.
Figure 1: UniQreate’s three-step process to automate document data extraction
Goals of moving to the cloud
UniQreate’s business model centers around creating value by structuring the unstructured data in the most efficient manner. They aim to automate the processes of extracting data and make it easy for its customers to build insights to make better decisions quickly with a high level of accuracy and confidence. As a budding AI startup, they need a highly performant, available infrastructure that can run AI workloads reliably at a lower cost.
Given that the company foundation is based on data, UniQreate was also looking to build a performant, agile, and scalable data management platform with the following technical requirements:
- Off-the-shelf higher compute power at a competitive cost
- Scalable file storage and managed MySQL services
- Object storage that addresses all project storage needs
- Compartment feature that provides a clean way to segregate and manage separate environments
The suite of Oracle products used
UniQreate partnered with Oracle for Startups to work on a cloud solution that best works for their business. UniQreate used the following OCI services to deliver on its business goals:
- Bare metal Compute: Oracle’s bare metal servers provide UniQreate with isolation, visibility, and control by using dedicated compute instances. The servers support applications that require high core counts, large amounts of memory, and high bandwidth—scaling up to 160 cores (the largest in the industry), 2 TB of RAM, and up to 1 PB of block storage. UniQreate is running 16 CPU instances, and 3 GPU instances for different client environments. Each of these compute instances can support up to 200 extraction cycles per day with model training running every 24 hours.
- Virtual cloud network (VCN): UniQreate use a virtual private network (VPN) that customers can set up in Oracle data centers. It closely resembles a traditional network with firewall rules and specific types of communication gateways that customers can choose to use.
- Object Storage: OCI Object Storage service enables UniQreate to securely store any type of data in its native format. With built-in redundancy, Object Storage is ideal for building applications that require scale and flexibility. The service can consolidate multiple data sources for analytics, backup, or archive purposes. UniQreate is efficiently utilizing this service to back up all environments, including images of each virtual machine (VM).
- File Storage: OCI File Storage service provides a durable, scalable, secure, enterprise-grade network file system that a customer can connect to bare metal, VM, or container instances using a VCN. File Storage offered UniQreate a scalable, low latency organization of models and document metadata, effectively decoupling from the web server and database server.
- Load balancer: OCI Load Balancing service enables UniQreate to distribute web requests across an array of servers and automatically route traffic across availability domains resulting in high availability and fault tolerance for its applications or data sources.
- Identity and Access Management (IAM): IAM service secures access to enterprise applications for both cloud and on-premises deployments. UniQreate uses IAM to implement a strict policy for each of its users.
- Oracle MySQL Database service: UniQreate utilizes this fully managed OCI Database service that lets developers quickly develop and deploy secure, cloud native applications. This service is optimized for and exclusively available in OCI. Oracle MySQL Database service has an integrated, high-performance analytics engine, HeatWave, which allows UniQreate to run sophisticated real-time analytics directly against an operational MySQL database.
Figure 2: UniQreate’s reference architecture
UniQreate has been part of the Oracle for Startups program since 2020. UniQreate has been running 16 OCPU instances and 3 GPU instances for multiple client environments to address different capital market use cases. They follow a layered approach to build its enterprise architecture, which comprises the following components:
- Web server for the extraction UI and administrative capabilities
- Database server for persistent storage for their web server
- Core Manager helps decide shape and volume of VMs to be launched for the prediction engine, which runs on UniQreate’s AI and ML module.
- Process Monitor monitors the health and performance of the entire system.
- File Storage provides scalable, low latency organization of models and document metadata, which effectively decouples from the web server and database server.
To dynamically deploy the wanted shape of the core engine, UniQreate uses Ansible scripts. The VCN was segmented to host two subnets with two levels of security: Network and application security.
To achieve high availability and fault tolerance, they configured availability domains for different components to mitigate risk if one availability domain site goes down. UniQreate also uses OCI Identity and Access Control to ensure strict policy implementation for each user. In addition, OCI Object Storage is used for all the environment backups, including images of each VM.
Using Oracle’s highly performant, highly available, resilient cloud, UniQreate can easily scale the volume and variety of different use cases at ease. They migrated the entire setup to OCI in just four days and provide maximum availability and uptime. UniQreate was scale the computing bandwidth and storage at a competitive cost, addressing their need for scale and lower costs.
Oracle’s scalable file storage, MySQL managed service, Object Storage, and user-friendly UI made it easy for them to manage operations considering the limited resources they have. With OCI, UniQreate can run 200 extraction cycles per day for client environments with model training running every 24 hours. Applying OCI’s compartment approach, they efficiently maintained separate environments in the same region, helping maintain higher control.
With the features, functionality, and competitive cost provided by Oracle Cloud@Customer, UniQreate has achieved more than 20% monthly savings.
UniQreate data extraction requires a massive volume of documents: 5–10 million documents per year per client. Throughout its journey, UniQreate has tried numerous machine learning and deep learning algorithms, including TF-IDF, Random Forest, Text embedding models, Neural Nets, and many other DL models. After running such a wide range of algorithms, the company has built more advanced models to provide better context and representation of the text.
To run these advanced models more efficiently, UniQreate needed a more robust underlying infrastructure to further augment its accuracy. They want to focus on improving the training time and performance of the models. To achieve this excellence in its technology, the company plans to run extensive tests on OCI GPU to select the best possible scaling infrastructure, which is seamless, cost-efficient, and robust while delivering high performance and throughput.
For more details, see the following resources: