1. Customer References
Technical Case Study

SQream delivers big data analytics with OCI powerful high-compute GPUs and agile cloud native Kubernetes

October 3, 2022 | 6 minute read

Share:

With the ever-increasing use of big data and machine learning for critical business functions, companies are that finding massive datasets processing can often take days to perform. Reducing the time to produce insights from their data helps them get results faster and speed up business decisions. Big data is now ubiquitous across a wide range of industries, whether it’s the retailers looking to effectively manage inventory or telecommunications organizations analyzing data to keep a pulse on network quality.

While the volume of edge devices keeps creating petabytes of data, companies are analyzing ways to patch security vulnerabilities more efficiently in a time-sensitive manner. All enterprises face some challenged: Plenty of time-sensitive data that needs processing in a timely and cost-effective way.

Cloud-hosted big data analytics are compute-intensive and data-intensive, mainly because of their dependency on the need for high-throughput data processing, which demands extremely fast data extraction processes to make rapid and accurate business decisions. Here, the collaboration of Oracle Cloud Infrastructure (OCI) and SQream comes to the rescue!

SQreaming fast

SQream was founded to meet the challenges data-driven enterprises face as they deal with an exponential rate of data. SQream was founded by Ami Gal and Razi Shoshani in 2010 with the idea of using GPUs to process massive data in SQL for faster outcomes. SQream’s solutions are designed for organizations to get immediate insights, accelerate their cloud data pipelines, and easily scale to analyze massive amounts of data without the burden of building infrastructure to support it. Today, more than 250 organizations worldwide rely on SQream to query petabytes of data to get unprecedented new insights at exceptional speed.

SQream products and solution

One of SQream’s key analytics offering is their software-as-a-service (SaaS) data lakehouse, a data processing engine designed for petabytes scale analytical workloads. SQream’s SaaS data lakehouse is a cloud native solution for automatically deploying, scaling, and managing data preparation and processing workloads. SQream uses its GPU-based architecture to process petabyte-scale data for your critical business insights. SQream can also manage any size of workload with its linear scalability. With each machine you add, you gain another 5 TB per hour ingest.

Architecturally, the SQream cloud offering is a self-contained, containerized solution that you can deploy as several options. You can run and manage SQream on your public cloud account on OCI or your virtual private network (VPN), achieving rapid time-to-value out of your data.

SQream’s solution operates directly on the customer’s data lake storage, giving you complete control for security and access management for your sensitive data. SQream interacts natively with open-standard formats, such as Apache Parquet, JSON, and Apache Avro, providing you with the interoperability you need. SQream solution reduces the need for data duplication and synchronization, eliminating the need to replace any component within your existing architecture. Organizations can use SQream’s platform in the following cases, especially when scaling from terabytes to petabytes:

  • Performing complex data preparation processes: Shortening the overall processing time by a factor and expediting your longest workloads’ critical path
Download this diagram in PNG format. You must accept the Oracle Technology Network License Agreement for Architecture Diagrams to download this diagram.
  • Running heavy analytics as an SQL query engine: Faster generating higher-quality business insights, while improving your competitive price-performance
  • Download this diagram in PNG format. You must accept the Oracle Technology Network License Agreement for Architecture Diagrams to download this diagram.

    OCI products and services supporting SQream

    • Regions: An OCI region is a localized geographic area that contains one or more data centers, called availability domains. Regions are independent of other regions, and vast distances can separate them, across countries or continents. OCI’s global presence allows SQream to scale and work with customers across the globe without compromising availability or performance.
    • GPU-accelerated compute: GPU-powered bare metal provides SQream with a high-performance computing platform for its demanding processing that run sophisticated algorithms. Because their solution relies on massive, parallel high-performance computing jobs, users benefit from running GPUs, allowing SQream to solve complex problems. SQream uses multiple BM.GPU2.2 GPU-accelerated compute machines to scale out and efficiently communicate with OCI Object Storage, together delivering outstanding linear performance.
    • OCI Kubernetes Engine (OKE):The open source-based Oracle-managed container orchestration service enables SQream to autoscaling clusters and pods, reducing the time and cost of creating and scaling their apps. The OKE web-based REST API and CLI allow SQream to automate Kubernetes operations for all actions, including Kubernetes cluster creation, scaling, and operations.
    • Object Storage: OCI Object Storage allows SQream to fully utilize its scale-out architecture to deliver linear scaling, achieving 4–5-GB-per-second ingestion rates. Linear scaling has the following rates with these example configurations:
      • 10 TB, two machines: 0:59:26, 2.8 GB/second
      • 10 TB, four machines: 0:35:40, 4.8 GB/second
      • 30 TB, four machines: 1:52:20, 4.4 GB/second
      • 30 TB, five machines: 1:34:07, 5.3 GB/second
      • 100 TB, four machines: 5:51:50, 4.7 GB/second
    • Identity and Access Management (IAM): IAM enables SQream to control who can access their resources in OCI and the operations that they can perform on those resources.
    • Virtual cloud networks (VCNs) and subnets: A VCN is a customizable, software-defined network that you set up in an OCI region. Like traditional data center networks, VCNs give you complete control over your network environment. A VCN can have multiple non-overlapping CIDR blocks that you can change after you create the VCN. You can segment a VCN into subnets, which you can scope to a region or to an availability domain. Each subnet consists of a contiguous range of addresses that don't overlap with the other subnets in the VCN. You can change the size of a subnet after creation. A subnet can be public or private.
    • Security lists: For each subnet, SQream can create security rules that specify the source, destination, and type of traffic allowed in and out of the subnet.
    • Route tables: Virtual route tables contain rules to route traffic from subnets to destinations outside a VCN, typically through gateways.

    SQream solution on OCI

    SQream’s SaaS data lakehouse on OCI is built to harness the raw power and high throughput capabilities of the GPU, with MPP-on-chip capabilities and a fully SQL analytical interface. SQream’s SQL compiler allows running parallel SQL statements while utilizing thousands of GPU cores simultaneously. OKE support for GPU combined with the Oracle bare metal GPU machines and OCI Object Storage allows SQream to fully utilize its scale-out architecture delivering linear scaling that can achieve 5 TB per hour per machine and keep scaling as needed.

    Separating its compute and storage and running multiple computes for read/write provides SQream with flexibility. This technique enables easy scaling and compute to be used with an existing storage solution. SQream automatically splits up the storage into manageable chunks to reduce demand on the hardware, then spools and caches data for more efficiency.

    The power of the GPU is only part of the story to SQream’s speed. Beginning with ultra-fast ingest of 5 TB per hour per machine, SQream automatically optimizes and compresses the data. The use of a columnar database allows for efficient access to data, optimizing it for running online analytical processing (OLAP), storing metadata, and efficient for real-time analytics. SQream also applies a chunking strategy that allows for multidimensional data partitioning resulting in improving performance and minimizing complex scaling processes in record time.

    SQream’s default compression mode is adaptive, allowing the system to determine the best compression algorithm based on the actual data resulting in optimized query performance. SQream can compress large datasets up to 80 percent. So, reading actual data is five times faster than the network speed, enabling any customer with any size of dataset size to utilize OCI speed and scalability.

    The results

    SQream proved linear scalability while performing data analytics benchmarks on bigger datasets. With OCI, SQream’s SaaS data lakehouse scaled from 10 TB to 100 TB and created a persuading proof of use to prospects.

    Now, customers can leverage SQream to accelerate petabyte-scale analytics workloads on OCI. OCI’s cost for GPU machines allowed SQream more computing power for the same price, ultimately providing cost-effective analytics solutions for OCI big data customers.

    Next steps

    For organizations that base their business on data insights and need to make time-sensitive decisions, OCI and SQream is the perfect combination for big data analytics. To learn more on how to deploy SQream on Oracle Cloud Infrastructure or run a proof of concept, contact David Marom or ask your Oracle Sales Account team.

    By Shweta Bhatia Gupta,
    Product Manager