Data Science Service Features

Key features

Data preparation
Model building
Model training
Governance and model management
Automation and MLOps
AI Quick Actions

Data preparation

Flexible data access

Data scientists can access and use any data source in any cloud or on-premises. This provides more potential data features that lead to better models.

Data preparation at scale with Spark

Submit interactive Spark queries to your OCI Data Flow Spark cluster. Or, use Oracle Accelerated Data Science SDK to easily develop a Spark application and then run it at scale on OCI Data Flow, all from within the Data Science environment.

Feature store (In preview)

Define feature engineering pipelines and build features with fully managed execution. Version and document both features and feature pipelines. Share, govern, and control access to features. Consume features for both batch and real-time inference scenarios.

Model building

JupyterLab interface

Built-in, cloud-hosted JupyterLab notebook environments enable data science teams to build and train models using a familiar user interface.

Open source machine learning frameworks

OCI Data Science provides familiarity and versatility for data scientists, with hundreds of popular open source tools and frameworks, such as TensorFlow or PyTorch, or add frameworks of choice.

Oracle Accelerated Data Science (ADS) library

Oracle Accelerated Data Science SDK is a user-friendly Python toolkit that supports the data scientist through their entire end-to-end data science workflow.

Oracle Accelerated Data Science SDK

Model training

Powerful hardware, including graphics processing units (GPUs)

With NVIDIA GPUs, data scientists can build and train deep learning models in less time. When compared to CPUs, performance speedups can be 5 to 10 times faster.

Jobs

Use Jobs to run repeatable data science tasks in batch mode. Scale up your model training with support for bare metal NVIDIA GPUs and distributed training.

In-console editing of job artifacts

Easily create, edit, and run Data Science job artifacts directly from the OCI Console using the Code Editor. Comes with Git integration, autoversioning, personalization, and more.

Governance and model management

Model catalog

Data scientists use the model catalog to preserve and share completed machine learning models. The catalog stores the artifacts and captures metadata around the taxonomy and context of the model, hyperparameters, definitions of the model input and output data schemas, and detailed provenance information about the model origin, including the source code and the training environment.

Model evaluation and comparison

Automatically generate a comprehensive suite of metrics and visualizations to measure model performance against new data and compare model candidates.

Reproducible environments

Leverage prebuilt, curated conda environments to address a variety of use cases, such as NLP, computer vision, forecasting, graph analytics, and Spark. Publish custom environments and share with colleagues, ensuring reproducibility of training and inference environments.

Version control

Data scientists can connect to their organization’s Git repository to preserve and retrieve machine learning work.

Automation and MLOps

Managed model deployment

Deploy machine learning models as HTTP endpoints for serving model predictions on new data in real time. Simply click to deploy from the model catalog, and OCI Data Science handles all infrastructure operations, including compute provisioning and load balancing.

ML pipelines

Operationalize and automate your model development, training, and deployment workflows with a fully managed service to author, debug, track, manage, and execute ML pipelines.

ML monitoring

Continuously monitor models in production for data and concept drift. Enables data scientists, site reliability engineers, and DevOps engineers to receive alerts and quickly assess model retraining needs.

ML applications

Originally designed for Oracle’s own SaaS applications to embed AI features, ML applications are now available to automate the entire MLOps lifecycle, including development, provisioning, and ongoing maintenance and fleet management, for ISVs with hundreds of models for each of their thousands of customers.

ML applications documentation

AI Quick Actions

No-code access

Use LLMs from Mistral, Meta, and others without writing a single line of code via a seamless user interface in OCI Data Science notebooks.

Import any LLM from OCI Object Storage, then fine-tune and deploy via an easy-to-use user interface.

Deployment

Deploy LLMs with a few clicks, powered by popular inference servers, such as vLLM (from UC Berkeley), Text Generation Inference (from Hugging Face), or TensorRT-LLM (from NVIDIA), for optimal performance.

Fine-tuning

To achieve optimal performance, leverage distributed training with PyTorch, Hugging Face Accelerate, or DeepSpeed for fine-tuning LLMs. Enable storage of fine-tuned weights with object storage. Additionally, service-provided Condas eliminate the requirement for custom Docker environments and enable sharing with less slowdown.

Evaluate

Produce detailed evaluation reports for your LLM, based on BERTScore or Recall-Oriented Understudy for Gisting Evaluation (ROUGE), to help you understand how the model’s performance compares to other models.