Data scientists can access and use any data source in any cloud or on-premises. This provides more potential data features that lead to better models.
Oracle Cloud Infrastructure (OCI) Data Labeling is a service for building labeled datasets to more accurately train AI and machine learning models. With OCI Data Labeling, developers and data scientists assemble data, create and browse datasets, and apply labels to data records.
Submit interactive Spark queries to your OCI Data Flow Spark cluster. Or, use Oracle Accelerated Data Science SDK to easily develop a Spark application and then run it at scale on OCI Data Flow, all from within the Data Science environment.
Define feature engineering pipelines and build features with fully managed execution. Version and document both features and feature pipelines. Share, govern, and control access to features. Consume features for both batch and real-time inference scenarios.
Built-in, cloud-hosted JupyterLab notebook environments enable data science teams to build and train models using a familiar user interface.
OCI Data Science provides familiarity and versatility for data scientists, with hundreds of popular open source tools and frameworks, such as TensorFlow or PyTorch, or add frameworks of choice. A strategic partnership between OCI and Anaconda enables OCI users to download and install packages directly from the Anaconda repository at no cost—making secure open source more accessible than ever.
Oracle Accelerated Data Science SDK is a user-friendly Python toolkit that supports the data scientist through their entire end-to-end data science workflow.
With NVIDIA GPUs, data scientists can build and train deep learning models in less time. When compared to CPUs, performance speedups can be 5 to 10 times faster.
Use Jobs to run repeatable data science tasks in batch mode. Scale up your model training with support for bare metal NVIDIA GPUs and distributed training.
Easily create, edit, and run Data Science job artifacts directly from the OCI Console using the Code Editor. Comes with Git integration, autoversioning, personalization, and more.
Data scientists use the model catalog to preserve and share completed machine learning models. The catalog stores the artifacts and captures metadata around the taxonomy and context of the model, hyperparameters, definitions of the model input and output data schemas, and detailed provenance information about the model origin, including the source code and the training environment.
Automatically generate a comprehensive suite of metrics and visualizations to measure model performance against new data and compare model candidates.
Leverage prebuilt, curated conda environments to address a variety of use cases, such as NLP, computer vision, forecasting, graph analytics, and Spark. Publish custom environments and share with colleagues, ensuring reproducibility of training and inference environments.
Data scientists can connect to their organization’s Git repository to preserve and retrieve machine learning work.
Deploy machine learning models as HTTP endpoints for serving model predictions on new data in real time. Simply click to deploy from the model catalog, and OCI Data Science handles all infrastructure operations, including compute provisioning and load balancing.
Operationalize and automate your model development, training, and deployment workflows with a fully managed service to author, debug, track, manage, and execute ML pipelines.
Continuously monitor models in production for data and concept drift. Enables data scientists, site reliability engineers, and DevOps engineers to receive alerts and quickly assess model retraining needs.
Originally designed for Oracle’s own SaaS applications to embed AI features, ML applications are now available to automate the entire MLOps lifecycle, including development, provisioning, and ongoing maintenance and fleet management, for ISVs with hundreds of models for each of their thousands of customers.
Leverage LLMs, such as Llama 2 and Mistral 7B, with one click via seamless integration with Data Science notebooks.
Access support for model deployment using Text Generation Inference (Hugging Face), vLLM (UC Berkeley), and NVIDIA Triton serving with public examples for
Users can access moderation controls, endpoint model swap with zero downtime, and endpoints deactivation and activation capabilities. Leverage distributed training with PyTorch, Hugging Face Accelerate, and DeepSpeed for fine-tuning of LLMs to achieve optimal performance. Enable effortless checkpointing and storage of fine-tuned weights with mounting for object storage and file system as a service. Additionally, service-provided Condas eliminate the requirement for custom Docker environments and enable sharing with less slowdown.