Generative AI Service Features

 

Models

New models available from Meta and Cohere for OCI Generative AI include

  • Llama 2: Developed by Meta, Llama 2 is a collection of text generation models with up to 70 billion parameters. It’s the leading open source large language model (LLM) that’s free for research and commercial use.
  • Command: Command is Cohere’s flagship text generation model. It comes in two sizes: 6 billion parameters and 52 billion parameters. The former has lower latency and costs while the latter model provides better accuracy.
  • Summarize: Cohere’s Summarize provides high-quality summaries that accurately capture the most important information from your documents.
  • Embed: Cohere’s English and multilingual embedding models (v3) convert text to vector embeddings representation. “Light” versions of Embed are smaller and faster (English only).

Dedicated AI clusters

Dedicated AI clusters give you the ability to host foundational models on dedicated GPUs that are private to you. These clusters also provide you a stable throughput performance that's required for production use cases and can support hosting and fine-tuning workloads. OCI Generative AI gives you the ability to scale out your cluster with zero downtime to handle changes in volume call. Up to 50 custom, fine-tuned models can be hosted on the same dedicated hosting cluster as long as these fine-tuned models all share the same base foundational model.

LangChain integration

OCI Generative AI will be integrated with LangChain, an open source framework that has emerged to develop new interfaces for generative AI applications powered by language models. LangChain makes it easy to swap out abstractions and components necessary to work with language models.

Generative AI operations

OCI Generative AI provides content moderation controls, endpoint model swap with zero downtime, and endpoints deactivation and activation capabilities. For each model endpoint, OCI Generative AI also captures a series of analytics including call statistics, tokens processed, error counts, etc.