To pare AI costs and hold data close, businesses turn to more transparent models

Open-weights generative AI models give companies more control and transparency, but they need refinement to shine.

Aaron Ricadela | July 11, 2024


A crop of large language models that openly furnish information about their inner workings is giving businesses private, efficient alternatives to popular generative AI software from OpenAI, Anthropic, Google, and others. Instead of licensing propriety AI models and paying for access each time employees or customers seek answers, businesses can download these freely available “open-weights” models, tinker with their statistical balances, then deploy LLM-powered applications on a public cloud service without the additional fees.

Reaping the benefits, though, means having ample generative AI expertise on staff—and a tolerance for the trial and error needed to get models humming and into production.

In AI model building, weights are the relationships among the billions of parameters, or variables models learn when they’re trained on the internet and other data sources.

With open-weights models, savvy companies can adjust those relationships to suit their applications with more visibility and control than closed systems allow. Since the open models are free, companies that build with them don’t need to pay licensing fees for application programming interfaces levied each time users interact with models such as OpenAI’s GPT-4, Anthropic’s Claude 3, or Google’s Gemini.

What’s more, businesses can fine-tune open large language models without sending their private data through those providers’ APIs—often a no-go in sectors such as banking or healthcare. The result can be smaller models that make it easier for organizations to get started with AI, customized with data that doesn’t leave their clouds.

“Almost every company I’ve talked to that’s developed their own model is starting with open weights,” says Dave Schubmehl, an analyst at market researcher IDC, which found that businesses plan to use the approach for 51% of their generative AI use cases, according to an October 2023 survey of 607 global IT decision-makers. They’re tweaking open models to create applications for delivering customer service, analyzing patents, and designing manufacturing environments by feeding them with specialized data. “Companies can create their own LLMs by adjusting those weights. There’s more ability to steer the output,” he says.

The computing industry is moving into an era in which interactions with software and the web are increasingly mediated by AI agents that can hold conversations, carry out tasks, create presentations, and glean insights from data or code. As more businesses bake AI into their software, they’re prioritizing cost, customization, and the ability to explain what’s happening under the hood. Large language models are summarizing documents, pulling computable data from unstructured text and news stories, and performing retrieval-augmented generation (RAG), which anchors them with specialized knowledge. For workaday AI techniques like these, open models can rival or eclipse proprietary counterparts’ performance—and potentially cost less.

The price of inference, or processing LLM requests, is critical at a time when 29% of respondents said they haven’t seen any return on their AI investments thus far, according to IDC.

Many companies may use both open weights and more proprietary software as they develop AI-powered apps and then put them into production. The very biggest models—GPT-4 has an estimated 1.8 trillion parameters—tend to produce more conversational, natural-sounding responses than smaller ones. But beside inferencing fees charged when users make queries, the supersized LLMs can run more slowly, limiting customers’ cloud server throughput.

“GPT-4 is a great model but also expensive and slow,” says Percy Liang, director of Stanford University’s Center for Research on Foundation Models (CRFM) and a co-founder of Together AI, which makes tools for training, fine-tuning, and serving open-weights models. “If you have some crazy idea to prototype, you can run it on Claude 3 or GPT-4. But for most of what companies are looking at—RAG-style question-answering, summarization, or extracting information—they should be looking at open models,” he says. “The landscape is quickly changing.”

Serious competitors

Meta in April released Llama 3, a free LLM that lets users access its statistical weights, in 8 billion-parameter and 70 billion-parameter sizes, with a 400 billion version planned. It’s built to excel at executing commands over multiple steps, such as booking a trip with flights and ground transport, and the company is deploying it as an AI assistant in Facebook, Instagram, Messenger, and WhatsApp. It’s available to download or via cloud service providers including Oracle Cloud Infrastructure, which began offering Llama 3 in June.

French startup Mistral publishes the weights for some of its models and ships them under a business-friendly Apache 2 license. Elon Musk’s xAI, which raised $6 billion in May, announced in April an update to its Grok large language model, which posts its code and weights for anyone to download.

And the United Arab Emirates’ Technology Innovation Institute in May released its Falcon 2 11B LLM focused on enterprise use cases, such as analyzing X-rays or construction documents—and published both its weights and training data. It offers additional open models in 40 billion and 80 billion parameter sizes. Microsoft, Hugging Face, and Databricks also provide open models. Tech companies, universities, and research organizations released 149 general-purpose AI models last year; 98 of them included access to their workings, according to an April report by Stanford University’s Human-Centered Artificial Intelligence research center.

Llama 3 and Mistral’s open Mixtral 8x22B model rank respectively fifth and eighth on a leaderboard from Stanford’s CRFM that tracks performance across 10 AI benchmarks, with GPT-4o at the top.

“There is mounting evidence that open source models can be as good in some ways as proprietary LLMs if they are fine-tuned with specific, reliable data,” Deutsche Bank Research said in an April newsletter to clients. “Open source models are becoming serious competitors.”

Proceed with caution

Businesses have three basic choices about where to run open models, which afford more protection during fine-tuning, when companies put data in, and inferencing, when users pull data out. They can deploy them on GPUs hosted by a cloud service provider and take responsibility for the software stack themselves. They can run the models as a managed service from a cloud provider without worrying about hardware and software configurations. Oracle’s generative AI service also furnishes customers with a virtual cloud network that shields data from model vendors. Or they can run LLMs on specialist AI hosting platforms, such as Together AI, Fireworks AI, Replicate, or Anyscale.

Whatever the approach, open-weights models present pros and cons. Opening AI models’ scaffolding subjects them to more scrutiny and avoids concentrating development power in a handful of companies. The more people contribute to AI, the better the research and the systems will be, Meta’s AI research head and others have argued. On the other hand, without the ability to track what they’re doing through APIs, bad actors can more readily try to circumvent built-in guardrails meant to prevent systems from generating offensive content or disclosing private information.

Enterprises should use [open weights] as a managed service that gives them some ongoing control. It’s too hard to put all the security and toxicity controls in place if you’re trying to do it yourself.”

Greg Pavlik Senior Vice President for AI, Oracle

“Open models can help us find weaknesses and their antidotes more quickly,” says Kristian Kersting, a computer science professor and head of the AI and ML lab at Germany’s Technical University of Darmstadt, who is also an investor in Aleph Alpha, a German LLM maker. Mid-market industrial companies may also find open models more accessible than their closed counterparts. “We need both.”

Another hurdle: Getting the weights right without making the model worse. Companies can sometimes work for months to adjust models’ weights for their applications without successfully getting past testing, while missteps risk worsening pretrained models’ accuracy, says Greg Pavlik, senior vice president for AI at Oracle, “If you screw around with the weights, you can wind up obliterating the behavior,” he says.

“It sounds self-serving to say you should partner with a technology vendor, but you should,” says Pavlik. “Enterprises should use it as a managed service that gives them some ongoing control. It’s too hard to put all the security and toxicity controls in place if you’re trying to do it yourself.” Oracle in June made LLM developer Cohere’s Command R and Command R+ open-weights models, targeted at RAG applications in areas including banking, healthcare, and government, available on its Oracle Cloud Infrastructure (OCI) Generative AI service.

There are also limits on how much of the models’ plumbing users can actually inspect and tweak. Unlike open source software, such as Linux, or the machine learning development framework PyTorch, which let businesses view and modify source code to fit their needs, open LLMs rarely disclose all the data behind their construction. That includes the training data that can let outsiders know why they’re more adept at answering certain questions—or whether they were trained with copyrighted material.

xAI’s Grok makes its code and weights available but not its training data. Llama-powered software products can’t top 700 million monthly active users without a license from Meta, and companies can’t use them to improve other LLMs. Google has published the weights for its open Gemma models but no code. “You actually have limited transparency into these models,” says Stanford’s Liang. “Only a few people can train them, which doesn’t really limit the concentration of power.”

Model providers themselves also benefit from openness. Meta CEO Mark Zuckerberg told investors (PDF) in February that releasing weights for AI models lets the company reap improvements from outside experts that make its software more efficient and less costly to run, while helping cement industry standards.

Evidium is one growing company that embeds Llama, Mistral, and other open models into its software. The San Francisco startup is developing a healthcare AI platform for doctors, pharma companies, and insurers that combines generative AI with knowledge graphs and symbolic AI to foster transparency. That means doctors can always see the reference sources behind the system’s recommendations. The AI platform can update information in real time as drug specifications and other guidelines change, making it more explainable than LLMs alone, says head of machine learning Wian Stipp. “Open-weights models are efficient starting points so we don’t have to pretrain from scratch,” he says.

Mileage may vary

So, what’s the best value for a company deploying an AI model? Even IT advisors are struggling to make recommendations in a market marked by weekly innovations and a panoply of pricing schemes. Published prices in June show that OpenAI charges $5 per 1 million word-fragment “tokens”—about 400,000 words—needed to write prompts or upload documents into GPT-4o. Generating output costs $15 per million tokens. Anthropic’s Claude 3.5 Sonnet model costs $3 per million tokens of input and $15 per million tokens for output. But AI model providers offer other commercial plans that charge by the hour or month for dedicated computing rather than counting tokens.

“The GenAI pricing landscape is complicated,” IDC writes in a March report, “a level of volatility that makes it difficult to plan or budget.” According to Deloitte, it’s “unclear whether open or closed-sourced generative AI will dominate, or whether the two will continue to coexist side-by-side as is the case in several other key areas of tech.”

Each comes with cost considerations based on scale. Running Llama 3 on OCI Generative AI in a multitenant environment costs $0.015 per 10,000 characters of text input or output, according to prices published in June. A dedicated Llama 3 service that reserves computing and network bandwidth for each customer costs $24 per hour. A third option in which businesses manage and run the model themselves on OCI GPU-based servers costs $40 an hour before discounts. According to benchmarks run by Oracle on throughput rates and using published pricing, customers processing more than 9.6 million characters an hour in a generative AI application (about 960,000 words) would find it less expensive to use OCI’s Llama 3 dedicated service than OpenAI’s latest GPT-4o API.

AI vendors that hold their weights and data close are delivering compelling products for businesses, too. OpenAI in May released GPT-4o, a new LLM which acts as an interactive voice assistant that can instantly describe how computer code works, translate spoken language, coach users through math problems, and make suggestions about what it sees through a smartphone camera—abilities it’s released to developers. OpenAI doesn’t release any code or training data from its models. Anthropic in May expanded Claude to Europe, citing its security and data privacy strictures as in line with the European Union’s transparency-promoting AI Act, expected to come into force in 2025.

Nevertheless, businesses are eagerly exploring whether open-weights models yield cost, customization, and control advantages over more proprietary ones. “Because they’re open, you can set up and run the models yourself, potentially at a lower cost than a provider could,” says Jason Peck, a research director in Oracle Labs. “Your mileage may vary. But if you have the skill, you can potentially do it for less.”


View more Oracle Connect articles