Tractian builds hundreds of AI models faster, at lower cost using OCI

Company taps Oracle Cloud Infrastructure to ramp up AI model training and inference workloads as it fine-tunes its industrial monitoring systems.

United States | High Technology

“With Oracle, our inference costs dropped 15% while we reduced latency by 50%, helping the team deliver timely insights and improve maintenance for our customers’ critical assets. ”

JP VoltaniVP of Engineering, Tractian

Tractian is a leading provider of AI-driven industrial monitoring systems, helping manufacturers prevent equipment failures to reduce downtime and improve overall reliability.

The company—which is headquartered in the United States and also operates a research and development center in São Paulo, Brazil—builds sensors and pairs them with patented AI algorithms that enable more precise interventions for a factory floor’s most valuable physical assets. It’s currently expanding its AI models and device types to support a wide range of industries, including automotive, chemicals, consumer goods, food and beverage, and oil and gas.

To date, Tractian systems monitor more than 200,000 machines in over 2,000 plants in the US, Mexico, and Brazil. Amid its rapid business growth, the company is ramping up its AI model training and inference workloads across terabytes of sensor and maintenance data. Tractian chose Oracle Cloud Infrastructure (OCI) for its cutting-edge GPUs, strong support, and cost-effectiveness for scaling the company’s AI model training and real-time inferencing globally.

Why Tractian chose Oracle

Tractian’s sophisticated AI models, which require terabytes of sensor data per machine, can take months to train. To accelerate that training, the company needed a high performance, bare metal cloud infrastructure with high-throughput storage. Tractian was also looking for low latency and predictable costs at scale to support its inferencing needs.

Tractian chose OCI because of Oracle’s dependable access to scarce NVIDIA GPUs and OCI’s mature Kubernetes engine, block storage, networking, and PostgreSQL integration. It also values Oracle’s responsive sales and engineering support as well as its continuous improvement of OCI. “On OCI, we saw that we could spin up and be successful fast,” says JP Voltani, Tractian’s VP of engineering. “Oracle is now our go-to for model training.”

Results

On OCI bare metal with NVIDIA H100 and GB300 GPUs, Tractian cut training costs by 20% and reduced training times by an average 35%, helping it deliver improved AI models to customers.

In production, inference costs were reduced by 15% and latency by 50%. This has helped Tractian’s manufacturer customers prioritize maintenance actions for vital assets.

As a result of these improvements, Tractian achieved a predictable scale for both training and inference. The company estimates that its systems are able to prevent a customer machine failure roughly every 15 minutes across its installed base, improving uptime and safety.

About the customer

Tractian builds hardware and software and develops its own AI models to help its manufacturer customers predict failures in their industrial equipment.

Learn more about Tractian