YellowDog pioneers superfast HPC rendering jobs using Oracle Cloud
UK startup taps Oracle Cloud Infrastructure to help customers deliver their high-performance computing workloads on time and on budget.
“The main benefit our customers see using Oracle Cloud Infrastructure is the ability to accelerate their workloads, through fast deployment and orchestration at scale. In many cases jobs run more quickly than customers ever imagined possible.”
YellowDog, based in Bristol, England, helps creative studios, biotech companies, engineering firms, and other customers run their most challenging compute workloads.
The company’s cloud-native workflow management and scheduling software lets customers spin up idle cloud data center capacity to produce complex 3D renderings, analyze vast amounts of data, and process other huge jobs quickly and inexpensively.
YellowDog’s customers seek to deploy their CPU- and GPU-intensive workloads across the fewest number of cloud nodes in order to save money, while also ensuring that their jobs are completed by a specific deadline. To help them achieve those goals, YellowDog required the highest-performing, most flexible cloud infrastructure on the market.
Furthermore, the vast majority of customers run commercially sensitive projects—for example, rendering advertisements for hitherto unreleased products—which require complete data isolation, both from YellowDog and its infrastructure provider.
What we are able to do more than anything is mimic an on-premises data center in Oracle Cloud. This works for us, works for Oracle, and it works for the customer. This is certainly unique within Oracle.
Why YellowDog Chose Oracle
YellowDog was an early adopter of Oracle Cloud Infrastructure (OCI), drawn to its high performance and relatively low price. Other key reasons it chose OCI: direct integration with Oracle’s bare metal servers and fast deployment.
“Having access to the raw compute power offered by Oracle bare metal means that customers benefit from the best possible performance,” says YellowDog CTO Simon Ponsford. “Many HPC workloads require InfiniBand networks, such as those offered by Oracle, and options such as disabling hyperthreading, which typically isn’t possible with most cloud providers. The Oracle Cloud is far more akin to being able to run HPC jobs than most of the competition.”
Oracle Cloud Infrastructure has enabled YellowDog to significantly improve the performance of its HPC and rendering jobs. By being able to turn off hyperthreading, for example, and doing away with hypervisors that take up extra resources, YellowDog can access the raw compute power of Oracle bare metal servers directly, thereby achieving faster turnaround times for customers’ applications.
Turning off hyperthreading also results in a substantial cost saving, which YellowDog can pass on to its customers, because it halves the number of cores required for licensing—which typically can amount to 15% of the cost of a job. “This is really where OCI comes into its own, because if you go elsewhere, you generally have to keep hyperthreading on, so it will use double the number of licenses,” Ponsford says. “The price-performance mix of OCI means that our customers are able to run their workloads over fewer servers and still deliver on time. This has a big impact, particularly when some applications are licensed per node.”
Additionally, YellowDog and its customers have benefited from Oracle’s transparent pricing, which, unlike with most other cloud providers, is the same across all data centers. This simplifies budgeting for YellowDog’s customers, who also don’t have to worry about major price increases should they need to add a few nodes in another data center in order to complete a job on time or earlier than originally scheduled. Importantly, with OCI’s flat pricing model, coupled with YellowDog’s Best Source of Compute technology, workloads can be orchestrated across data centers according to geography (to comply with data protection regulations), environmental considerations, availability, and price.
That kind of flexibility also extends to setup and configuration—YellowDog customers can mix and match OCI bare metal and virtual machines, as well as different compute shapes. “For other cloud providers, there is often only one HPC shape that you can work with, but with Oracle there are several, which means you’ve got options,” Ponsford says. “Also, the way that Oracle’s RDMA networks work is great. To be able to get dedicated interfaces to shift traffic around is brilliant. It’s really difficult to get that from other providers.”
The fast deployment times on OCI have enabled YellowDog to differentiate itself from its competitors on price—because customers pay from the moment they request the machines, rather than from when they’re up and running. In one case, YellowDog demonstrated to a satellite business customer that it could build its 20,000-core environment in less than 30 minutes—and post-contract, it frequently achieved that in just 18 minutes. This enabled the customer to achieve its goal of running its two-hour jobs four times per day and pay for only a total of 10 hours, whereas with all other competitors the customer had engaged, it needed to keep the machines running 24/7, costing it more than double.
Because Oracle Cloud Infrastructure's physical network is designed for complete customer and service isolation, YellowDog’s customers are ensured that their data is kept private—essential for the commercially sensitive projects that make up 99% of YellowDog’s business.
For the future, YellowDog is keen to explore the latest Nvidia, AMD, and Intel chipsets on OCI, which will enable it to run even more ambitious workloads in less time. “We're always looking for that hardware refresh that will improve the delivery times to customers,” Ponsford says. “And we like the technical choices that Oracle has made around that.”