Data lake use cases
To provide all the advantages that data lakes can offer, a proper solution should be able to offer better ways to:
- Ingest and transform: Move and convert different kinds and formats of data
- Persist and access: Ensure data is secure, can be readily discovered, can easily scale as needed, and be accessed as needed across products
- Analyze and use data science: Uncover insights and trends within data
A data lake is more useful when it is part of a greater data management platform, and it should integrate well with existing data and tools for a more powerful data lake.
Omnichannel marketing data lake
Using the data lake to extend the data warehouse is something often seen with omnichannel marketing, sometimes called multichannel marketing. The way to think about the data ecosystem in marketing is that every channel can be its own database, and every touchpoint can be as well. And then many marketers also buy data from third parties.
For example, a marketer might want to buy data that has additional demographic and consumer preference information about customers and prospects, and that helps the marketer fill out that complete view of each customer, which in turn helps with creating more personalized and targeted marketing campaigns.
That’s a complex data ecosystem, and it’s getting bigger in volume and greater in complexity all the time. The data lake is brought in quite often to capture data that's coming in from multiple channels and touchpoints. And some of those actually are streaming data.
Companies that offer a smartphone app to its customers may be receiving that data in real time or close to it, as customers use that app. Many times, the company doesn’t really need full real time. It could be an hour or two old. But it allows the marketing department to do very granular monitoring of the business and create specials, incentives, discounts, and micro-campaigns.
Digital supply chain data lake
The digital supply chain is an equally diverse data environment and the data lake can help with that, especially when the data lake is on Hadoop. Hadoop is largely a file-based system because it was originally designed for very large and highly numerous log files that come from web servers. In the supply chain there is often a large quantity of file-based data. Think about file-based and document-based data from EDI systems, XML, and of course today JSONs coming on very strong in the digital supply chain. That's very diverse information.
There is also internal information to consider. Manufacturers often have data from the shop floor and from shipping and billing that's highly relevant to the supply chain. The lake can help manufacturers bring that data together and manage it in a file-based kind of way.
The Internet of Things data lake
The Internet of Things is creating new data sources almost daily in some companies. And of course, as those sources diversify they create even more data. Increasingly, there are more sensors on more machinery all the time. As an example, every rail freight or truck freight vehicle like that has a huge list of sensors so the company can track that vehicle through space and time, in addition to how it’s operated. Is it operated safely? Is it operated in an optimal way relative to fuel consumption? Enormous amounts of information are coming from these places, and the data lake is very popular because it provides a repository for all of that data.
A single data lake
Now, those are examples of fairly targeted uses of the data lake in certain departments or IT programs, but a different approach is for centralized IT to provide a single large data lake that is multitenant. It can be used by lots of different departments, business units, and technology programs. As people get used to the lake, they figure out how to optimize
it for diverse uses and operations, analytics, and even compliance.