Oracle Cloud Free Tier

Build, test, and deploy applications on Oracle Cloud—for free.

What is data mining?

Data mining definition

What is data mining? Simply put, it is the process of discovering insights when dealing with large volumes of data. This data can come from many sources or a single database, and insights may be generated through manual discovery or automation. Many different paths exist to produce insights, often depending on variables, such as resources, machine learning/artificial intelligence capabilities, data complexity, volume of data, and the training and experience of the staff. This process involves deep analysis of data to discover patterns and underlying factors, all to create conclusions and produce informed decisions.

Data mining in big data

The use of data mining rose significantly over the past twenty years as more data sources provided a big data environment. Big data refers to massive volumes of data, often in continuous streams from multiple sources and at high velocity. In the early days of business intelligence, data tables were often exported from devices and manually prepared for insight. But as the world has become increasingly connected, data can arrive in volumes too massive for manual dissection, especially when it comes in a mix of both structured and unstructured data.

Data mining is a process that makes big data functional. Without data mining, enterprises would wind up sitting on terabytes of data from a wide range of sources: Internet of Things (IoT) devices, databases, corporate social media, marketing emails, sensors, website usage, and much more, each with its own set of metadata. Combing through expansive volumes of data is physically impossible. Data mining techniques employ algorithms to identify patterns through this massive set of records, then outputs a set of recommendations for teams to act on.

A simple example of this comes from online shopping for retailers. In these situations, customer histories are all compiled into a massive database. An algorithm sifts through that data to look for correlations, for example, people who purchase only a certain brand of dog food. This algorithm will look for information about related purchases, such as supplements or treat brands. As patterns emerge, this information can be fed to the marketing team to create promotions that trigger related to this specific brand.

How data mining works

The above section explains data mining on a big-picture level, but let’s explore the actual process of data mining. Both automated processing and human analysis are used in getting the most out of data mining, with staff establishing the guidelines while machine learning and artificial intelligence sift through large volumes of data. In general, the following workflow is used:

  1. Goals: What is the goal of your data mining? Establishing this between all stakeholders is the most important part of the process. If the goal isn’t clearly and thoughtfully established, the entire effort may have to be scrapped and restarted.
  2. Data preparation: Data preparation can involve a wide range of processes, including culling data sources, establishing formats, and cleaning datasets of anomalies and noise.
  3. Building the model: Data scientists will then build the model and develop and train it through iteration. In many cases, multiple models will be built and tested to find the most appropriate path to the goal. This evaluation process requires a broad approach for validation, with techniques, such as cross validation and receiver operating characteristic (ROC) curve analysis.

Once the data mining model has been built, it is time to deploy it across datasets. Active monitoring is required to ensure there aren’t any surprises or reasons to tweak and refine the model. If everything works as planned, the resulting data should clear standards for validity and usefulness, and as a result be ready for business users to review for data-driven decisions.

Data mining use cases

In addition to the retail example above, data mining can be a transformative process for a number of industries. The examples below highlight how data mining can be applied to industry-specific needs.

Data mining for healthcare

Data mining can transform the healthcare industry by improving and accelerating experiences for both providers and patients. Providers can use data mining to accelerate and engage research, understand operational data to best support staffing needs, and identify red flags for insurance and record fraud. For patients, data mining identifies patterns that drive preventative care options, ensuring that conversations can begin before treatments are necessary. It can also identify hidden patterns in things, such as side effects, opening the door to a better sense of how treatments might be affected by a patient’s specific and unique condition.

Data mining for manufacturing

For the manufacturing industry, data is being generated across the entire process: procurement of materials, assembly logistics, quality control, shipping dates, and returns due to manufacturing defects. Data mining can examine both individual steps in the process and the bigger picture. This enables teams to address issues on both a micro and macro view.

For example, data mining may identify that one particular vendor has longer ship times but shows fewer overall defects, so managers can decide the risk is worth it because steps can be run in parallel to mitigate the impact of delays. On the other hand, it can also show that one vendor delivers consistently but their higher defect rate creates a greater impact on the process. Data mining can create these connections so that decisions optimize the entire manufacturing process rather than being made in a vacuum.

Data mining for financial services

Data mining offers numerous benefits for financial service providers, both for internal operations and for customer experience. On the operations front, data mining can impact everything from human resources to marketing. Specifically for that industry, though, data mining can minimize IT risks, as availability and security are the highest priority for anything involving finance.

On the customer side, data mining offers both protective elements as well as a better customer experience. Data mining across transaction patterns can identify and flag items that seem unusual by geography, time of day, category of purchase, or all of these together. The results can then be forwarded to fraud teams to see if they require follow up. For the end user, data mining patterns can create marketing triggers for specialized promotions, such as refinancing or HELOC loans.

Data mining for the enterprise

Every organization in a company, from internal operations to customer service, can benefit from data mining. Successful data mining starts with having a strong infrastructure to take advantage of multiple, high-velocity data sources. Try Oracle Cloud Infrastructure for free to learn how it builds the foundation for data mining.