Graph analytics is an emerging form of data analysis, one that works particularly well with complex relationships. It involves moving data points and relationships between data points into a graph format (also known as nodes and links, or vertices and edges). When querying complex relationships or distant connections between data, graph analytics offers a solution that codes queries more efficiently and can output results in an easy-to-digest visual format.
Graph analytics’ graph format provides a much more flexible platform for finding distant connections or analyzing data based on things like strength or quality of relationship. When searching for an indirect relationship between two data points, the nature of a graph format's logic is a more efficient platform for completing this task. That is why graph analytics is the most effective—and in many cases, the most preferred—way to explore complex relationships in data.
Graph Analytics vs. Graph Database
It’s easy to confuse the terms “graph analytics” and “graph database”—and in fact, sometimes they are mistakenly used interchangeably. But while there is some overlap between the two terms, they represent two distinctly separate ideas.
A graph database is a specialized single-purpose platform for creating and manipulating graphs, which are relationships between nodes. Because of this singular focus, it does come with limitations in its capabilities and cannot process certain things as well as, say, a relational database.
Graph analytics is the process of analyzing data in a graph format using data points as nodes and relationships as edges. Graph analytics does not necessarily require a graph database to be employed as long as the tool can retrieve data; a multi-model configuration (such as Oracle Database) can provide users with flexibility over how they query and manage their data.
Graph Analytics Fundamentals
Graph analytics brings data into a graph format, regardless of the data model it draws from. In a graph format, the key assets are records (nodes or vertices) and the connections between the records (edges, links, or relationships). Since it’s possible to create connections (edges) between two nodes or many nodes, this in turn opens the door to all sorts of dimensional analyses.
The image below gives a visual representation of an example query using graph analytics. In this example, all of the records are represented by dots. By default, all of the dots are blue. When a query is made, the resulting records and their respective connections are shown in red. Graph databases allow for nearly limitless connections to help identify patterns and detect anomalies.
A simple real-world example of graph analytics is determining directions from start to finish on a map. You can imagine that every intersection is a node and every street is an edge. The query, then, is to determine the best path to get from A to B. In addition, strength/quality of connection can be factored in as traffic. All of this can be included to process the query (see the example below):
How Do You Use Graph Analytics?
How Do You Use Graph Analytics?
Graph analytics is an extremely flexible, extremely powerful tool. Because of its graph format, complex relationships can be determined for deeper insights with much less effort. Graph analytics generally runs queries in languages such as Property Graph Query Language (PGQL). The example below shows the same query in PGQL and SQL.
As seen in the above example, the PGQL code is simpler and much more efficient. Because graph analytics emphasizes relationships between data, it is ideal for several different types of analyses. In particular, graph analytics excels at:
Finding the shortest path between two nodes
Determining the nodes that create the most activity/influence
Analyzing connectivity to identify the weakest points of a network
Analyzing the state of the network or community based on connection distance/density in a group
A simple example of graph analytics in action is the image below, which shows a visual representation of the popular party game “Six Degrees of Kevin Bacon.” For those new to it, this game involves coming up with connections between Kevin Bacon and another actor based on a chain of mutual films. This emphasis on relationships makes it the ideal way to demonstrate graph analytics.
Imagine a data set with two categories of nodes: every film ever made and every actor that has been in those films. Then, using graph analytics, we run a query asking to connect Kevin Bacon to Muppet icon Miss Piggy. The result would be as follows:
In this example, the available nodes (vertices) are both actors and films and the relationships (edges) are the status of “acted in.” From here, the query returns the following results:
Kevin Bacon acted in The River Wild with Meryl Streep.
Meryl Streep acted in Lemony Snicket’s A Series of Unfortunate Events with Billy Connolly.
Billy Connolly acted in Muppet Treasure Island with Miss Piggy.
Graph analytics can query many different relationships for this Kevin Bacon example, such as:
“What is the shortest chain to connect Kevin Bacon to Miss Piggy?” (shortest path analysis, as used in the Six Degrees game above)
“Who has worked with the largest number of actors?” (degree centrality)
“What is the average distance between Kevin Bacon and all other actors?” (closeness centrality)
This is, of course, a more amusing example than most uses of graph analytics. But this approach works in nearly all big data—any situation where large numbers of records show a natural connectivity with each other. Some of the most popular ways to use graph analytics is for analyzing social networks, communication networks, website traffic and usage, real-world road data, and financial transactions and accounts.
Use Case: Filtering Out Disinformation and Bots on Social Media
Graph analytics can be applied in many different scenarios, but it is commonly used to analyze social networks. In fact, social networks make the ideal use case as they involve a heavy volume of nodes (user accounts) and multi-dimensional connections (engagements in many different directions). A graph analysis for a social network can determine:
How active are users? (number of nodes)
Which users have the most influence? (density of connections)
Who has the most two-way engagement? (direction and density of connections)
However, this information is useless if it has been unnaturally skewed by bots. Fortunately, graph analytics can provide an excellent means for identifying and filtering out bots.
In a real-world use case, the Oracle team used Oracle Marketing Cloud to evaluate social media advertising and traction—specifically, to identify fake bot accounts that skewed data. The most common behavior by these bots involved retweet target accounts, thus artificially inflating their popularity. A simple pattern analysis allowed for a look using retweet count and density of connections to neighbors. Naturally popular accounts showed different relationships with neighbors compared to bot-driven accounts.
This image shows naturally popular accounts.
And this image shows the behavior of a bot-driven account.
The key here is using the power of graph analytics to identify a natural pattern versus a bot pattern. From there, it’s as simple as filtering out those accounts, though it’s also possible to dig deeper to examine, say, the relationship between bots and retweeted accounts
Social media networks do their best to eliminate bot accounts, as they impact their overall user base experience. To verify that this process of bot detection was accurate, flagged accounts were checked after a month. The results were as follows:
Still active: 8.8%
This extremely high percentage of punished accounts (91.2%) showed the accuracy of both pattern identification and the cleansing process. This would have taken significantly longer in a standard tabular database, but with graph analytics, it’s possible to identify complex patterns quickly.
Use Case: Credit Card Fraud
Analytics have become a powerful tool in the finance industry as a means of detecting fraud. Despite advances in anti-fraud technology, such as the use of embedded chips in cards, fraud can still occur in a number of ways. Skimming devices can steal details from magnetic strips—a technique commonly used in locations that haven’t installed chip readers yet. Once those details are stored, they can be loaded onto a counterfeit card to make purchases or withdraw money.
As a means of fraud detection, pattern identification is often the first line of defense. Expected purchase patterns are based on location, frequency, types of stores, and other things that fit a user profile. When something appears totally anomalous—for example, a person who stays within the San Francisco Bay Area most of the time suddenly making late-night purchases in Florida—it flags it as potentially fraudulent.
The computing power needed for this is simplified significantly with graph analytics. Graph analytics excels at establishing patterns between nodes—in this case, the categories of nodes are defined as accounts (cardholders), purchase locations, purchase category, transactions, and terminals. It’s easy to identify natural behavior patterns; for example, in a given month, a person could:
Buy pet food (purchase category) at different pet stores (terminals)
Pay for restaurants on weekends (transaction metadata) in the region (purchase locations)
Buy repair hardware (purchase category) at a local hardware store (account location, purchase location)
Fraud detection is typically handled with machine learning but graph analytics can supplement this effort to create a more accurate, more efficient process. Thanks to the focus on relationships, the results have become effective predictors in determining and flagging fraudulent records. curating and preparing data before it can actually be used.
The Future of Graph Analytics
Graph analytics and graph techniques has been evolving as compute power and big data have increased over the past decade. In fact, it’s become increasingly clear that it will become the standard tool for analyzing a brave new world of complex data relationships. As businesses and organizations continue pushing the capabilities of big data and analysis, the ability to derive insights in increasingly complex ways makes graph analytics a must-have for today’s needs and tomorrow’s successes.