A graph database is a specialized, single-purpose platform for creating and manipulating graphs. Graphs contain nodes, edges, and properties, all of which are used to represent and store data in a way that relational databases are not equipped to do.
Graph analytics is another commonly used term, and it refers specifically to the process of analyzing data in a graph format using data points as nodes and relationships as edges. Graph analytics requires a database that can support graph formats; this could be a dedicated graph database, or a multi-model database that supports multiple data models, including graph.
What Is a Graph?
A graph is a collection of points (vertices) and lines between those points (edges). Graphs allow users to model data based on relationships in a more natural, intuitive way than relational databases usually allow for.
In the example below, the vertices are Melli, Jean, John, Lucy, and Sophie and the edges which denote the relationships are “collaborates with” and “feuds with”.
The graph format provides a much more flexible platform for finding distant connections or analyzing data based on things like strength or quality of relationship. When searching for an indirect relationship between two data points, the nature of a graph format's logic is a more efficient platform for completing this task.
Graphs let you explore and discover connections and patterns in social networks, IoT, big data, data warehouses, and also complex transaction data for multiple business use cases including fraud detection in banking, discovering connections in social networks, and customer 360. Today, graph is increasingly being used as a part of data science as a way to make connections in relationships clearer.
Graph algorithms—operations specifically designed to analyze relationships and behaviors among data in graphs—make it possible to understand things that are difficult to see with other methods. For example, graph algorithms can identify what individual or item is most connected to others in social networks or business processes. The algorithms can identify communities, anomalies, common patterns, and paths that connect individuals or related transactions.
Graph Database Fundamentals
Graph databases bring data into a graph format, regardless of the data model they draw from. In a graph format, the key assets are records (nodes or vertices) and the connections between the records (edges, links, or relationships). Since it’s possible to create connections (edges) between two nodes or many nodes, this in turn opens the door to all sorts of dimensional analyses.
The image below gives a visual representation of an example query using graph analytics. In this example, all of the records are represented by dots. By default, all of the dots are blue. When a query is made, the resulting records and their respective connections are shown in red. Graph databases allow for nearly limitless connections to help identify patterns and detect anomalies.
A simple real-world example of what can be accomplished with a graph database is determining directions from start to finish on a map. You can imagine that every intersection is a node and every street is an edge. The query, then, is to determine the best path to get from A to B. In addition, strength/quality of connection can be factored in as traffic. All of this can be included to process the query (see the example below):
How Can You Use a Graph Database?
Graph databases are an extremely flexible, extremely powerful tool. Because of the graph format, complex relationships can be determined for deeper insights with much less effort. Graph databases generally run queries in languages such as Property Graph Query Language (PGQL). The example below shows the same query in PGQL and SQL.
As seen in the above example, the PGQL code is simpler and much more efficient. Because graphs emphasize relationships between data, they are ideal for several different types of analyses. In particular, graph databases excel at:
Finding the shortest path between two nodes
Determining the nodes that create the most activity/influence
Analyzing connectivity to identify the weakest points of a network
Analyzing the state of the network or community based on connection distance/density in a group
A simple example of graph analytics in action is the image below, which shows a visual representation of the popular party game “Six Degrees of Kevin Bacon.” For those new to it, this game involves coming up with connections between Kevin Bacon and another actor based on a chain of mutual films. This emphasis on relationships makes it the ideal way to demonstrate graph analytics.
Imagine a data set with two categories of nodes: every film ever made and every actor that has been in those films. Then, using graph, we run a query asking to connect Kevin Bacon to Muppet icon Miss Piggy. The result would be as follows:
In this example, the available nodes (vertices) are both actors and films and the relationships (edges) are the status of “acted in.” From here, the query returns the following results:
Kevin Bacon acted in The River Wild with Meryl Streep.
Meryl Streep acted in Lemony Snicket’s A Series of Unfortunate Events with Billy Connolly.
Billy Connolly acted in Muppet Treasure Island with Miss Piggy.
Graph databases can query many different relationships for this Kevin Bacon example, such as:
“What is the shortest chain to connect Kevin Bacon to Miss Piggy?” (shortest path analysis, as used in the Six Degrees game above)
“Who has worked with the largest number of actors?” (degree centrality)
“What is the average distance between Kevin Bacon and all other actors?” (closeness centrality)
This is, of course, a more amusing example than most uses of graph analytics. But this approach works in nearly all big data—any situation where large numbers of records show a natural connectivity with each other. Some of the most popular ways to use graph analytics is for analyzing social networks, communication networks, website traffic and usage, real-world road data, and financial transactions and accounts.
Graph Database Use Case: Filtering Out Disinformation and Bots on Social Media
Graph databases can be used in many different scenarios, but it is commonly used to analyze social networks. In fact, social networks make the ideal use case as they involve a heavy volume of nodes (user accounts) and multi-dimensional connections (engagements in many different directions). A graph analysis for a social network can determine:
How active are users? (number of nodes)
Which users have the most influence? (density of connections)
Who has the most two-way engagement? (direction and density of connections)
However, this information is useless if it has been unnaturally skewed by bots. Fortunately, graph analytics can provide an excellent means for identifying and filtering out bots.
In a real-world use case, the Oracle team used Oracle Marketing Cloud to evaluate social media advertising and traction—specifically, to identify fake bot accounts that skewed data. The most common behavior by these bots involved retweet target accounts, thus artificially inflating their popularity. A simple pattern analysis allowed for a look using retweet count and density of connections to neighbors. Naturally popular accounts showed different relationships with neighbors compared to bot-driven accounts.
This image shows naturally popular accounts.
And this image shows the behavior of a bot-driven account.
The key here is using the power of graph analytics to identify a natural pattern versus a bot pattern. From there, it’s as simple as filtering out those accounts, though it’s also possible to dig deeper to examine, say, the relationship between bots and retweeted accounts
Social media networks do their best to eliminate bot accounts, as they impact their overall user base experience. To verify that this process of bot detection was accurate, flagged accounts were checked after a month. The results were as follows:
Still active: 8.8%
This extremely high percentage of punished accounts (91.2%) showed the accuracy of both pattern identification and the cleansing process. This would have taken significantly longer in a standard tabular database, but with graph analytics, it’s possible to identify complex patterns quickly.
Graph Database Use Case: Credit Card Fraud
Graphs have become a powerful tool in the finance industry as a means of detecting fraud. Despite advances in anti-fraud technology, such as the use of embedded chips in cards, fraud can still occur in a number of ways. Skimming devices can steal details from magnetic strips—a technique commonly used in locations that haven’t installed chip readers yet. Once those details are stored, they can be loaded onto a counterfeit card to make purchases or withdraw money.
As a means of fraud detection, pattern identification is often the first line of defense. Expected purchase patterns are based on location, frequency, types of stores, and other things that fit a user profile. When something appears totally anomalous—for example, a person who stays within the San Francisco Bay Area most of the time suddenly making late-night purchases in Florida—it flags it as potentially fraudulent.
The computing power needed for this is simplified significantly with graph analytics. Graph analytics excels at establishing patterns between nodes—in this case, the categories of nodes are defined as accounts (cardholders), purchase locations, purchase category, transactions, and terminals. It’s easy to identify natural behavior patterns; for example, in a given month, a person could:
Buy pet food (purchase category) at different pet stores (terminals)
Pay for restaurants on weekends (transaction metadata) in the region (purchase locations)
Buy repair hardware (purchase category) at a local hardware store (account location, purchase location)
Fraud detection is typically handled with machine learning but graph analytics can supplement this effort to create a more accurate, more efficient process. Thanks to the focus on relationships, the results have become effective predictors in determining and flagging fraudulent records. curating and preparing data before it can actually be used.
The Future of Graph Databases
Graph databases and graph techniques have been evolving as compute power and big data have increased over the past decade. In fact, it’s become increasingly clear that they will become the standard tool for analyzing a brave new world of complex data relationships. As businesses and organizations continue pushing the capabilities of big data and analysis, the ability to derive insights in increasingly complex ways makes graph databases a must-have for today’s needs and tomorrow’s successes.