A graph database is defined as a specialized, single-purpose platform for creating and manipulating graphs. Graphs contain nodes, edges, and properties, all of which are used to represent and store data in a way that relational databases are not equipped to do.
Graph analytics is another commonly used term, and it refers specifically to the process of analyzing data in a graph format using data points as nodes and relationships as edges. Graph analytics requires a database that can support graph formats; this could be a dedicated graph database, or a converged database that supports multiple data models, including graph.
There are two popular models of graph databases: property graphs and RDF graphs. The property graph focuses on analytics and querying, while the RDF graph emphasizes data integration. Both types of graphs consist of a collection of points (vertices) and the connections between those points (edges). But there are differences as well.
Property graphs are used to model relationships among data, and they enable query and data analytics based on these relationships. A property graph has vertices that can contain detailed information about a subject, and edges that denote the relationship between the vertices. The vertices and edges can have attributes, called properties, with which they are associated.
In this example, a set of colleagues and their relationships are represented as a property graph.
Because they are so versatile, property graphs are used in a broad range of industries and sectors, such as finance, manufacturing, public safety, retail, and many others.
RDF graphs (RDF stands for Resource Description Framework) conform to a set of W3C (Worldwide Web Consortium) standards designed to represent statements and are best for representing complex metadata and master data. They are often used for linked data, data integration, and knowledge graphs. They can represent complex concepts in a domain, or provide rich semantics and inferencing on data.
In the RDF model a statement is represented by three elements: two vertices connected by an edge reflecting the subject, predicate and object of a sentence—this is known as an RDF triple. Every vertex and edge is identified by a unique URI, or Unique Resource Identifier. The RDF model provides a way to publish data in a standard format with well-defined semantics, enabling information exchange. Government statistics agencies, pharmaceutical companies, and healthcare organizations have adopted RDF graphs widely.
Graphs and graph databases provide graph models to represent relationships in data. They allow users to perform “traversal queries” based on connections and apply graph algorithms to find patterns, paths, communities, influencers, single points of failure, and other relationships, which enable more efficient analysis at scale against massive amounts of data. The power of graphs is in analytics, the insights they provide, and their ability to link disparate data sources.
When it comes to analyzing graphs, algorithms explore the paths and distance between the vertices, the importance of the vertices, and clustering of the vertices. For example, to determine importance algorithms will often look at incoming edges, importance of neighboring vertices, and other indicators.
Graph algorithms—operations specifically designed to analyze relationships and behaviors among data in graphs—make it possible to understand things that are difficult to see with other methods. When it comes to analyzing graphs, algorithms explore the paths and distance between the vertices, the importance of the vertices, and clustering of the vertices. The algorithms will often look at incoming edges, importance of neighboring vertices, and other indicators to help determine importance. For example, graph algorithms can identify what individual or item is most connected to others in social networks or business processes. The algorithms can identify communities, anomalies, common patterns, and paths that connect individuals or related transactions.
Because graph databases explicitly store relationships, queries and algorithms utilizing the connectivity between vertices can be run in sub-seconds rather than hours or days. Users don’t need to execute countless joins and the data can more easily be used for analysis and machine learning to discover more about the world around us.
The graph format provides a more flexible platform for finding distant connections or analyzing data based on things like strength or quality of relationship. Graphs let you explore and discover connections and patterns in social networks, IoT, big data, data warehouses, and also complex transaction data for multiple business use cases including fraud detection in banking, discovering connections in social networks, and customer 360. Today, graph databases are increasingly being used as a part of data science as a way to make connections in relationships clearer.
Because graph databases explicitly store the relationships, queries and algorithms utilizing the connectivity between vertices can be run in subseconds rather than hours or days. Users don’t need to execute countless joins and the data can more easily be used for analysis and machine learning to discover more about the world around us.
Graph databases are an extremely flexible, extremely powerful tool. Because of the graph format, complex relationships can be determined for deeper insights with much less effort. Graph databases generally run queries in languages such as Property Graph Query Language (PGQL). The example below shows the same query in PGQL and SQL.
As seen in the above example, the PGQL code is simpler and much more efficient. Because graphs emphasize relationships between data, they are ideal for several different types of analyses. In particular, graph databases excel at:
A simple example of graph databases in action is the image below, which shows a visual representation of the popular party game “Six Degrees of Kevin Bacon.” For those new to it, this game involves coming up with connections between Kevin Bacon and another actor based on a chain of mutual films. This emphasis on relationships makes it the ideal way to demonstrate graph analytics.
Imagine a data set with two categories of nodes: every film ever made and every actor that has been in those films. Then, using graph, we run a query asking to connect Kevin Bacon to Muppet icon Miss Piggy. The result would be as follows:
In this example, the available nodes (vertices) are both actors and films and the relationships (edges) are the status of “acted in.” From here, the query returns the following results:
Graph databases can query many different relationships for this Kevin Bacon example, such as:
This is, of course, a more amusing example than most uses of graph analytics. But this approach works in nearly all big data—any situation where large numbers of records show a natural connectivity with each other. Some of the most popular ways to use graph analytics is for analyzing social networks, communication networks, website traffic and usage, real-world road data, and financial transactions and accounts.
Conceptually, money laundering is simple. Dirty money is passed around to blend it with legitimate funds and then turned into hard assets. This is the kind of process that was used in the Panama Papers analysis.
More specifically, a circular money transfer involves a criminal who sends large amounts of fraudulently obtained money to himself or herself—but hides it through a long and complex series of valid transfers between “normal” accounts. These “normal” accounts are actually accounts created with synthetic identities. They typically share certain similar information because they are generated from stolen identities (email addresses, addresses, etc.) and it’s this related information that makes graph analysis such a good fit to make them reveal their fraudulent origins.
To make fraud detection simpler, users can create a graph from transactions between entities as well as entities that share some information, including the email addresses, passwords, addresses, and more. Once a graph is created, running a simple query will find all customers with accounts who have similar information, and reveal which accounts are sending money to each other.
Graph databases can be used in many different scenarios, but it is commonly used to analyze social networks. In fact, social networks make the ideal use case as they involve a heavy volume of nodes (user accounts) and multi-dimensional connections (engagements in many different directions). A graph analysis for a social network can determine:
However, this information is useless if it has been unnaturally skewed by bots. Fortunately, graph analytics can provide an excellent means for identifying and filtering out bots.
In a real-world use case, the Oracle team used Oracle Marketing Cloud to evaluate social media advertising and traction—specifically, to identify fake bot accounts that skewed data. The most common behavior by these bots involved retweet target accounts, thus artificially inflating their popularity. A simple pattern analysis allowed for a look using retweet count and density of connections to neighbors. Naturally popular accounts showed different relationships with neighbors compared to bot-driven accounts.
This image shows naturally popular accounts.
And this image shows the behavior of a bot-driven account.
The key here is using the power of graph analytics to identify a natural pattern versus a bot pattern. From there, it’s as simple as filtering out those accounts, though it’s also possible to dig deeper to examine, say, the relationship between bots and retweeted accounts.
Social media networks do their best to eliminate bot accounts, as they impact their overall user base experience. To verify that this process of bot detection was accurate, flagged accounts were checked after a month. The results were as follows:
This extremely high percentage of punished accounts (91.2%) showed the accuracy of both pattern identification and the cleansing process. This would have taken significantly longer in a standard tabular database, but with graph analytics, it&’s possible to identify complex patterns quickly.
Graph databases have become a powerful tool in the finance industry as a means of detecting fraud. Despite advances in anti-fraud technology, such as the use of embedded chips in cards, fraud can still occur in a number of ways. Skimming devices can steal details from magnetic strips—a technique commonly used in locations that haven’t installed chip readers yet. Once those details are stored, they can be loaded onto a counterfeit card to make purchases or withdraw money.
As a means of fraud detection, pattern identification is often the first line of defense. Expected purchase patterns are based on location, frequency, types of stores, and other things that fit a user profile. When something appears totally anomalous—for example, a person who stays within the San Francisco Bay Area most of the time suddenly making late-night purchases in Florida—it flags it as potentially fraudulent.
The computing power needed for this is simplified significantly with graph analytics. Graph analytics excels at establishing patterns between nodes—in this case, the categories of nodes are defined as accounts (cardholders), purchase locations, purchase category, transactions, and terminals. It’s easy to identify natural behavior patterns; for example, in a given month, a person could:
Fraud detection is typically handled with machine learning but graph analytics can supplement this effort to create a more accurate, more efficient process. Thanks to the focus on relationships, the results have become effective predictors in determining and flagging fraudulent records. curating and preparing data before it can actually be used.
Graph databases and graph techniques have been evolving as compute power and big data have increased over the past decade. In fact, it’s become increasingly clear that they will become the standard tool for analyzing a brave new world of complex data relationships. As businesses and organizations continue pushing the capabilities of big data and analysis, the ability to derive insights in increasingly complex ways makes graph databases a must-have for today’s needs and tomorrow’s successes.
Oracle makes it easy to adopt graph technologies. Oracle Database and Oracle Autonomous Database include a graph database and graph analytics engine so users can discover more insights in their data by using the power of graph algorithms, pattern matching queries, and visualization. Graphs are part of Oracle’s converged database, which supports multimodel, multiworkload, and multi-tenant requirements–all in a single database engine.
Although all graph databases claim they are high-performance, Oracle’s graph offerings are performant both in query performance and algorithms, as well as tightly integrated with Oracle database. This makes it easy for developers to add graph analytics to existing applications and make use of the scalability, consistency, recovery, access control, and security that the database provides by default.