Graph analysis lets you reveal latent information that is encoded, not as fields in your data, but as direct and indirect relationships - metadata - between elements of your data - information that is not obvious to the naked eye, but can have tremendous value once uncovered.
PGX is a toolkit for graph analysis - both running algorithms such as PageRank against graphs, and performing SQL-like pattern-matching against graphs, using the results of algorithmic analysis. Algorithms are parallelized for extreme performance. The PGX toolkit includes both a single-node in-memory engine, and a distributed engine for extremely large graphs. Graphs can be loaded from a variety of sources including flat files, SQL and NoSQL databases and Apache Spark and Hadoop; incremental updates are supported.
The tools included as part of the PGX distribution include:
The typical usage pattern in PGX is to
In addition, there are features for filtering graphs, extracting subgraphs and much more, and graphs can be saved for later use.
In our latest PGX version, we have added awesome features like Apache Spark support, the ability to export compiled Green-Marl programs as Java JAR files and more. Check out our what's new page for the latest features.
Load graphs from a variety of sources such as relational databases, NoSQL databases, Apache Spark / Hadoop, and flat files
Applying graph pattern matching: PGX includes an SQL-like query language for pattern-matching subgraphs based on their connections, properties or both. Matched subgraphs can have further analytics run against them.
Running parallel, high-performance graph algorithms: PGX provides built-in implementations of many popular graph algorithms. The user can easily apply these algorithms on their graph data sets by simply invoking the appropriate methods.
Browsing and exporting results: Once the analysis is finished, the users can browse the results of their analysis and export them into the file system.
Fast, parallel, in-memory execution: PGX is a fast, parallel, in-memory graph analytic framework. PGX adopts light-weight in-memory data structures which allow fast execution of graph algorithms. Moreover, PGX exploits multiple CPUs of modern computer systems by running parallelized graph algorithms. Note that not only the built-in algorithms are parallelized, but also custom graph algorithms are automatically parallelized with the help of a DSL compiler.
Rich built-in algorithms: PGX provides built-in implementations of many popular graph algorithms including computing various centrality measures, finding shortest paths, finding/evaluating clusters and components, and predicting future edges, etc. (Note: The OTN public release contains only a small subset of these algorithms. See the documentation and contact us if you want to remove this limitation.)
Easy implementation and efficient execution of custom algorithms: PGX adopts the Green-Marl DSL for the sake of both ease of implementation of custom algorithms and their efficient execution. The users can program their own graph algorithms intuitively by using the high-level graph-specific data type and operators in Green-Marl. PGX can execute the given Green-Marl program efficiently by parallelizing the given Green-Marl program and mapping it into the PGX-internal API.
Interactive Shell: PGX provides a shell application with which the user can exercise the PGX features in an interactive manner. That is, the user can simply start the shell and type commands from the shell command line, instead of creating a whole Java application for his/her analysis.
Deploy as a webservice: PGX ships with a web application which can be deployed in a container like Weblogic, Jetty or Tomcat. This allows you to use your interactive shell and other APIs on a remote instance. You can deploy PGX on a server-class machine and have multiple clients share access to the resources of that machine.
Hadoop support: You can use PGX to analyze graphs on a Hadoop cluster. You can run PGX as a Yarn application and connect to it from the interactive shell or other APIs. PGX also supports loading and storing graphs from HDFS.
PGX can be used in several ways:
In a Java (or Scala or Groovy or other JVM language) application: The entire runtime, PGX or the PGX client (talking to a remote PGX server) can be used as a library embedded in a Java application.
Interactively from the shell: The user can also make use of PGX, as if it is a separate application, by using the PGX shell. Once the user starts up the PGX shell, he/she can load graphs, invoke algorithms, and browse/export results in a very simple manner using the shell.
In an Apache Zeppelin notebook: A zeppelin interpreter is available for download, which embeds the PGX shell in Zeppelin (which can talk to an embedded or remote PGX server instance). Analyses can be collaboratively, interactively developed in a web browser and formatted as reports.
Remote usage: For both use cases above, you can either use PGX locally or remotely. In the remote case you need to start PGX on a webserver and provide the client with a hostname and port to connect to. If you use PGX locally, it will simply spin up a local PGX instance on which you can work without any HTTP overhead.
See the tutorials for more information on how to use PGX.
This version of PGX is released under the OTN license. Please see the documentation for more details about the OTN release and its limitations.
Please see the installation documentation, which explains how to install PGX.