Welcome to Parallel Graph Analytics (PGX)

What is PGX?

PGX is a fast, parallel, in-memory graph analytic framework. Using PGX, the users can load up their graphs into main-memory, run various graph algorithms on them very efficiently, explore their results, and export them back into the file system. 


What’s new in PGX 1.2.0?

In our latest PGX version, we have added awesome features like PGQL, a new query language for graph pattern matching, a new algorithm and APIs to help you build a recommendation engine on top of your graph and more. Check out our What's new page or read the more detailed Changelog.


What can I do with PGX?

  • Applying graph pattern matching: PGX includes an SQL-like query language for pattern-matching subgraphs based on their connections, properties or both. Matched subgraphs can have further analytics run against them.

  • Running parallel, high-performance graph algorithms: PGX provides built-in implementations of many popular graph algorithms. The user can easily apply these algorithms on their graph data sets by simply invoking the appropriate methods.

  • Running custom graph algorithms: PGX is also able to execute custom (i.e. user-provided) graph algorithms. Users can write up their own graph algorithms with the Green-Marl DSL and feed it to PGX. The provided Green-Marl program is transformed to be executed by PGX using a parallelizing compiler.

  • Mutating Graphs: Complicated graph analyses often consist of multiple steps, where some of the steps require graph mutating operations. For example, one may want to create an undirected version of the graph, to renumber the nodes in the graph, or remove repeated edges between nodes. PGX provides fast, parallel built-in implementations of such operations.

  • Browsing and exporting results: Once the analysis is finished, the users can browse the results of their analysis and export them into the file system.

What are the key benefits of PGX?

  • Fast, parallel, in-memory execution: PGX is a fast, parallel, in-memory graph analytic framework. PGX adopts light-weight in-memory data structures which allow fast execution of graph algorithms. Moreover, PGX exploits multiple CPUs of modern computer systems by running parallelized graph algorithms. Note that not only the built-in algorithms are parallelized, but also custom graph algorithms are automatically parallelized with the help of a DSL compiler.

  • Rich built-in algorithms: PGX provides built-in implementations of many popular graph algorithms including computing various centrality measures, finding shortest paths, finding/evaluating clusters and components, and predicting future edges, etc. (Note: The OTN public release contains only a small subset of these algorithms. See the documentation and contact us if you want to remove this limitation.)

  • Easy implementation and efficient execution of custom algorithms: PGX adopts the Green-Marl DSL for the sake of both ease of implementation of custom algorithms and their efficient execution. The users can program their own graph algorithms intuitively by using the high-level graph-specific data type and operators in Green-Marl. PGX can execute the given Green-Marl program efficiently by parallelizing the given Green-Marl program and mapping it into the PGX-internal API.

  • Interactive Shell: PGX provides a shell application with which the user can exercise the PGX features in an interactive manner. That is, the user can simply start the shell and type commands from the shell command line, instead of creating a whole Java application for his/her analysis.

  • Deploy as a webservice: PGX ships with a web application which can be deployed in a container like Weblogic, Jetty or Tomcat. This allows you to use your interactive shell and other APIs on a remote instance. You can deploy PGX on a server-class machine and have multiple clients share access to the resources of that machine.

  • Hadoop support: You can use PGX to analyze graphs on a Hadoop cluster. You can run PGX as a Yarn application and connect to it from the interactive shell or other APIs. PGX also supports loading and storing graphs from HDFS.

How can I use PGX? What does the PGX API look like?

PGX can be used in two ways.

  1. In a Java application: Since PGX is implemented as a set of Java classes, the users can embed PGX into their Java application as a library. The user, however, needs to take care of starting up PGX appropriately, before he/she invokes PGX methods.

  2. Interactively from the shell: The user can also make use of PGX, as if it is a separate application, by using the PGX shell. Once the user starts up the PGX shell, he/she can load graphs, invoke algorithms, and browse/export results in a very simple manner using the shell.

  3. Remote usage: For both use cases above, you can either use PGX locally or remotely. In the remote case you need to start PGX on a webserver and provide the client with a hostname and port to connect to. If you use PGX locally, it will simply spin up a local PGX instance on which you can work without any HTTP overhead.

Check our tutorials for more information on how to use PGX.


What is the license of PGX?

This version of PGX is released under the OTN license. Please see the documentation for more details about the OTN release and its limitations.

How can I install PGX in my system?

Please see the installation documentation, which explains how to install PGX.