3.3. Using the Python Library¶
The xGT client is a python library, xgt
, that connects to a running xGT server.
This section gives a brief overview of using xgt
to perform graph analytics with xGT.
Section Graph Thinking provides a tutorial on graph analysis with xGT, including how to think of and model your data as a graph.
The first step is to connect to the server which returns a Connection
object. This object is used to drive the server to perform graph analysis.
To learn more about connecting to a server, see Connecting to a Server:
import xgt
server = xgt.Connection(host = 'localhost')
The second step is to create a new graph by creating one or more vertex and edge frames. The example below defines an empty graph on the server:
vertex_frame = server.create_vertex_frame(name = 'VertexFrame',
schema = [['id', xgt.INT]],
key = 'id')
edge_frame = server.create_edge_frame(name = 'EdgeFrame',
source = 'VertexFrame',
target = 'VertexFrame',
schema = [['source_id', xgt.INT],
['target_id', xgt.INT],
['value', xgt.TEXT]],
source_key = 'source_id',
target_key = 'target_id')
In this example, vertex_frame
and edge_frame
are frame proxy objects that can be used to load data into the graph and obtain basic information about it.
To read more about frame types in xGT as well as the API for creating and modifying frames, see Graph Data Model: Frames and Namespaces and Frame Management.
The third step is to load data into the graph. Data Movement discusses loading data in detail.
The fourth step is to run queries to search for graph patterns. Queries are Python strings following the Trovares Query Language (TQL) syntax:
query = """
MATCH (a)-[e1:EdgeFrame]->(b)<-[e2:EdgeFrame]-(c)
WHERE e1.value = e2.value
RETURN COUNT(*)
"""
job = server.run_job(query)
For more information on running queries, see Job Management and History. Sections The Trovares Query Language (TQL) and TQL for Cypher Users discuss TQL.
3.3.1. Multiprocessing¶
There are some limitations on using multiple processes in Python with xgt. When creating multiple processes, xgt requires that the connection must be created after all processes have forked within a specific process. Creating a connection and then forking will not work correctly. This means a user may do the following common scenarios: 1. Fork processes and create a connection within the newly forked processes (so long as the user doesn’t fork again within these processes). 2. Fork processes from parent process and then create a connection in the parent.
An example of how to use multiprocessing with xgt:
from multiprocessing import Process
import xgt
def process(query):
server = xgt.Connection(host = 'localhost')
job = server.run_job(query)
query = """
MATCH (a)-[e1:EdgeFrame]->(b)<-[e2:EdgeFrame]-(c)
WHERE e1.value = e2.value
RETURN COUNT(*) INTO result
"""
p = Process(target = process, args = (query,))
p.start()
server = xgt.Connection(host = 'localhost')
p.join()
result = server.get_table_frame('result')
An example of an incorrect way to use multiprocessing:
from multiprocessing import Process
import xgt
server = xgt.Connection(host = 'localhost')
def process(query):
job = server.run_job(query)
query = """
MATCH (a)-[e1:EdgeFrame]->(b)<-[e2:EdgeFrame]-(c)
WHERE e1.value = e2.value
RETURN COUNT(*) INTO result
"""
p = Process(target = process, args = (query,))
p.start()
p.join()
result = server.get_table_frame('result')
These limitations only apply to libraries that use fork. Libraries that use multithreading work as expected.