The Python interface to the Trovares xGT graph analytics engine.

Main Features

Data loading

xGT is a strongly-typed graph system. Loading data is a two-step process:

  1. Describe the structure and data types of your graph.
Define the vertex and edge frame structure with Connection.create_vertex_frame() and Connection.create_edge_frame(). Once the type structure is set, VertexFrame and EdgeFrame objects provide access to the server-side structures.
  1. Load your edge and vertex data.
The VertexFrame and EdgeFrame objcts provide high-performance, parallel load() methods to ingest data as well as a direct insert() method to add small amounts of data piecewise.

Query processing

Queries are expressed as strings written in TQL.

>>> query = '''
      MATCH  (emp:Employee)-[edge1:ReportsTo]->(boss:Employee)
      RETURN emp.PersonID AS EmployeeID,
             boss.PersonID AS BossID
      INTO   ResultTable
      '''

A query runs in the context of a Job, which can be run, scheduled and canceled. The run_job() method runs the query and blocks until it finishes successfully, terminates by an error, or it’s canceled.

Example

The following Python script shows some of the functions that can be used to create a graph, load data into it, run a query and access the results, and finally remove that graph from the system.

import xgt

#-- Connect to xgtd --
conn = xgt.Connection()

#-- Define and create the graph --
emp = conn.create_vertex_frame(
        name = 'Employee',
        schema = [['PersonID', xgt.INT],
                  ['Name', xgt.TEXT],
                  ['PostalCode', xgt.INT]],
        key = 'PersonID')

rep = conn.create_edge_frame(
        name = 'ReportsTo',
        schema = [['EmpID', xgt.INT],
                  ['BossID', xgt.INT],
                  ['StartDate', xgt.DATE],
                  ['EndDate', xgt.DATE]],
        source = 'Employee',
        target = 'Employee',
        source_key = 'EmpID',
        target_key = 'BossID')

#-- Load data to the graph in xgtd --
# Use the insert() method for data of a few hundred rows or less;
# for bigger amounts of data, use the load() method with csv files.
emp.insert(
  [[111111101, 'Manny', 98103],
   [111111102, 'Trish', 98108],
   [911111501, 'Frank', 98101],
   [911111502, 'Alice', 98102]
  ])
rep.insert(
  [[111111101, 911111501, '2015-01-03', '2017-04-14'],
   [111111102, 911111501, '2016-04-02', '2017-04-14'],
   [911111502, 911111501, '2016-07-07', '2017-04-14'],
   [111111101, 911111502, '2017-04-15', '3000-12-31'],
   [111111102, 911111502, '2017-04-15', '3000-12-31'],
   [911111501, 911111502, '2017-04-15', '3000-12-31']
  ])

#-- Query data --
conn.drop_frame('Result1')
cmd = '''
  MATCH
    (emp:Employee)-[edge1:ReportsTo]->
    (boss:Employee)-[edge2:ReportsTo]->
    (emp)
  WHERE
    edge1.EndDate <= edge2.StartDate
  RETURN
    emp.PersonID AS Employee1ID,
    boss.PersonID AS Employee2ID,
    edge1.StartDate AS FirstStart,
    edge1.EndDate AS FirstEnd,
    edge2.EndDate AS SecondEnd,
    edge2.StartDate AS SecondStart
  INTO
    Result1
  '''
conn.run_job(cmd)

#-- Results extraction --
ncols = len(emp.schema)
nrows = emp.num_vertices
print('Employee columns: {0} rows: {1} '.format(ncols, nrows))

ncols = len(rep.schema)
nrows = rep.num_edges
print('ReportsTo columns: {0} rows: {1} '.format(ncols, nrows))

r1 = conn.get_table_frame('Result1')
ncols = (r1.schema)
nrows = r1.num_rows
print('Result columns: {0} rows: {1} '.format(ncols, nrows))
print('')

print('--- Result1 ---')
r1dat = r1.get_data(0, 100)
for row in r1dat:
  print(', '.join([str(c) for c in row]))
print('')

#-- Drop all objects --
conn.drop_frame('ReportsTo')
conn.drop_frame('Employee')
conn.drop_frame('Result1')