Using the xgt package

Introduction

xGT is a tool for reading in massive amounts of data into RAM for performing fast pattern search operations. The best data for this analytic approach is where there are relationships between data objects described in the data. The xgt library is a module written in the Python language and is the preferred interface between a Python script and xGT.

Installing the Python xgt package

The xGT Python interface can be used in a variety of ways. (Check the AWS documentation for more info.)

If you wish to run python scripts on a local machine, then you will need to install the xgt python package on your client. Trovares distributes xgt as a pip package, available to download from our developer site. You can do this directly from the pip install line:

python -m pip uninstall xgt
python -m pip install http://developer.trovares.com/download/python/awslive/xgt-awslive.tar.gz

Using the xgt package

The following code is an example of a script that takes advantage of the xgt library to create a graph, load data to it, run a query, extract results, and finally remove the graph.

The full API reference for the xgt package is also available here: Python package manual

import xgt

#-- Connect to xgtd --
conn = xgt.Connection()

#-- Define and create the graph --
emp = conn.create_vertex_frame(name   = 'Employee',
                               schema = [['PersonID', xgt.INT],
                                         ['Name', xgt.TEXT],
                                         ['PostalCode', xgt.INT]],
                               key    = 'PersonID')

rep = conn.create_edge_frame(name   = 'ReportsTo',
                             schema = [['EmpID', xgt.INT],
                                       ['BossID', xgt.INT],
                                       ['StartDate', xgt.DATE],
                                       ['EndDate', xgt.DATE]],
                             source = 'Employee',
                             target = 'Employee',
                             source_key = 'EmpID',
                             target_key = 'BossID')

#-- Load data to the graph in xgtd --
# Use the insert() method for data of a few hundred rows or less;
# for bigger amounts of data, use the load() method with csv files.
emp.insert(
  [[111111101, 'Manny', 98103],
   [111111102, 'Trish', 98108],
   [911111501, 'Frank', 98101],
   [911111502, 'Alice', 98102]
  ])
rep.insert(
  [[111111101, 911111501, '2015-01-03', '2017-04-14'],
   [111111102, 911111501, '2016-04-02', '2017-04-14'],
   [911111502, 911111501, '2016-07-07', '2017-04-14'],
   [111111101, 911111502, '2017-04-15', '3000-12-31'],
   [111111102, 911111502, '2017-04-15', '3000-12-31'],
   [911111501, 911111502, '2017-04-15', '3000-12-31']
  ])


#-- Query data --
cmd = '''
  MATCH
    (emp:Employee)-[edge1:ReportsTo]->
    (boss:Employee)-[edge2:ReportsTo]->
    (emp)
  WHERE
    edge1.EndDate <= edge2.StartDate
  RETURN
    emp.PersonID AS Employee1ID,
    boss.PersonID AS Employee2ID,
    edge1.StartDate AS FirstStart,
    edge1.EndDate AS FirstEnd,
    edge2.EndDate AS SecondEnd,
    edge2.StartDate AS SecondStart
  INTO
    Result1
  '''
conn.drop_frame('Result1')
conn.run_job(cmd)

#-- Results extraction --
ncols = len(emp.schema)
nrows = emp.num_vertices
print('Employee columns: {0} rows: {1} '.format(ncols, nrows))

ncols = len(rep.schema)
nrows = rep.num_edges
print('ReportsTo columns: {0} rows: {1} '.format(ncols, nrows))

r1 = conn.get_table_frame('Result1')
ncols = len(r1.schema)
nrows = r1.num_rows
print('Result columns: {0} rows: {1} '.format(ncols, nrows))
print('')

print('--- Result1 ---')
r1dat = r1.get_data(0, 100)
for row in r1dat:
    print(', '.join([str(c) for c in row]))

#-- Drop all objects --
conn.drop_frame('ReportsTo')
conn.drop_frame('Employee')
conn.drop_frame('Result1')