Using the xGT Python interface

Introduction

xGT is a two-part tool: a client and a server. The xGT server is an engine for analyzing large-scale graph datasets which typically runs on a powerful machine equipped with a large memory. The client is a thin Python library which can control the server from anywhere. This allows xGT to be integrated easily into existing data analysis pipelines regardless of their location.

Installing the xgt package

Trovares distributes xgt as a pip package, available to download from our developer site. You can do this directly from the command line:

python -m pip uninstall xgt
python -m pip install http://developer.trovares.com/download/python/latest/xgt-latest.tar.gz

Using the xgt package

In a Python interpreter or script, simply run import xgt. The xgt library is a client interface which allows you to talk to an xGT server running somewhere else. Connect by creating a Connection object pointing to the host where your xGT server is running (by default, this is localhost).

import xgt
print('Client version: ' + xgt.__version__)

conn = xgt.Connection(host='127.0.0.1')

print('Server version: ' + conn.server_version)
print(str(conn.free_user_memory_size) + ' bytes available in xGT')

A complete example

The following code is an example of a script that uses the xgt library to create a graph, load data to it, run a query, extract results, and finally remove the graph.

The full API reference for the xgt package is also available here: Python package manual

import xgt

#-- Connect to xgtd --
conn = xgt.Connection()

#-- Define and create the graph --
emp = conn.create_vertex_frame(name   = 'Employee',
                               schema = [['PersonID', xgt.INT],
                                         ['Name', xgt.TEXT],
                                         ['PostalCode', xgt.INT]],
                               key    = 'PersonID')

rep = conn.create_edge_frame(name   = 'ReportsTo',
                             schema = [['EmpID', xgt.INT],
                                       ['BossID', xgt.INT],
                                       ['StartDate', xgt.DATE],
                                       ['EndDate', xgt.DATE]],
                             source = 'Employee',
                             target = 'Employee',
                             source_key = 'EmpID',
                             target_key = 'BossID')

#-- Load data to the graph in xgtd --
# Use the insert() method for data of a few hundred rows or less;
# for bigger amounts of data, use the load() method with csv files.
emp.insert(
  [[111111101, 'Manny', 98103],
   [111111102, 'Trish', 98108],
   [911111501, 'Frank', 98101],
   [911111502, 'Alice', 98102]
  ])
rep.insert(
  [[111111101, 911111501, '2015-01-03', '2017-04-14'],
   [111111102, 911111501, '2016-04-02', '2017-04-14'],
   [911111502, 911111501, '2016-07-07', '2017-04-14'],
   [111111101, 911111502, '2017-04-15', '3000-12-31'],
   [111111102, 911111502, '2017-04-15', '3000-12-31'],
   [911111501, 911111502, '2017-04-15', '3000-12-31']
  ])


#-- Query data --
cmd = '''
  MATCH
    (emp:Employee)-[edge1:ReportsTo]->
    (boss:Employee)-[edge2:ReportsTo]->
    (emp)
  WHERE
    edge1.EndDate <= edge2.StartDate
  RETURN
    emp.PersonID AS Employee1ID,
    boss.PersonID AS Employee2ID,
    edge1.StartDate AS FirstStart,
    edge1.EndDate AS FirstEnd,
    edge2.EndDate AS SecondEnd,
    edge2.StartDate AS SecondStart
  INTO
    Result1
  '''
conn.drop_frame('Result1')
conn.run_job(cmd)

#-- Results extraction --
ncols = len(emp.schema)
nrows = emp.num_vertices
print('Employee columns: {0} rows: {1} '.format(ncols, nrows))

ncols = len(rep.schema)
nrows = rep.num_edges
print('ReportsTo columns: {0} rows: {1} '.format(ncols, nrows))

r1 = conn.get_table_frame('Result1')
ncols = len(r1.schema)
nrows = r1.num_rows
print('Result columns: {0} rows: {1} '.format(ncols, nrows))
print('')

print('--- Result1 ---')
r1dat = r1.get_data(0, 100)
for row in r1dat:
    print(', '.join([str(c) for c in row]))

#-- Drop all objects --
conn.drop_frame('ReportsTo')
conn.drop_frame('Employee')
conn.drop_frame('Result1')