Using the xgt
package
Introduction
xGT is a tool for reading in massive amounts of data into
RAM for performing fast pattern search operations.
The best data for this analytic approach is where there are
relationships between data objects described in the data.
The xgt
library is a module written in the Python language and is the
preferred interface between a Python script and xGT.
Installing the Python xgt
package
The xGT Python interface can be used in a variety of ways. (Check the AWS documentation for more info.)
If you wish to run python scripts on a local machine, then you will need to install the xgt
python package on your client.
Trovares distributes xgt
as a pip package, available to download from our developer site.
You can do this directly from the pip install line:
python -m pip uninstall xgt
python -m pip install http://developer.trovares.com/download/python/awslive/xgt-awslive.tar.gz
Using the xgt
package
The following code is an example of a script that takes advantage of the
xgt
library to create a graph, load data to it, run a query, extract
results, and finally remove the graph.
The full API reference for the xgt
package is also available here: Python package manual
import xgt
#-- Connect to xgtd --
conn = xgt.Connection()
#-- Define and create the graph --
emp = conn.create_vertex_frame(name = 'Employee',
schema = [['PersonID', xgt.INT],
['Name', xgt.TEXT],
['PostalCode', xgt.INT]],
key = 'PersonID')
rep = conn.create_edge_frame(name = 'ReportsTo',
schema = [['EmpID', xgt.INT],
['BossID', xgt.INT],
['StartDate', xgt.DATE],
['EndDate', xgt.DATE]],
source = 'Employee',
target = 'Employee',
source_key = 'EmpID',
target_key = 'BossID')
#-- Load data to the graph in xgtd --
# Use the insert() method for data of a few hundred rows or less;
# for bigger amounts of data, use the load() method with csv files.
emp.insert(
[[111111101, 'Manny', 98103],
[111111102, 'Trish', 98108],
[911111501, 'Frank', 98101],
[911111502, 'Alice', 98102]
])
rep.insert(
[[111111101, 911111501, '2015-01-03', '2017-04-14'],
[111111102, 911111501, '2016-04-02', '2017-04-14'],
[911111502, 911111501, '2016-07-07', '2017-04-14'],
[111111101, 911111502, '2017-04-15', '3000-12-31'],
[111111102, 911111502, '2017-04-15', '3000-12-31'],
[911111501, 911111502, '2017-04-15', '3000-12-31']
])
#-- Query data --
cmd = '''
MATCH
(emp:Employee)-[edge1:ReportsTo]->
(boss:Employee)-[edge2:ReportsTo]->
(emp)
WHERE
edge1.EndDate <= edge2.StartDate
RETURN
emp.PersonID AS Employee1ID,
boss.PersonID AS Employee2ID,
edge1.StartDate AS FirstStart,
edge1.EndDate AS FirstEnd,
edge2.EndDate AS SecondEnd,
edge2.StartDate AS SecondStart
INTO
Result1
'''
conn.drop_frame('Result1')
conn.run_job(cmd)
#-- Results extraction --
ncols = len(emp.schema)
nrows = emp.num_vertices
print('Employee columns: {0} rows: {1} '.format(ncols, nrows))
ncols = len(rep.schema)
nrows = rep.num_edges
print('ReportsTo columns: {0} rows: {1} '.format(ncols, nrows))
r1 = conn.get_table_frame('Result1')
ncols = len(r1.schema)
nrows = r1.num_rows
print('Result columns: {0} rows: {1} '.format(ncols, nrows))
print('')
print('--- Result1 ---')
r1dat = r1.get_data(0, 100)
for row in r1dat:
print(', '.join([str(c) for c in row]))
#-- Drop all objects --
conn.drop_frame('ReportsTo')
conn.drop_frame('Employee')
conn.drop_frame('Result1')