Using the xgt
package
Introduction
xGT is a tool for reading in massive amounts of data into
RAM for performing fast pattern search operations.
The best data for this analytic approach is where there are
relationships between data objects described in the data.
The xgt
library is a module written in the Python language and is the
preferred interface between a Python script and xGT.
Installing the Python xgt
package
The xGT Python interface can be used in a variety of ways. (Check the AWS documentation for more info.)
If you wish to run python scripts on a local machine, then you will need to install the xgt
python package on your client.
Trovares distributes xgt
as a pip package, available to download from our developer site.
You can do this directly from the pip install line:
python -m pip uninstall xgt
python -m pip install http://developer.trovares.com/download/python/awslive/xgt-awslive.tar.gz
Using the xgt
package
The following code is an example of a script that takes advantage of the
xgt
library to create a graph, load data to it, run a query, extract
results, and finally remove the graph.
import xgt
#-- Define the graph in python --
ng = xgt.Graph('Company')
v1 = xgt.Vertex(name = 'Employee',
schema = [['PersonID', xgt.INT],
['Name', xgt.TEXT],
['PostalCode', xgt.INT]],
key = ['PersonID'])
e1 = xgt.Edge(name = 'ReportsTo',
schema = [['EmpID', xgt.INT],
['BossID', xgt.INT],
['StartDate', xgt.DATE],
['EndDate', xgt.DATE]],
source = [['EmpID', v1.key.PersonID]],
target = [['BossID', v1.key.PersonID]])
ng.add(v1).add(e1)
#-- Connect to the xGT server --
conn = xgt.connect()
#-- Create the graph --
conn.drop_graph('Company')
conn.create(ng)
#-- Load data into the graph --
cg = conn.get_graph('Company')
emp = cg.vertices.Employee
emp.insert(
[
[111111101, 'Manny', 98103],
[111111102, 'Trish', 98108],
[911111501, 'Frank', 98101],
[911111502, 'Alice', 98102]
])
rep = cg.edges.ReportsTo
rep.insert(
[
[111111101, 911111501, '2015-01-03', '2017-04-14'],
[111111102, 911111501, '2016-04-02', '2017-04-14'],
[911111502, 911111501, '2016-07-07', '2017-04-14'],
[111111101, 911111502, '2017-04-15', '3000-12-31'],
[111111102, 911111502, '2017-04-15', '3000-12-31'],
[911111501, 911111502, '2017-04-15', '3000-12-31']
])
#-- Query data --
conn.drop_table('Result1')
cmd = """
MATCH
(emp:Employee)-[edge1:ReportsTo]->
(boss:Employee)-[edge2:ReportsTo]->
(emp)
WHERE
edge1.EndDate <= edge2.StartDate
RETURN
emp.Name AS EmpName,
emp.PersonID AS Employee1ID,
boss.PersonID AS Employee2ID,
edge1.StartDate AS FirstStart,
edge1.EndDate AS FirstEnd,
edge2.EndDate AS SecondEnd,
edge2.StartDate AS SecondStart
INTO
Result1
"""
conn.run_job(cmd)
#-- Create a table --
conn.drop_table('Table01')
nt = xgt.Table(name = 'Table01',
schema = [['col1', xgt.INT],
['col2', xgt.TEXT],
['col3', xgt.DATE]])
conn.create(nt)
r3 = conn.get_table('Table01')
#-- Results extraction --
r1 = conn.get_table('Result1')
r1dat = r1.get_data(0, 100)
for row in r1dat:
r = [('"' + c + '"' if isinstance(c, str) else str(c)) for c in row]
print(', '.join(r))
print('')
#-- Drop all objects --
conn.drop_graph('Company')
conn.drop_table('Result1')
conn.drop_table('Table01')