Using the xGT Python interface
Introduction
xGT is a two-part tool: a client and a server. The xGT server is an engine for analyzing large-scale graph datasets which typically runs on a powerful machine equipped with a large memory. The client is a thin Python library which can control the server from anywhere. This allows xGT to be integrated easily into existing data analysis pipelines regardless of their location.
Installing the xgt
package
Trovares distributes xgt
as a pip package, available to download from our developer site.
You can do this directly from the command line:
python -m pip install --upgrade xgt
Using the xgt
package
In a Python interpreter or script, simply run import xgt
.
The xgt
library is a client interface which allows you to talk to an xGT server running somewhere else.
Connect by creating a Connection
object pointing to the host where your xGT server is running (by default, this is localhost).
import xgt
print('Client version: ' + xgt.__version__)
conn = xgt.Connection(host='127.0.0.1')
print('Server version: ' + conn.server_version)
print(str(conn.free_user_memory_size) + ' bytes available in xGT')
A complete example
The following code is an example of a script that uses the xgt
library to create a graph, load data to it, run a query, extract results, and finally remove the graph.
The full API reference for the xgt
package is also available here:
Python package manual.
import xgt
# Connect to xgtd
conn = xgt.Connection()
# Define and create the graph
employees = conn.create_vertex_frame(
name='Employees',
schema=[['person_id', xgt.INT],
['name', xgt.TEXT],
['postal_code', xgt.INT]],
key='person_id')
reports_to = conn.create_edge_frame(
name='ReportsTo',
schema=[['employee_id', xgt.INT],
['boss_id', xgt.INT],
['start_date', xgt.DATE],
['end_date', xgt.DATE]],
source=employees,
target=employees,
source_key='employee_id',
target_key='boss_id')
# Load data to the graph in xgtd
# Use the insert() method for data of a few hundred rows or less;
# for bigger amounts of data, use the load() method with csv files.
employees.insert(
[[111111101, 'Manny', 98103],
[111111102, 'Trish', 98108],
[911111501, 'Frank', 98101],
[911111502, 'Alice', 98102]
])
reports_to.insert(
[[111111101, 911111501, '2015-01-03', '2017-04-14'],
[111111102, 911111501, '2016-04-02', '2017-04-14'],
[911111502, 911111501, '2016-07-07', '2017-04-14'],
[111111101, 911111502, '2017-04-15', '3000-12-31'],
[111111102, 911111502, '2017-04-15', '3000-12-31'],
[911111501, 911111502, '2017-04-15', '3000-12-31']
])
# Query data
cmd = '''
MATCH
(employee:Employees)-[edge1:ReportsTo]->
(boss:Employees)-[edge2:ReportsTo]->
(employee)
WHERE
edge1.end_date <= edge2.start_date
RETURN
employee.person_id AS employee1_id,
boss.person_id AS employee2_id,
edge1.start_date AS start1,
edge1.end_date AS end1,
edge2.start_date AS start2,
edge2.end_date AS end2
INTO
Results
'''
conn.drop_frame('Results')
conn.run_job(cmd)
results = conn.get_table_frame('Results')
# Results extraction
for frame in [employees, reports_to, results]:
print('{name}: {n_cols} columns, {n_rows} rows'.format(
name=frame.name,
n_cols=len(frame.schema),
n_rows=frame.num_rows))
print('\n--- Results ---')
for row in results.get_data(0,100):
print(', '.join([str(c) for c in row]))
# Drop all objects
[conn.drop_frame(f) for f in [reports_to, employees, results]]