7.1. Main Features

The Python interface to the Trovares xGT graph analytics engine.

7.1.1. Data Loading

xGT is a strongly-typed graph system. Loading data is a three step process:

  1. Create a namespace or use a pre-created namespace.

Frame names are represented using a two-level naming scheme. The first part represents the namespace in which the frame is stored. Users can create a namespace, or, alternatively, system administrators may want to create namespaces ahead of time. Creating a namespace can be done with Connection.create_namespace() and a list of existing namespaces can be obtained using Connection.get_namespaces().

  1. Describe the structure and data types of your graph.

Define the vertex and edge frame structure with Connection.create_vertex_frame() and Connection.create_edge_frame(). Once the type structure is set, VertexFrame and EdgeFrame objects provide access to the server-side structures.

  1. Load your edge and vertex data.

The VertexFrame and EdgeFrame objects provide the high-performance, parallel load() method to ingest data as well as a direct insert() method to add small amounts of data piecewise.

7.1.2. Query Processing

Queries are expressed as strings written in TQL.

>>> query ='''
      MATCH  (emp:career__Employee)-[edge1:ns__ReportsTo]->(boss:career__Employee)
      RETURN emp.person_id AS employee_id,
             boss.person_id AS boss_id
      INTO   results__ResultTable
'''

A query runs in the context of a Job, which can be run, scheduled and canceled. The Connection.run_job() method runs the query and blocks until it finishes successfully, terminates by an error, or is canceled.

7.1.3. Example Script

The following Python script shows some of the functions that can be used to create a graph, load data into it, run a query and access the results, and finally remove that graph from the system.

import xgt

# Connect to xgtd.
conn = xgt.Connection()

# Create a namespace.
conn.create_namespace('career')

# Define and create the graph.
employees = conn.create_vertex_frame(
  name='career__Employees',
  schema=[['person_id', xgt.INT],
          ['name', xgt.TEXT],
          ['postal_code', xgt.INT]],
  key='person_id')

reports_to = conn.create_edge_frame(
  name='career__ReportsTo',
  schema=[['employee_id', xgt.INT],
          ['boss_id', xgt.INT],
          ['start_date', xgt.DATE],
          ['end_date', xgt.DATE]],
  source=employees,
  target=employees,
  source_key='employee_id',
  target_key='boss_id')

# Load data to the graph in xgtd.
# Use the insert() method for data of a few hundred rows or less;
# for bigger amounts of data, use the load() method with csv files.
employees.insert(
  [[111111101, 'Manny', 98103],
   [111111102, 'Trish', 98108],
   [911111501, 'Frank', 98101],
   [911111502, 'Alice', 98102]
  ])
reports_to.insert(
  [[111111101, 911111501, '2015-01-03', '2017-04-14'],
   [111111102, 911111501, '2016-04-02', '2017-04-14'],
   [911111502, 911111501, '2016-07-07', '2017-04-14'],
   [111111101, 911111502, '2017-04-15', '3000-12-31'],
   [111111102, 911111502, '2017-04-15', '3000-12-31'],
   [911111501, 911111502, '2017-04-15', '3000-12-31']
  ])

# Query data
cmd = '''
  MATCH
    (employee:career__Employees)-[edge1:career__ReportsTo]->
    (boss:career__Employees)-[edge2:career__ReportsTo]->
    (employee)
  WHERE
    edge1.end_date <= edge2.start_date
  RETURN
    employee.person_id AS employee1_id,
    boss.person_id AS employee2_id,
    edge1.start_date AS start1,
    edge1.end_date AS end1,
    edge2.start_date AS start2,
    edge2.end_date AS end2
  INTO
    results__Results
  '''

conn.run_job(cmd)
results = conn.get_table_frame('results__Results')

# Extract the results.
for frame in [employees, reports_to, results]:
  print('{name}: {n_cols} columns, {n_rows} rows'.format(
    name=frame.name,
    n_cols=len(frame.schema),
    n_rows=frame.num_rows))

print('\n--- Results ---')
for row in results.get_data(0,100):
  print(', '.join([str(c) for c in row]))

# Drop all objects.
[conn.drop_frame(f) for f in [reports_to, employees, results]]