Cypher support in TQL

xGT TQL queries (MATCH)

xGT's TQL language includes a subset of the Cypher language. This document is focused on describing the particular details of TQL's supported Cypher subset for users already familiar with Cypher. This Cypher Reference Card may be helpful.

xGT supports a restricted form of the Cypher query language to enable the exploration of property graph datasets. The principal Cypher command supported by xGT is the MATCH command with constraints. We recommend that you familiarize yourself with the Cypher language and in particular the syntactic components of the MATCH command.

A query in TQL consists typically of four parts:

  1. The graph pattern description.
  2. Constraints on properties & object identities.
  3. Data augmentation and modifications to the graph.
  4. Specification on how the answer set should be produced.

The graph pattern description consists of a list of steps over graph objects in xGT. Each step in the graph pattern corresponds to either a vertex or edge match. A pattern can consist of a sequence of steps which are fully connected in the graph, or it can consist of multiple (comma separated) of those sequences which are connected as a graph pattern by intermediate vertices.

Constraints on properties and object identities are expressed as part of a MATCH query using a WHERE clause. The WHERE clause can contain conditions based on property values of vertices and/or edges (for instance, a.year > 1980), as well as identity comparison between vertices or edges (for instance, a <> b).

Data augmentation and modifications to the graph let the query modify properties of existing matched entities in the graph (vertices or edges), as well as augment the graph with new vertex and edge instances (with their own new properties).

The answer set specification consists of what results should be produced from the MATCH query, including which properties of which entities should be reported back to the user. Solution modifiers, such as sorting the results or reporting only unique results, may also be used to reduce the size of results.

Fundamental concepts in TQL's subset of Cypher

The fundamental concepts used in the Cypher subset supported by TQL are the following:

Property and topology modifications as part of MATCH queries

Aggregation functions and solution modifiers

Examples of TQL queries

We illustrate TQL's subset of Cypher with several examples:

Example 1

MATCH (a:People)-[b:WorksFor]->[c:Companies]
WHERE 100 >= a.person_id AND a.person_id <= 150
RETURN a.name, c.name

In this simple example, we want to match people with an id between 100 and 150 for any company in our graph. We use labels (textual identifier after the ":" on each step) to identify which xGT object to use for that particular graph step. We use variables to capture matched information in order to constrain the results (a.person_id <= 150), as well as to indicate what values should be inserted into the results table. Cypher graph steps use different syntax ("( )") and ("[ ]") for vertices and edges, respectively. Before execution, the query is type checked against the declared xGT objects present in the server. In this case, the xGT server must contain a WorksFor edge frame. This edge frame must connect vertices belonging to the People frame to vertices belonging to the Companies frame.

This particular query example would also be valid in neo4j's Cypher, given the existence of appropriate data with the indicated Cypher labels.

Example 2

MATCH (a)-[:Edge]->(b:)-[]->(c:)-[:Edge]->(a)
WHERE a <> b AND b <> c AND c <> a
RETURN a.unique_id, b.unique_id, c.unique_id

This more complex example finds triangles in the graph. In this case, the frame label for the middle edge step is not given and must be inferred. The return clause indicates that we want the unique_id values (endpoints) of all triangles found in the graph. Note that we check that the identities of the three endpoints are different by requiring that the variables a, b and c to be different from each other (without specifying a property). We guarantee that the matched shape is a triangle by specifying a as both the beginning and end of the pattern.

Example 3

MATCH (source:Node)-[e:Event]->(target:Node)
WHERE source.value < target.value
SET e.duration = e.end_time - e.start_time
RETURN count(*)

This example modifies a property in one (or more) matched edges. In this case, the property duration is set to the difference between start and end times on each edge that satisfies the criteria given in the where clause. The total count of modified edges is returned.

Note that the property modification occurs after the matching and recording of the results has happened. Modifications in a query are not readable within that same query, so if a query returns matched values, they will contain values from before they query's changes.

Example 4

MATCH ()-[e:EdgeFrame]->()
WHERE e.sid = 0 AND e.tid = -1
CREATE (v:VertexFrame { id: e.sid+55, data: "test" })
RETURN e

This example creates a new vertex as part of the vertex frame VertexFrame. The CREATE command sets the key property of the vertex (id, in this case) to be equal to a computed value based on the matched edge e. Additionally, the property data is set to a constant string value. As can be noticed, this query has two side effects: returning the matched edge e and also creating a new vertex v in the vertex frame VertexFrame. Note that if this query results in multiple edges being matched for the criteria indicated in the WHERE clause, then there would be an error trying to insert multiple vertices with the same key value. Vertices must be unique for a particular key value.

Example 5

MATCH (a:VertexFrame)-[e:EdgeFrame]->(b:VertexFrame)
WHERE e.sid = 0 AND e.tid = -1
CREATE (a)-[new_edge:EdgeFrame { float_count: -0.1, int_count: 0 }]->(b)
RETURN e

In this example, a new edge new_edge is inserted into the edge frame EdgeFrame. Note that the end points (a and b) of the new edge must be matched as part of the query. The CREATE command for an edge can set values for any non-key properties of the edge. The key properties are automatically derived from the key values of the two vertex end points. As with vertex creation, this query has two side effects: the return of the matched edges and the creation of new edges. Note that in contrast to vertices, multiple edges with identical key values are allowed in xGT.

Example 6

MATCH (a:VertexFrame)
WHERE a.id = 0
MERGE (b:VertexFrame { id: -1, data: "negative_vertex" })
CREATE (a)-[new_edge:EdgeFrame { float_count: -0.1, int_count: 0 }]->(b)
RETURN a

This example uses the MERGE keyword to either match or create an existing vertex (b with an id key value of -1). In addition, an edge newEdge is created between the matched vertex a with zero id value to the merged vertex b. Note that MERGE is not supported for edges since multiple edges with the same key values are allowed, so matching existing edges can potentially be ambiguous.

Example 7

MATCH (a:VertexFrame)
WHERE a.id = 0
DETACH DELETE a
RETURN a

This example shows the removal of a vertex from the vertex frame VertexFrame and all of its connected edges (if any). While this appears fairly simple, under the covers xGT has to analyze all edge frames that have VertexFrame as one of their end point types and has to make sure all edges incident on the deleted vertex a are removed from those edge frames as well. Note that it is possible to return the matched vertex that is to be deleted (a) since the matching (and result recording) occurs before the deletion itself.

Example 8

MATCH ()-[e:EdgeFrame]->()
WHERE e.sid = 0 AND e.tid = -1
DELETE e
RETURN e

In contrast to deleting a vertex, deleting an edge is a fairly straightforward and localized operation: only a single edge is affected, no vertices are modified. As is the case for vertex deletion, the deleted edge can be returned since matching occurs before the deletion itself.

Type inference in TQL queries

TQL interprets Cypher labels as a type annotation corresponding to the unique name of a vertex or edge frame. The principal difference between Cypher and TQL is that TQL supports only a single label per graph step and it must correspond to a known xGT graph object.

We can say that a vertex belonging to the People vertex frame has type People and an edge belonging to the EdgeFrame edge frame has type EdgeFrame.

Our TQL compiler uses type inference to automatically deduce types in many cases so that TQL queries can be written in a manner similar to their Cypher analogues. In particular, given a type annotation for an edge step [:Edge], the compiler can always deduce the types of the two vertex steps on either side of the edge step: (a)-[:Edge]->(b). In this case, the type annotation for the a and b vertex steps is not needed.

Edge frame annotations can be inferred for edge steps when the preceding and succeeding vertex steps have type annotations and those vertex frames are only used by a single edge frame. If vertices are connected by edges belonging to multiple edge frames, then an unannotated edge step is ambiguous and must have its own type annotation. For example, if we have edge frames WorksFor and ConsumerOf with vertex frames People and Companies as source and target for both, then the unannotated query (a:People)-[b]->(c:Companies) is ambiguous for the edge b.

Multiple unannotated vertex and edge steps are supported in a query because type inferences are propagated from one step to another by the compiler, e.g. (a)-[:Edge]->(b)-[]->(c). Type inference continues until types for all elements of the query have been deduced correctly or the compiler has detected an error.


Limitations on TQL's Cypher support