4.15. TQL for Cypher Users

4.15.1. xGT TQL Queries

xGT’s TQL language includes a subset of the Cypher language. This document is focused on describing the particular details of TQL’s supported Cypher subset for users already familiar with Cypher. This Cypher Reference Card and the Cypher Query Language Reference may be helpful.

xGT supports a restricted form of the Cypher query language to enable the exploration of property graph datasets. We recommend that you familiarize yourself with the Cypher language and in particular the syntactic components of the different Cypher clauses. TQL is Typed

A fundamental difference between Cypher and TQL is that TQL is strongly typed. Each vertex (node) and each edge (relationship) is assigned to exactly one frame. There are three types of frames. A description of each is given below, with more information in Graph Data Model: Frames and Namespaces.

A VertexFrame is a collection of vertices (nodes) each of which has the same set of properties. For example, a vertex frame may contain a collection of vertices with an integer property named “id” and a date property named “startTime”, and a boolean property named “isActive”. Each vertex that belongs to this frame must have only these properties and the value of the property must be of the data type specified for the frame or null if not assigned for that vertex.

Similarly, an EdgeFrame is a collection of edges (relationships), each of which has the same set of properties. Each edge that belongs to this frame must have only these properties and the value of the property must be of the data type specified for the frame or null. Furthermore, each edge frame has a single source vertex frame and a single (same or different) target vertex frame. This means that all edges in a frame have a source vertex that belongs to a single frame and a target vertex that belongs to a single frame.

Finally, a TableFrame is a collection of non-graph rows with the same set of properties. These are usually created to store results of queries.

The frame of a vertex or edge is similar to the type of a relationship in Cypher. It is also somewhat similar to the label of a node in Cypher, except that while nodes in Cypher can have multiple labels, there is exactly one frame per vertex or edge in TQL. The frame that a vertex or edge is assigned to cannot be changed. Pattern Matching

TQL queries are very similar to Cypher queries, with some key differences. As in Cypher, pattern matching in TQL is done using the MATCH clause.

TQL interprets Cypher labels on vertices and types on edges as a type annotation corresponding to the fully qualified name of a vertex or edge frame. In the example below, VertexFrame would be a label in Cypher, and EdgeFrame would be a type in Cypher. In TQL, VertexFrame is the name of the frame to which the first vertex in the pattern belongs to, while EdgeFrame is the name of the frame to which the edge in the pattern belongs to. Note that the use of a default namespace (see Default Namespace) has an impact on how the type annotation is expressed for a vertex or edge. It is enough to use the frame name (without the namespace prefix), if the corresponding frame is in the default namespace.

MATCH (a:VertexFrame)-[:EdgeFrame]->()

A key difference between Cypher and TQL is that TQL supports only a single frame per vertex step, which must correspond to a known frame in xGT. Therefore, the following is not supported in TQL:

MATCH (a:VertexFrame:OtherVertexFrame)-[:EdgeFrame]->()

Unlike vertex steps, TQL supports specifying multiple frames for a single edge step in a pattern using the pipe symbol. This means that the edge step can belong to any of the frames specified. Any edge that matches such an edge step in a pattern will still only belong to one of the specified frames. An example of this is shown below:

MATCH (a:VertexFrame)-[:EdgeFrame | :OtherEdgeFrame]->()

TQL supports a limited form of variable-length edge traversal. See section Variable-length Edge Traversal for more discussion. TQL Query Structure

A query part consists typically of four pieces:

  1. The graph pattern description.

  2. Constraints on properties and graph elements.

  3. Data augmentation and modifications to the graph.

  4. Specification on how the answer set should be produced.

Query sections can be linked to each other using the WITH clause to produce a final composite answer for the overall query.

The graph pattern description consists of a list of matching steps over graph frames in xGT. Each step in the graph pattern corresponds to either a vertex or edge match. A pattern can consist of a single sequence of steps which are fully connected in the graph, or it can consist of multiple (comma separated) of those sequences which are connected as a graph pattern by intermediate vertices. In TQL, each graph pattern must form a single connected component. Multiple MATCH clauses are allowed in a single query section and are interpreted in the same way as non-linear patterns connected by the comma operator.

Constraints on properties and object identities are expressed as part of a query using a WHERE clause. The WHERE clause can contain conditions based on property values of vertices and/or edges (for instance, a.year > 1980), as well as identity comparison between vertices or edges (for instance, a <> b).

TQL supports searching for optional patterns connected to the main graph pattern with the use of the OPTIONAL MATCH clause.

Data augmentation and modifications to the graph let the query modify properties of existing matched entities in the graph (vertices or edges), as well as augment the graph with new vertex and edge instances (with their own new properties).

The answer set specification consists of what results should be produced from the query, including which properties of which entities should be reported back to the user. Solution modifiers, such as sorting the results or reporting only unique results, may also be used to further process the results for final output to the user. Type Inference in TQL Queries

We can say that a vertex belonging to the People vertex frame has type People and an edge belonging to the EdgeFrame edge frame has type EdgeFrame.

Our TQL compiler uses type inference to automatically deduce types in many cases so that TQL queries can be written in a manner similar to their Cypher analogues. In particular, given a type annotation for an edge step [:EdgeFrame], the compiler can always deduce the types of the two vertex steps on either side of the edge step: (a)-[:EdgeFrame]->(b). In this case, the type annotation for the a and b vertex steps is not needed.

Edge frame annotations can be inferred for edge steps when the preceding and succeeding vertex steps have type annotations and those vertex frames are only used by a single edge frame. If vertices are connected by edges belonging to multiple edge frames, then an unannotated edge step is ambiguous and must have its own type annotation. For example, if we have edge frames WorksFor and ConsumerOf with vertex frames People and Companies as source and target for both, then the unannotated query (a:People)-[b]->(c:Companies) is ambiguous for the edge b.

The TQL compiler can also deduce types based on frame properties used in a WHERE clause. If the combination of properties only occurs in a single vertex or edge frame, the type can be deduced. Consider the case where there are only two vertex frames. The frame People has properties name and DOB. The frame Companies has properties name and location. The compiler can correctly deduce the type of v in the following query as Companies because only Companies has the property location.

WHERE v.location = "Seattle"

The compiler cannot deduce the type of v in the following query, however, because multiple frames have the property name.

WHERE v.name = "John"

The compiler also looks at combinations of attached frames and properties to attempt to deduce the type of frame.

Multiple unannotated vertex and edge steps are supported in a query because type inferences are propagated from one step to another by the compiler, e.g. (a)-[:EdgeFrame]->(b)-[]->(c). Type inference continues until types for all elements of the query have been deduced correctly or the compiler has detected an error. Pattern Variables

In a Cypher query, variables are used to propagate matched information to other parts of the query, as well as to the results set. A variable used in two different (but compatible) steps will guarantee that the matched objects are the same. Variables also provide the mechanism for writing query constraints over the graph object’s properties. In the example below, the variable a is assigned to the first and third vertices, the second vertex has variable b, the variable e is assigned to the first edge, and the second edge has no variable assigned.

MATCH (a)-[e:EdgeFrame]->(b)-[]->(a)

By default, TQL’s vertex or edge steps are not restricted to be distinct from other vertex or edge steps. This is in contrast to the semantics for MATCH clauses in Neo4j, which does not allow repeated edges in a single MATCH clause. If two vertex or two edge steps must be unique, then an identity constraint must be added to guarantee it: WHERE a <> b. This means that in the example above, all three vertices could be the same even though two labels (a and b) are used. TQL provides a shorthand form to simplify the addition of these unique constraints for vertices (which is the most common case). The Cypher function unique_vertices() can be added to the WHERE clause of a query to guarantee that the vertices specified in the arguments must be unique with respect to each other. For example, the syntax unique_vertices(a, b, c) will generate the constraint a <> b, a <> c, and b <> c. To specify that edges must be unique, the constraints such as e1 <> e2 must be added to the query. Property and Topology Modifications as Part of Queries

  • We support the SET operation to modify existing properties that have been declared as part of the frame’s schema. Only properties that are not part of the frame’s key can be modified by the SET operation. Trying to modify a key property results in an error.

  • Multiple properties can be modified at a time by using a property map: SET a = { name : 'Alice', zip : 90001, state : 'WA' }, where is a variable that has been matched. TQL differs from Cypher in the treatment of property maps for the SET command because xGT maintains the concept of key properties which can not be modified on demand (via SET). For this reason, the SET command with a property map must not include key properties. Due to this, the syntax SET a = { map } and SET a += { map } behave in the same manner and only allow modification of non-key properties.

  • Property removal via the REMOVE keyword is not supported since xGT considers all properties part of a frame’s schema and thus cannot be modified after frame creation.

  • Addition of properties that are not specified in the frame’s schema is not supported in xGT. All properties of a frame must be declared as part of its schema.

  • The type of the expression used to SET a property to a particular value must be compatible with the declared type of the property in the frame’s schema. An error is reported otherwise.

  • xGT supports dynamic additions and deletions of vertex and edge instances to compatible frames as part of a running query. In addition to producing a results table, a query can have side effects that modify the topology of the graph by adding new vertex and edge instances.

  • Vertex and edge instance addition is supported via the Cypher/TQL CREATE command. This command requires the user to specify a variable to be bound to the newly created instance, the name of the frame that the instance will be added to and a property map with the values of the properties of that new instance. All key properties must have values, other properties default to a NULL value if not specified in the map.

  • Vertex creation syntax: CREATE (v0:<vertex frame name> { keyProperty : <value>, otherProperty : <value>, ... }). In this case, the variable v0 must not be bound to any other portion of the underlying query part. It solely binds to the newly created vertex instance. The vertex frame must have been created previously and the CREATE statement must include values for all key properties. The values of the keys will be checked for uniqueness across the entire vertex frame.

  • Edge creation syntax: CREATE (v0)-[e0:<edge frame name> { property1 : <value>, property2 : <value> }]->(v1). The variable e0 must not be bound to any other entity as part of the MATCH query. On the other hand, the two endpoints of the edge v0 and v1 must be bound to vertices belonging to the source and target vertex frames of the edge frame. v0 and v1 must be bound as part of the query. The property values in the map for the new edge are optional, the values of the key properties are directly taken from the keys of the two endpoints: v0 and v1. Specifying the key property values manually is not allowed. For convenience, the CREATE command for an edge can specify the direction in either way: CREATE (source)-[]->(target) or CREATE (target)<-[]-(source).

  • xGT supports the use of the MERGE keyword to indicate the matching of an existing vertex or its creation if it does not exist in the corresponding vertex frame. The user must at least specify the values of the key property of the vertex: MERGE (v: <vertex frame name> { keyProperty : <value>, ... }). The merged vertex can then be used to create a new edge connecting to it. The use of the MERGE keyword is not allowed for edges, since multiple edge instances with the same key values are permitted in xGT. It would be ambiguous which one to retrieve, if they exist.

  • The power of dynamic additions of vertices and edges to the graph comes from possibility of specifying topology connections and values of their properties programmatically, from matched data in the graph.

  • Removal of vertices is supported via the DETACH DELETE command. The syntax is DETACH DELETE <matched vertex variable>, where the matched vertex variable has been bound to a vertex entity as part of a MATCH statement.

  • Note that the removal of a vertex triggers removal of all incident edges (incoming and outgoing) on that vertex across all related edge frames. The cost of removal could be non-trivial for very high degree vertices. It may be that Access Control will impact this operation. See Topology Deletions for a detailed explanation.

  • Removing an edge is achieved via the DELETE <matched edge variable> command. Removing an edge is simpler than removing a vertex and does not trigger effects beyond the edge frame containing that edge. The only requirement to remove an edge is to match it to a bound variable as part of the query. Aggregation Functions and Solution Modifiers

  • We support a set of Cypher’s aggregation functions: count(*), sum(), avg(), min(), max(), and collect(), which can be computed over all returns or by a grouping key.

  • We have provided an extension to Cypher to be able to use degree computations over vertices in MATCH queries. We support the following functions: outdegree() and indegree(). All of them take one or two parameters with the first parameter being the name of a vertex variable. The second optional parameter is the name of an edge frame to use for a relative degree calculation. The absolute degree functions calculate the degree over all the edges connecting the vertex, regardless of edge frame.

  • We support the DISTINCT construct for returning unique results.

  • We support the ORDER BY construct for sorting results tables.

  • We support the LIMIT and SKIP constructs for reducing the size of the results.

  • As opposed to Cypher, TQL supports the use of both null and not-a-number values for floating-point types. NULL will be treated as Cypher treats null values where NaN will behave according to the IEEE standard. NaN may also arise from the use of floating point operations in xGT.

4.15.2. Examples of TQL Queries

We illustrate TQL’s subset of Cypher with several examples: Example 1

MATCH (a:People)-[b:WorksFor]->(c:Companies)
WHERE 100 <= a.person_id AND a.person_id <= 150
RETURN a.name, c.name

In this simple example, we want to match people with an id between 100 and 150 for any company in our graph. We use labels (textual identifier after the “:” on each step) to identify which xGT object to use for that particular graph step. Note that the labels correspond to qualified frame names, that is they include the namespace where the frames reside. We use variables to capture matched information in order to constrain the results a.person_id <= 150, as well as to indicate what values should be inserted into the results table. Cypher graph steps use different syntax ( ) and [ ] for vertices and edges, respectively. Before execution, the query is type checked against the declared xGT objects present in the server. In this case, the xGT server must contain a WorksFor edge frame in the namespace career. This edge frame must connect vertices belonging to the People frame to vertices belonging to the Companies frame.

This particular query example would also be valid in Neo4j’s Cypher, given the existence of appropriate data with the indicated Cypher labels. Example 2

MATCH (a)-[:EdgeFrame]->(b:)-[]->(c:)-[:EdgeFrame]->(a)
WHERE a <> b AND b <> c AND c <> a
RETURN a.unique_id, b.unique_id, c.unique_id

This more complex example finds triangles in the graph. In this case, the frame label for the middle edge step is not given and must be inferred. The return clause indicates that we want the unique_id values (endpoints) of all triangles found in the graph. Note that we check that the identities of the three endpoints are different by requiring that the variables a, b and c are different from each other (without specifying a property). We guarantee that the matched shape is a triangle by specifying a as both the beginning and end of the pattern. Example 3

MATCH (source:VertexFrame)-[e:Event]->(target:VertexFrame)
WHERE source.value < target.value
SET e.duration = e.end_time - e.start_time

This example modifies a property in one (or more) matched edges. In this case, the property duration is set to the difference between start and end times on each edge that satisfies the criteria given in the where clause.

Note that the property modification occurs after the matching and recording of the results has happened. Modifications in a query are not readable within that same query, so if a query returns matched values, they will contain values from before the query’s changes. Example 4

MATCH ()-[e:EdgeFrame]->()
WHERE e.sid = 0 AND e.tid = -1
CREATE (v:VertexFrame { id: e.sid+55, data: "test" })

This example creates a new vertex as part of the vertex frame VertexFrame. The CREATE command sets the key property of the vertex (id, in this case) to be equal to a computed value based on the matched edge e. Additionally, the property data is set to a constant string value. As can be noticed, this query has two side effects: returning the matched edge e and also creating a new vertex v in the vertex frame VertexFrame. Note that if this query results in multiple edges being matched for the criteria indicated in the WHERE clause, then there would be an error trying to insert multiple vertices with the same key value. Vertices must be unique for a particular key value. Example 5

MATCH (a:VertexFrame)-[e:EdgeFrame]->(b:VertexFrame)
WHERE e.sid = 0 AND e.tid = -1
CREATE (a)-[new_edge:EdgeFrame { float_count: -0.1, int_count: 0 }]->(b)

In this example, a new edge new_edge is inserted into the edge frame EdgeFrame. Note that the end points (a and b) of the new edge must be matched as part of the query. The CREATE command for an edge can set values for any non-key properties of the edge. The key properties are automatically derived from the key values of the two vertex end points. Note that in contrast to vertices, multiple edges with identical key values are allowed in xGT. Example 6

MATCH (a:VertexFrame)
WHERE a.id = 0
MERGE (b:VertexFrame { id: -1, data: "negative_vertex" })
CREATE (a)-[new_edge:EdgeFrame { float_count: -0.1, int_count: 0 }]->(b)

This example uses the MERGE keyword to either match or create an existing vertex (b with an id key value of -1). In addition, an edge newEdge is created between the matched vertex a with zero id value to the merged vertex b. Note that MERGE is not supported for edges since multiple edges with the same key values are allowed, so matching existing edges can potentially be ambiguous.

Note also that this query does not have a return value. Not including the RETURN clause is convenient for graph modification queries as often no return value is needed. Example 7

MATCH (a:VertexFrame)
WHERE a.id = 0

This example shows the removal of a vertex from the vertex frame VertexFrame and all of its connected edges (if any). While this appears fairly simple, under the covers xGT has to analyze all edge frames that have VertexFrame as one of their source or target vertex frame and has to make sure all edges incident on the deleted vertex a are removed from those edge frames as well. Note that it is possible to return the matched vertex that is to be deleted (a) since the matching (and result recording) occurs before the deletion itself.

It may be that Access Control will impact this operation. See Topology Deletions for a detailed explanation. Example 8

MATCH ()-[e:EdgeFrame]->()
WHERE e.sid = 0 AND e.tid = -1

In contrast to deleting a vertex, deleting an edge is a fairly straightforward and localized operation: only a single edge is affected, no vertices are modified. As is the case for vertex deletion, the deleted edge can be returned since matching occurs before the deletion itself.

If returning the edge is not required the query can be expressed as follows:

MATCH ()-[e:EdgeFrame]->()
WHERE e.sid = 0 AND e.tid = -1

In this case, the answer set for the query is empty. Example 9

MERGE (a:VertexFrame { id: 0, data: "zero vertex" })
MERGE (b:VertexFrame { id: -1, data: "negative vertex" })
CREATE (a)-[new_edge:EdgeFrame { float_count: -0.1, int_count: 0 }]->(b)

This is an example of query with graph modification side effects ONLY. It does not produce an answer table or have a MATCH clause with a pattern. Note that creation or merging of new vertices and edges in this manner, is limited to those specified by constant values, in contrast to using MATCH to find them. Example 10

MATCH (a:VertexFrame)-[:EdgeFrame *10..20]->(b:VertexFrame)
WHERE a <> b

This query uses variable-length edge traversal on an edge frame whose source and target vertex frames are the same. The query will count different vertex endpoints a and b such that b is reachable from a by traversing at least 10 and no more than 20 edges. Example 11

MATCH (a:VertexFrame)-[]->(b:VertexFrame)
WHERE a <> b AND (a)-[]->(a)

This example shows how to filter on patterns using the WHERE clause to further constrain the matching graph elements. In this case, the matched vertices a and b will only be reported for source vertices a that have a one edge cycle. Example 12

MATCH (a:VertexFrame)-[]->(b:VertexFrame)
OPTIONAL MATCH (b)-[]->(c)
OPTIONAL MATCH (b)<-[]-(d)

This example shows the use of an optional match to find additional optional patterns connected to the pattern in the required match. TQL does not allow the same variable to be shared between multiple optional match patterns in the same query part unless that variable was used in the required match pattern. This means that in the example above, the second optional match could not refer to c.

The optional pattern must connect to the required pattern through a shared vertex.

4.15.3. Limitations on TQL’s Cypher Support

  • TQL supports a subset of MATCH read-only and read-write queries.

  • LOAD CSV statements are not supported by TQL. xGT provides alternative commands to load data into graph data structures. See Data Movement.

  • The REMOVE statement is not supported by TQL since dynamic properties are not supported by xGT.

  • FOREACH statements are not supported by TQL.

  • Cypher’s pattern comprehension is not supported by TQL.

  • Cypher DDL statements are not supported by TQL.

  • Maps are not supported in TQL.

  • Path variables cannot be returned as part of the results of a query.

xGT provides alternative frame management commands better suited to property graph manipulation.