Cypher support in TQL
xGT TQL queries (MATCH)
xGT's TQL language includes a subset of the Cypher language. This document is focused on describing the particular details of TQL's supported Cypher subset for users already familiar with Cypher. This Cypher Reference Card may be helpful.
xGT supports a restricted form of the Cypher query language to enable the exploration of property graph datasets. The principal Cypher command supported by xGT is the read-only MATCH command with constraints. We recommend that you familiarize yourself with the Cypher language and in particular the syntactic components of the MATCH command.
A MATCH query in TQL consists typically of four parts:
- The graph pattern description.
- Constraints on properties & object identities.
- Data augmentation and modifications to the graph.
- Specification on how the answer set should be produced.
The graph pattern description consists of a list of steps over graph objects in xGT. Each step in the graph pattern corresponds to either a vertex or edge match. A pattern can consist of a sequence of steps which are fully connected in the graph or it can consist of multiple (comma separated) of those sequences which are connected as a graph pattern by intermediate vertices.
Constraints on properties and object identities are expressed as part of a MATCH query using a WHERE clause. The WHERE clause can contain conditions based on property values of vertices and/or edges (a.year > 1980), as well as identity comparison between vertices or edges (a <> b).
Data augmentation and modifications to the graph lets the query modify properties of existing matched entities in the graph (vertices or edges), as well as augment the graph with new vertex and edge instances (with their own new properties).
The answer set specification consists of what results should be produced from the MATCH query. Including which properties of which entities should be reported back to the user, as well as solution modifiers, such as sorting the results based on some of the columns, reporting only unique results or aggregating information over columns.
Fundamental concepts in TQL's subset of Cypher
The fundamental concepts used in the Cypher subset supported by TQL are the following:
- Graph pattern steps: each Cypher query consists of a sequence of steps over graph objects in xGT.
- Cypher differentiates between vertex steps and edge steps syntactically.
- Labels: Cypher uses labels to indicate the type of a particular graph object. The label corresponds to the name of a vertex frame or edge frame.
- Our mapping of Cypher to TQL requires the use of at most a single label on each graph step.
- The name of the label must match a previously declared xGT graph object type: a vertex frame or edge frame.
- Variables: in a Cypher query, variables are used to propagate matched information to other parts of the query, as well as to the results set.
- A variable used in two different (but compatible) steps will guarantee that the matched objects are the same.
- Variables also provide the mechanism for writing query constraints over the graph object's properties.
- Graph step semantics: by default, TQL's vertex or edge steps are not
restricted to be distinct from other vertex or edge steps. If two vertex or
two edge steps must be unique, then an identity constraint must be added to
guarantee it:
WHERE a <> b
. - We provide a shorthand form to simplify the addition of these unique
constraints for vertices (which is the most common case). The Cypher function
unique_vertices()
can be added to theWHERE
clause of a query to guarantee that the vertices specified in the arguments must be unique with respect to each other. For example, the syntaxunique_vertices(a, b, c)
will generate the constraint a <> b, a <> c and b <> c.
Property and topology modifications as part of MATCH
queries
- We support the
SET
operation to modify existing properties that have been declared as part of the frame's schema. Only properties that are not part of the frame's key can be modified by theSET
operation. Trying to modify a key property results in an error. - Multiple properties can be modified at a time by using a property map:
SET a = { name : 'Alice', zip : 90001, state : 'WA' }
, where isa
variable that has been matched. TQL differs from Cypher in the treatment of property maps for theSET
command because xGT introduces the concept of key properties which can not be modified on demand (viaSET
). For this reason, theSET
command with a property map must not include key properties. Due to this, the syntaxSET a = { map }
andSET a += { map }
behave in the same manner and only allow modification of non-key properties. - Property removal via the
REMOVE
command is not supported in xGT since all properties of a frame are considered to be schema properties, which can not be modified after creation of a frame. - Addition of properties that are NOT specified in the frame's schema is not supported in xGT. All properties of a frame must be declared as part of its schema.
- The type of the expression used to
SET
a property to a particular value must be compatible with the declared type of the property in the frame's schema. An error is reported otherwise. - xGT supports dynamic additions and deletions of vertex and edge instances to compatible frames as part of a running query. In addition to producing a results table, a query can have side-effects that modify the topology of the graph by adding new vertex and edge instances.
- Vertex and edge instance addition is supported via the Cypher/TQL
CREATE
command. This command requires the user to specify a variable to be bound to the newly created instance, the type name of the frame that the instance will be added to and a property map with the values of the properties of that new instance. All key properties must have values, other properties default to a null value if not specified in the map. - Vertex creation syntax:
CREATE (v0:<vertex frame name> { keyProp1 : <value>, keyProp2 : <value>, ... })
. In this case, the variablev0
must not be a part of the underlyingMATCH
query, it solely binds to the newly created vertex instance. The vertex frame must have been created previously and theCREATE
command must include values for all key properties. The values of the keys will be checked for uniqueness across the entire vertex frame. - Edge creation syntax:
CREATE (v0)-[e0:<edge frame name> { property1 : <value>, property2 : <value> }]->(v1)
. The variablee0
must not be bound to any other entity as part of theMATCH
query. On the other hand, the two endpoints of the edgev0
andv1
must be bound to instances of the vertex frames of the same type as the specified endpoints of the edge frame.v0
andv1
must be bound as part of theMATCH
query. The property values in the map for the new edge are optional, the values of the key properties are directly taken from the keys of the two endpoints:v0
andv1
. Specifying the key property values manually is not allowed. For convenience, theCREATE
command for an edge can specify the direction in either way:CREATE (source)-[]->(target)
orCREATE (target)<-[]-(source)
. - xGT supports the use of the
MERGE
keyword to indicate the matching of an existing vertex or its creation if it does not exist in the corresponding vertex frame. The user must at least specify the values of the key property of the vertex:MERGE (v: <vertex frame name> { keyProp : <value>, ... })
. The merged vertex can then be used to create a new edge connecting to it. The use of theMERGE
keyword is not allowed for edges, since multiple edge instances with the same key values are permitted in xGT. It would be ambiguous which one to retrieve, if they exist. - The power of dynamic additions of vertices and edges to the graph comes from possibility of specifying topology connections and values of their properties programmatically, from matched data in the graph.
- Removal of vertices is supported via the
DETACH DELETE
command. The syntax isDETACH DELETE <matched vertex variable>
, where the matched vertex variable has been bound to a vertex entity as part of theMATCH
command. - Note that the removal of a vertex triggers removal of ALL incident edges (incoming and outgoing) on that vertex across all related edge frames. The cost of removal could be non-trivial for very high degree vertices.
- Removing an edge is achieved via the
DELETE <matched edge variable>
command. Removing an edge is simpler than removing a vertex and does not trigger effects beyond the edge frame containing that edge. The only requirement to remove an edge is toMATCH
it into a bound variable as part of the query.
Aggregation functions and solution modifiers
- We support a set of Cypher's aggregation functions:
count(*)
,sum()
,avg()
,min()
andmax()
, which can be computed over all returns or by a grouping key. - We have provided an extension to Cypher to be able to use degree computations
over vertices in
MATCH
queries. We support the following functions:outdegree()
andindegree()
. All of them take one or two parameters with the first parameter being the name of a vertex variable. The second optional parameter is the name of an edge frame to use for a relative degree calculation. The absolute degree functions calculate the degree over all the edges connecting the vertex, regardless of edge frame. - We support the
ORDER BY
construct for sorting results tables. - We support the
LIMIT
andSKIP
constructs for reducing the size of the results. - As opposed to Cypher, xGT supports the use of both null and NaN (Not-A-Number) for floating point types. Data can be ingested either with null values or NaN values. null values will be treated as Cypher treats null values where NaN values will be treated according to the IEEE standard. NaN values may also arise from the use of floating point operations in xGT.
Examples of TQL queries
We illustrate TQL's subset of Cypher with several examples:
Example 1
MATCH (a:Person)-[b:WorksFor]->[c:Company]
WHERE 100 <= a.PersonID AND a.PersonID <= 150
RETURN a.Name, c.Name
In this simple example, we want to match people with an id between 100 and 150 for any company in our graph. We use labels (textual identifier after the ":" on each step) to identify which xGT object to use for that particular graph step. We use variables to capture matched information in order to constrain the results (a.PersonID <= 150), as well as to indicate what values should be inserted into the results table. Notice that the Cypher graph steps use different syntax ("( )") and ("[ ]") for vertices and edges, respectively. Before execution, the query is type checked against the declared xGT objects present in the server. In this case, the xGT server must contain a WorksFor edge frame. This edge frame must connect vertices belonging to the Person frame to vertices belonging to the Company frame.
This particular query example would also be valid in neo4j's Cypher, given the existence of appropriate data with the indicated Cypher labels.
Example 2
MATCH (a)-[:TrianglesEdge]->(b:)-[]->(c:)-[:TrianglesEdge]->(a)
WHERE a <> b AND b <> c
AND c <> a
RETURN a.UniqueID, b.UniqueID, c.UniqueID
This more complex example finds triangles in the graph. In this case, the frame
label for the middle edge step is not given and must be inferred. The return
clause indicates that we want the UniqueID
values (endpoints) of all triangles
found in the graph. Note that we check that the identities of the three
endpoints are different by requiring that the variables a
, b
and c
to be
different from each other (without specifying a property). Also note that we
guarantee that the matched shape is indeed a triangle by specifying a
as the
beginning AND end of the pattern.
Example 3
MATCH ()-[a:edgeFrame]->()
WHERE a.sid = 0 AND a.tid = 1
SET a.int_count = a.int_count + 1
RETURN count(*)
This example modifies an integer property in one (or more) matched edges. In
this case, the integer property int_count
is incremented by one on each
matched edge that satisfies the criteria specified in the where clause. The
total count of modified edges is returned.
Note that the property modification occurs after the matching and recording of the results has happened. If a query returned the property values of the matched edges together with modifications to those properties, then the returned results will contain the previous values rather than the modified ones.
Example 4
MATCH ()-[e:edgeFrame]->()
WHERE e.sid = 0 AND e.tid = -1
CREATE (v:vertexFrame { id : e.sid + 55, data : "test" })
RETURN e
This example creates a new vertex as part of the vertex frame vertexFrame
.
The CREATE
command sets the key property of the vertex (id
, in this case) to
be equal to a computed value based on the matched edge e
. Additionally, the
property data
is set to a constant string value. As can be noticed, this
query has two side effects: returning the matched edge e
and ALSO
creating a new vertex v
in the vertex frame vertexFrame
. Note that if this
query results in multiple edges being matched for the criteria indicated in the
WHERE
clause, then there would be an error trying to insert multiple vertices
with the same key value. Vertices must be unique for a particular key
value.
Example 5
MATCH (a:vertexFrame)-[e:edgeFrame]->(b:vertexFrame)
WHERE e.sid = 0 AND e.tid = -1
CREATE (a)-[newEdge:edgeFrame { float_count : -0.1, int_count : 0 }]->(b)
RETURN e
In this example, a new edge newEdge
is inserted into the edge frame
edgeFrame
. Note that the end points (a
and b
) of the new edge must be
matched as part of the query. The CREATE
command for an edge can set values
for any non-key properties of the edge. The key properties are derived from the
key values of the two vertex end points. As with vertex creation, this query
has two side effects: the return of the matched edges and the creation of new
edges. Note that in contrast to vertices, repeated edges with the same key
values are allowed in xGT.
Example 6
MATCH (a:vertexFrame)
WHERE a.id = 0
MERGE (b:vertexFrame { id : -1, data : "negative_vertex" })
CREATE (a)-[newEdge:edgeFrame { float_count : -0.1, int_count : 0 }]->(b)
RETURN a
This example uses the MERGE
keyword to either match or create an existing
vertex (b
with an id
key value of -1). In addition, an edge newEdge
is
created between the match vertex a
with zero id
value to the merged vertex
b
. Note that MERGE
is not supported for edges since multiple edges with
the same key values are allowed and thus matching existing edges is potentially
ambiguous.
Example 7
MATCH (a:vertexFrame)
WHERE a.id = 0
DETACH DELETE a
RETURN a
This example shows the removal of a vertex from the vertex frame vertexFrame
.
While this appears fairly simple, under the covers xGT has to analyze all edge
frames that have vertexFrame
as one of their end point types and has to make
sure all edges incident on the deleted vertex a
are removed from those edge
frames as well. Note that it is possible to return the matched vertex that is
to be deleted (a
) since the matching (and result recording) occurs before the
deletion itself.
Example 8
MATCH ()-[e:edgeFrame]->()
WHERE e.sid = 0 AND e.tid = -1
DELETE e
RETURN e
In contrast to deleting a vertex, deleting an edge is a fairly straightforward
and localized operation: only the containing edge frame (edgeFrame
in this
case) is affected. As is the case for vertex deletion, the deleted edge can be
returned since matching occurs before the deletion itself.
Type inference in TQL queries
TQL interprets Cypher labels as a type annotation corresponding to the unique name of a vertex or edge frame. The principal difference between Cypher and TQL is that TQL supports only a single label per graph step and it must correspond to a known xGT graph object.
We can say that a vertex belonging to the Person vertex frame is of type Person and an edge belonging to the WorksFor edge frame is of type WorksFor.
Our TQL compiler uses type inference to enable the elision of those annotations in many cases so that TQL queries can be written in a manner similar to their Cypher analogues. The TQL compiler tries to automatically deduce what the type annotation should be for different graph objects. In particular, given a type annotation for an edge step "[ :edge ]", the compiler can deduce the types of the two vertex steps on either side of the edge step: (a)-[:edge]->(b). In this case, the type annotation for the "a" and "b" vertex steps is not needed.
Edge frame annotations can be inferred for edge steps when the preceding and
succeeding vertex steps have type annotations and those vertex frames are only
used in the declaration of a single edge frame. If vertices of a frame are
connected by edges belonging to multiple edge frames, then an unannotated edge
step could be ambiguous and must have its own type annotation: for example, if
we have edge frames worksFor
and consumerOf
with vertex frames Person
and
Company
for both, then the following unannotated query is ambiguous for the
edge step "b
": (a:Person)-[b]->(c:Company)
.
Multiple unannotated vertex and edge steps are supported in a query because
types are inferred and propagated from one step to another by the compiler,
e.g. (a)-[:edge]->(b)-[]->(c)
. Type inference continues iteratively until all
graph pattern steps in the query have been "typed" correctly or the compiler has
detected an error.
Limitations on TQL's Cypher support
- TQL supports a subset of MATCH read-only and read-write queries.
- LOAD CSV statements are not supported by TQL. TQL provides alternative commands to load data into graph data structures.
- WITH and UNION clauses are not yet supported by TQL.
- The REMOVE statements is not supported by TQL since dynamic properties are not supported by xGT.
- FOREACH and CALL statements are not supported by TQL.
- CASE statements and list expressions are not supported by TQL.
- Cypher DDL statements will not be supported by TQL. TQL provides alternative DDL commands better suited to property graph manipulation.
- Cypher WHERE conditions are allowed that use constants and the properties and frames previously created through TQL DDL commands.
- Cypher's idiom for group by operations is supported by TQL, it can be combined with ORDER BY and LIMIT, but not yet with SKIP.