Cypher support in TQL
xGT TQL Queries (MATCH)
xGT's TQL language includes a subset of the Cypher language. This document is focused on describing the particular details of TQL's supported Cypher subset for users familiar with neo4j's Cypher language.
xGT supports a restricted form of the Cypher query language to enable the exploration of property graph datasets. The principal Cypher command supported by xGT is the read-only MATCH command with constraints. We recommend that you familiarize yourself with the Cypher language and in particular the syntactic components of the MATCH command.
A MATCH query in TQL consists typically of three parts:
- The graph pattern description.
- Constraints on properties & object identities.
- Specification on how the answer set should be produced.
The graph pattern description consists of a list of steps over graph objects in xGT. Each step in the graph pattern corresponds to either a vertex or edge match. A pattern can consist of a sequence of steps which are fully connected in the graph or it can consist of multiple (comma separated) of those sequences which are connected as a graph pattern by intermediate vertices.
Constraints on properties and object identities are expressed as part of a MATCH query using a WHERE clause. The WHERE clause can contain conditions based on property values of vertices and/or edges (a.year > 1980), as well as identity comparison between vertices or edges (a <> b).
The answer set specification consists of what results should be produced from the MATCH query. Including which properties of which entities should be reported back to the user, as well as solution modifiers, such as sorting the results based on some of the columns, reporting only unique results or aggregating information over columns.
Details on supported Cypher constructs in TQL
The fundamental concepts used in the Cypher subset supported by TQL are the following:
- Graph pattern steps: each Cypher query consists of a sequence of steps over graph objects in xGT.
- Cypher differentiates between vertex steps and edge steps syntactically.
- Labels: Cypher uses labels to indicate the type of a particular graph object (vertex or edge).
- Our mapping of Cypher to TQL requires the use of at most a single label on each graph step.
- The name of the label must match a previously declared xGT graph object type: vertex or edge type.
- Variables: in a Cypher query, variables are used to propagate matched information to other parts of the query, as well as to the results set.
- A variable used in two different (but compatible) steps will guarantee that the matched objects are the same.
- Variables also provide the mechanism for writing query constraints over the graph object's properties.
- Graph step semantics: by default, TQL's vertex or edge steps are not restricted to be distinct from other vertex or edge steps. If two vertex or two edge steps must be unique, then a constraint must be added to guarantee it: WHERE a <> b.
- We provide a shorthand form to simplify the addition of these unique constraints for vertices (which is the most common case). The Cypher function unique_vertices() can be added to the WHERE clause of a query to guarantee that the vertices specified in the arguments must be unique with respect to each other. For example, the syntax unique_vertices(a, b, c) will generate the constraint a <> b, a <> c and b <> c.
- We support a set of Cypher's aggregation functions: count(*), sum(), avg(), min() and max(), which can be computed over all returns or by a grouping key.
- We have provided an extension to Cypher to be able to use degree computations over vertices in MATCH queries. We support the following functions: outdegree() and indegree(). All of them take a one or two parameters with the first parameter being the name of a vertex label. The second optional parameter is the name of an edge type to use for a relative degree calculation. The absolute degree functions calculate the degree over all the edge types that the vertex belongs to in the current xGT environment.
- We support the ORDER BY construct for sorting results tables.
- We support the LIMIT & SKIP constructs for reducing the size of the results.
- The combination of ORDER BY w/LIMIT is supported.
- We do not currently support the combination of ORDER BY & DISTINCT, but will do so in the near future.
Examples of TQL queries
We illustrate TQL subset of Cypher with two examples:
Example 1
MATCH (a:Person)-[b:WorksFor]->[c:Company]
WHERE 100 <= a.PersonID AND a.PersonID <= 150
RETURN a.Name, c.Name
In this simple example, we want to match people with an id between 100 and 150 for any company in our graph. We use labels (textual identifier after the ":" on each step) to identify which xGT object to use for that particular graph step. We use variables to capture matched information in order to constrain the results (a.PersonID <= 150), as well as to indicate what values should be inserted into the results table. Notice that the Cypher graph steps use different syntax ("( )") and ("[ ]") for vertices and edges, respectively. Before execution, the query is type checked against the declared xGT objects present in the database. In this case, the xGT database must contain a WorksFor edge type that connects vertices of type Person to vertices of type Company.
This particular query example would also be valid in neo4j's Cypher, given the existence of appropriate data with the indicated Cypher labels.
Example 2
MATCH (a)-[:TrianglesEdge]->(b:)-[]->(c:)-[:TrianglesEdge]->(a)
WHERE a <> b AND b <> c
AND c <> a
RETURN a.UniqueID, b.UniqueID, c.UniqueID
This more complex example uses an edge type xGT object to find triangles in the graph. In this case, the declaration of the edge type object uses the property UniqueID as the integer key of the vertices. The return clause indicates that we want the UniqueID values (endpoints) of all triangles found in the graph.
Type inference in TQL queries
TQL interprets Cypher labels as a type annotation corresponding to the unique name of a vertex or edge type. The principal difference between Cypher and TQL is that TQL supports only a single label per graph step and it must correspond to a known xGT graph object.
Our TQL compiler uses type inference to enable the elision of those annotations in many cases so that TQL queries can be written in a manner similar to their Cypher analogues. The TQL compiler tries to automatically deduce what the type annotation should be for different graph objects. In particular, given a type annotation for an edge step "[ :edge ]", the compiler can deduce the types of the two vertex types on either side of the edge step: (a)-[:edge]->(b). In this case, the type annotation for the "a" and "b" vertex steps is not needed.
Edge type annotations can be inferred for edge steps when the preceding and succeeding vertex steps have type annotations and those vertex types are only used in the declaration of a single edge type. If vertex types are used in multiple edge types, then an unannotated edge step could be ambiguous and must have its own type annotation: for example, if we have edge types "worksFor" and "consumerOf" with vertex types Person and Company for both, then the following unannotated query is ambiguous for the edge step "b": (a:Person)-[b]->(c:Company).
Multiple unannotated vertex and edge steps are supported in a query because types are inferred and propagated from one step to another by the compiler, e.g. (a)-[:edge]->(b)-[]->(c). Type inference continues iteratively until all graph pattern steps in the query have been "typed" correctly or the compiler has detected an error.
Limitations on TQL's Cypher support
- TQL supports MATCH read-only queries.
- CREATE and SET statements are not supported by TQL, but we provide alternatives to create and set the data of different graph objects.
- LOAD CSV statements are not supported by TQL. TQL provides alternative commands to load data into graph data structures.
- WITH and UNION clauses are not yet supported by TQL.
- The MERGE statement is not supported by TQL.
- DELETE and REMOVE statements are not supported by TQL.
- FOREACH and CALL statements are not supported by TQL.
- CASE statements and list expressions are not supported by TQL.
- Cypher DDL statements will not be supported by TQL. TQL provides alternative DDL commands better suited to property graph manipulation.
- Cypher WHERE conditions are allowed that use constants and the properties and entities previously created through TQL DDL commands.
- Cypher's idiom for group by operations is supported by TQL, it can be combined with ORDER BY & LIMIT, but not yet with SKIP.