2.9. Constraints¶
We have discussed how to express the topological part of a TQL query using structures. We now discuss how to express constraints on the data stored in the components of the graph.
Recall that each graph component in xGT has an associated schema with named properties. Each property in the schema has an associated data type (see Data Types and Operators).
Constraints on the properties of the graph elements are described in the WHERE clause of a TQL query. The WHERE clause consists of expressions involving the properties of the graph elements, combined with constants (of the appropriate data type) and commonly used comparison, arithmetic, and boolean operators.
The combination of the structure and constraints in a query is called the graph pattern in TQL.
Properties are referred to by using the .
(dot) operator in between the name of a bound variable and the name of a property: a.name
would access the property named name
of the graph element bound to the variable a
.
Property expressions can be combined with other property expressions and constants using TQL’s Cypher operator subset.
Supported operators are as follows:
Arithmetic:
+
,-
,*
,/
,^
(exponentiation),%
(modulus), (unary)-
Boolean:
AND
,OR
,XOR
,NOT
Comparison:
=
(equality),<>
(difference),<
,>
,<=
,>=
,IS NULL
,IS NOT NULL
String:
STARTS WITH
,ENDS WITH
,CONTAINS
,+
(concatenation operator).List:
IN
(membership test),+
(list concatenation),[ ]
(subscripting and slicing operators).
Constants are supported for the following data types:
Integer numbers
Floating-point numbers (32 bit)
Boolean true and false
String constants (surrounded by double or single quotes)
Null constant (
NULL
)Lists
Type constructors are supported to create values for the following data types from either string constants or expressions:
Dates
Times
Datetimes
Durations
IPv4 addresses
Parentheses ()
can be used to indicate precedence when dealing with nested
expressions.
Examples of WHERE constraints are as follows:
WHERE p.name = "John" AND p.age < 40
WHERE c.value > 10.0 OR c.value < 2.5
WHERE (p.name STARTS WITH "D" OR p.name STARTS WITH "F") AND p.address IS NOT NULL
WHERE c.time > 20 AND (c.value > 10.0 OR c.value < d.value) AND c.`co$t` < 10.0
2.9.1. Pattern Filters¶
In addition to constraints on property values, TQL allows for constraints that filter on the existence (or non-existence) of linear graph patterns connected to the main topological structure of the query.
The expression WHERE <linear graph pattern>
evaluates to a boolean true value if the pattern exists in the graph, false if it does not.
The linear graph pattern is expressed as a sequence of vertex-edge-vertex steps that must begin and end with a vertex, subject to the same rules as the expression of the topological structure of the query.
We use the term “pattern filter” to denominate this type of topological pattern checks.
Examples of pattern filters are as follows:
WHERE (a)-[]->(a)
to check that the matched vertexa
has a one edge cycle.WHERE NOT (a)-[]->(a)
to check that the matched vertexa
does NOT have a one edge cycle.WHERE a.year >= 1990 AND (a)-[: WorksWith]->(b)
to check that persona
was born after the year 1990 and also works with personb
, both of which must be matched in the main structure of the query.
In TQL, pattern filters are restricted to one linear chain of graph steps (that is the pattern cannot be a non-linear pattern connected by the comma operator). Each vertex or edge step in the linear chain can either have a variable or not. If it has a variable, it must be a variable that is the same as one attached to an existing topological step in the main query structure. New variables are not allowed to be introduced in pattern filters.
A frame name can be provided by the query to indicate which frame the pattern filter step binds to: e.g. (: Vertex)
or (a : Vertex)
.
In addition to frame information, it is possible to specify equality constraints on the matched properties via the use of a property map. The use of property maps for constraints inside a pattern filter is useful on graph steps that do not bind to a previously matched variable. Examples of the use of property maps in pattern filters are as follows:
WHERE (a)-[{ value : 10 }]->(a)
will check that the matched vertexa
is connected in a one-step cycle to itself via an edge that has the propertyvalue = 10
.WHERE ({ key : b.key + 10 })-[]->()-[]->(a)
will check the existence of a two edge path that ends in the matched vertexa
and begins on a vertex with a key property greater by 10 than the key property of a matched vertexb
.
Note that adding property map constraints on a step with an existing variable could easily lead to empty answers.
For example, in the pattern WHERE (a { a.key : 10 })-[]->()
if the matched key property for a
is not 10, then the constraint will be false.
This happens because property maps only allow the expression of equality constraints.
A full example of the use of a pattern filter is as follows:
MATCH (a)-[]->(c)<-[]-(b)
WHERE a <> c AND ({ key : b.key + 10 })-[]->(a)
In this case, the topological structure indicates that vertices a and b each have an edge connecting them to vertex c. In addition, the constraints indicates that a and c must be different and that there must be another vertex with a key value of the key of b plus ten that has an edge connecting it to a. As can be seen in this case, the pattern filter check is an extension of the topological structure represented in the match pattern.
Edge components of the pattern can specify multiple edge frames to match, as well as variable-length traversals.
An example of multiple edge frames and variable-length traversal in a pattern filter is as follows:
MATCH (a)-[]->(b)
WHERE NOT (a)-[:edge0 | :edge1 *2..10]->(b)
In this case, the query tries to match vertices a
and b
such that they are connected via one edge, but there is no path between them using 2 to 10 edges from frames edge0
and edge1
.
See sections Variable-length Edge Traversal and Multiple Edge Frame Traversal for more details.
2.9.2. Functions¶
TQL provides a set of functions that can be used in queries. These functions include standard Cypher functions and a few that are unique to TQL.
2.9.2.1. Aggregation Functions¶
Sometimes it is useful to know cumulative information about results. Aggregation functions provide this ability by combining results from multiple rows into a single result. For instance, the following query returns a single row that has the total number of executives in all the companies in the graph:
MATCH (c:Companies)
RETURN sum(c.num_executives)
In addition to a single cumulative result, rows can be grouped by user-specified keys. The grouping keys are non-aggregated expressions that are given alongside the aggregation function expressions. Input rows with the same value for the grouping keys are combined.
Consider the example where the Companies
frame has a column state
indicating which state the company headquarters is located in.
The following query returns a single row for each unique value of state along with the total number of executives in each state:
MATCH (c:Companies)
RETURN c.state, sum(c.num_executives)
There can be multiple grouping keys and aggregation functions. Consider this example that groups results by the unique combinations of city and state:
MATCH (c:Companies)
RETURN c.state, c.city, sum(c.num_executives), avg(c.num_executives)
The DISTINCT
keyword can be used in conjunction with an aggregation function.
Using DISTINCT
with an aggregation function generates the unique set of input values for the aggregation function, and the function is applied to the set of distinct values instead of all the values.
For example, consider this query that returns the number of unique cities that company headquarters are located in:
MATCH (c:Companies)
RETURN count(DISTINCT c.city)
The following aggregation functions are supported by TQL:
Name |
Description |
---|---|
count(*) |
Returns the number of elements in the results set. |
count(expr) |
Returns the number of non-null values of an expression in the results set. |
avg(expr) |
Returns the average of all values of a numerical expression in the results set. |
sum(expr) |
Returns the sum of all values of a numerical expression in the results set. |
min(expr) |
Returns the smallest of all values of an expression in the results set. |
max(expr) |
Returns the largest of all values of an expression in the results set. |
collect(expr) |
Returns a list containing all values of an expression in the results set. |
The variant count(*)
includes null
values in the count while the variant count(expr)
excludes expressions that evaluate to null
.
For all other aggregation functions, expressions that evaluate to null
are excluded from the result.
The functions avg()
and sum()
take the argument expr
that is an expression that evaluates to either integer or float type and return an aggregate value of the same type as expr
.
Expressions that evaluate to any other type result in an error.
The functions min()
and max()
take the argument expr
that is an expression that evaluates to any type and return an aggregate value of the same type as expr
.
The aggregation functions can only be used in RETURN and WITH expressions.
Examples of their usage are as follows:
RETURN count(*) AS total
RETURN avg(a.age)
WITH sum(p.sale_price - p.production_cost) AS profit
WITH max(a.dob) AS youngest, min(a.dob) AS oldest, count(DISTINCT a.dob) AS unique_dobs
2.9.2.2. Conversion Functions¶
Conversion functions allow converting one type into a different type, for instance converting an integer into a string. TQL supports the following conversion functions:
Name |
Description |
---|---|
toBoolean(expr) |
Converts the given expression to a boolean. |
toInteger(expr) |
Converts the given expression to an integer. |
toFloat(expr) |
Converts the given expression to an IEEE 754 float representation. |
toString(expr) |
Converts the given expression to a string. |
The conversion functions take the single argument expr
and can be applied to properties and expressions of any type.
They can be used in any context where an expression is valid, including WHERE, RETURN, WITH, CASE, ORDER BY, CREATE, MERGE and SET expressions.
The toBoolean()
function will convert strings to boolean.
Strings can be 0, 1, true, or false.
True and false are case-insensitive.
Other types used in conversion besides boolean or string will result in an error.
The toInteger()
function will truncate floats and convert booleans to 0 or 1 for false and true, respectively.
IPv4 addresses are converted to their integer representation.
Date and datetime types are converted to seconds since the epoch where a date type is considered to occur at midnight.
The function converts time types to seconds since midnight.
Strings are parsed and converted to integers, if possible.
When parsing a string, the function expects a format similar to Neo4j where the string value can be written as positive or negative integer with an optional decimal fractional part that will be truncated.
The toFloat()
function works similarly to toInteger()
, but will use 32-bit IEEE 754 float precision.
For time and datetime types, the fractional precision of the seconds will be represented in the float.
Some considerations may need to be taken given that 32-bit floats have precision relative to magnitude.
For more recent dates this precision can be 10 to 10^2 given the magnitude because the seconds since the epoch can be fairly large.
This means the precision will be seconds to tens of seconds.
Similar precision may be expected from IP addresses given that each value in the upper range represents 256^3.
If toBoolean()
, toFloat()
, or toInteger()
fails at parsing a string, they will return a null.
The conversion functions can be used in any context where an expression is valid, including WHERE, RETURN, WITH, CASE, ORDER BY, CREATE, MERGE and SET expressions.
Some examples are:
WHERE toString(a.property) = "5"
WITH toString(10.5 + a.float_property) AS aString
RETURN toInteger(10.5 + a.float_property) AS anInt
2.9.2.3. Duration Functions¶
Duration functions allow returning a part of a duration, for instance retrieving the number of days stored in a duration.
Name |
Description |
---|---|
getDay(expr) |
Gets the day component of a duration. |
getHour(expr) |
Gets the hour component of a duration. |
getMinute(expr) |
Gets the minute component of a duration. |
getSecond(expr) |
Gets the second component of a duration. |
getMicrosecond(expr) |
Get the microsecond component of a duration. |
The duration functions take the single argument expr
that must be an expression which evaluates to a duration.
The functions return just the component specified and not the total.
For example, getSecond()
will return just the seconds component, not the total number of seconds contained within the duration.
RETURN getSecond(duration({ minute : 1, second : 30 })
will return an integer of 30, and not an integer of 90.
2.9.2.4. Mathematical Functions¶
TQL supports the following mathematical functions:
Name |
Description |
---|---|
abs(expr) |
Returns the absolute value of a numerical expression. |
ceil(expr) |
Returns the ceiling of a numerical expression. |
floor(expr) |
Returns the floor of a numerical expression. |
rand() |
Returns a random floating point number in the range of 0 (inclusive) to 1 (exclusive). |
round(expr) |
Returns the rounded value of a numerical expression. |
sign(expr) |
Returns 0 if the value of the numerical expression is zero, 1 if positive, and -1 if negative. |
The function rand()
takes no arguments.
The other mathematical functions take the argument expr
that must be an expression that evaluates to either an integer or float type and return a value of the same type as the input expression.
If expr
evaluates to null
, null
is returned.
Using these functions on any non-numeric type results in an error.
The mathematical functions can be used in any context where an expression is valid, including WHERE, RETURN, WITH, CASE, ORDER BY, CREATE, MERGE and SET expressions.
2.9.2.5. String Functions¶
TQL supports the following string functions:
Name |
Description |
---|---|
reverse(string) |
Returns a string with the elements of the passed-in parameter in reverse order. |
size(string) |
Returns the number of characters in the string. |
substring(orig, start [, length]) |
Returns a substring of the original string. |
The substring()
function has two required arguments and a third optional argument.
The argument orig
is an expression that evaluates to a string to operate on.
The argument start
is an expression that evaluates to a 0-based starting position in the original string.
The optional argument length
is an expression that evaluates to the length of the substring to extract.
If length
is not given, the function returns the substring beginning at start
through the end of orig
.
If orig
evaluates to null
, null
is returned.
If orig
evaluates to any non-string type, an error is thrown.
If start
or length
evaluates to either a negative integer or null
, an error is thrown.
If start
or length
evaluates to any non-integer type, an error is thrown.
The string functions can be used in any context where an expression is valid, including WHERE, RETURN, WITH, CASE, ORDER BY, CREATE, MERGE and SET expressions.
2.9.2.6. List Functions¶
TQL supports the following list functions:
Name |
Description |
---|---|
range(start, stop [, step]) |
Returns a list containing a range of integer values. |
reverse(list) |
Returns a list with the elements of the passed-in parameter in reverse order. |
size(list) |
Returns the number of elements in the list. |
tail(list) |
Returns a list that does not include the first element in the passed-in parameter. |
keys(variable) |
Returns a list of the properties associated with the bound variable passed in as a parameter. |
The range()
function generates an inclusive range.
The function has two required arguments and a third optional argument all of which must be integer expressions.
The argument start
is the first integer included in the resulting range.
The argument stop
is the last value potentially included in the resulting range.
The argument step
indicates the difference between subsequent elements in the list.
If step
is not given, its default is 1
.
Note that stop
may not be included in the list because of the step size.
The keys()
function must be passed a bound variable associated with a frame.
The following Cypher functions for list are not currently supported by xGT: labels()
, reduce()
, toBooleanList()
, toFloatList()
, toIntegerList()
, and toStringList()
.
2.9.2.7. Path Functions¶
TQL supports the following functions that can be applied to path variables (see Path Variable Capture):
Name |
Description |
---|---|
length(path) |
Returns the number of edges contained in a path. |
nodes(path) |
Returns a list of the vertices in a path. |
relationships(path) |
Returns a list of the edges in a path. |
Path variables and lists are not compatible with each other. Functions that can be applied to lists cannot be applied to path variables and vice versa. The path functions nodes() and relationships() return lists of vertices and edges, but those lists are not considered paths.
Both the nodes() and relationships() return the list of vertices and edges in the order in which they appear in the path. The path itself stores vertices and edges in the order in which they appear in the TQL query written by the user, independent of query reordering and optimization.
2.9.2.8. Degree Functions¶
TQL supports the following degree functions:
Name |
Description |
---|---|
indegree(vertex [, edge_frame]) |
Returns an integer value with the number of incoming edges to the specified bound vertex variable. Can be global or relative to a particular edge frame. |
outdegree(vertex [, edge_frame]) |
Return an integer value with the number of outgoing edges from the specified bound vertex variable. Can be global or relative to a particular edge frame. |
The degree functions have one required and one optional argument.
The argument vertex
must be a bound variable to a vertex component in the structure.
The optional second argument edge_frame
must be the qualified name of an edge frame in xGT.
Any other values for vertex
or edge_frame
results in an error.
Degree computations without the optional edge frame name are global, in the sense that the degree of the bound vertex is computed across edge frames incident on the owning vertex frame in xGT. When using the optional edge frame name argument, the degree computation becomes relative to that particular edge frame. That is, the degree of the vertex is computed only for edges of the named frame.
The degree functions can be used in any context where an expression is valid, including WHERE, RETURN, WITH, CASE, ORDER BY, CREATE, MERGE and SET expressions.
Examples are as follows:
WHERE indegree(a) = 10
WHERE outdegree(b) > 10
WHERE (outdegree(c, FriendOf) + outdegree(c, WorksFor)) < 5
2.9.2.9. Distinct Vertices Function¶
By default xGT and TQL do not impose restrictions on the identity of the vertices and edges in a structure. In particular, cycles are allowed (graph paths from one vertex to the same vertex) and will be reported if present in the data. There are cases where the identity of the vertices in a query is not important, but there are also cases in which at least some of the vertices must be different from each other.
It is easy to express distinct vertices with just two: a <> b
, but if we have more – say a
, b
, c
and d
– then expressing all the pairwise constraints becomes tedious and error-prone: a <> b AND a <> c AND a <> d AND b <> c AND b <> d AND c <> d
.
For this reason, TQL provides a shortcut:
Name |
Description |
---|---|
unique_vertices(vertex_list) |
Enforces that a list of vertices are all distinct. |
The function unique_vertices()
takes a list of bound vertex variables.
Any other input value results in an error.
It can only be used as part of a WHERE clause with its arguments being the bound variables for vertices that must be distinct from each other.
xGT automatically generates all the necessary constraints and adds them to any other user-provided constraints in the query.
2.9.2.10. Type Constructors¶
TQL provides mechanisms for expressing constraints based on date, time, datetime, duration, and IPv4 address types.
The functions date()
, time()
, datetime()
, duration()
and ipaddress()
let the TQL query represent values of the corresponding types built from string expressions in an appropriate format.
The strings could be literal constants written in the TQL query or string expressions constructed as part of the query.
See Encoding Text to Types for a description of valid strings for each type.
A duration can also be constructed using a map of values for each time component.
The format is as follows: duration({ day : [n], hour : [n], minute : [n], second : [n], microsecond : [n]})
.
Each component is optional and the value being mapped must be an integer.
Examples of the use of these type constructors are as follows:
WHERE a.date > date("2018-01-01")
WHERE a.date > date(b.date_as_string)
WHERE a.time = time("16:00:00.000000")
WHERE b.datetime <> datetime("2017-12-31T00:00:00")
WHERE a.date - b.date < duration("P4D")
WHERE a.time - b.time < duration({ day = 2 })
WHERE b.ipaddr = ipaddress("192.168.1.1")
WHERE b.ipaddr = ipaddress(c.ip_as_string)
2.9.3. CASE Expressions¶
xGT and TQL support the CASE clause which provides conditional expressions. There are two types of CASE expressions: simple and generic. The simple form compares values to a test expression while the generic form evaluates a series of conditional expressions.
CASE expressions can be used in any context where an expression is valid, including WHERE, RETURN, WITH, CASE, ORDER BY, CREATE, MERGE and SET expressions.
2.9.3.1. Simple CASE Expressions¶
The simple CASE expression has the following form:
CASE test
WHEN value THEN result
[WHEN value THEN result]
[ELSE default]
END
Each of test, value, result, and default are expressions. The list of values is compared against the test expression in order until one is found that is equal to the test expression. If a value is found equal to the test expression, the result associated with that value is returned. If no value is equal to the test expression, the default is returned. If no value is equal to the test expression and no default is given, null is returned.
All the result expressions and the default expression must be of the same type. The test expression and all the value expressions must be of the same type.
2.9.3.2. Generic CASE Expressions¶
The generic CASE expression has the following form:
CASE
WHEN predicate THEN result
[WHEN predicate THEN result]
[ELSE default]
END
Each of test, predicate, result, and default are expressions. The list of predicates is evaluated in order until one is found true. If a predicate is found true, the result associated with that predicate is returned. If no predicate is true, the default is returned. If no predicate is true and no default is given, null is returned.
All the result expressions and the default expression must be of the same type. All the predicate expressions must be of boolean type.
2.9.3.3. Examples¶
This example uses a simple case expression in a RETURN clause:
MATCH (v:VertexFrame)
WHERE v.ID < 10
RETURN v.ID,
CASE v.ID
WHEN 2 THEN 1
ELSE 0
END AS result
The example returns a row for each vertex with an ID < 10 containing two columns. The first column is the ID. The second column is 1 if the ID is 2 and 0 otherwise.
This example uses a generic case expression in a WITH clause:
MATCH (v:VertexFrame)
WHERE v.ID < 10
WITH v,
CASE
WHEN v.ID % 2 = 1 THEN "odd"
ELSE "even"
END AS type
WHERE type = "even"
RETURN v.ID
The example returns a row for each vertex with an even ID less than 10 with a single column holding the ID.
2.9.4. Cypher Parameters¶
xGT and TQL support user-defined parameters indicated in a query by a variable name starting with $
.
The values of the parameter are supplied via a Python map at the time of running or scheduling a job via run_job()
or schedule_job()
.
Python boolean, float, integer, decimal, date, time, datetime, timedelta, ip address, and string types are automatically converted to the appropriate Cypher type. Non-nested lists of these types are also supported.
Cypher parameters can be used in the following ways:
For literals and expressions:
MATCH (c:Companies)
WHERE c.num_executives >= $param
conn.run_job(query, parameters = { "param": 10 })
String pattern matching:
MATCH (c:Companies)
WHERE c.company_name STARTS WITH $name
conn.run_job(query, parameters = { "name": "Trov" })
In the creation of an object:
CREATE (c:Companies { company_name : $name, date_founded : $date })
conn.run_job(query, parameters = { "name": "NewCompany", "date": date(2021, 1, 1) })
In the setting of object properties:
MATCH (c:Companies)
WHERE c.company_name = "NewCompany"
SET c.num_executives = $param
conn.run_job(query, parameters = { "param": 10 })
For skip and limit values:
MATCH (c:companies)
WHERE c.num_executives >= 10
RETURN c
SKIP $s
LIMIT $l
conn.run_job(query, parameters = { "s": 10, "l": 5 }