2.9. Constraints

We have discussed how to express the topological part of a TQL query using structures. We now discuss how to express constraints on the data stored in the components of the graph.

Recall that each graph component in xGT has an associated schema with named properties. Each property in the schema has an associated data type (see Data Types and Operators).

Constraints on the properties of the graph elements are described in the WHERE clause of a TQL query. The WHERE clause consists of expressions involving the properties of the graph elements, combined with constants (of the appropriate data type) and commonly used comparison, arithmetic, and boolean operators.

The combination of the structure and constraints in a query is called the graph pattern in TQL.

Properties are referred to by using the . (dot) operator in between the name of a bound variable and the name of a property: a.name would access the property named name of the graph element bound to the variable a.

Property expressions can be combined with other property expressions and constants using TQL’s Cypher operator subset.

Supported operators are as follows:

  • Arithmetic: +, -, *, /, ^ (exponentiation), % (modulus), (unary) -

  • Boolean: AND, OR, XOR, NOT

  • Comparison: = (equality), <> (difference), <, >, <=, >=, IS NULL, IS NOT NULL

  • String: STARTS WITH, ENDS WITH, CONTAINS, + (concatenation operator).

  • List: IN (membership test), + (list concatenation), [ ] (subscripting and slicing operators).

Constants are supported for the following data types:

  • Integer numbers

  • Floating-point numbers (32 bit)

  • Boolean true and false

  • String constants (surrounded by double or single quotes)

  • Null constant (NULL)

  • Lists

Type constructors are supported to create values for the following data types from either string constants or expressions:

  • Dates

  • Times

  • Datetimes

  • Durations

  • IP addresses

Parentheses () can be used to indicate precedence when dealing with nested expressions.

Examples of WHERE constraints are as follows:

WHERE p.name = "John" AND p.age < 40
WHERE c.value > 10.0 OR c.value < 2.5
WHERE (p.name STARTS WITH "D" OR p.name STARTS WITH "F") AND p.address IS NOT NULL
WHERE c.time > 20 AND (c.value > 10.0 OR c.value < d.value) AND c.`co$t` < 10.0

2.9.1. Pattern Filters

In addition to constraints on property values, TQL allows for constraints that filter on the existence (or non-existence) of linear graph patterns connected to the main topological structure of the query. The expression WHERE <linear graph pattern> evaluates to a boolean true value if the pattern exists in the graph, false if it does not. The linear graph pattern is expressed as a sequence of vertex-edge-vertex steps that must begin and end with a vertex, subject to the same rules as the expression of the topological structure of the query. We use the term pattern filter to denominate this type of topological pattern checks. Examples of pattern filters are as follows:

  • WHERE (a)-[]->(a) to check that the matched vertex a has a one edge cycle.

  • WHERE NOT (a)-[]->(a) to check that the matched vertex a does NOT have a one edge cycle.

  • WHERE a.year >= 1990 AND (a)-[: WorksWith]->(b) to check that person a was born after the year 1990 and also works with person b, both of which must be matched in the main structure of the query.

In TQL, pattern filters are restricted to one linear chain of graph steps (that is the pattern cannot be a non-linear pattern connected by the comma operator). Each vertex or edge step in the linear chain can either have a variable or not. If it has a variable, it must be a variable that is the same as one attached to an existing topological step in the main query structure. New variables are not allowed to be introduced in pattern filters.

A frame name can be provided by the query to indicate which frame the pattern filter step binds to: e.g. (: Vertex) or (a : Vertex).

In addition to frame information, it is possible to specify equality constraints on the matched properties via the use of a property map. The use of property maps for constraints inside a pattern filter is useful on graph steps that do not bind to a previously matched variable. Examples of the use of property maps in pattern filters are as follows:

  • WHERE (a)-[{ value : 10 }]->(a) will check that the matched vertex a is connected in a one-step cycle to itself via an edge that has the property value = 10.

  • WHERE ({ key : b.key + 10 })-[]->()-[]->(a) will check the existence of a two edge path that ends in the matched vertex a and begins on a vertex with a key property greater by 10 than the key property of a matched vertex b.

Note that adding property map constraints on a step with an existing variable could easily lead to empty answers. For example, in the pattern WHERE (a { a.key : 10 })-[]->() if the matched key property for a is not 10, then the constraint will be false. This happens because property maps only allow the expression of equality constraints.

A full example of the use of a pattern filter is as follows:

MATCH (a)-[]->(c)<-[]-(b)
WHERE a <> c AND ({ key : b.key + 10 })-[]->(a)

In this case, the topological structure indicates that vertices a and b each have an edge connecting them to vertex c. In addition, the constraints indicates that a and c must be different and that there must be another vertex with a key value of the key of b plus ten that has an edge connecting it to a. As can be seen in this case, the pattern filter check is an extension of the topological structure represented in the match pattern.

Edge components of the pattern can specify multiple edge frames to match, as well as variable-length traversals.

An example of multiple edge frames and variable-length traversal in a pattern filter is as follows:

MATCH (a)-[]->(b)
WHERE NOT (a)-[:Edge0 | :Edge1 *2..10]->(b)

In this case, the query tries to match vertices a and b such that they are connected via one edge, but there is no path between them using 2 to 10 edges from frames Edge0 and Edge1.

See sections Variable-length Edge Traversal and Multiple Edge Frame Traversal for more details.

2.9.2. Functions

TQL provides a set of functions that can be used in queries. These functions include standard Cypher functions and a few that are unique to TQL.

2.9.2.1. Aggregation Functions

Sometimes it is useful to know cumulative information about results. Aggregation functions provide this ability by combining results from multiple rows into a single result. For instance, the following query returns a single row that has the total number of executives in all the companies in the graph:

MATCH (c:Companies)
RETURN sum(c.num_executives)

In addition to a single cumulative result, rows can be grouped by user-specified keys. The grouping keys are non-aggregated expressions that are given alongside the aggregation function expressions. Input rows with the same value for the grouping keys are combined.

Consider the example where the Companies frame has a column state indicating which state the company headquarters is located in. The following query returns a single row for each unique value of state along with the total number of executives in each state:

MATCH (c:Companies)
RETURN c.state, sum(c.num_executives)

There can be multiple grouping keys and aggregation functions. Consider this example that groups results by the unique combinations of city and state:

MATCH (c:Companies)
RETURN c.state, c.city, sum(c.num_executives), avg(c.num_executives)

The DISTINCT keyword can be used in conjunction with an aggregation function. Using DISTINCT with an aggregation function generates the unique set of input values for the aggregation function, and the function is applied to the set of distinct values instead of all the values.

For example, consider this query that returns the number of unique cities that company headquarters are located in:

MATCH (c:Companies)
RETURN count(DISTINCT c.city)

The following aggregation functions are supported by TQL:

Aggregation Functions

Name

Description

count(*)

Returns the number of elements in the results set.

count(expr)

Returns the number of non-null values of an expression in the results set.

avg(expr)

Returns the average of all values of a numerical expression in the results set.

sum(expr)

Returns the sum of all values of a numerical expression in the results set.

min(expr)

Returns the smallest of all values of an expression in the results set.

max(expr)

Returns the largest of all values of an expression in the results set.

collect(expr)

Returns a list containing all values of an expression in the results set.

The variant count(*) includes null values in the count while the variant count(expr) excludes expressions that evaluate to null. For all other aggregation functions, expressions that evaluate to null are excluded from the result. The functions avg() and sum() take the argument expr that is an expression that evaluates to either integer or float type and return an aggregate value of the same type as expr. Expressions that evaluate to any other type result in an error. The functions min() and max() take the argument expr that is an expression that evaluates to any type and return an aggregate value of the same type as expr.

The aggregation functions can only be used in RETURN and WITH expressions.

Examples of their usage are as follows:

RETURN count(*) AS total
RETURN avg(a.age)
WITH sum(p.sale_price - p.production_cost) AS profit
WITH max(a.dob) AS youngest, min(a.dob) AS oldest, count(DISTINCT a.dob) AS unique_dobs

2.9.2.2. Conversion Functions

Conversion functions allow converting one type into a different type, for instance converting an integer into a string. TQL supports the following conversion functions:

Conversion Functions

Name

Description

toBoolean(expr)

Converts the given expression to a boolean.

toInteger(expr)

Converts the given expression to an integer.

toUInteger(expr)

Converts the given expression to an unsigned integer.

toFloat(expr)

Converts the given expression to an IEEE 754 float representation.

toString(expr)

Converts the given expression to a string.

The conversion functions take the single argument expr and can be applied to properties and expressions of any type. They can be used in any context where an expression is valid, including WHERE, RETURN, WITH, CASE, ORDER BY, CREATE, MERGE and SET expressions.

The toBoolean() function will convert strings to boolean. Strings can be 0, 1, true, or false. True and false are case-insensitive. Other types used in conversion besides boolean or string will result in an error.

The toInteger() function will truncate floats and convert booleans to 0 or 1 for false and true, respectively. IPv4 addresses are converted to their integer representation. Date and datetime types are converted to seconds since the epoch where a date type is considered to occur at midnight. The function converts time types to seconds since midnight. Strings are parsed and converted to integers, if possible. When parsing a string, the function expects a format similar to Neo4j where the string value can be written as a positive or negative integer with an optional decimal fractional part that will be truncated.

The toUInteger() function will truncate floats and convert booleans to 0 or 1 for false and true, respectively. IPv4 addresses are converted to their integer representation. Date and datetime types are converted to seconds since the epoch where a date type is considered to occur at midnight. The function converts time types to seconds since midnight. Strings are parsed and converted to integers, if possible. When parsing a string, the function expects a format similar to Neo4j where the string value can be written as an integer with an optional decimal fractional part that will be truncated. Negative values will be converted to positive values using two’s complement rules.

The toFloat() function works similarly to toInteger(), but will use 32-bit IEEE 754 float precision. For time and datetime types, the fractional precision of the seconds will be represented in the float. Some considerations may need to be taken given that 32-bit floats have precision relative to magnitude. For more recent dates this precision can be 10 to 10^2 given the magnitude because the seconds since the epoch can be fairly large. This means the precision will be seconds to tens of seconds. Similar precision may be expected from IP addresses given that each value in the upper range represents 256^3.

If toBoolean(), toFloat(), or toInteger() fails at parsing a string, they will return a null.

The conversion functions can be used in any context where an expression is valid, including WHERE, RETURN, WITH, CASE, ORDER BY, CREATE, MERGE and SET expressions.

Some examples are:

WHERE toString(a.property) = "5"
WITH toString(10.5 + a.float_property) AS a_string
RETURN toInteger(10.5 + a.float_property) AS an_int

2.9.2.3. Mathematical Functions

TQL supports the following mathematical functions:

Mathematical Functions

Name

Description

abs(expr)

Returns the absolute value of a numerical expression.

ceil(expr)

Returns the ceiling of a numerical expression.

floor(expr)

Returns the floor of a numerical expression.

rand()

Returns a random floating point number in the range of 0 (inclusive) to 1 (exclusive).

round(expr)

Returns the rounded value of a numerical expression.

sign(expr)

Returns 0 if the value of the numerical expression is zero, 1 if positive, and -1 if negative.

The function rand() takes no arguments. The other mathematical functions take the argument expr that must be an expression that evaluates to either an integer or float type and return a value of the same type as the input expression. If expr evaluates to null, null is returned. Using these functions on any non-numeric type results in an error.

The mathematical functions can be used in any context where an expression is valid, including WHERE, RETURN, WITH, CASE, ORDER BY, CREATE, MERGE and SET expressions.

2.9.2.4. String Functions

TQL supports the following string functions:

String Functions

Name

Description

reverse(string)

Returns a string with the elements of the passed-in parameter in reverse order.

size(string)

Returns the number of characters in the string.

substring(orig, start [, length])

Returns a substring of the original string.

The substring() function has two required arguments and a third optional argument. The argument orig is an expression that evaluates to a string to operate on. The argument start is an expression that evaluates to a 0-based starting position in the original string. The optional argument length is an expression that evaluates to the length of the substring to extract. If length is not given, the function returns the substring beginning at start through the end of orig. If orig evaluates to null, null is returned. If orig evaluates to any non-string type, an error is thrown. If start or length evaluates to either a negative integer or null, an error is thrown. If start or length evaluates to any non-integer type, an error is thrown.

The string functions can be used in any context where an expression is valid, including WHERE, RETURN, WITH, CASE, ORDER BY, CREATE, MERGE and SET expressions.

2.9.2.5. List Functions

TQL supports the following list functions:

List Functions

Name

Description

range(start, stop [, step])

Returns a list containing a range of integer values.

reverse(list)

Returns a list with the elements of the passed-in parameter in reverse order.

size(list)

Returns the number of elements in the list.

tail(list)

Returns a list that does not include the first element in the passed-in parameter.

keys(variable)

Returns a list of the properties associated with the bound variable passed in as a parameter.

The range() function generates an inclusive range. The function has two required arguments and a third optional argument all of which must be integer expressions. The argument start is the first integer included in the resulting range. The argument stop is the last value potentially included in the resulting range. The argument step indicates the difference between subsequent elements in the list. If step is not given, its default is 1. Note that stop may not be included in the list because of the step size.

The keys() function must be passed a bound variable associated with a frame.

The following Cypher functions for list are not currently supported by xGT: labels(), reduce(), toBooleanList(), toFloatList(), toIntegerList(), and toStringList().

2.9.2.6. Path Functions

TQL supports the following functions that can be applied to path variables (see Path Variable Capture):

Path Functions

Name

Description

length(path)

Returns the number of edges contained in a path.

nodes(path)

Returns a list of the vertices in a path.

relationships(path)

Returns a list of the edges in a path.

Path variables and lists are not compatible with each other. Functions that can be applied to lists cannot be applied to path variables and vice versa. The path functions nodes() and relationships() return lists of vertices and edges, but those lists are not considered paths.

Both the nodes() and relationships() return the list of vertices and edges in the order in which they appear in the path. The path itself stores vertices and edges in the order in which they appear in the TQL query written by the user, independent of query reordering and optimization.

2.9.2.7. Degree Functions

TQL supports the following degree functions:

Degree Functions

Name

Description

indegree(vertex [, edge_frame])

Returns an integer value with the number of incoming edges to the specified bound vertex variable. Can be global or relative to a particular edge frame.

outdegree(vertex [, edge_frame])

Return an integer value with the number of outgoing edges from the specified bound vertex variable. Can be global or relative to a particular edge frame.

The degree functions have one required and one optional argument. The argument vertex must be a bound variable to a vertex component in the structure. The optional second argument edge_frame must be the qualified name of an edge frame in xGT. Any other values for vertex or edge_frame results in an error.

Degree computations without the optional edge frame name are global, in the sense that the degree of the bound vertex is computed across edge frames incident on the owning vertex frame in xGT. When using the optional edge frame name argument, the degree computation becomes relative to that particular edge frame. That is, the degree of the vertex is computed only for edges of the named frame.

The degree functions can be used in any context where an expression is valid, including WHERE, RETURN, WITH, CASE, ORDER BY, CREATE, MERGE and SET expressions.

Examples are as follows:

WHERE indegree(a) = 10
WHERE outdegree(b) > 10
WHERE (outdegree(c, FriendOf) + outdegree(c, WorksFor)) < 5

2.9.2.8. Distinct Vertices Function

By default xGT and TQL do not impose restrictions on the identity of the vertices and edges in a structure. In particular, cycles are allowed (graph paths from one vertex to the same vertex) and will be reported if present in the data. There are cases where the identity of the vertices in a query is not important, but there are also cases in which at least some of the vertices must be different from each other.

It is easy to express distinct vertices with just two: a <> b, but if we have more – say a, b, c and d – then expressing all the pairwise constraints becomes tedious and error-prone: a <> b AND a <> c AND a <> d AND b <> c AND b <> d AND c <> d. For this reason, TQL provides a shortcut:

Unique Vertices Function

Name

Description

unique_vertices(vertex_list)

Enforces that a list of vertices are all distinct.

The function unique_vertices() takes a list of bound vertex variables. Any other input value results in an error. It can only be used as part of a WHERE clause with its arguments being the bound variables for vertices that must be distinct from each other. xGT automatically generates all the necessary constraints and adds them to any other user-provided constraints in the query.

2.9.2.9. Type Construction Functions

TQL provides functions to construct constant values of some types. The supported types are date, time, datetime, duration, IP address, and row IDs.

Type Construction Functions

Name

Description

date(string)

Converts a string representation of a date into a constant date value.

time(string)

Converts a string representation of a time into a constant time value.

datetime(string)

Converts a string representation of a datetime into a constant datetime value.

duration(string)
duration(map)

Constructs a constant duration value from either a string or component map.

ipaddress(string)
ipaddress(int1 [, int2])

Constructs a constant IP address from a string, one integer, or two integers.

id(graph_element)

Turns off the expansion of a graph element to all its properties in a WITH or RETURN clause.

id(frame, position_or_key [, is_key])

Constructs a constant row ID from the given frame and position or key.

The parameter string to all the functions can be literal constants written in the TQL query or string expressions constructed as part of the query. See Encoding Text to Types for a description of valid strings for each type. If the parameter string, map, or graph_element to any of the functions evaluates to null, null is returned.

A duration can be constructed using a map of component values. The format is duration({ day : [n], hour : [n], minute : [n], second : [n], microsecond : [n]}). Each component is optional and the value being mapped must be an integer.

An IP address can be constructed using one or two unsigned 64 bit integers. The first integer is the low bits, and the second integer is the high bits.

The id() function has two usages. The first is to prevent the expansion of a graph element to all its properties in a RETURN or WITH clause. Instead, a row ID (see Frame Row References (Row IDs)) corresponding to the graph element is returned. The second usage is to construct a row id from a frame and either a row position or key. For table and edge frames, the position_or_key parameter can only be an integer indicating a row position. When the frame parameter is a vertex frame, a valid value from the key column of the frame can also be used. The compiler assumes the position_or_key parameter is a position if it is an integer. To indicate a vertex key when the key column is an integer type, pass true to the optional third parameter is_key.

Examples of the use of these type constructors are as follows:

WHERE a.date > date("2018-01-01")
WHERE a.date > date(b.date_as_string)
WHERE a.time = time("16:00:00.000000")
WHERE b.datetime <> datetime("2017-12-31T00:00:00")
WHERE a.date - b.date < duration("P4D")
WHERE a.time - b.time < duration({ day = 2 })
WHERE b.ipaddr = ipaddress("192.168.1.1")
WHERE b.ipaddr = ipaddress(c.ip_as_string)
WHERE b.ipaddr = ipaddress(c.ipv4_as_int)
WHERE b.ipaddr = ipaddress(c.ip_as_string)
WHERE b.ipaddr = ipaddress(c.ipv6_as_int0, c.ipv6_as_int1)
MATCH (v:VertexFrame)
WHERE v = id(VertexFrame, "key_val")

2.9.3. Temporal Components

xGT supports accessing the components of the temporal types: date, time, datetime and duration. The syntax to access a component of a temporal type is expression.component. expression can be any valid expression of a temporal type and component is the name of a temporal component valid for that type.

Each type provides access to individual temporal components as described in the table below:

Components for Temporal Types

Component Name

Date

Time

Datetime

Duration

microsecond

second

minute

hour

day

month

year

time

date

These components can be applied to any expression of the right type. They return just the component specified and not the total. For example, .second will return just the seconds component, not the total number of seconds contained within the type.

RETURN duration({ minute : 1, second : 30 }).second

will return an integer of 30, and not an integer of 90.

These components can be nested for datetime values as follows:

RETURN a.datetime_column.date.year, a.datetime_column.time.minute

2.9.4. CASE Expressions

xGT and TQL support the CASE clause which provides conditional expressions. There are two types of CASE expressions: simple and generic. The simple form compares values to a test expression while the generic form evaluates a series of conditional expressions.

CASE expressions can be used in any context where an expression is valid, including WHERE, RETURN, WITH, CASE, ORDER BY, CREATE, MERGE and SET expressions.

2.9.4.1. Simple CASE Expressions

The simple CASE expression has the following form:

CASE test
  WHEN value THEN result
  [WHEN value THEN result]
  [ELSE default]
END

Each of test, value, result, and default are expressions. The list of values is compared against the test expression in order until one is found that is equal to the test expression. If a value is found equal to the test expression, the result associated with that value is returned. If no value is equal to the test expression, the default is returned. If no value is equal to the test expression and no default is given, null is returned.

All the result expressions and the default expression must be of the same type. The test expression and all the value expressions must be of the same type.

2.9.4.2. Generic CASE Expressions

The generic CASE expression has the following form:

CASE
  WHEN predicate THEN result
  [WHEN predicate THEN result]
  [ELSE default]
END

Each of test, predicate, result, and default are expressions. The list of predicates is evaluated in order until one is found true. If a predicate is found true, the result associated with that predicate is returned. If no predicate is true, the default is returned. If no predicate is true and no default is given, null is returned.

All the result expressions and the default expression must be of the same type. All the predicate expressions must be of boolean type.

2.9.4.3. Examples

This example uses a simple case expression in a RETURN clause:

MATCH (v:VertexFrame)
WHERE v.id < 10
RETURN v.id,
       CASE v.id
         WHEN 2 THEN 1
         ELSE 0
       END AS result

The example returns a row for each vertex with id < 10 containing two columns. The first column is the ID. The second column is 1 if the ID is 2 and 0 otherwise.

This example uses a generic case expression in a WITH clause:

MATCH (v:VertexFrame)
WHERE v.id < 10
WITH v,
     CASE
       WHEN v.id % 2 = 1 THEN "odd"
       ELSE "even"
     END AS type
WHERE type = "even"
RETURN v.id

The example returns a row for each vertex with an even ID less than 10 with a single column holding the ID.

2.9.5. Cypher Parameters

xGT and TQL support user-defined parameters indicated in a query by a variable name starting with $. The values of the parameter are supplied via a Python map at the time of running or scheduling a job via run_job() or schedule_job().

Python boolean, float, integer, decimal, date, time, datetime, timedelta, ip address, and string types are automatically converted to the appropriate Cypher type. xGT RowID types can also be passed as Cypher parameters. Non-nested lists of these types are also supported.

Cypher parameters can be used in the following ways:

  • For literals and expressions:

MATCH (c:Companies)
WHERE c.num_executives >= $param
conn.run_job(query, parameters = { "param": 10 })
  • String pattern matching:

MATCH (c:Companies)
WHERE c.company_name STARTS WITH $name
conn.run_job(query, parameters = { "name": "Trov" })
  • In the creation of an object:

CREATE (c:Companies { company_name : $name, date_founded : $date })
conn.run_job(query, parameters = { "name": "NewCompany", "date": date(2021, 1, 1) })
  • In the setting of object properties:

MATCH (c:Companies)
WHERE c.company_name = "NewCompany"
SET c.num_executives = $param
conn.run_job(query, parameters = { "param": 10 })
  • For skip and limit values:

MATCH (c:Companies)
WHERE c.num_executives >= 10
RETURN c
SKIP $s
LIMIT $l
conn.run_job(query, parameters = { "s": 10, "l": 5 }