2.20. TQL Fragments

TQL fragments provide a mechanism to evaluate TQL expressions as part of another operation. load() and insert() use TQL fragments to transform raw input data and map input columns to frame columns using the row_filter parameter. save() and get_data() use TQL fragments to transform the frame’s data by mapping to output columns using the row_filter parameter.

The fragment must be passed as a Python string:

frame.load(file, row_filter = '<fragment code>')
frame.insert([ [ 0, 1 ], [ 1, 2 ]], row_filter = '<fragment code>')
frame.save(file, row_filter = '<fragment code>')
frame.get_data(row_filter = '<fragment code>')

A TQL fragment supports a subset of TQL’s clauses, expressions and functions that can be evaluated in a stateless manner. A TQL fragment can only contain the clauses WHERE and RETURN to support filtering and output mapping respectively. Other TQL clauses are not allowed in a fragment. For input operations, a TQL fragment is applied independently to each row of the input data to produce the final output data stored in the frame. For output operations, a TQL fragment is applied to each row of the frame (subject to offset and length selections) to produce a row of the output data.

2.20.1. Input TQL Fragments

The example below shows a simple input TQL fragment that filters raw input data based on the value of one of the columns and computes the output columns of the target frame. The first column has ten added to it, while the second column is passed through unmodified.

WHERE a.value > 0
RETURN a.value + 10, a.text

The input to a TQL fragment is the raw data coming from the input operation (file or Python client data) organized in a single row. All columns present in the input are available to the fragment for evaluation. A schema is derived for the raw input data using type inference to map the input data types to xGT’s types. Column names are automatically computed from the schema.

2.20.2. Output TQL Fragments

The example below shows a simple output TQL fragment that filters frame data based on the value of one of the columns and computes the output of the target. The first output column has a string concatenated to it, while the second column is passed through unmodified.

WHERE a.value > 0
RETURN a.text + "_from_fragment" AS mytext, a.value AS myvalue

The input to a TQL fragment, in this case, are the frame’s rows that are within the range of rows selected by the offset and length parameters to the operation. All columns present in the frame are available to the fragment for evaluation. Output column names are automatically computed if not provided by an AS clause.

2.20.3. Bound Variables

TQL requires a bound variable to refer to columns that are part of the row. TQL fragments can use any name for this implied bound variable, but only a single name is allowed. In the example above, the bound variable a is the only name allowed to refer to the input row. An expression b.property in the same TQL fragment will produce an error as it refers to a different bound variable.

2.20.4. Interaction with Different Data Sources

TQL fragments are supported for all of xGT’s input data sources:

  • Parquet files.

  • CSV files with and without headers.

  • Client data as Python list of lists, pandas DataFrames, or pyarrow Tables.

While the behavior of TQL fragments is uniform across all sources, there are slight differences between the input sources with respect to schema data types and column names.

2.20.4.1. Parquet Files

Parquet files are the simplest case, as they contain a schema as part of their metadata. The column names contained in the file’s schema are used to refer to the input data’s columns. All xGT data types are inferred automatically from the native Parquet data types except for the IPADDRESS type. Columns that potentially contain IP address data are presented to the fragment as strings (or lists of strings for list types) and the TQL fragment must treat them as such. xGT supports passing those strings directly to the output frame’s IP address column if no processing is required.

frame = conn.create_table_frame(
  name   = 'graph__Table',
  schema = [['col0', xgt.INT],
            ['col1', xgt.IPADDRESS]])
WHERE input.value > 0
RETURN input.value + 10, input.ipaddress_as_string

2.20.4.2. CSV Files

To process CSV files using a TQL fragment, xGT automatically infers a schema from the raw data. All xGT types (including IP addresses and lists of IP addresses) are inferred from the CSV input.

CSV files that include headers will use the column names declared in the header as the column names available in the TQL fragment. CSV files without headers will use automatically generated column names starting with the prefix f.

This code shows the frame’s schema as declared on creation.

frame = conn.create_table_frame(
  name   = 'graph__Table',
  schema = [['column_a', xgt.INT],
            ['column_b', xgt.IPADDRESS]])

The raw CSV input data is:

0, 128.0.0.1
1, 128.0.0.2
2, 128.0.0.3

Because there are no headers in the CSV file, the inferred schema is:

schema = [['f0', xgt.INT],
          ['f1', xgt.IPADDRESS]]

The fragment must be written using those column names:

WHERE b.f0 > 0
RETURN b.f0 + 10, b.f1

2.20.4.3. Python Data

Python data is treated in a similar manner to Parquet data but without column names (as there is no schema metadata). Columns that contain IP address data will be inferred as columns of strings. Column names are automatically generated in the same manner as for CSV files without headers.

The code below shows the frame’s schema as declared on creation.

frame = conn.create_table_frame(
  name   = 'graph__Table',
  schema = [['column_a', xgt.INT],
            ['column_b', xgt.IPADDRESS],
            ['column_c', xgt.LIST, xgt.TEXT]])

The raw Python input data is:

[[ 1, 128.0.0.1, [ "one" ]],
 [ 2, 128.0.0.2, [ "one", "two" ]],
 [ 3, 128.0.0.3, [ "one", "two", "three" ]]]

The inferred schema is:

schema = [['f0', xgt.INT],
  ['f1', xgt.TEXT],
  ['f2', xgt.LIST, xgt.TEXT]]

The fragment must be written using those column names:

WHERE python_data.f0 > 0
RETURN python_data.f0 + 10, python_data.f1, tail(python_data.f2)

The resulting data inserted into the frame is:

11, 128.0.0.1, [ ]
12, 128.0.0.2, [ "two" ]
13, 128.0.0.3, [ "two", "three" ]

Note that this example also illustrates how to process columns that contain lists. In this case, lists of strings.

2.20.5. Requirements for Output Columns

The RETURN clause in a TQL fragment specifies how the target frame’s columns will be populated for each row of raw input. The RETURN clause must have one entry per target frame column. Each entry must be of a type that is compatible with the exact type of the column via text encoding (see Encoding Text to Types).

If an output column does not matter or the fragment cannot compute it, then use the value NULL for that column. The exception is that key columns in vertex and edge frames cannot be NULL and must have a value.

Consider the following frame schema:

frame = conn.create_vertex_frame(
  name   = 'graph__Table',
  schema = [['col0', xgt.INT],
            ['col1', xgt.IPADDRESS],
            ['col2', xgt.LIST, xgt.TEXT]],
  key = 'col0')

Some examples of acceptable fragment output for this frame are:

RETURN a.f0, a.f1, a.f2
RETURN a.f0 + 2, "128.0.0.1", NULL
RETURN a.f0 / 4, a.f2, [ "one", "two" ]

2.20.6. Putting It All Together

Here is a full example of using TQL fragments to process input data as it is being inserted into a target frame.

Start with creating an empty vertex frame:

frame = conn.create_vertex_frame(
  name   = 'Systems',
  schema = [['id', xgt.INT],
            ['ipaddr', xgt.IPADDRESS],
            ['users', xgt.LIST, xgt.TEXT]],
  key = 'id')

We want to use IPv6 addresses for the entries in the Systems vertex frame. The source data has two 64-bit integer columns that will be used to construct the IPv6 address:

data = [[ 0, 1, 1, [ "alice" ] ],
        [ 1, 2, 2, [ "alice", "bob" ] ],
        [ 2, 3, 3, [ "bob", "charlie", "diana" ] ]]

This code inserts the data into the frame and process it into its target form using a TQL fragment:

frame.insert(data, row_filter = """
  WHERE size(input.f3) > 1
  RETURN input.f0, ipaddress(input.f1, input.f2), tail(input.f3)
  """)

The resulting entries in the Systems frame are:

1, ::2:0:0:0:2, [ "bob" ]
2, ::3:0:0:0:3, [ "charlie", "diana" ]

This example created rows with IPv6 addresses from the numeric entries as well as the tail of the input lists.

Consider another example using the same target frame and a CSV file with headers:

machine_id, ipaddr, users
10, '255.255.255.0', [ "u" ]
11, '255.255.255.1', [ "u", "v", "w" ]
12, '255.255.255.2', [ "x", "y", "z" ]
13, '255.255.255.3', [ "u", "a" ]

This code loads the data into the frame and filters it using a TQL fragment:

frame.load('xgtd://file.csv', row_filter = """
  WHERE "u" IN input.users
  RETURN input.machine_id, input.ipaddr, input.users
  """)

The resulting entries in the Systems frame are:

10, 255.255.255.0, [ "u" ]
11, 255.255.255.1, [ "u", "v", "w" ]
13, 255.255.255.3, [ "u", "a" ]

2.20.7. Interaction with Different Data Targets

All data targets supported by the save() and get_data() operations are supported directly when using TQL fragments.

  • Client CSV and Parquet files

  • Server CSV and Parquet files

  • Client data as Python list of lists, pandas DataFrames, or pyarrow Tables.

The schema for the frame automatically drives the data types of the computed output columns in the RETURN clause. The TQL AS clause can be used to rename output columns for targets like Parquet files, pandas DataFrames, pyarrow Tables or CSV files with headers.