2.4. Frame Management

This section describes how to manage frames, which are used to store data in xGT. A detailed description of the frame data model is found in Frames and Namespaces.

Frames in xGT can be created directly by specifying the frame schema. In this case frames are empty after creation and data must be loaded in as a separate step. The API calls are:

Frames can also be created implicitly from a data source. In this case, the schema is inferred from the data, the frame is created, and the data is loaded in one step. The API calls are:

2.4.1. Direct Frame Creation

Frames in xGT can be created by the client API calls create_vertex_frame(), create_edge_frame() and create_table_frame(). Each of these API calls requires passing in the frame name and schema.

If the name parameter is a fully qualified name, the frame is created in the namespace specified. If it contains only the name of the frame, the frame is created in the default namespace. For more information on namespaces, see Frames and Namespaces. If a frame of the specified name already exists in the namespace, an error is thrown and the frame is not created again.

The schema describes the names and data types of each column of a frame. These columns correspond to the properties of each element of the frame. For example, in the example below, the Employees frame has a schema with three columns and any vertex that will belong to this frame will have these three properties (the value of a property can be null if that column is not a key column):

server = xgt.Connection()

person_frame = server.create_vertex_frame(name = 'Employees',
                                          schema = [['person_id', xgt.INT],
                                                    ['name', xgt.TEXT,
                                                    ['start_date', xgt.DATE]],
                                          key = 'person_id')

A list of supported data types is found in Data Movement.

To create a vertex frame, one column of the schema must be specified as the key with the key parameter and will be used to uniquely identify vertices in the frame.

To create an edge frame, additional parameters specifying the source and target vertices must be provided. The source and target parameters indicate which vertex frames the edge frame is connecting. Each of these is either the name of a vertex frame or a VertexFrame object. They can be the same or different frames. The source_key and target_key parameters indicate which columns of the edge frame’s schema are used to identify the source vertex and target vertex, respectively, of the edge. For each row of the edge frame, the value assigned to the source_key column corresponds to the key column of a row in the vertex frame given by the source parameter. Similarly, for each row of the edge frame, the value assigned to the target_key column corresponds to the key column of a row in the vertex frame given by the target parameter. The remaining columns are properties of the edge.

2.4.1.1. Example

server = xgt.Connection()

person_frame = server.create_vertex_frame(name = 'Employees',
                                          schema = [['person_id', xgt.INT],
                                                    ['name', xgt.TEXT],
                                                    ['start_date', xgt.DATE]],
                                          key = 'person_id')

company_frame = server.create_vertex_frame(name = 'Companies',
                                           schema = [['company_id', xgt.INT],
                                                     ['name', xgt.TEXT],
                                                     ['profit', xgt.INT]],
                                           key = 'company_id')

friend_frame = server.create_edge_frame(name = 'FriendsWith',
                                        source = 'Employees',
                                        target = 'Employees',
                                        schema = [['source_id', xgt.INT],
                                                  ['target_id', xgt.INT]],
                                        source_key = 'source_id',
                                        target_key = 'target_id')

work_frame = server.create_edge_frame(name = 'WorksFor',
                                      source = 'Employees',
                                      target = 'Companies',
                                      schema = [['source_id', xgt.INT],
                                                ['target_id', xgt.INT],
                                                ['position', xgt.TEXT],
                                                ['years' , xgt.INT]],
                                      source_key = 'source_id',
                                      target_key = 'target_id')

The frame Employees contains vertices representing people. The key column person_id uniquely identifies each vertex, while the columns name and start_date give additional information, which need not be unique. Note that because the frame schema of Employees has three columns, each vertex within it must have three properties, one of them the unique key. The Companies frame contains vertices representing companies, with the company_id column uniquely identifying each vertex.

Frames FriendsWith and WorksFor define two edge frames, each connecting already defined vertex frames. With direct frame creation using create_edge_frame(), the source and target frames must be created before an edge frame is created. Note that an edge frame may connect vertices from the same vertex frame, as FriendsWith does, or may connect vertices from two different frames, as WorksFor does.

2.4.1.2. List Columns

Lists are supported for schema columns in all types of frames. To specify a list column in a schema the following is used:

['<column name>', xgt.LIST, <base type>, <depth>]

The column name can be any valid xGT column name. The type of the column must be xgt.LIST with the base type corresponding to any of xGT’s non-list types. The depth specifies how many levels of nesting are to be used. The default depth is 1, so specifying a simple list of integers can be done as follows:

['integer_list', xgt.LIST, xgt.INT]

A list of lists of integers is specified as:

['nested_integer_list', xgt.LIST, xgt.INT, 2]

List columns cannot be used as key columns for vertex frames and edge frames.

2.4.2. Implicit Frame Creation

xGT supports implicit frame creation from input data through the methods: create_vertex_frame_from_data(), create_edge_frame_from_data(), and create_table_frame_from_data(). These methods provide an easier way to create graphs in xGT, combining the frame creation and data loading steps, as well as automatically inferring the schema of the frame.

If the name parameter is a fully qualified name, the frame is created in the namespace specified. Otherwise, it is created in the default namespace. For more information on namespaces, see Frames and Namespaces. If a frame of the specified name already exists in the namespace, an error is thrown and the frame is not created again.

2.4.2.1. Implicit Vertex Frame Creation

To create a vertex frame from data using create_vertex_frame_from_data(), the data source, frame name, and key name must be passed in. The key parameter is the name of the column that contains the unique key identifying each vertex. If the data source is a CSV file with no header, then the key parameter should be an integer representing the position of the key column.

The example below shows creating a vertex frame named my_vertex from a pyarrow table, which must contain a column named id.

a_frame = conn.create_vertex_frame_from_data(pytab, name = 'my_vertex', key = 'id')

2.4.2.2. Implicit Edge Frame Creation

To create an edge frame from data using create_edge_frame_from_data(), the data, frame name, source vertex frame name, target vertex frame name, source key, and target key must be passed in.

The source and target parameters indicate which vertex frames the edge frame is connecting. Each of these is either the name of a vertex frame or a VertexFrame object. If either endpoint vertex frame does not already exist, it is created with a single column named “id”. Otherwise, the existing vertex frame is used if its schema is compatible.

The source_key and target_key parameters indicate which columns of the edge frame’s schema are used to identify the source vertex and target vertex, respectively, of the edge. These parameters should be string column names or in the case of a CSV file with no header, integer column positions. If the source key or target key columns of the data refer to any vertices not already in the source or target vertex frames, these will be implicitly inserted into the vertex frames.

The example below shows creating an edge frame named “worksFor” along with two endpoint vertex frames. The source vertex frame named Employees is created beforehand, but the target vertex frame named Companies is automatically created. For each edge, the column of data.csv named employee_id identifies the source vertex in the Employee vertex frame and the column of data.csv named department_id identifies the target vertex in the Companies vertex frame.

conn.create_vertex_frame_from_data('employees.csv', name = 'Employees', key = 'person_id')

conn.create_edge_frame_from_data('data.csv', name = 'worksFor',
                                 source = 'Employees', target = 'Companies',
                                 source_key = 'employee_id', target_key = 'department_id')

2.4.2.3. Implicit Table Frame Creation

To create a table frame from data using create_table_frame_from_data(), the data source and table frame name must be passed in. The example below shows creating a table frame named my_table from a Parquet file on the server filesystem.

a_frame = conn.create_table_frame_from_data('xgtd://data.parquet', name = 'my_table')

2.4.2.4. Schema Inference

The methods create_vertex_frame_from_data(), create_edge_frame_from_data(), and create_table_frame_from_data() allow the xGT frame schema, including the column names and data types, to be automatically inferred from the source data.

The schema inference can also be done directly with get_schema_from_data(). This returns an xGT frame schema from the data without creating any frame. This method can be used to see what schema would be used before creating a graph frame from the data. It also allows the user to adjust the schema before it is used to create a frame.

While xGT will try to infer data types, users who want more control over specifying the data types or column names of the schema can pass in a schema.

The example below shows using get_schema_from_data() to get a schema from the data, modifying its third column, and then passing the schema into create_vertex_frame_from_data(). The vertex frame is then created with a schema that includes a column of type IPADDRESS named “source_ip”.

# Get a schema from a CSV file.
my_schema = conn.get_schema_from_data('data.csv')
# Change the name of the third column in the schema.
my_schema[2][0] = 'source_ip'
# Change the data type of the third column of the schema.
my_schema[2][1] = xgt.IPADDRESS

# Create a frame from the data, but pass in the manually adjusted schema.
a_frame = conn.create_vertex_frame_from_data('data.csv', name = 'v', key = 'source_ip',
                                             schema = my_schema)

Any schema can be passed in as long as it is compatible with the data source. It does not have to be a schema returned by get_schema_from_data().

Scalar and list data types are supported in the input data for automatic schema inference. The following xGT data types can be automatically inferred from data:

  • BOOLEAN

  • INT

  • FLOAT

  • DATE

  • TIME

  • DATETIME

  • DURATION

  • TEXT

  • Nested lists of any of these types.

Schema inference will fail and return an XgtTypeError if the data contains any unsupported types.

2.4.2.5. Supported Data Sources

Frames can be automatically created from the following data types:

  • Arrow Table

  • Pandas DataFrame

  • CSV file(s)

  • Parquet file(s)

Arrow Tables

To create an xGT frame from an Arrow Table, use pyarrow. The data_source parameter must be a pyarrow Table.

The pyarrow Table schema must contain only the following data types:

  • boolean: maps to xGT BOOLEAN

  • signed integer: maps to xGT INT

  • unsigned integer: maps to xGT INT

  • float32: maps to xGT FLOAT

  • float64: maps to xGT FLOAT

  • decimal128: maps to xGT FLOAT

  • decimal256: maps to xGT FLOAT

  • time32: maps to xGT TIME

  • time64: maps to xGT TIME

  • date32: maps to xGT DATE

  • date64: maps to xGT DATETIME

  • timestamp: maps to xGT DATETIME

  • duration: maps to xGT DURATION

  • string: maps to xGT TEXT

  • list: maps to an xGT list of the appropriate underlying type.

Pandas DataFrames

To create an xGT frame from pandas, the data_source parameter must be a pandas DataFrame. The data types are first translated to pyarrow types, which then map to xGT types as described above.

CSV files

To create an xGT frame from a CSV file, the data source parameter should be a string with the file path. It can also be a list of file names, possibly with wildcards in them. The files can be on the file system local to the client, local to the server, or on the web. As described in Data Movement, a file on the server filesystem if indicated with the xgtd:// protocol. Files on the web would use the standard wed addressing(URL) to access them such as https:// or s3://.

When creating from a CSV file, the delimiter and header_mode parameters can be passed in to parse the file. Note that if the CSV file has no header, the column names of the inferred schema will be named by default: “f0”, “f1”, etc. Otherwise, the header is used to assign schema column names.

The data in the file is first translated to pyarrow types, which then map to xGT types as described above. CSV files with non-uniform columns aren’t supported.

List support for CSV files is limited to lists of depth one. That is, lists that have a scalar type as their immediate child element.

Parquet files

To create an xGT frame from a Parquet file, the data source parameter should be a string with the file path. It can also be a list of Parquet file names, possibly with wildcards them. The files can be on the file system local to the client, local to the server, or on the web. A file on the server filesystem if indicated with the xgtd:// protocol. Files on the web would use the standard wed addressing(URL) to access them such as https:// or s3://. For any additional restrictions of loading into xGT from a Parquet file, see Loading Parquet Files.

The data in the file is first translated to pyarrow types, which then map to xGT types as described above.

Wildcards for file names

Wildcards are supported for file names in both the xgt:// and xgtd:// protocols as described in section Getting Data into xGT. Wildcards will be expanded using normal filesystem expansion rules on the respective operating systems of the client and the server machines. An example of wildcards in file names is: xgt://mylocal.csv.* which would expand to files matching the prefix mylocal.csv. on the directory running the client script. Note that in the case of wildcard expansion, all files must have the same column names if column mapping is used. They should also have the same number of columns.

Lists of file names

Lists of file names are supported as a data source on client, server, and the web. An example of this would be [ xgtd://workers.parquet, xgtd://persons.parquet ]. All files in the list must have the same column names if column mapping is used. They should also have the same number of columns. Wildcards can be used in the list elements as well: [ xgtd://myfile.parquet, xgtd://mysource.parquet.* ]. Mixed client, server, or web file locations are supported, but will run as separate transactions.

2.4.3. Retrieving Frames

If a frame has been previously created and already exists on the server, the client can be used to retrieve a proxy object to that frame. This is done with get_vertex_frame(), get_edge_frame(), and get_table_frame(). Additionally, client API calls are provided to list the existing table, vertex and edge frames in a running xGT instance: get_vertex_frames(), get_edge_frames() and get_table_frames(). Those calls can be restricted to list the frames resident in particular namespaces.

2.4.4. Dropping Frames

All frame types can be deleted using the client API call drop_frame(). Note that a frame cannot be dropped if doing so creates an invalid graph. In order to drop a vertex frame, it must not be the source or target of any edge frame. This is the case even if the frames are empty. Frame Drop provides additional information about dropping frames.