5.3.2. xgt.EdgeFrame

class xgt.EdgeFrame(conn, name, schema, source, target, source_key, target_key, container_id, commit_id)

EdgeFrame object represents a collection of edges held on the xGT server; it can be used to retrieve information about them and should not be instantiated directly by the user. Methods that return this object: Connection.get_frame(), Connection.get_frames() and Connection.create_edge_frame(). Each edge in an EdgeFrame shares the same properties, described in EdgeFrame.schema.

The source vertex of each edge in an EdgeFrame must belong to the same VertexFrame. This name of this VertexFrame is given by EdgeFrame.source_name. The targe vertex of each edge in an EdgeFrame must belong to the same VertexFrame. This name of this VertexFrame is given by EdgeFrame.target_name.

For each edge in the EdgeFrame, its source vertex is identified by the edge property name given by EdgeFrame.source_key, which is one of the properties listed in the schema. The edge target vertex is identified by the property name given by EdgeFrame.target_key.

Parameters:
  • conn (Connection) – An open connection to an xGT server.

  • name (str) – Fully qualified name of the edge frame, including the namespace.

  • schema (list of lists) – List of lists associating property names with xGT data types. Each edge in the EdgeFrame will have these properties.

  • source (str or VertexFrame) – The name of a VertexFrame or a VertexFrame object. The source vertex of each edge in this EdgeFrame will belong to this VertexFrame.

  • target (str or VertexFrame) – The name of a VertexFrame or a VertexFrame object. The target vertex of each edge in this EdgeFrame will belong to this VertexFrame.

  • source_key (str) – The edge property name that identifies the source vertex of an edge. This is one of the properties from the schema.

  • target_key (str) – The edge property name that identifies the target vertex of an edge. This is one of the properties from the schema.

Examples

>>> import xgt
>>> conn = xgt.Connection()
>>> e1 = conn.create_edge_frame(
...        name = 'WorksFor',
...        schema = [['srcid', xgt.INT],
...                  ['role', xgt.TEXT],
...                  ['trgid', xgt.INT]],
...        source = 'People',
...        target = 'Companies',
...        source_key = 'srcid',
...        target_key = 'trgid')
>>> e2 = conn.get_frame('RelatedTo') # An existing edge frame
>>> print(e1.name, e2.name)

Methods

__init__(conn, name, schema, source, target, ...)

Constructor for EdgeFrame.

append_columns(new_columns)

Appends columns to the frame's schema.

delete_columns(columns)

Deletes columns from the frame's schema.

get_data([offset, length, rows, columns, ...])

Returns frame data starting at a given offset and spanning a given length.

insert(data[, row_labels, ...])

Inserts data rows.

load(paths[, header_mode, record_history, ...])

Loads data from one or more files specified in the list of paths.

modify_columns(new_columns)

Modifies the frame's columns.

save(path[, offset, length, headers, ...])

Writes the rows from the frame to a file in the location indicated by the path parameter.

update_columns(columns, data[, offset, ...])

Updates the entries for columns in a frame.

Attributes

connection

The connection used when constructing the frame.

name

Name of the frame.

num_edges

The number of edges in the EdgeFrame.

num_rows

The number of rows in the frame.

row_label_universe

The universe of row security labels that can be attached to rows of this frame.

schema

The frame's property names and types.

source_key

The edge property name that identifies the source vertex of an edge.

source_key_column

The column position of the frame's source key.

source_name

The name of the source vertex frame.

target_key

The edge property name that identifies the target vertex of an edge.

target_key_column

The column position of the frame's target key.

target_name

The name of the target vertex frame.

user_permissions

The actions a user is allowed to take on this frame.

__init__(conn, name, schema, source, target, source_key, target_key, container_id, commit_id)

Constructor for EdgeFrame. Called when EdgeFrame is created.

append_columns(new_columns: Iterable[Sequence]) None

Appends columns to the frame’s schema. The new columns are given as schema entries and must have names unique from the existing column names. Entries in new columns are initialized to null values. If new_columns is None or has no entries, the function just returns.

Added in version 1.15.0.

Parameters:

new_columns (iterable of sequence) – The columns to append to the frame. Given as a iterable over sequences representing valid column entries.

Raises:
  • XgtTypeError – If new_columns is not an Iterable or None or an entry is not a tuple or list giving a valid schema entry.

  • XgtValueError – If a new column has a duplicate name.

property connection: Connection

The connection used when constructing the frame.

Type:

Connection object

delete_columns(columns: Iterable[int | str]) None

Deletes columns from the frame’s schema. The columns are given as a mixed list of column positions and schema column names. Duplicates of the same column are accepted and behave as if the column were given once. If columns is None or has no entries, the function just returns.

Added in version 1.15.0.

Parameters:

columns (iterable of int or str) – The columns to delete. Given as an iterable over mixed column positions and schema column names.

Raises:
  • XgtTypeError – If columns is not an Iterable or an entry is not an int or str.

  • XgtValueError – If a position is out-of-bounds, a name is not in the schema, or a key column is deleted.

get_data(offset: int = 0, length: int = None, rows: Iter[int] = None, columns: Iter[int, str] = None, format: str = 'python', include_row_labels: bool = False, row_label_column_header: str = None, duration_as_interval: bool = False, row_filter: str = None) Seq[Seq] | pandas.DataFrame | pyarrow.Table

Returns frame data starting at a given offset and spanning a given length.

Parameters:
  • offset (int) – Position (index) of the first row to be retrieved. Cannot be given with rows. Optional. Default=0.

  • length (int) – Maximum number of rows to be retrieved starting from the row indicated by offset. A value of ‘None’ means ‘all rows’ on and after the offset. Cannot be given with rows. Optional. Default=None.

  • rows (Iterable of int) –

    The rows to retrieve. A value of ‘None’ means all rows. Cannot be given with either offset or length. Optional. Default=None.

    Added in version 1.16.0.

  • columns (Iterable of int or str) –

    The columns to retrieve. Given as an iterable over mixed column positions and schema column names. A value of ‘None’ means all columns. Optional. Default=None.

    Added in version 1.14.0.

  • format (str) –

    Selects the data format returned: a Python list of list, a pandas Dataframe, or an Apache Arrow Table. Must be one of ‘python’, ‘pandas’, or ‘arrow’. Optional. Default=’python’.

    Added in version 1.14.0.

  • include_row_labels (bool) – Indicates whether the security labels for each row should be egested along with the row. Default=False.

  • row_label_column_header (str) – The header column name to use for all row labels if include_row_labels is true and headers is true. Ignored for python format. Default=None.

  • row_filter (str) –

    TQL fragment used to filter, modify and parameterize the frame’s data to produce the row data returned to the client. Default=None.

    Added in version 1.15.0.

Return type:

list of lists, pandas DataFrame, or Apache Arrow Table

Raises:
  • XgtNameError – If the frame does not exist on the server.

  • XgtSecurityError – If the user does not have required permissions for this action.

  • XgtTransactionError – If a conflict with another transaction occurs.

  • ValueError – If parameter is out of bounds or invalid format given.

  • OverflowError – If data is out of bounds when converting.

insert(data: Seq[Seq] | pandas.DataFrame | pyarrow.Table, row_labels: Seq[str] | None = None, row_label_columns: Seq[str] | None = None, source_vertex_row_labels: Seq[str] | None = None, target_vertex_row_labels: Seq[str] | None = None, column_mapping: Map[str, str | int] | None = None, suppress_errors: bool = False, row_filter: str | None = None, chunk_size: int | None = 10000) Job

Inserts data rows. The properties of the new data must match the schema in both order and type.

Parameters:
  • data (list, pandas DataFrame, or pyarrow Table) – Data represented by a list of lists of data items, by a pandas DataFrame or by a pyarrow Table.

  • row_labels (list) – A list of security labels to attach to each row inserted. Each label must have been passed in to the row_label_universe parameter when creating the frame. Note: Only one of row_labels and row_label_columns must be passed.

  • row_label_columns (list) – A list of integer column indices indicating which columns in the input data contain security labels to attach to the inserted row. Note: Only one of row_labels and row_label_columns must be passed.

  • source_vertex_row_labels (list) – A list of security labels to attach to each source vertex that is implicitly inserted. Each label must have been passed in to the row_label_universe parameter when creating the frame.

  • target_vertex_row_labels (list) – A list of security labels to attach to each target vertex that is implicitly inserted. Each label must have been passed in to the row_label_universe parameter when creating the frame.

  • column_mapping (dictionary) –

    Maps the frame column names to input columns for the ingest. The key of each element is a frame column name. The value is either the name of the column (from the Pandas frame or xGT schema column name for lists) or the file column index.

    Added in version 1.15.0.

  • suppress_errors (bool) –

    If true, will continue to insert data if an ingest error is encountered, placing the first 1000 errors in the job history. If false, stops on first error and raises. Defaults to False.

    Added in version 1.11.0.

  • row_filter (str) –

    TQL fragment used to filter, modify and parameterize the raw data from the input to produce the row data fed to the frame.

    Added in version 1.15.0.

  • chunk_size (int) –

    Number of rows to transfer in a single Arrow chunk between the client and the server.

    Added in version 1.16.0.

Returns:

A Job object representing the job that has executed the insert.

Return type:

Job

Raises:
  • XgtIOError – If there are errors in the data being inserted or some data could not be inserted into the frame.

  • XgtNameError – If the frame does not exist on the server.

  • XgtSecurityError – If the user does not have required permissions for this action.

  • XgtTransactionError – If a conflict with another transaction occurs.

load(paths: Sequence[str] | str, header_mode: str = 'none', record_history: bool = True, row_labels: Sequence[str] | None = None, row_label_columns: Sequence[str] | None = None, source_vertex_row_labels: Sequence[str] | None = None, target_vertex_row_labels: Sequence[str] | None = None, delimiter: str = ',', column_mapping: Mapping[str, str | int] | None = None, suppress_errors: bool = False, row_filter: str | None = None) Job

Loads data from one or more files specified in the list of paths. These files may be CSV, Parquet, or compressed CSV. Some limitations exist for compressed CSV. See docs.trovares.com for more details. Each path may have its own protocol as described below.

Parameters:
  • paths (list or str) –

    A single path or a list of paths to files. Local or server paths may contain wildcards. Wildcard expressions can contain *, ?, range sets, and negation. See docs.trovares.com for more details.

    Syntax for one file path

    Resource type

    Path syntax

    local to Python:

    ’<path to file>’ ‘xgt://<path to file>’

    xGT server:

    ’xgtd://<path to file>’

    AWS S3:

    ’s3://<path to file>’

    https site:

    https://<path to file>’

    http site:

    http://<path to file>’

    ftps server:

    ’ftps://<path to file>’

    ftp server:

    ftp://<path to file>’

  • header_mode (str) –

    Indicates how the file header should be processed:
    • HeaderMode.NONE: No header exists.

    • HeaderMode.IGNORE: Ignore the first line containing the header.

    • HeaderMode.NORMAL: Process the header in non-strict mode. If a schema column is missing, a null value is ingested for that schema column. Any file column whose name does not correspond to a schema column or a security label column is ignored.

    • HeaderMode.STRICT: Process the header in strict mode. The name of each header column should correspond to a schema column, a security label column, or be named IGNORE. Each schema column must appear in the file.

    Optional. Default=HeaderMode.NONE. Only applies to CSV files.

    Added in version 1.11.0.

  • record_history (bool) – If true, records the history of the job.

  • row_labels (list) – A list of security labels to attach to each row inserted with the load. Each label must have been passed in to the row_label_universe parameter when creating the frame. Note: Only one of row_labels and row_label_columns must be passed.

  • row_label_columns (list) – A list of columns indicating which columns in the CSV file contain security labels to attach to the inserted row. If the header mode is NONE or IGNORE, this must be a list of integer column indices. If the header mode is NORMAL or STRICT, this must be a list of string column names. Note: Only one of row_labels and row_label_columns must be passed.

  • source_vertex_row_labels (list) – A list of security labels to attach to each source vertex that is implicitly inserted. Each label must have been passed in to the row_label_universe parameter when creating the frame.

  • target_vertex_row_labels (list) – A list of security labels to attach to each target vertex that is implicitly inserted. Each label must have been passed in to the row_label_universe parameter when creating the frame.

  • delimiter (str) – Single character delimiter for CSV data. Only applies to CSV files.

  • column_mapping (dictionary) –

    Maps the frame column names to file columns for the ingest. The key of each element is a frame column name. The value is either the name of the file column (from the file header) or the file column index. If file column names are used, the header_mode must be NORMAL. If only file column indices are used, the header_mode can be NORMAL, NONE, or IGNORE.

    Added in version 1.15.0.

  • suppress_errors (bool) –

    If true, continues to load data if an ingest error is encountered, placing the first 1000 errors into the job history. If false, stops on first error and raises. Defaults to False.

    Added in version 1.11.0.

  • row_filter (str) –

    TQL fragment used to filter, modify and parameterize the raw data from the input to produce the row data fed to the frame.

    Added in version 1.15.0.

Returns:

A Job object representing the job that has executed the load.

Return type:

Job

Raises:
  • XgtIOError – If a file specified cannot be opened or if there are errors inserting any lines in the file into the frame.

  • XgtNameError – If the frame does not exist on the server.

  • XgtSecurityError – If the user does not have required permissions for this action.

  • XgtTransactionError – If a conflict with another transaction occurs.

modify_columns(new_columns: Iterable[int | str | Sequence]) None

Modifies the frame’s columns. Can be used to add, delete, or reorder columns. The new columns are given as a list or tuple of mixed column positions, schema column names, or schema entries. Added columns must be given as schema entries. Column positions or names must be valid in the current schema.

Any columns in the current schema that are not in the new columns are deleted. Schema entries in the new columns with names not in the current schema are added. The columns are reordered to the order given in the new columns. Entries in added columns are initialized to null values.

The types of columns in the current schema cannot be changed. Key columns must be in the new columns.

Added in version 1.15.0.

Parameters:

new_columns (iterable of int, str, or sequence) – The new schema to apply to the frame. Given as an iterable over mixed column positions, schema column names, or sequences representing valid column entries.

Raises:
  • XgtTypeError – If new_columns is not an Iterable, is empty, or an entry is not an int, str, or list or tuple giving a valid schema entry. If the type of an existing column is changed.

  • XgtValueError – If a position is out-of-bounds, a name is not in the schema, a key column is not included in the new schema, or a column name is duplicated.

property name: str

Name of the frame.

Type:

str

property num_edges: int

The number of edges in the EdgeFrame.

Type:

int

property num_rows: int

The number of rows in the frame.

Type:

int

property row_label_universe: list[str]

The universe of row security labels that can be attached to rows of this frame. Only labels that are also in the authenticated user’s label set are returned.

Type:

list of str

save(path: str, offset: int = 0, length: int | None = None, headers: bool = False, record_history: bool = True, include_row_labels: bool = False, row_label_column_header: str | None = None, preserve_order: bool = False, number_of_files: int = 1, duration_as_interval: bool = False, delimiter: str = ',', row_filter: str | None = None, windows_newline: bool = False)

Writes the rows from the frame to a file in the location indicated by the path parameter. Will save as a Parquet file if the extension is .parquet, otherwise saves as a CSV.

Parameters:
  • path (str) –

    Path to a file.

    Syntax for one file path

    Resource type

    Path syntax

    local to Python:

    ’<path to file>’ ‘xgt://<path to file>’

    xGT server:

    ’xgtd://<path to file>’

    AWS S3 (Beta):

    ’s3://<path to file>’

  • offset (int) – Position (index) of the first row to be retrieved. Optional. Default=0.

  • length (int) – Maximum number of rows to be retrieved. Optional. Default=None.

  • headers (boolean) – Indicates if headers should be added. Optional. Default=False.

  • record_history (bool) – If true, records the history of the job.

  • include_row_labels (bool) – Indicates whether the security labels for each row should be egested along with the row.

  • row_label_column_header (str) – The header column name to use for all row labels if include_row_labels is true and headers is true.

  • preserve_order (boolean) – Indicates if the output should keep the order the frame is stored in. Optional. Default=False.

  • number_of_files (int) – Number of files to save. Only works with the xgtd:// protocol. Optional. Default=1.

  • duration_as_interval (bool) –

    For Parquet files duration will be saved as the logical Interval type instead of the default 64 bit unsigned integer type. Only works with the xgtd:// protocol. Optional. Default=False.

    Added in version 1.13.0.

  • delimiter (str) –

    Single character delimiter for CSV data. Only applies to CSV files.

    Added in version 1.13.1.

  • row_filter (str) –

    TQL fragment used to filter, modify and parameterize the frame’s data to produce the row data saved in the file.

    Added in version 1.15.0.

  • windows_newline (bool) –

    False indicates CSV files should use a Unix newline of line feed (LF). True indicates a Windows newline of carriage return (CR), line feed (LF). Only applies to CSV files. Optional. Default=False.

    Added in version 2.0.4.

Returns:

A Job object representing the job that has executed the save.

Return type:

Job

Raises:
property schema: list[list]

The frame’s property names and types.

Type:

list of lists

property source_key: str

The edge property name that identifies the source vertex of an edge.

Type:

str

property source_key_column: str

The column position of the frame’s source key.

Type:

str

property source_name: str

The name of the source vertex frame.

Type:

str

property target_key: str

The edge property name that identifies the target vertex of an edge.

Type:

str

property target_key_column: str

The column position of the frame’s target key.

Type:

str

property target_name: str

The name of the target vertex frame.

Type:

str

update_columns(columns: Iter[int | str], data: Seq[Seq] | pandas.DataFrame | pyarrow.Table, offset: int | None = 0, chunk_size: int | None = 10000) None

Updates the entries for columns in a frame. The columns are specified by position or name. The data used for the update can be python data, a pandas dataframe or a pyarrow table (Beta).

Added in version 1.16.0.

Parameters:
  • columns (iterable of str or int) – The columns to update. Given as the columns’ names or positions.

  • data (list, pandas DataFrame, or pyarrow Table) – Data represented by a list of lists of data items, by a pandas DataFrame or by a pyarrow Table.

  • offset (int) – Position (index) of the first row to update. Optional. Default = 0.

  • chunk_size (int) – Number of rows to transfer in a single Arrow chunk between the client and the server.

Raises:
  • XgtTypeError – If column is not a str or int.

  • XgtValueError – If the column’s name or position is not in the schema or if the offset is invalid.

property user_permissions: Mapping[str, bool]

The actions a user is allowed to take on this frame.

The actions are:

Key

Description

create_rows

True if the user can add rows to the frame.

update_rows

True if the user can update columns/properties of rows.

delete_rows

True if the user can delete rows of the frame.

delete_frame

True if the user can delete the frame.

Added in version 2.0.1.

Type:

map of [str, bool]