7.3.2. xgt.EdgeFrame¶
- class xgt.EdgeFrame(conn, name, schema, source, target, source_key, target_key)¶
EdgeFrame object represents a collection of edges held on the xGT server; it can be used to retrieve information about them and should not be instantiated directly by the user. Methods that return this object: Connection.get_edge_frame(), Connection.get_edge_frames() and Connection.create_edge_frame(). Each edge in an EdgeFrame shares the same properties, described in EdgeFrame.schema.
The source vertex of each edge in an EdgeFrame must belong to the same VertexFrame. This name of this VertexFrame is given by EdgeFrame.source_name. The targe vertex of each edge in an EdgeFrame must belong to the same VertexFrame. This name of this VertexFrame is given by EdgeFrame.target_name.
For each edge in the EdgeFrame, its source vertex is identified by the edge property name given by EdgeFrame.source_key, which is one of the properties listed in the schema. The edge target vertex is identified by the property name given by EdgeFrame.target_key.
- Parameters:
conn (Connection) – An open connection to an xGT server.
name (str) – Fully qualified name of the edge frame, including the namespace.
schema (list of pairs) – List of pairs associating property names with xGT data types. Each edge in the EdgeFrame will have these properties.
source (str or VertexFrame) – The name of a VertexFrame or a VertexFrame object. The source vertex of each edge in this EdgeFrame will belong to this VertexFrame.
target (str or VertexFrame) – The name of a VertexFrame or a VertexFrame object. The target vertex of each edge in this EdgeFrame will belong to this VertexFrame.
source_key (str) – The edge property name that identifies the source vertex of an edge. This is one of the properties from the schema.
target_key (str) – The edge property name that identifies the target vertex of an edge. This is one of the properties from the schema.
Examples
>>> import xgt >>> conn = xgt.Connection() >>> e1 = conn.create_edge_frame( ... name = 'WorksFor', ... schema = [['srcid', xgt.INT], ... ['role', xgt.TEXT], ... ['trgid', xgt.INT]], ... source = 'People', ... target = 'Companies', ... source_key = 'srcid', ... target_key = 'trgid') >>> e2 = conn.get_edge_frame('RelatedTo') # An existing edge frame >>> print(e1.name, e2.name)
Methods
__init__
(conn, name, schema, source, target, ...)Constructor for EdgeFrame.
get_data
([offset, length, include_row_labels])Returns frame data starting at a given offset and spanning a given length.
get_data_arrow
([offset, length, ...])Returns an Apache Arrow Table containing frame data starting at a given offset and spanning a given length.
get_data_pandas
([offset, length, ...])Returns a Pandas DataFrame containing frame data starting at a given offset and spanning a given length.
insert
(data[, row_labels, ...])Inserts data rows.
load
(paths[, header_mode, headerMode, ...])Loads data from one or more files specified in the list of paths.
save
(path[, offset, length, headers, ...])Writes the rows from the frame to a file in the location indicated by the path parameter.
Attributes
The connection used when constructing the frame.
Name of the frame.
Gets the number of edges in the EdgeFrame.
Gets the number of rows in the frame.
Gets the universe of row security labels that can be attached to rows of this frame.
Gets the property names and types of the frame.
The edge property name that identifies the source vertex of an edge.
Gets the name of the source vertex frame.
The edge property name that identifies the target vertex of an edge.
Gets the name of the target vertex frame.
- __init__(conn, name, schema, source, target, source_key, target_key)¶
Constructor for EdgeFrame. Called when EdgeFrame is created.
- property connection¶
The connection used when constructing the frame.
- Type:
Connection object
- get_data(offset=0, length=None, include_row_labels=False)¶
Returns frame data starting at a given offset and spanning a given length.
- Parameters:
offset (int) – Position (index) of the first row to be retrieved. Optional. Default=0.
length (int) – Maximum number of rows to be retrieved starting from the row indicated by offset. A value of ‘None’ means ‘all rows’ on and after the offset. Optional. Default=None.
include_row_labels (bool) – Indicates whether the security labels for each row should be egested along with the row. (since version 1.5.0)
- Return type:
list of lists
- Raises:
XgtNameError – If the frame does not exist on the server.
XgtSecurityError – If the user does not have required permissions for this action.
XgtTransactionError – If a conflict with another transaction occurs.
ValueError – If parameter is out of bounds.
- get_data_arrow(offset=0, length=None, include_row_labels=False, row_label_column_header=None)¶
Returns an Apache Arrow Table containing frame data starting at a given offset and spanning a given length. (since version 1.11.0)
- Parameters:
offset (int) – Position (index) of the first row to be retrieved. Optional. Default=0.
length (int) – Maximum number of rows to be retrieved starting from the row indicated by offset. A value of ‘None’ means ‘all rows’ on and after the offset. Optional. Default=None.
include_row_labels (bool) – Indicates whether the security labels for each row should be egested along with the row.
row_label_column_header (str) – The header column name to use for all row labels if include_row_labels is true and headers is true.
- Return type:
Apache Arrow Table
- Raises:
XgtNameError – If the frame does not exist on the server.
XgtSecurityError – If the user does not have required permissions for this action.
XgtTransactionError – If a conflict with another transaction occurs.
ValueError – If parameter is out of bounds.
- get_data_pandas(offset=0, length=None, include_row_labels=False, row_label_column_header=None)¶
Returns a Pandas DataFrame containing frame data starting at a given offset and spanning a given length.
- Parameters:
offset (int) – Position (index) of the first row to be retrieved. Optional. Default=0.
length (int) – Maximum number of rows to be retrieved starting from the row indicated by offset. A value of ‘None’ means ‘all rows’ on and after the offset. Optional. Default=None.
include_row_labels (bool) – Indicates whether the security labels for each row should be egested along with the row. (since version 1.5.0)
row_label_column_header (str) – The header column name to use for all row labels if include_row_labels is true and headers is true. (since version 1.5.0)
- Return type:
Pandas DataFrame
- Raises:
XgtNameError – If the frame does not exist on the server.
XgtSecurityError – If the user does not have required permissions for this action.
XgtTransactionError – If a conflict with another transaction occurs.
ValueError – If parameter is out of bounds.
- insert(data, row_labels=None, row_label_columns=None, source_vertex_row_labels=None, target_vertex_row_labels=None, suppress_errors=False)¶
Inserts data rows. The properties of the new data must match the schema in both order and type.
- Parameters:
data (list or Pandas dataframe) – Data represented by a list of lists of data items or by a Pandas Dataframe.
row_labels (list) – A list of security labels to attach to each row inserted. Each label must have been passed in to the row_label_universe parameter when creating the frame. Note: Only one of row_labels and row_label_columns must be passed. (since version 1.5.0)
row_label_columns (list) – A list of integer column indices indicating which columns in the input data contain security labels to attach to the inserted row. Note: Only one of row_labels and row_label_columns must be passed. (since version 1.5.0)
source_vertex_row_labels (list) – A list of security labels to attach to each source vertex that is implicitly inserted. Each label must have been passed in to the row_label_universe parameter when creating the frame. (since version 1.5.0)
target_vertex_row_labels (list) – A list of security labels to attach to each target vertex that is implicitly inserted. Each label must have been passed in to the row_label_universe parameter when creating the frame. (since version 1.5.0)
suppress_errors (bool) – If true, continues to load data if an ingest error is encountered, placing the first 1000 errors into the job history. If false, stops on first error and raises. Defaults to False. (since version 1.11.0)
- Returns:
A Job object representing the job that has executed the insert.
- Return type:
- Raises:
XgtIOError – If there are errors in the data being inserted or some data could not be inserted into the frame.
XgtNameError – If the frame does not exist on the server.
XgtSecurityError – If the user does not have required permissions for this action.
XgtTransactionError – If a conflict with another transaction occurs.
- load(paths, header_mode='none', headerMode=None, record_history=True, row_labels=None, row_label_columns=None, source_vertex_row_labels=None, target_vertex_row_labels=None, delimiter=',', frame_to_file_column_mapping=None, suppress_errors=False)¶
Loads data from one or more files specified in the list of paths. These files may be CSV, Parquet, or compressed CSV. Some limitations exist for Parquet and compressed CSV. See docs.trovares.com for more details. Each path may have its own protocol as described below.
- Parameters:
paths (list or string) –
A single path or a list of paths to files.
Syntax for one file path
Resource type
Path syntax
local to Python:
’<path to file>’ ‘xgt://<path to file>’
xGT server:
’xgtd://<path to file>’
AWS s3:
’s3://<path to file>’
https site:
’https://<path to file>’
http site:
’http://<path to file>’
ftps server:
’ftps://<path to file>’
ftp server:
’ftp://<path to file>’
header_mode (str) –
- Indicates how the file header should be processed:
HeaderMode.NONE: No header exists.
HeaderMode.IGNORE: Ignore the first line containing the header.
HeaderMode.NORMAL: Process the header in non-strict mode. If a schema column is missing, a null value is ingested for that schema column. Any file column whose name does not correspond to a schema column or a security label column is ignored.
HeaderMode.STRICT: Process the header in strict mode. The name of each header column should correspond to a schema column, a security label column, or be named IGNORE. Each schema column must appear in the file.
Optional. Default=HeaderMode.NONE. Only applies to CSV files. (since version 1.11.0)
headerMode (str) –
- Indicates if the CSV files contain headers:
HeaderMode.NONE
HeaderMode.IGNORE
HeaderMode.NORMAL
HeaderMode.STRICT
Optional. Default=HeaderMode.NONE. Only applies to CSV files. Same as header_mode. (DEPRECATED)
record_history (bool) – If true, records the history of the job. (since version 1.4.0)
row_labels (list) – A list of security labels to attach to each row inserted with the load. Each label must have been passed in to the row_label_universe parameter when creating the frame. Note: Only one of row_labels and row_label_columns must be passed. (since version 1.5.0)
row_label_columns (list) – A list of columns indicating which columns in the CSV file contain security labels to attach to the inserted row. If the header mode is NONE or IGNORE, this must be a list of integer column indices. If the header mode is NORMAL or STRICT, this must be a list of string column names. Note: Only one of row_labels and row_label_columns must be passed. (since version 1.5.0)
source_vertex_row_labels (list) – A list of security labels to attach to each source vertex that is implicitly inserted. Each label must have been passed in to the row_label_universe parameter when creating the frame. (since version 1.5.0)
target_vertex_row_labels (list) – A list of security labels to attach to each target vertex that is implicitly inserted. Each label must have been passed in to the row_label_universe parameter when creating the frame. (since version 1.5.0)
delimiter (str) – Delimiter for CSV data. Only applies to CSV files. (since version 1.5.1)
frame_to_file_column_mapping (dictionary) – Maps the frame column names to file columns for the ingest. The key of each element is frame’s column name. The value is either the name of the file column (from the file header) or the file column index. If file column names are used, the header_mode must be NORMAL. If only file column indices are used, the header_mode can be NORMAL, NONE, or IGNORE. (Beta since version 1.10)
suppress_errors (bool) – If true, continues to load data if an ingest error is encountered, placing the first 1000 errors into the job history. If false, stops on first error and raises. Defaults to False. (since version 1.11.0)
- Returns:
A Job object representing the job that has executed the load.
- Return type:
- Raises:
XgtIOError – If a file specified cannot be opened or if there are errors inserting any lines in the file into the frame.
XgtNameError – If the frame does not exist on the server.
XgtSecurityError – If the user does not have required permissions for this action.
XgtTransactionError – If a conflict with another transaction occurs.
- property name¶
Name of the frame.
- Type:
str
- property num_edges¶
Gets the number of edges in the EdgeFrame.
- Type:
int
- property num_rows¶
Gets the number of rows in the frame.
- Type:
int
- property row_label_universe¶
Gets the universe of row security labels that can be attached to rows of this frame. Only labels that are also in the authenticated user’s label set are returned.
- Type:
list of strings
- save(path, offset=0, length=None, headers=False, record_history=True, include_row_labels=False, row_label_column_header=None, preserve_order=False, number_of_files=1)¶
Writes the rows from the frame to a file in the location indicated by the path parameter. Will save as a Parquet file if the extension is .parquet, otherwise saves as a CSV.
- Parameters:
path (str) –
Path to a file.
Syntax for one file path
Resource type
Path syntax
local to Python:
’<path to file>’ ‘xgt://<path to file>’
xGT server:
’xgtd://<path to file>’
offset (int) – Position (index) of the first row to be retrieved. Optional. Default=0.
length (int) – Maximum number of rows to be retrieved. Optional. Default=None.
headers (boolean) – Indicates if headers should be added. Optional. Default=False.
record_history (bool) – If true, records the history of the job. (since version 1.4.0)
include_row_labels (bool) – Indicates whether the security labels for each row should be egested along with the row. (since version 1.5.0)
row_label_column_header (str) – The header column name to use for all row labels if include_row_labels is true and headers is true. (since version 1.5.0)
preserve_order (boolean) – Indicates if the output should keep the order the frame is stored in. Optional. Default=False. (since version 1.5.1)
number_of_files (int) – Number of files to save. Only works with the xgtd:// protocol. Optional. Default=1. (since version 1.10.0)
- Returns:
A Job object representing the job that has executed the save.
- Return type:
- Raises:
XgtIOError – If a file specified cannot be opened.
XgtNameError – If the frame does not exist on the server.
XgtSecurityError – If the user does not have required permissions for this action.
XgtTransactionError – If a conflict with another transaction occurs.
- property schema¶
Gets the property names and types of the frame.
- Type:
list of lists
- property source_key¶
The edge property name that identifies the source vertex of an edge.
- Type:
str
- property source_name¶
Gets the name of the source vertex frame.
- Type:
str
- property target_key¶
The edge property name that identifies the target vertex of an edge.
- Type:
str
- property target_name¶
Gets the name of the target vertex frame.
- Type:
str