matgraphdb.materials.nodes.materials.MaterialStore¶

class MaterialStore(storage_path: str, initialize_kwargs: dict | None = None)¶

A class that inherits from NodeStore.

__init__(storage_path: str, initialize_kwargs: dict | None = None)¶

Parameters:: storage_path (str) – The path where ParquetDB files for this node type are stored.

Methods

`__init__`(storage_path[, initialize_kwargs])
`backup_database`(backup_path)	Creates a complete backup of the current dataset.
`construct_table`(data[, schema, metadata, ...])	Constructs a PyArrow Table from various input data formats.
`copy_dataset`(dest_name[, overwrite])	Creates a complete copy of the current dataset under a new name.
`create`(data[, schema, metadata, ...])	Adds new data to the database.
`create_material`([structure, coords, ...])	Adds a material to the database with optional symmetry and calculated properties.
`create_materials`(materials[, schema, ...])	Adds multiple materials to the database in a single transaction.
`create_nodes`(data[, schema, metadata, ...])	Adds new data to the database.
`dataset_exists`([dataset_name])	Check if a dataset exists and contains data.
`delete`([ids, filters, columns, normalize_config])	Deletes records or columns from the database.
`delete_materials`([ids, columns, ...])	Deletes records from the database by ID.
`delete_nodes`([ids, columns, normalize_config])	Deletes records from the database.
`drop_dataset`()	Removes the current dataset directory and reinitializes it with an empty table.
`export_dataset`(file_path[, format])	Exports the entire dataset to a single file in the specified format.
`export_partitioned_dataset`(export_dir, ...)	Exports the dataset to a partitioned format in the specified directory.
`get_current_files`()	Get a list of all Parquet files in the current dataset.
`get_field_metadata`([field_names, return_bytes])	Retrieves metadata for specified fields/columns in the dataset.
`get_field_names`([columns, include_cols])	Get the names of fields/columns in the dataset schema.
`get_file_sizes`([verbose])	Get the size of each file in the dataset in MB.
`get_metadata`([return_bytes])	Retrieves the metadata of the dataset table.
`get_n_rows_per_row_group_per_file`([as_dict])	Get the number of rows in each row group for each file.
`get_number_of_row_groups_per_file`()	Get the number of row groups in each Parquet file in the dataset.
`get_number_of_rows_per_file`()	Get the number of rows in each Parquet file in the dataset.
`get_parquet_column_metadata_per_file`([as_dict])	Get detailed metadata for each column in each row group in each file.
`get_parquet_file_metadata_per_file`([as_dict])	Get the metadata for each Parquet file in the dataset.
`get_parquet_file_row_group_metadata_per_file`([...])	Get detailed metadata for each row group in each Parquet file.
`get_row_group_sizes_per_file`([verbose])	Get the size of each row group for each file.
`get_schema`()	Get the PyArrow schema of the dataset.
`get_serialized_metadata_size_per_file`()	Get the serialized metadata size for each Parquet file in the dataset.
`import_dataset`(file_path[, format])	Imports data from a file into the dataset, supporting multiple file formats.
`initialize`(**kwargs)
`is_empty`()	Check if the dataset is empty.
`merge_datasets`(source_tables, dest_table)
`normalize`([normalize_config])	Normalize the dataset by restructuring files for optimal performance.
`normalize_nodes`([normalize_config])	Normalize the dataset by restructuring files for consistent row distribution.
`preprocess_table`(table[, ...])	Preprocesses a PyArrow table by flattening nested structures and handling special field types.
`process_data_with_python_objects`(data[, ...])	Processes input data and handles Python object serialization.
`read`([ids, columns, filters, load_format, ...])	Reads data from the database with flexible filtering and formatting options.
`read_materials`([ids, columns, filters, ...])	Reads data from the MaterialStore.
`read_nodes`([ids, columns, filters, ...])	Reads data from the database.
`rename_dataset`(new_name[, remove_dest])	Renames the current dataset directory and all contained files.
`rename_fields`(name_map[, normalize_config])	Rename fields/columns in the dataset using a mapping dictionary.
`restore_database`(backup_path)	Restores the dataset from a previous backup.
`set_field_metadata`(fields_metadata[, update])	Sets or updates metadata for specific fields/columns in the dataset.
`set_metadata`(metadata[, update])	Sets or updates the metadata of the dataset table.
`sort_fields`([normalize_config])	Sort the fields/columns of the dataset alphabetically by name.
`summary`([show_column_names])	Generate a formatted summary string containing database information and metadata.
`to_nested`([nested_dataset_dir, ...])	Converts the current dataset to a nested structure optimized for querying nested data.
`transform`(transform_callable[, new_db_path, ...])	Transform the entire dataset using a user-provided callable.
`update`(data[, schema, metadata, ...])	Updates existing records in the database by matching on specified key fields.
`update_materials`(data[, schema, metadata, ...])	Updates existing records in the database.
`update_nodes`(data[, schema, metadata, ...])	Updates existing records in the database.
`update_schema`([field_dict, schema, ...])	Updates the schema of the table in the dataset.

Attributes

`basename_template`	Get the template for parquet file basenames.
`columns`	Get the column names in the database.
`dataset_name`	Get the dataset name.
`db_path`	Get the database path.
`n_columns`	Get the number of columns in the database.
`n_features`
`n_files`	Get the number of parquet files in the database.
`n_nodes`
`n_row_groups_per_file`	Get the number of row groups in each parquet file.
`n_rows`	Get the total number of rows in the database.
`n_rows_per_file`	Get the number of rows in each parquet file.
`n_rows_per_row_group_per_file`	Get the number of rows in each row group for each file.
`name_column`
`node_metadata_keys`
`serialized_metadata_size_per_file`	Get the size of serialized metadata for each file.
`storage_path`