matgraphdb.materials.nodes.materials.MaterialStore¶
- class MaterialStore(storage_path: str, initialize_kwargs: dict | None = None)¶
A class that inherits from NodeStore.
- __init__(storage_path: str, initialize_kwargs: dict | None = None)¶
- Parameters:
storage_path (str) – The path where ParquetDB files for this node type are stored.
Methods
__init__
(storage_path[, initialize_kwargs])backup_database
(backup_path)Creates a complete backup of the current dataset.
construct_table
(data[, schema, metadata, ...])Constructs a PyArrow Table from various input data formats.
copy_dataset
(dest_name[, overwrite])Creates a complete copy of the current dataset under a new name.
create
(data[, schema, metadata, ...])Adds new data to the database.
create_material
([structure, coords, ...])Adds a material to the database with optional symmetry and calculated properties.
create_materials
(materials[, schema, ...])Adds multiple materials to the database in a single transaction.
create_nodes
(data[, schema, metadata, ...])Adds new data to the database.
dataset_exists
([dataset_name])Check if a dataset exists and contains data.
delete
([ids, filters, columns, normalize_config])Deletes records or columns from the database.
delete_materials
([ids, columns, ...])Deletes records from the database by ID.
delete_nodes
([ids, columns, normalize_config])Deletes records from the database.
drop_dataset
()Removes the current dataset directory and reinitializes it with an empty table.
export_dataset
(file_path[, format])Exports the entire dataset to a single file in the specified format.
export_partitioned_dataset
(export_dir, ...)Exports the dataset to a partitioned format in the specified directory.
get_current_files
()Get a list of all Parquet files in the current dataset.
get_field_metadata
([field_names, return_bytes])Retrieves metadata for specified fields/columns in the dataset.
get_field_names
([columns, include_cols])Get the names of fields/columns in the dataset schema.
get_file_sizes
([verbose])Get the size of each file in the dataset in MB.
get_metadata
([return_bytes])Retrieves the metadata of the dataset table.
get_n_rows_per_row_group_per_file
([as_dict])Get the number of rows in each row group for each file.
get_number_of_row_groups_per_file
()Get the number of row groups in each Parquet file in the dataset.
get_number_of_rows_per_file
()Get the number of rows in each Parquet file in the dataset.
get_parquet_column_metadata_per_file
([as_dict])Get detailed metadata for each column in each row group in each file.
get_parquet_file_metadata_per_file
([as_dict])Get the metadata for each Parquet file in the dataset.
get_parquet_file_row_group_metadata_per_file
([...])Get detailed metadata for each row group in each Parquet file.
get_row_group_sizes_per_file
([verbose])Get the size of each row group for each file.
get_schema
()Get the PyArrow schema of the dataset.
get_serialized_metadata_size_per_file
()Get the serialized metadata size for each Parquet file in the dataset.
import_dataset
(file_path[, format])Imports data from a file into the dataset, supporting multiple file formats.
initialize
(**kwargs)is_empty
()Check if the dataset is empty.
merge_datasets
(source_tables, dest_table)normalize
([normalize_config])Normalize the dataset by restructuring files for optimal performance.
normalize_nodes
([normalize_config])Normalize the dataset by restructuring files for consistent row distribution.
preprocess_table
(table[, ...])Preprocesses a PyArrow table by flattening nested structures and handling special field types.
process_data_with_python_objects
(data[, ...])Processes input data and handles Python object serialization.
read
([ids, columns, filters, load_format, ...])Reads data from the database with flexible filtering and formatting options.
read_materials
([ids, columns, filters, ...])Reads data from the MaterialStore.
read_nodes
([ids, columns, filters, ...])Reads data from the database.
rename_dataset
(new_name[, remove_dest])Renames the current dataset directory and all contained files.
rename_fields
(name_map[, normalize_config])Rename fields/columns in the dataset using a mapping dictionary.
restore_database
(backup_path)Restores the dataset from a previous backup.
set_field_metadata
(fields_metadata[, update])Sets or updates metadata for specific fields/columns in the dataset.
set_metadata
(metadata[, update])Sets or updates the metadata of the dataset table.
sort_fields
([normalize_config])Sort the fields/columns of the dataset alphabetically by name.
summary
([show_column_names])Generate a formatted summary string containing database information and metadata.
to_nested
([nested_dataset_dir, ...])Converts the current dataset to a nested structure optimized for querying nested data.
transform
(transform_callable[, new_db_path, ...])Transform the entire dataset using a user-provided callable.
update
(data[, schema, metadata, ...])Updates existing records in the database by matching on specified key fields.
update_materials
(data[, schema, metadata, ...])Updates existing records in the database.
update_nodes
(data[, schema, metadata, ...])Updates existing records in the database.
update_schema
([field_dict, schema, ...])Updates the schema of the table in the dataset.
Attributes
basename_template
Get the template for parquet file basenames.
columns
Get the column names in the database.
dataset_name
Get the dataset name.
db_path
Get the database path.
n_columns
Get the number of columns in the database.
n_features
n_files
Get the number of parquet files in the database.
n_nodes
n_row_groups_per_file
Get the number of row groups in each parquet file.
n_rows
Get the total number of rows in the database.
n_rows_per_file
Get the number of rows in each parquet file.
n_rows_per_row_group_per_file
Get the number of rows in each row group for each file.
name_column
node_metadata_keys
serialized_metadata_size_per_file
Get the size of serialized metadata for each file.
storage_path