deepdrivemd.data.utils
Data utility functions for handling HDF5 files.
Functions
|
Concatenate HDF5 files into a virtual HDF5 file. |
|
Create and return a virtual HDF5 file. |
|
Helper function for accessing data fields in a HDF5 file. |
- deepdrivemd.data.utils.concatenate_virtual_h5(input_file_names: List[str], output_name: str, fields: Optional[List[str]] = None) None
Concatenate HDF5 files into a virtual HDF5 file.
Concatenates a list
input_file_names
of HDF5 files containing the same format into a single virtual dataset.- Parameters
input_file_names (List[str]) – List of HDF5 file names to concatenate.
output_name (str) – Name of output virtual HDF5 file.
fields (Optional[List[str]], default=None) – Which dataset fields to concatenate. Will concatenate all fields by default.
- deepdrivemd.data.utils.get_virtual_h5_file(output_path: pathlib.Path, all_h5_files: List[str], last_n: int = 0, k_random_old: int = 0, virtual_name: str = 'virtual', node_local_path: Optional[pathlib.Path] = None) Tuple[pathlib.Path, List[str]]
Create and return a virtual HDF5 file.
Create a virtual HDF5 file from the last_n files in all_h5_files and a random selection of k_random_old.
- Parameters
output_path (Path) – Directory to write virtual HDF5 file to.
all_h5_files (List[str]) – List of HDF5 files to select from.
last_n (int, optional) – Chooses the last n files in
all_h5_files
to concatenate into a virtual HDF5 file. Defaults to all the files.k_random_old (int, default=0) – Chooses k random files not in the
last_n
files to concatenate into the virtual HDF5 file. Defaults to choosing no random old files.virtual_name (str, default=”virtual”) – The name of the virtual HDF5 file to be written e.g.
virtual_name == virtual
implies the file will be written tooutput_path/virtual.h5
.node_local_path (Optional[Path], default=None) – An optional path to write the virtual file to that could be a node local storage. Will also copy all selected HDF5 files in
all_h5_files
to the same directory.
- Returns
Path – The path to the created virtual HDF5 file.
List[str] – The selected HDF5 files from
last_n
andk_random_old
used to make the virtual HDF5 file.
- Raises
ValueError – If
all_h5_files
is empty. If :obj:last_n is greater thanlen(all_h5_files)
.
- deepdrivemd.data.utils.parse_h5(path: Union[str, pathlib.Path], fields: List[str]) Dict[str, npt.ArrayLike]
Helper function for accessing data fields in a HDF5 file.
- Parameters
path (Union[Path, str]) – Path to HDF5 file.
fields (List[str]) – List of dataset field names inside of the HDF5 file.
- Returns
Dict[str, npt.ArrayLike] – A dictionary maping each field name in
fields
to a numpy array containing the data from the associated HDF5 dataset.