deepdrivemd.data.utils

Data utility functions for handling HDF5 files.

Functions

concatenate_virtual_h5(input_file_names, ...)

Concatenate HDF5 files into a virtual HDF5 file.

get_virtual_h5_file(output_path, all_h5_files)

Create and return a virtual HDF5 file.

parse_h5(path, fields)

Helper function for accessing data fields in a HDF5 file.

deepdrivemd.data.utils.concatenate_virtual_h5(input_file_names: List[str], output_name: str, fields: Optional[List[str]] = None) None

Concatenate HDF5 files into a virtual HDF5 file.

Concatenates a list input_file_names of HDF5 files containing the same format into a single virtual dataset.

Parameters
  • input_file_names (List[str]) – List of HDF5 file names to concatenate.

  • output_name (str) – Name of output virtual HDF5 file.

  • fields (Optional[List[str]], default=None) – Which dataset fields to concatenate. Will concatenate all fields by default.

deepdrivemd.data.utils.get_virtual_h5_file(output_path: pathlib.Path, all_h5_files: List[str], last_n: int = 0, k_random_old: int = 0, virtual_name: str = 'virtual', node_local_path: Optional[pathlib.Path] = None) Tuple[pathlib.Path, List[str]]

Create and return a virtual HDF5 file.

Create a virtual HDF5 file from the last_n files in all_h5_files and a random selection of k_random_old.

Parameters
  • output_path (Path) – Directory to write virtual HDF5 file to.

  • all_h5_files (List[str]) – List of HDF5 files to select from.

  • last_n (int, optional) – Chooses the last n files in all_h5_files to concatenate into a virtual HDF5 file. Defaults to all the files.

  • k_random_old (int, default=0) – Chooses k random files not in the last_n files to concatenate into the virtual HDF5 file. Defaults to choosing no random old files.

  • virtual_name (str, default=”virtual”) – The name of the virtual HDF5 file to be written e.g. virtual_name == virtual implies the file will be written to output_path/virtual.h5.

  • node_local_path (Optional[Path], default=None) – An optional path to write the virtual file to that could be a node local storage. Will also copy all selected HDF5 files in all_h5_files to the same directory.

Returns

  • Path – The path to the created virtual HDF5 file.

  • List[str] – The selected HDF5 files from last_n and k_random_old used to make the virtual HDF5 file.

Raises

ValueError – If all_h5_files is empty. If :obj:last_n is greater than len(all_h5_files).

deepdrivemd.data.utils.parse_h5(path: Union[str, pathlib.Path], fields: List[str]) Dict[str, npt.ArrayLike]

Helper function for accessing data fields in a HDF5 file.

Parameters
  • path (Union[Path, str]) – Path to HDF5 file.

  • fields (List[str]) – List of dataset field names inside of the HDF5 file.

Returns

Dict[str, npt.ArrayLike] – A dictionary maping each field name in fields to a numpy array containing the data from the associated HDF5 dataset.