deepdrivemd.data.api

Functions

glob_file_from_dirs(dirs, pattern)

Return a list of all items matching pattern in multiple dirs.

Classes

DeepDriveMD_API(experiment_directory)

Stage_API(experiment_dir, stage_dir_name)

class deepdrivemd.data.api.DeepDriveMD_API(experiment_directory: Union[str, pathlib.Path])
AGENT_DIR = 'agent_runs'
AGGREGATE_DIR = 'aggregation_runs'
MACHINE_LEARNING_DIR = 'machine_learning_runs'
MODEL_SELECTION_DIR = 'model_selection_runs'
MOLECULAR_DYNAMICS_DIR = 'molecular_dynamics_runs'
static get_initial_pdbs(initial_pdb_dir: Union[str, pathlib.Path]) List[pathlib.Path]

Return a list of PDB paths from the initial_pdb_dir.

Parameters

initial_pdb_dir (Union[str, Path]) – Initial data directory passed containing PDBs and optional topologies.

Returns

List[Path] – List of paths to initial PDB files.

Raises

ValueError – If any of the PDB file names contain a double underscore __.

get_last_n_md_runs(n: Optional[int] = None, data_file_suffix: str = '.h5', traj_file_suffix: str = '.dcd', structure_file_suffix: str = '.pdb') Dict[str, List[str]]

Get the last n MD run directories data file paths.

Return a dictionary of data file paths for the last n MD runs including the training data files, the trajectory files, and the coordinate files.

Parameters
  • n (int, optional) – Number of latest MD run directories to glob data files from. Defaults to all MD run directories.

  • data_file_suffix (int, optional) – The suffix of the training data file. Defaults to “.h5”.

  • traj_file_suffix (str, optional) – The suffix of the traj file. Defaults to “.dcd”.

  • structure_file_suffix (str, optional) – The suffix of the structure file. Defaults to “.pdb”.

Returns

Dict[str, List[str]] – A dictionary with keys “data_files”, “traj_files” and “structure_files” each containing a list of n paths globed from the the latest n MD run directories.

get_restart_pdb(index: int, stage_idx: int = - 1, task_idx: int = 0) Dict[str, Any]

Gets a single datum for the restart points JSON file.

Parameters

index (int) – Index into the agent_{}.json file of the latest DeepDriveMD iteration.

Returns

Dict[Any] – Dictionary entry written by the outlier detector.

static get_system_name(pdb_file: Union[str, pathlib.Path]) str

Parse the system name from a PDB file.

Parameters

pdb_file (Union[str, Path]) – The PDB file to parse. Can be absolute path, relative path, or filename.

Returns

str – The system name used to identify system topology.

Examples

>>> pdb_file = "/path/to/system_name__anything.pdb"
>>> DeepDriveMD_API.get_system_name(pdb_file)
'system_name'
>>> pdb_file = "/path/to/system_name/anything.pdb"
>>> DeepDriveMD_API.get_system_name(pdb_file)
'system_name'
static get_system_pdb_name(pdb_file: Union[str, pathlib.Path]) str

Generate PDB file name with correct system name.

Parse pdb_file for the system name and generate a PDB file name that is parseable by DeepDriveMD. If pdb_file name is already compatible with DeepDriveMD, the returned name will be the same.

Parameters

pdb_file (Union[str, Path]) – The PDB file to parse. Can be absolute path, relative path, or filename.

Returns

str – The new PDB file name. File is not created.

Raises

ValueError – If pdb_file contains more than one __.

Examples

>>> pdb_file = "/path/to/system_name__anything.pdb"
>>> DeepDriveMD_API.get_system_pdb_name(pdb_file)
'system_name__anything.pdb'
>>> pdb_file = "/path/to/system_name/anything.pdb"
>>> DeepDriveMD_API.get_system_pdb_name(pdb_file)
'system_name__anything.pdb'
static get_topology(initial_pdb_dir: Union[str, pathlib.Path], pdb_file: Union[str, pathlib.Path], suffix: str = '.top') Optional[pathlib.Path]

Get the topology file for the system.

Parse pdb_file for the system name and then retrieve the topology file from the correct subdirectory, given by the system name, in the initial_pdb_dir directory or return None if the system doesn’t have a topology.

Parameters
  • initial_pdb_dir (Union[str, Path]) – Initial data directory passed containing system subdirectories with PDBs and optional topologies.

  • pdb_file (Union[str, Path]) – The PDB file to parse. Can be absolute path, relative path, or filename.

  • suffix (str) – Suffix of the topology file (.top, .prmtop, etc).

Returns

Optional[Path] – The path to the topology file, or None if system has no topology.

get_total_iterations() int
static write_pdb(output_pdb_file: Union[str, pathlib.Path], input_pdb_file: Union[str, pathlib.Path], traj_file: Union[str, pathlib.Path], frame: int, in_memory: bool = False) None

Write a PDB file.

Writes output_pdb_file to disk containing coordindates of a single frame from a given input PDB input_pdb_file and trajectory file traj_file.

Parameters
  • output_pdb_file (Union[str, Path]) – The path of the output PDB file to be written to.

  • input_pdb_file (Union[str, Path]) – The path of the input PDB file used to open traj_file in MDAnalysis.Universe().

  • traj_file (Union[str, Path]) – The path of the trajectory file to be read from.

  • frame (int) – The frame index into traj_file used to write output_pdb_file.

  • in_memory (bool, optional) – If true, will load the MDAnalysis.Universe() trajectory into memory.

Examples

>>> output_pdb_file = "/path/to/output.pdb"
>>> input_pdb_file = "/path/to/input.pdb"
>>> traj_file = "/path/to/traj.dcd"
>>> frame = 10
>>> DeepDriveMD_API.write_pdb(output_pdb_file, input_pdb_file, traj_file, frame)
class deepdrivemd.data.api.Stage_API(experiment_dir: pathlib.Path, stage_dir_name: str)
config_path(stage_idx: int = - 1, task_idx: int = 0) Optional[pathlib.Path]
static get_count(path: pathlib.Path, pattern: str, is_dir: bool = False) int
static get_latest(path: pathlib.Path, pattern: str, is_dir: bool = False, key: typing.Callable[[pathlib.Path], pathlib.Path] = <function Stage_API.<lambda>>) Optional[pathlib.Path]
json_path(stage_idx: int = - 1, task_idx: int = 0) Optional[pathlib.Path]
read_task_json(stage_idx: int = - 1, task_idx: int = 0) Optional[List[Dict[str, Any]]]
property runs_dir: pathlib.Path
stage_dir(stage_idx: int = - 1) Optional[pathlib.Path]

Return the stage directory containing task subdirectories.

Each stage type has a directory containing subdirectories stageXXXX. In each stageXXXX there are several task directories labeled taskXXXX. This function returns a particular stageXXXX directory selected with stage_idx. Each iteration of DeepDriveMD corresponds to a stageXXXX directory, they are labeled in increasing order.

stage_dir_count() int

Return the number of stage directories.

static stage_name(stage_idx: int) str
task_dir(stage_idx: int = - 1, task_idx: int = 0, mkdir: bool = False) Optional[pathlib.Path]
static task_name(task_idx: int) str
static unique_name(task_path: pathlib.Path) str
write_task_json(data: List[Dict[str, Any]], stage_idx: int = - 1, task_idx: int = 0) None

Dump data to a new JSON file for the agent.

Dump data to a JSON file written to the directory specified by stage_idx and task_idx.

Parameters

data (List[Dict[str, Any]]) – List of dictionarys to pass to json.dump(). Values in the dictionarys must be JSON serializable.

deepdrivemd.data.api.glob_file_from_dirs(dirs: List[str], pattern: str) List[str]

Return a list of all items matching pattern in multiple dirs.