easygraph.datasets.ppi module#
PPIDataset for inductive learning.
- class easygraph.datasets.ppi.LegacyPPIDataset(mode='train', raw_dir=None, force_reload=False, verbose=False, transform=None)[source]#
Bases:
PPIDataset
Legacy version of PPI Dataset
- Attributes:
hash
Hash value for the dataset and the setting.
name
Name of the dataset.
- num_labels
raw_dir
Raw file directory contains the input data folder.
raw_path
Directory contains the input data files.
save_dir
Directory to save the processed dataset.
save_path
Path to save the processed dataset.
url
Get url to download the raw dataset.
verbose
Whether to print information.
Methods
download
()Automatically download data and extract it.
has_cache
()Overwrite to realize your own logic of deciding whether there exists a cached dataset.
load
()Overwrite to realize your own logic of loading the saved dataset from files.
process
()Overwrite to realize your own logic of processing the input data.
save
()Overwrite to realize your own logic of saving the processed dataset into files.
- class easygraph.datasets.ppi.PPIDataset(mode='train', raw_dir=None, force_reload=False, verbose=False, transform=None)[source]#
Bases:
EasyGraphBuiltinDataset
Protein-Protein Interaction dataset for inductive node classification
A toy Protein-Protein Interaction network dataset. The dataset contains 24 graphs. The average number of nodes per graph is 2372. Each node has 50 features and 121 labels. 20 graphs for training, 2 for validation and 2 for testing.
Reference: http://snap.stanford.edu/graphsage/
Statistics:
Train examples: 20
Valid examples: 2
Test examples: 2
- Parameters:
mode (str) – Must be one of (‘train’, ‘valid’, ‘test’). Default: ‘train’
raw_dir (str) – Raw file directory to download/contains the input data directory. Default: ~/.eg/
force_reload (bool) – Whether to reload the dataset. Default: False
verbose (bool) – Whether to print out progress information. Default: True.
transform (callable, optional) – A transform that takes in a
DGLGraph
object and returns a transformed version. TheDGLGraph
object will be transformed before every access.
- num_labels#
Number of labels for each node
- Type:
int
- labels#
Node labels
- Type:
Tensor
- features#
Node features
- Type:
Tensor
Examples
>>> dataset = PPIDataset(mode='valid') >>> num_labels = dataset.num_labels >>> for g in dataset: .... feat = g.ndata['feat'] .... label = g.ndata['label'] .... # your code here >>>
- Attributes:
hash
Hash value for the dataset and the setting.
name
Name of the dataset.
- num_labels
raw_dir
Raw file directory contains the input data folder.
raw_path
Directory contains the input data files.
save_dir
Directory to save the processed dataset.
save_path
Path to save the processed dataset.
url
Get url to download the raw dataset.
verbose
Whether to print information.
Methods
download
()Automatically download data and extract it.
Overwrite to realize your own logic of deciding whether there exists a cached dataset.
load
()Overwrite to realize your own logic of loading the saved dataset from files.
process
()Overwrite to realize your own logic of processing the input data.
save
()Overwrite to realize your own logic of saving the processed dataset into files.
- has_cache()[source]#
Overwrite to realize your own logic of deciding whether there exists a cached dataset.
By default False.
- property num_labels#