easygraph.datasets.ppi module#

PPIDataset for inductive learning.

class easygraph.datasets.ppi.LegacyPPIDataset(mode='train', raw_dir=None, force_reload=False, verbose=False, transform=None)[source]#

Bases: PPIDataset

Legacy version of PPI Dataset

class easygraph.datasets.ppi.PPIDataset(mode='train', raw_dir=None, force_reload=False, verbose=False, transform=None)[source]#

Bases: EasyGraphBuiltinDataset

Protein-Protein Interaction dataset for inductive node classification

A toy Protein-Protein Interaction network dataset. The dataset contains 24 graphs. The average number of nodes per graph is 2372. Each node has 50 features and 121 labels. 20 graphs for training, 2 for validation and 2 for testing.

Reference: http://snap.stanford.edu/graphsage/

Statistics:

  • Train examples: 20

  • Valid examples: 2

  • Test examples: 2

Parameters:
  • mode (str) – Must be one of (‘train’, ‘valid’, ‘test’). Default: ‘train’

  • raw_dir (str) – Raw file directory to download/contains the input data directory. Default: ~/.eg/

  • force_reload (bool) – Whether to reload the dataset. Default: False

  • verbose (bool) – Whether to print out progress information. Default: True.

  • transform (callable, optional) – A transform that takes in a DGLGraph object and returns a transformed version. The DGLGraph object will be transformed before every access.

num_labels#

Number of labels for each node

Type:

int

labels#

Node labels

Type:

Tensor

features#

Node features

Type:

Tensor

Examples

>>> dataset = PPIDataset(mode='valid')
>>> num_labels = dataset.num_labels
>>> for g in dataset:
....    feat = g.ndata['feat']
....    label = g.ndata['label']
....    # your code here
>>>
has_cache()[source]#

Overwrite to realize your own logic of deciding whether there exists a cached dataset.

By default False.

property num_labels#
process()[source]#

Overwrite to realize your own logic of processing the input data.

save()[source]#

Overwrite to realize your own logic of saving the processed dataset into files.

It is recommended to use dgl.data.utils.save_graphs to save dgl graph into files and use dgl.data.utils.save_info to save extra information into files.