easygraph.datasets.ppi module#

PPIDataset for inductive learning.

class easygraph.datasets.ppi.LegacyPPIDataset(mode='train', raw_dir=None, force_reload=False, verbose=False, transform=None)[source]#

Bases: PPIDataset

Legacy version of PPI Dataset

Attributes:
hash

Hash value for the dataset and the setting.

name

Name of the dataset.

num_labels
raw_dir

Raw file directory contains the input data folder.

raw_path

Directory contains the input data files.

save_dir

Directory to save the processed dataset.

save_path

Path to save the processed dataset.

url

Get url to download the raw dataset.

verbose

Whether to print information.

Methods

download()

Automatically download data and extract it.

has_cache()

Overwrite to realize your own logic of deciding whether there exists a cached dataset.

load()

Overwrite to realize your own logic of loading the saved dataset from files.

process()

Overwrite to realize your own logic of processing the input data.

save()

Overwrite to realize your own logic of saving the processed dataset into files.

class easygraph.datasets.ppi.PPIDataset(mode='train', raw_dir=None, force_reload=False, verbose=False, transform=None)[source]#

Bases: EasyGraphBuiltinDataset

Protein-Protein Interaction dataset for inductive node classification

A toy Protein-Protein Interaction network dataset. The dataset contains 24 graphs. The average number of nodes per graph is 2372. Each node has 50 features and 121 labels. 20 graphs for training, 2 for validation and 2 for testing.

Reference: http://snap.stanford.edu/graphsage/

Statistics:

  • Train examples: 20

  • Valid examples: 2

  • Test examples: 2

Parameters:
  • mode (str) – Must be one of (‘train’, ‘valid’, ‘test’). Default: ‘train’

  • raw_dir (str) – Raw file directory to download/contains the input data directory. Default: ~/.eg/

  • force_reload (bool) – Whether to reload the dataset. Default: False

  • verbose (bool) – Whether to print out progress information. Default: True.

  • transform (callable, optional) – A transform that takes in a DGLGraph object and returns a transformed version. The DGLGraph object will be transformed before every access.

num_labels#

Number of labels for each node

Type:

int

labels#

Node labels

Type:

Tensor

features#

Node features

Type:

Tensor

Examples

>>> dataset = PPIDataset(mode='valid')
>>> num_labels = dataset.num_labels
>>> for g in dataset:
....    feat = g.ndata['feat']
....    label = g.ndata['label']
....    # your code here
>>>
Attributes:
hash

Hash value for the dataset and the setting.

name

Name of the dataset.

num_labels
raw_dir

Raw file directory contains the input data folder.

raw_path

Directory contains the input data files.

save_dir

Directory to save the processed dataset.

save_path

Path to save the processed dataset.

url

Get url to download the raw dataset.

verbose

Whether to print information.

Methods

download()

Automatically download data and extract it.

has_cache()

Overwrite to realize your own logic of deciding whether there exists a cached dataset.

load()

Overwrite to realize your own logic of loading the saved dataset from files.

process()

Overwrite to realize your own logic of processing the input data.

save()

Overwrite to realize your own logic of saving the processed dataset into files.

has_cache()[source]#

Overwrite to realize your own logic of deciding whether there exists a cached dataset.

By default False.

property num_labels#
process()[source]#

Overwrite to realize your own logic of processing the input data.

save()[source]#

Overwrite to realize your own logic of saving the processed dataset into files.

It is recommended to use dgl.data.utils.save_graphs to save dgl graph into files and use dgl.data.utils.save_info to save extra information into files.