easygraph.datasets.gnn_benchmark module#

class easygraph.datasets.gnn_benchmark.AmazonCoBuyComputerDataset(raw_dir=None, force_reload=False, verbose=True, transform=None)[source]#

Bases: GNNBenchmarkDataset

‘Computer’ part of the AmazonCoBuy dataset for node classification task.

Amazon Computers and Amazon Photo are segments of the Amazon co-purchase graph [McAuley et al., 2015], where nodes represent goods, edges indicate that two goods are frequently bought together, node features are bag-of-words encoded product reviews, and class labels are given by the product category.

Reference: shchur/gnn-benchmark

Statistics:

  • Nodes: 13,752

  • Edges: 491,722 (note that the original dataset has 245,778 edges but DGL adds the reverse edges and remove the duplicates, hence with a different number)

  • Number of classes: 10

  • Node feature size: 767

Parameters:
  • raw_dir (str) – Raw file directory to download/contains the input data directory. Default: ~/.dgl/

  • force_reload (bool) – Whether to reload the dataset. Default: False

  • verbose (bool) – Whether to print out progress information. Default: True.

  • transform (callable, optional) – A transform that takes in a DGLGraph object and returns a transformed version. The DGLGraph object will be transformed before every access.

num_classes#

Number of classes for each node.

Type:

int

Examples

>>> data = AmazonCoBuyComputerDataset()
>>> g = data[0]
>>> num_class = data.num_classes
>>> feat = g.ndata['feat']  # get node feature
>>> label = g.ndata['label']  # get node labels
Attributes:
hash

Hash value for the dataset and the setting.

name

Name of the dataset.

num_classes

Number of classes.

raw_dir

Raw file directory contains the input data folder.

raw_path

Directory contains the input data files.

save_dir

Directory to save the processed dataset.

save_path

Path to save the processed dataset.

url

Get url to download the raw dataset.

verbose

Whether to print information.

Methods

download()

Automatically download data and extract it.

has_cache()

Overwrite to realize your own logic of deciding whether there exists a cached dataset.

load()

Overwrite to realize your own logic of loading the saved dataset from files.

process()

Overwrite to realize your own logic of processing the input data.

save()

Overwrite to realize your own logic of saving the processed dataset into files.

property num_classes#

Number of classes.

Return type:

int