easygraph.datasets.graph_dataset_base module#

Basic EasyGraph Dataset

class easygraph.datasets.graph_dataset_base.EasyGraphBuiltinDataset(name, url, raw_dir=None, hash_key=(), force_reload=False, verbose=True, transform=None, save_dir=None)[source]#

Bases: EasyGraphDataset

The Basic EasyGraph Builtin Dataset.

Parameters:
  • name (str) – Name of the dataset.

  • url (str) – Url to download the raw dataset.

  • raw_dir (str) – Specifying the directory that will store the downloaded data or the directory that already stores the input data. Default: ~/.dgl/

  • hash_key (tuple) – A tuple of values as the input for the hash function. Users can distinguish instances (and their caches on the disk) from the same dataset class by comparing the hash values.

  • force_reload (bool) – Whether to reload the dataset. Default: False

  • verbose (bool) – Whether to print out progress information. Default: False

  • transform (callable, optional) – A transform that takes in a DGLGraph object and returns a transformed version. The DGLGraph object will be transformed before every access.

Attributes:
hash

Hash value for the dataset and the setting.

name

Name of the dataset.

raw_dir

Raw file directory contains the input data folder.

raw_path

Directory contains the input data files.

save_dir

Directory to save the processed dataset.

save_path

Path to save the processed dataset.

url

Get url to download the raw dataset.

verbose

Whether to print information.

Methods

download()

Automatically download data and extract it.

has_cache()

Overwrite to realize your own logic of deciding whether there exists a cached dataset.

load()

Overwrite to realize your own logic of loading the saved dataset from files.

process()

Overwrite to realize your own logic of processing the input data.

save()

Overwrite to realize your own logic of saving the processed dataset into files.

download()[source]#

Automatically download data and extract it.

class easygraph.datasets.graph_dataset_base.EasyGraphDataset(name, url=None, raw_dir=None, save_dir=None, hash_key=(), force_reload=False, verbose=False, transform=None)[source]#

Bases: object

The basic EasyGraph dataset for creating graph datasets. This class defines a basic template class for EasyGraph Dataset. The following steps will be executed automatically:

  1. Check whether there is a dataset cache on disk (already processed and stored on the disk) by invoking has_cache(). If true, goto 5.

  2. Call download() to download the data if url is not None.

  3. Call process() to process the data.

  4. Call save() to save the processed dataset on disk and goto 6.

  5. Call load() to load the processed dataset from disk.

  6. Done.

Users can overwrite these functions with their own data processing logic.

Parameters:
  • name (str) – Name of the dataset

  • url (str) – Url to download the raw dataset. Default: None

  • raw_dir (str) – Specifying the directory that will store the downloaded data or the directory that already stores the input data. Default: ~/.EasyGraphData/

  • save_dir (str) – Directory to save the processed dataset. Default: same as raw_dir

  • hash_key (tuple) – A tuple of values as the input for the hash function. Users can distinguish instances (and their caches on the disk) from the same dataset class by comparing the hash values. Default: (), the corresponding hash value is 'f9065fa7'.

  • force_reload (bool) – Whether to reload the dataset. Default: False

  • verbose (bool) – Whether to print out progress information

  • transform (callable, optional) – A transform that takes in a DGLGraph object and returns a transformed version. The DGLGraph object will be transformed before every access.

Attributes:
hash

Hash value for the dataset and the setting.

name

Name of the dataset.

raw_dir

Raw file directory contains the input data folder.

raw_path

Directory contains the input data files.

save_dir

Directory to save the processed dataset.

save_path

Path to save the processed dataset.

url

Get url to download the raw dataset.

verbose

Whether to print information.

Methods

download()

Overwrite to realize your own logic of downloading data.

has_cache()

Overwrite to realize your own logic of deciding whether there exists a cached dataset.

load()

Overwrite to realize your own logic of loading the saved dataset from files.

process()

Overwrite to realize your own logic of processing the input data.

save()

Overwrite to realize your own logic of saving the processed dataset into files.

download()[source]#

Overwrite to realize your own logic of downloading data.

It is recommended to download the to the self.raw_dir folder. Can be ignored if the dataset is already in self.raw_dir.

has_cache()[source]#

Overwrite to realize your own logic of deciding whether there exists a cached dataset.

By default False.

property hash#

Hash value for the dataset and the setting.

load()[source]#

Overwrite to realize your own logic of loading the saved dataset from files.

It is recommended to use dgl.data.utils.load_graphs to load dgl graph from files and use dgl.data.utils.load_info to load extra information into python dict object.

property name#

Name of the dataset.

abstract process()[source]#

Overwrite to realize your own logic of processing the input data.

property raw_dir#

Raw file directory contains the input data folder.

property raw_path#
save()[source]#

Overwrite to realize your own logic of saving the processed dataset into files.

It is recommended to use dgl.data.utils.save_graphs to save dgl graph into files and use dgl.data.utils.save_info to save extra information into files.

property save_dir#

Directory to save the processed dataset.

property save_path#

Path to save the processed dataset.

property url#

Get url to download the raw dataset.

property verbose#

Whether to print information.