easygraph.datasets.graph_dataset_base module#
Basic EasyGraph Dataset
- class easygraph.datasets.graph_dataset_base.EasyGraphBuiltinDataset(name, url, raw_dir=None, hash_key=(), force_reload=False, verbose=True, transform=None, save_dir=None)[source]#
Bases:
EasyGraphDataset
The Basic EasyGraph Builtin Dataset.
- Parameters:
name (str) – Name of the dataset.
url (str) – Url to download the raw dataset.
raw_dir (str) – Specifying the directory that will store the downloaded data or the directory that already stores the input data. Default: ~/.dgl/
hash_key (tuple) – A tuple of values as the input for the hash function. Users can distinguish instances (and their caches on the disk) from the same dataset class by comparing the hash values.
force_reload (bool) – Whether to reload the dataset. Default: False
verbose (bool) – Whether to print out progress information. Default: False
transform (callable, optional) – A transform that takes in a
DGLGraph
object and returns a transformed version. TheDGLGraph
object will be transformed before every access.
- Attributes:
hash
Hash value for the dataset and the setting.
name
Name of the dataset.
raw_dir
Raw file directory contains the input data folder.
raw_path
Directory contains the input data files.
save_dir
Directory to save the processed dataset.
save_path
Path to save the processed dataset.
url
Get url to download the raw dataset.
verbose
Whether to print information.
Methods
download
()Automatically download data and extract it.
has_cache
()Overwrite to realize your own logic of deciding whether there exists a cached dataset.
load
()Overwrite to realize your own logic of loading the saved dataset from files.
process
()Overwrite to realize your own logic of processing the input data.
save
()Overwrite to realize your own logic of saving the processed dataset into files.
- class easygraph.datasets.graph_dataset_base.EasyGraphDataset(name, url=None, raw_dir=None, save_dir=None, hash_key=(), force_reload=False, verbose=False, transform=None)[source]#
Bases:
object
The basic EasyGraph dataset for creating graph datasets. This class defines a basic template class for EasyGraph Dataset. The following steps will be executed automatically:
Check whether there is a dataset cache on disk (already processed and stored on the disk) by invoking
has_cache()
. If true, goto 5.Call
download()
to download the data ifurl
is not None.Call
process()
to process the data.Call
save()
to save the processed dataset on disk and goto 6.Call
load()
to load the processed dataset from disk.Done.
Users can overwrite these functions with their own data processing logic.
- Parameters:
name (str) – Name of the dataset
url (str) – Url to download the raw dataset. Default: None
raw_dir (str) – Specifying the directory that will store the downloaded data or the directory that already stores the input data. Default: ~/.EasyGraphData/
save_dir (str) – Directory to save the processed dataset. Default: same as raw_dir
hash_key (tuple) – A tuple of values as the input for the hash function. Users can distinguish instances (and their caches on the disk) from the same dataset class by comparing the hash values. Default: (), the corresponding hash value is
'f9065fa7'
.force_reload (bool) – Whether to reload the dataset. Default: False
verbose (bool) – Whether to print out progress information
transform (callable, optional) – A transform that takes in a
DGLGraph
object and returns a transformed version. TheDGLGraph
object will be transformed before every access.
- Attributes:
hash
Hash value for the dataset and the setting.
name
Name of the dataset.
raw_dir
Raw file directory contains the input data folder.
raw_path
Directory contains the input data files.
save_dir
Directory to save the processed dataset.
save_path
Path to save the processed dataset.
url
Get url to download the raw dataset.
verbose
Whether to print information.
Methods
download
()Overwrite to realize your own logic of downloading data.
Overwrite to realize your own logic of deciding whether there exists a cached dataset.
load
()Overwrite to realize your own logic of loading the saved dataset from files.
process
()Overwrite to realize your own logic of processing the input data.
save
()Overwrite to realize your own logic of saving the processed dataset into files.
- download()[source]#
Overwrite to realize your own logic of downloading data.
It is recommended to download the to the
self.raw_dir
folder. Can be ignored if the dataset is already inself.raw_dir
.
- has_cache()[source]#
Overwrite to realize your own logic of deciding whether there exists a cached dataset.
By default False.
- property hash#
Hash value for the dataset and the setting.
- load()[source]#
Overwrite to realize your own logic of loading the saved dataset from files.
It is recommended to use
dgl.data.utils.load_graphs
to load dgl graph from files and usedgl.data.utils.load_info
to load extra information into python dict object.
- property name#
Name of the dataset.
- property raw_dir#
Raw file directory contains the input data folder.
- property raw_path#
- save()[source]#
Overwrite to realize your own logic of saving the processed dataset into files.
It is recommended to use
dgl.data.utils.save_graphs
to save dgl graph into files and usedgl.data.utils.save_info
to save extra information into files.
- property save_dir#
Directory to save the processed dataset.
- property save_path#
Path to save the processed dataset.
- property url#
Get url to download the raw dataset.
- property verbose#
Whether to print information.