easygraph.datasets.web_google module#

Web-Google Dataset

This dataset is a web graph based on Google’s web pages and their hyperlink structure, as crawled by the Stanford WebBase project in 2002.

Each node represents a web page, and a directed edge from u to v indicates a hyperlink from page u to page v.

Statistics: - Nodes: 875713 - Edges: 5105039 - Features: None - Labels: None

Reference: J. Leskovec, A. Rajaraman, J. Ullman, “Mining of Massive Datasets.” Dataset from SNAP: https://snap.stanford.edu/data/web-Google.html

class easygraph.datasets.web_google.WebGoogleDataset(raw_dir=None, force_reload=False, verbose=True, transform=None)[source]#

Bases: EasyGraphBuiltinDataset

Web-Google hyperlink network dataset.

Parameters:
  • raw_dir (str, optional) – Directory to store the raw downloaded files. Default: None

  • force_reload (bool, optional) – Whether to re-download and process the dataset. Default: False

  • verbose (bool, optional) – Whether to print detailed processing logs. Default: True

  • transform (callable, optional) – Optional transform to apply on the graph.

Examples

>>> from easygraph.datasets import WebGoogleDataset
>>> dataset = WebGoogleDataset()
>>> g = dataset[0]
>>> print("Nodes:", g.number_of_nodes())
>>> print("Edges:", g.number_of_edges())
download()[source]#

Download and decompress the .txt.gz file.

process()[source]#

Overwrite to realize your own logic of processing the input data.