easygraph.datasets.arxiv module#
Arxiv HEP-TH Citation Network
This dataset represents the citation network of preprints from the High Energy Physics - Theory (HEP-TH) category on arXiv, covering the period from January 1993 to April 2003.
Each node corresponds to a paper, and a directed edge from paper A to paper B indicates that A cites B.
No features or labels are included in this dataset.
Statistics: - Nodes: 27,770 - Edges: 352,807 - Features: None - Labels: None
Reference: J. Leskovec, J. Kleinberg and C. Faloutsos, “Graphs over Time: Densification Laws, Shrinking Diameters and Possible Explanations,” in KDD 2005. Dataset: https://snap.stanford.edu/data/cit-HepTh.html
- class easygraph.datasets.arxiv.ArxivHEPTHDataset(raw_dir=None, force_reload=False, verbose=True, transform=None)[source]#
Bases:
EasyGraphBuiltinDatasetArxiv HEP-TH citation network dataset.
- Parameters:
raw_dir (str, optional) – Directory to store the raw downloaded files. Default: None
force_reload (bool, optional) – Whether to re-download and process the dataset. Default: False
verbose (bool, optional) – Whether to print detailed processing logs. Default: True
transform (callable, optional) – Optional transform to apply on the graph.
Examples
>>> from easygraph.datasets import ArxivHEPTHDataset >>> dataset = ArxivHEPTHDataset() >>> g = dataset[0] >>> print("Nodes:", g.number_of_nodes()) >>> print("Edges:", g.number_of_edges())