easygraph.datasets.arxiv module#

Arxiv HEP-TH Citation Network

This dataset represents the citation network of preprints from the High Energy Physics - Theory (HEP-TH) category on arXiv, covering the period from January 1993 to April 2003.

Each node corresponds to a paper, and a directed edge from paper A to paper B indicates that A cites B.

No features or labels are included in this dataset.

Statistics: - Nodes: 27,770 - Edges: 352,807 - Features: None - Labels: None

Reference: J. Leskovec, J. Kleinberg and C. Faloutsos, “Graphs over Time: Densification Laws, Shrinking Diameters and Possible Explanations,” in KDD 2005. Dataset: https://snap.stanford.edu/data/cit-HepTh.html

class easygraph.datasets.arxiv.ArxivHEPTHDataset(raw_dir=None, force_reload=False, verbose=True, transform=None)[source]#

Bases: EasyGraphBuiltinDataset

Arxiv HEP-TH citation network dataset.

Parameters:
  • raw_dir (str, optional) – Directory to store the raw downloaded files. Default: None

  • force_reload (bool, optional) – Whether to re-download and process the dataset. Default: False

  • verbose (bool, optional) – Whether to print detailed processing logs. Default: True

  • transform (callable, optional) – Optional transform to apply on the graph.

Examples

>>> from easygraph.datasets import ArxivHEPTHDataset
>>> dataset = ArxivHEPTHDataset()
>>> g = dataset[0]
>>> print("Nodes:", g.number_of_nodes())
>>> print("Edges:", g.number_of_edges())
download()[source]#

Download and decompress the .txt.gz file.

process()[source]#

Overwrite to realize your own logic of processing the input data.