easygraph.datasets.wiki_topcats module#
Wikipedia Top Categories Dataset (wiki-topcats)
This dataset is a directed graph of Wikipedia articles restricted to top-level categories (at least 100 articles), capturing the largest strongly connected component.
Statistics: - Nodes: 1,791,489 - Edges: 28,511,807 - Categories: 17,364 - Overlapping labels per node
Source: H. Yin, A. Benson, J. Leskovec, D. Gleich. “Local Higher-order Graph Clustering”, KDD 2017 Data: https://snap.stanford.edu/data/wiki-topcats.html
- class easygraph.datasets.wiki_topcats.WikiTopCatsDataset(raw_dir=None, force_reload=False, verbose=True, transform=None)[source]#
Bases:
EasyGraphBuiltinDatasetWikipedia Top Categories Snapshot from 2011 (SNAP)