easygraph.datasets.wiki_topcats module#

Wikipedia Top Categories Dataset (wiki-topcats)

This dataset is a directed graph of Wikipedia articles restricted to top-level categories (at least 100 articles), capturing the largest strongly connected component.

Statistics: - Nodes: 1,791,489 - Edges: 28,511,807 - Categories: 17,364 - Overlapping labels per node

Source: H. Yin, A. Benson, J. Leskovec, D. Gleich. “Local Higher-order Graph Clustering”, KDD 2017 Data: https://snap.stanford.edu/data/wiki-topcats.html

class easygraph.datasets.wiki_topcats.WikiTopCatsDataset(raw_dir=None, force_reload=False, verbose=True, transform=None)[source]#

Bases: EasyGraphBuiltinDataset

Wikipedia Top Categories Snapshot from 2011 (SNAP)

download()[source]#

Automatically download data and extract it.

process()[source]#

Overwrite to realize your own logic of processing the input data.