easygraph.datasets.reddit module#

class easygraph.datasets.reddit.RedditDataset(self_loop=False, raw_dir=None, force_reload=False, verbose=True, transform=None)[source]#

Bases: EasyGraphBuiltinDataset

Reddit posts graph (Sept 2014) for community (subreddit) classification.

Statistics: - Nodes: ~232,965 - Edges: ~114 million (approx.) - Features per node: 602 - Classes: number of subreddit communities

Data are split by post-day: first 20 days train, then validation (30%), test (rest).

Parameters:
  • self_loop (bool) – Add self-loop edges if True.

  • raw_dir (same as EasyGraphBuiltinDataset)

  • force_reload (same as EasyGraphBuiltinDataset)

  • verbose (same as EasyGraphBuiltinDataset)

  • transform (same as EasyGraphBuiltinDataset)

property num_classes#
process()[source]#

Overwrite to realize your own logic of processing the input data.