easygraph.datasets.github module#

GitHub Users Social Network Dataset (musae_git)

This dataset represents a directed social network of GitHub users collected in 2019. Nodes represent GitHub developers, and a directed edge from user A to user B indicates that A follows B.

Each node also includes: - Features: User profile and activity-based features. - Labels: Developer’s project area (e.g., machine learning, web dev, etc.)

Statistics: - Nodes: 37,700 - Edges: 289,003 - Feature dim: 5,575 - Classes: 2

Reference: J. Leskovec et al. “SNAP Datasets: Stanford Large Network Dataset Collection”, https://snap.stanford.edu/data/github-social.html

class easygraph.datasets.github.GitHubUsersDataset(raw_dir=None, force_reload=False, verbose=True, transform=None)[source]#

Bases: EasyGraphBuiltinDataset

GitHub developers social graph (musae_git).

Parameters:
  • raw_dir (str, optional) – Directory to store raw data. Default: None

  • force_reload (bool, optional) – Force re-download and processing. Default: False

  • verbose (bool, optional) – Print processing information. Default: True

  • transform (callable, optional) – Transform to apply to the graph on load.

Examples

>>> from easygraph.datasets import GitHubUsersDataset
>>> dataset = GitHubUsersDataset()
>>> g = dataset[0]
>>> print("Nodes:", g.number_of_nodes())
>>> print("Edges:", g.number_of_edges())
>>> print("Feature shape:", g.nodes[0]['feat'].shape)
>>> print("Label:", g.nodes[0]['label'])
download()[source]#

Automatically download data and extract it.

property num_classes#
process()[source]#

Overwrite to realize your own logic of processing the input data.