easygraph.datasets.github module#
GitHub Users Social Network Dataset (musae_git)
This dataset represents a directed social network of GitHub users collected in 2019. Nodes represent GitHub developers, and a directed edge from user A to user B indicates that A follows B.
Each node also includes: - Features: User profile and activity-based features. - Labels: Developer’s project area (e.g., machine learning, web dev, etc.)
Statistics: - Nodes: 37,700 - Edges: 289,003 - Feature dim: 5,575 - Classes: 2
Reference: J. Leskovec et al. “SNAP Datasets: Stanford Large Network Dataset Collection”, https://snap.stanford.edu/data/github-social.html
- class easygraph.datasets.github.GitHubUsersDataset(raw_dir=None, force_reload=False, verbose=True, transform=None)[source]#
Bases:
EasyGraphBuiltinDatasetGitHub developers social graph (musae_git).
- Parameters:
raw_dir (str, optional) – Directory to store raw data. Default: None
force_reload (bool, optional) – Force re-download and processing. Default: False
verbose (bool, optional) – Print processing information. Default: True
transform (callable, optional) – Transform to apply to the graph on load.
Examples
>>> from easygraph.datasets import GitHubUsersDataset >>> dataset = GitHubUsersDataset() >>> g = dataset[0] >>> print("Nodes:", g.number_of_nodes()) >>> print("Edges:", g.number_of_edges()) >>> print("Feature shape:", g.nodes[0]['feat'].shape) >>> print("Label:", g.nodes[0]['label'])
- property num_classes#