easygraph.datasets.coauthor module#

CoauthorCS Dataset

This dataset contains a co-authorship network of authors who submitted papers to CS category. Each node represents an author and edges represent co-authorships. Node features are bag-of-words representations of keywords in the author’s papers. The task is node classification, with labels indicating the primary field of study.

Statistics: - Nodes: 18333 - Edges: 81894 - Feature Dim: 6805 - Classes: 15

Source: dmlc/dgl

class easygraph.datasets.coauthor.CoauthorCSDataset(raw_dir=None, force_reload=False, verbose=True, transform=None)[source]#

Bases: EasyGraphBuiltinDataset

CoauthorCS citation network dataset.

Nodes are authors, and edges indicate co-authorship relationships. Each node has a bag-of-words feature vector and a label denoting the primary research field.

Parameters:
  • raw_dir (str, optional) – Directory to store the raw downloaded files. Default: None

  • force_reload (bool, optional) – Whether to re-download and process the dataset. Default: False

  • verbose (bool, optional) – Whether to print detailed processing logs. Default: True

  • transform (callable, optional) – Transform to apply to the graph on access.

Examples

>>> from easygraph.datasets import CoauthorCSDataset
>>> dataset = CoauthorCSDataset()
>>> g = dataset[0]
>>> print("Nodes:", g.number_of_nodes())
>>> print("Edges:", g.number_of_edges())
>>> print("Feature shape:", g.nodes[0]['feat'].shape)
>>> print("Label:", g.nodes[0]['label'])
>>> print("Number of classes:", dataset.num_classes)
property num_classes#
process()[source]#

Overwrite to realize your own logic of processing the input data.