EasyHypergraph ======== Overview +++++++++++++ We present a comprehensive, computation-effective, and storage-saving hypergraph library designed not only for in-depth hypergraph analysis but also for the growing field of hypergraph learning. The library, called EasyHypergraph, allows users from multiple disciplines to input hypergraph data, efficiently perform hypergraph analysis or hypergraph learning tasks on large-scale datasets by utilizing computational workflow, and gain insights from higher-order structures. EasyHypergraph’s broad capabilities and computational efficiency empower researchers and practitioners from various fields to effectively harness the potential of higher-order networks, sparking innovative solutions for practical problems. Its architecture is shown as follows: .. image:: function_dynamic.png Examples +++++++++++++ We provide a brief introduction to the fundamental properties, basic operations, and node classification tasks on hypergraph using **EasyGraph**. Basic Properties and Operations of Hypergraph ------------------------- Please refer to the related source code `here `_. .. important:: The nodes in our eg.Hypergraph are numbered from 0 to n - 1 (where n is the number of nodes). Each hyperedge in the hypergraph is represented as an unordered set of vertices. Therefore, hyperedges such as ``(0, 1, 2)``, ``(0, 2, 1)``, and ``(2, 1, 0)`` are considered identical. .. code-block:: python hg = eg.Hypergraph(num_v = 5, e_list = [(0, 1, 2), (2, 3), (2, 3), (0, 4)], merge_op="sum") print("Sparse format:",hg.incidence_matrix) """ Sparse format: (0, 0) 1 (0, 2) 1 (1, 0) 1 (2, 0) 1 (2, 1) 1 (3, 1) 1 (4, 2) 1 """ print("Dense format:",hg.incidence_matrix.todense()) """ Dense format: [[1 0 1] [1 0 0] [1 1 0] [0 1 0] [0 0 1]] """ print("hg.e:",hg.e) # ([(0, 1, 2), (2, 3), (0, 4)], [1.0, 2.0, 1.0], [{}, {}, {}]) e_line_width = [1, 2, 1] hg.draw(v_label=[0, 1, 2, 3, 4], v_color='#e6928f', e_color='#4e9595', e_line_width=e_line_width) .. image:: hypergraph_example1.png .. code-block:: python # Add hyperedges and you can find the weight of the last hyperedge is 1.0 and 2.0, if you set the merge_op to mean and sum, respectively. hg.add_hyperedges(e_list = [(0, 2, 1), (2, 4)], merge_op="mean") print("hg.e:", hg.e) # ([(0, 1, 2), (2, 3), (0, 4), (2, 4)], [1.0, 2.0, 1.0, 1.0], [{}, {}, {}, {}]) hg.add_hyperedges(e_list = [(2, 4)], merge_op="sum") print("hg.e:", hg.e) # ([(0, 1, 2), (2, 3), (0, 4), (2, 4)], [1.0, 2.0, 1.0, 1.0], [{}, {}, {}, {}]) hg.remove_hyperedges(e_list = [(2, 3)]) print("hg.e:", hg.e) # ([(0, 1, 2), (0, 4), (2, 4)], [1.0, 1.0, 2.0], [{}, {}, {}]) .. image:: hypergraph_example2.png .. note:: If the added hyperedges have duplicate hyperedges, those duplicate hyperedges will be automatically merged with specified merge_op. If merge_op = 'sum', the weight is the sum of duplicate hyperedges weights. If merge_op = 'mean', the weight is the average sum of duplicate hyperedges weights. Create a hypergraph based on the k-nearest neighbors of the features. .. code-block:: python X = torch.tensor([[0.0658, 0.3191, 0.0204, 0.6955], [0.1144, 0.7131, 0.3643, 0.4707], [0.2250, 0.0620, 0.0379, 0.2848], [0.0619, 0.4898, 0.9368, 0.7433], [0.5380, 0.3119, 0.6462, 0.4311]]) hg = eg.Hypergraph.from_feature_kNN(X, k=3) print(f"hg: {hg}") print(f"hg.e: {hg.e}") ''' hg: Hypergraph(num_vertex=5, num_hyperedge=4) hg.e: ([(0, 1, 2), (0, 1, 4), (0, 2, 4), (1, 3, 4)], [1.0, 1.0, 1.0, 1.0], [{}, {}, {}, {}]) ''' print("H:",hg.H.to_dense()) ''' H: [[1 1 1 0] [1 1 0 1] [1 0 1 0] [0 0 0 1] [0 1 1 1]] ''' hg.draw(v_label=[0, 1, 2, 3, 4], v_color='#e6928f', e_color='#4e9595') .. image:: hypergraph_example3.png Construct a hypergraph from a graph: .. code-block:: python g = eg.Graph() g.add_edges([(0, 1), (1, 2), (2, 3), (1, 4)]) hg = eg.Hypergraph.from_graph(g) print(f"hg.e:{hg.e}") # ([(0, 1), (1, 2), (1, 4), (2, 3)], [1.0, 1.0, 1.0, 1.0], [{}, {}, {}, {}]) print(f"H:{hg.H.to_dense()}") ''' H: [[1 0 0 0] [1 1 1 0] [0 1 0 1] [0 0 0 1] [0 0 1 0]] ''' Train a Hypergraph Neural Network (HGNN) model on the trivago-clicks dataset: ------------------------- The related source code can be found in the following repositories: `hypergraph model `_ and `hypergraph datasets `_. We demonstrate a specific node classification task on the trivago-clicks dataset using a Hypergraph Neural Network (HGNN). **Model:** Hypergraph Neural Networks (HGNN) (Feng, Y., You, H., Zhang, Z., Ji, R., & Gao, Y. (2019). Hypergraph Neural Networks. Proceedings of the AAAI Conference on Artificial Intelligence, 33(01), 3558-3565. ). **Dataset:** trivago-clicks (eg.trivago_clicks): Sets of hotels clicked on in a Web browsing session, where labels are the countries of the accommodation. **Import Libraries:** .. code-block:: python import torch import torch.nn as nn import numpy as np import easygraph as eg from sklearn.model_selection import train_test_split import matplotlib.pyplot as plt **Dataset preparation and model definition:** .. code-block:: python def preprocess(): """Preprocess for HGNN model training. Preprocess dataset and model Returns: bool: dataset and model """ # There is no default feature vector for this dataset. Users can generate their own features. # Here we use random initialization to generate 100-dimensional node feature vectors trivago_clicks = eg.trivago_clicks() node_labels = trivago_clicks["labels"] hyperedges = trivago_clicks["edge_list"] num_classes = trivago_clicks["num_classes"] num_vertices = trivago_clicks["num_vertices"] input_feature_dim = 100 hidden_dim = 64 num_features = 100 node_features = {} for i in range(len(node_labels)): node_features[i] = np.random.randn(num_features) ''' Since there is no default split for this dataset, here we split the test set, validation set, and test set in a 50:25:25 fashion ''' train_nodes, test_nodes = train_test_split(list(range(num_vertices)), test_size=0.25, random_state=42) train_nodes, val_nodes = train_test_split(train_nodes, test_size=0.25, random_state=42) train_mask = train_nodes val_mask = val_nodes test_mask = test_nodes X = np.array([node_features[node] for node in range(len(node_labels))]) X = torch.from_numpy(X).float() y = np.array([node_labels[node] for node in range(len(node_labels))]) y = torch.from_numpy(y) dataset = {} dataset["structure"] = eg.Hypergraph(num_v=len(node_labels), e_list=hyperedges) dataset["features"] = X dataset["labels"] = y dataset["train_mask"] = train_mask dataset["val_mask"] = val_mask dataset["test_mask"] = test_mask dataset["num_classes"] = num_classes model = eg.HGNN(in_channels = input_feature_dim, hid_channels = hidden_dim, num_classes = num_classes) return dataset, model **Train, valid, test:** .. code-block:: python def train( data: dict, model: nn.Module, optimizer: torch.optim.Optimizer, criterion: nn.Module,): features, structure = data["features"], data["structure"] train_mask, labels = data["train_mask"], data["labels"] optimizer.zero_grad() outputs = model(features, structure) loss = criterion(outputs[train_mask], labels[train_mask]) loss.backward() optimizer.step() return loss @torch.no_grad() def valid(model: nn.Module, data: dict): features, structure = data["features"], data["structure"] val_mask, labels = data["val_mask"], data["labels"] model.eval() outputs = model(features, structure).argmax(dim=1) correct = (outputs[val_mask] == labels[val_mask]).sum() acc = int(correct) / len(val_mask) return acc @torch.no_grad() def test(model: nn.Module, data: dict): features, structure = data["features"], data["structure"] val_mask, labels = data["test_mask"], data["labels"] outputs = model(features, structure).argmax(dim=1) correct = (outputs[val_mask] == labels[val_mask]).sum() acc = int(correct) / len(val_mask) return acc **Loss visualization:** .. code-block:: python def draw_loss_curve(loss1, save_path = "loss_pic.png"): plt.clf() epochs = range(1, len(loss1) + 1) plt.plot(epochs, loss1, 'b', label='EG Training loss') plt.title('Training Loss Comparison') plt.xlabel('Epochs') plt.ylabel('Loss') plt.legend() plt.grid(True) if save_path is not None: plt.savefig(save_path) plt.show() **Main:** .. code-block:: python if __name__ == "__main__": dataset, model = preprocess() loss_lst = [] epoch = 10 lr = 0.01 loss_fn = nn.CrossEntropyLoss() optimizer = torch.optim.Adam(model.parameters(), lr = lr) model.train() for i in range(epoch): loss = train(data = dataset, model = model, optimizer=optimizer, criterion=loss_fn) loss_lst.append(loss.detach().numpy()) val_acc = valid(model = model, data = dataset) print(f"epoch: {i}, valid accuracy : {val_acc}, loss : {loss}") print("Training finish!") test_acc = test(model = model, data=dataset) print("test accuracy:", test_acc) draw_loss_curve(loss_lst) **Output:** .. code-block:: python epoch: 0, valid accuracy : 0.2997239226983555, loss : 2.3755440711975098 epoch: 1, valid accuracy : 0.30524546873124475, loss : 2.3317201137542725 epoch: 2, valid accuracy : 0.30806625855239467, loss : 2.2897789478302 epoch: 3, valid accuracy : 0.3118473172488297, loss : 2.248279571533203 epoch: 4, valid accuracy : 0.31736886328171887, loss : 2.2077314853668213 epoch: 5, valid accuracy : 0.32355059416636656, loss : 2.169461727142334 ...... epoch: 495, valid accuracy : 0.44616492617933023, loss : 1.5331683158874512 epoch: 496, valid accuracy : 0.445744808546393, loss : 1.5331355333328247 epoch: 497, valid accuracy : 0.44616492617933023, loss : 1.533029317855835 epoch: 498, valid accuracy : 0.4456247749369824, loss : 1.5328493118286133 epoch: 499, valid accuracy : 0.4461049093746249, loss : 1.5326640605926514 Training finish test accuracy: 0.4464550979068197 .. image:: loss_pic.png