data package

data.sbm module

data.sbm.sbm_processing.get_sbm_data()[source]

Generate a Stochastic Block Model (SBM) dataset as described in https://arxiv.org/abs/2405.19230.

This dataset represents a three-community Dynamic Stochastic Block Model (DSBM) with an inter-community edge probability matrix:

\[\begin{split}B(t) = \begin{bmatrix} s_1 & 0.02 & 0.02 \\ 0.02 & s_2 & 0.02 \\ 0.02 & 0.02 & s_3 \end{bmatrix}\end{split}\]

where \(s_1\), \(s_2\), and \(s_3\) represent within-community connection states. Each \(s\) can take one of two values: 0.08 or 0.16.

We simulate a dynamic network over \(T = 8\) time points, corresponding to the \(2^3 = 8\) possible combinations of \(s_1\), \(s_2\), and \(s_3\). For each time point, the adjacency matrix \(A(t)\) is drawn from the corresponding probability matrix \(B(t)\). The ordering of these time points is random.

The task is to predict the community label of each node.

Returns:

A tuple containing:
  • As (list of np.ndarray): List of adjacency matrices for each time point.

  • node_labels (np.ndarray): Array of node labels for each time point.

Return type:

tuple

data.school module

data.school.school_processing.get_school_data(return_all_labels=False)[source]

A dynamic social network between pupils at a primary school in Lyon, France (Stehlé et al., 2011).

Each of the 232 pupils wore a radio identification device such that each interaction, with its timestamp, could be recorded, forming a dynamic network. An interaction was defined by close proximity for 20 seconds.

The task is to predict the classroom allocation of each pupil. This dataset has a temporal structure that particularly distinguishes:

  • Class time: Pupils cluster together based on their class (easier).

  • Lunchtime: The cluster structure breaks down (harder).

The data covers two full school days, making it roughly repeating.

Returns:

A tuple containing:
  • As (np.ndarray): Adjacency matrices for each time window.

  • node_labels (np.ndarray): Labels for each node at each time window.

Return type:

tuple

data.flight module

data.flight.flight_processing.get_flight_data()[source]

The OpenSky dataset tracks the number of flights (edges) between airports (nodes) over each month from the start of 2019 to the end of 2021 (Olive et al., 2022).

The task is to predict the country of a given (European-only) airport. The network exhibits seasonal and periodic patterns and features a structural change when the COVID-19 pandemic hit Europe around March 2020.

Returns:

A tuple containing:
  • As_euro (np.ndarray): Adjacency matrices for each time window, filtered for European airports.

  • node_labels (np.ndarray): Labels for each node at each time window.

Return type:

tuple