Network Measurements

Michele Ferrero joins the group

2025-04-01T00:00:00+00:00

Michele Ferrero is a PhD student through a CIFRE scolarship granted by Huawei Technologies Co.,Ltd in Paris through the support of EURECOM (Prof. Roberto Morabito)

Wang Chao’s MITUNE paper accepted at ICLR 2025

2025-01-27T00:00:00+00:00

Wang Chao’s paper titled “Information Theoretic Text-to-Image Alignment” will be presented at Conext 2024

Congrats to Chao and the team.

Abstract: Diffusion models for Text-to-Image (T2I) conditional generation have recently achieved tremendous success. Yet, aligning these models with user’s intentions still involves a laborious trial-and-error process, and this challenging alignment problem has attracted considerable attention from the research community. In this work, instead of relying on fine-grained linguistic analyses of prompts, human annotation, or auxiliary vision-language models, we use Mutual Information (MI) to guide model alignment. In brief, our method uses self-supervised fine-tuning and relies on a point-wise MI estimation between prompts and images to create a synthetic fine-tuning set for improving model alignment. Our analysis indicates that our method is superior to the state-of-the-art, yet it only requires the pre-trained denoising network of the T2I model itself to estimate MI, and a simple fine-tuning strategy that improves alignment while maintaining image quality.

And the paper pre-print here: MITUNE paper

DUMBO accepted at Conext 2024

2024-01-15T00:00:00+00:00

Our paper titled “Taming the Elephants: Affordable Flow Length Prediction in the Data Plane” will be presented at Conext 2024

Congrats to the team, especially Raphael, Andrea, and Gabriele.

Abstract: Machine Learning (ML) shows promising potential for enhancing networking tasks. In particular, early flow size prediction would be beneficial for a wide range of use cases. However, implementing an ML-enabled system is a challenging task due to network devices limited resources. Previous works have demonstrated the feasibility of running simple ML models in the data plane, yet their integration in a practical end-to-end system is not trivial. Additional challenges in resources management and model maintenance need to be addressed to ensure the network task(s) performance improvement justifies the system overhead. In this work, we propose DUMBO, a versatile end-to-end system to generate and exploit flow size hints at line rate.Our system seamlessly integrates and maintains a simple ML model that offers early coarse-grain flow size prediction in the data plane. We evaluate the proposed system on flow scheduling, per-flow packet inter-arrival time distribution, and flow size estimation using real traffic traces, and perform experiments using an FPGA prototype running on an AMD(R)-Xilinx(R) Alveo U280 SmartNIC. Our results show that DUMBO outperforms traditional state-of-the-art approaches by equipping network devices data planes with a lightweight ML model.

SPADA accepted at Conext 2023

2023-10-18T00:00:00+00:00

Our abstract titled “SPADA: A Sparse Approximate Data Structure representation for data plane per-flow monitoring” will be presented at Conext 2023

Congrats to the team, especially Andrea, Raphael, and Gabriele.

Abstract: Accurate per-flow monitoring is critical for precise network diagnosis, performance analysis, and network operation and management in general. However, the limited amount of memory available on modern programmable devices and the large number of active flows force practitioners to monitor only the most relevant flows with approximate data structures, limiting their view of network traffic. We argue that, due to the skewed nature of network traffic, such data structures are, in practice, heavily underutilized, i.e., sparse, thus wasting a significant amount of memory.

This paper proposes a Sparse Approximate Data Structure (SPADA) representation that leverages sparsity to reduce the memory footprint of per-flow monitoring systems in the data plane while preserving their original accuracy. SPADA representation can be integrated into a generic per-flow monitoring system and is suitable for several measurement use cases. We prototype SPADA in P4 for a commercial FPGA target and test our approach with a custom simulator that we make publicly available, on four real network traces over three different monitoring tasks. Our results show that SPADA achieves 2× to 11× memory footprint reduction with respect to the state-of-the-art while maintaining the same accuracy, or even improving it.

Prelimary work on generative data augmentation for traffic classification accepted at Conext SW 2022

2023-10-16T00:00:00+00:00

Our abstract titled “Toward Generative Data Augmentation for Traffic Classification” will be presented at Conext Student Workshop 2023

Congrats to Chao Wang.

Abstract: Data Augmentation (DA)—augmenting training data with synthetic samples—is wildly adopted in Computer Vision (CV) to improve models performance. Conversely, DA has not been yet popularized in networking use cases, including Traffic Classification (TC). In this work, we present a preliminary study of 14 hand-crafted DAs applied on the MIRAGE19 dataset. Our results (𝑖) show that DA can reap benefits previously unexplored in TC and (𝑖𝑖) foster a research agenda on the use of generative models to automate DA design.

Prelimary work on learned data structures presented at Conext SW 2022

2022-12-06T00:00:00+00:00

Our abstract titled “Learned data structures for per-flow measurements” has been presented at Conext Student Workshop 2022

Abstract: This work presents a generic framework that exploits learning to improve the quality of network measurements. The main idea is to reuse measures collected by the network monitoring tasks to train an ML model that learns some per-flow characteristics and improves the measurement quality re-configuring the memory according to the learned information. We applied this idea to two different monitoring tasks, we identify the main issues related to this approach and we present some preliminary results.

Paper accepted at HotNets 2022

2022-09-21T00:00:00+00:00

Our paper titled “Towards a systematic multi-modal representation learning for network data” has been accepted at HotNets 2022

Abstract: Learning the right representations from complex input data is the key ability of successful machine learning (ML) models. The latter are often tailored to a specific data modality. For example, recurrent neural networks (RNNs) were designed having the processing of sequential data in mind, while convolutional neural networks (CNNs) were designed to exploit spatial correlation naturally present in images. Unlike computer vision (CV) and natural language processing (NLP), each of which targets a single well-defined modality, network ML problems often have a mixture of data modalities as input. Yet, instead of exploiting such abundance, prac- titioners tend to rely on sub-features thereof, reducing the problem on single modality for the sake of simplicity.

In this paper, we advocate for exploiting all the modalities naturally present in network data. As a first step, we observe that network data systematically exhibits a mixture of quantities (e.g., measurements), and entities (e.g., IP addresses, names, etc.). Whereas the former are generally well exploited, the latter are often underused or poorly represented (e.g., with one-hot encoding). We propose to systematically leverage state of the art embedding techniques to learn entity representations, whenever significant sequences of such entities are historically observed. Through two diverse use cases, we show that such entity encoding can benefit and naturally augment classic quantity-based features.

Paper accepted at INFOCOM 2022

2021-12-03T00:00:00+00:00

Our paper titled “Accelerating Deep Learning Classification with Error-controlled Approximate-key Caching”, will be presented at INFOCOM 2022. A preliminary version of the paper is available here. Full version here

Abstract: While Deep Learning (DL) technologies are a promising tool to solve networking problems that map to classification tasks, their computational complexity is still too high with respect to real-time traffic measurements requirements. To reduce the DL inference cost, we propose a novel caching paradigm, that we named approximate-key caching, which returns approximate results for lookups of selected input based on cached DL inference results. While approximate cache hits alleviate DL inference workload and increase the system throughput, they however introduce an approximation error. As such, we couple approximate-key caching with an error-correction principled algorithm, that we named auto-refresh. We analytically model our caching system performance for classic LRU and ideal caches, we perform a trace-driven evaluation of the expected performance, and we compare the benefits of our proposed approach with the state-of-the-art similarity caching – testifying the practical interest of our proposal.

Hiring at Huawei Paris, Fall 2021

2021-09-01T00:00:00+00:00

My group has an opening for a permanent researcher position:

The Network Measurements research team of the Mathematical and Algorithmic Sciences Lab, is looking for candidates for a permanent research position on performance analysis, advanced data structure and network programmability to be applied in the context of Network measurements. The opening, is in the Huawei Research Center, located in the Paris area. The position focus on developing novel algorithms and mechanisms and/or provide accurate models for understanding and improving network measurements efficiency.

Postition has been filled stay tuned for more.

Paper accepted at SEC 2021

2021-04-03T00:00:00+00:00

Our paper titled “FENXI: Fast in-network analytics”, will be presented at SEC 2021 2022. A preliminary version of the paper is available here. Full version here

Abstract: Live traffic analysis at the first aggregation point in the ISP network enables the implementation of complex traffic engineering policies but is limited by the scarce processing capabilities, especially for Deep Learning (DL) based analytics. The introduction of specialized hardware accelerators i.e., Tensor Processing Unit (TPU), offers the opportunity to enhance the processing capabilities of network devices at the edge. Yet, to date, no packet processing pipeline is capable of offering DL-based analysis capabilities in the data-plane, without interfering with network operations. In this paper, we present FENXI, a system to run complex analytics by leveraging TPU. The design of FENXI decouples forwarding operations and traffic analytics which operates at different granularities i.e., packet and flow levels. We conceive two independent modules that asynchronously communicate to exchange network data and analytics results, and design data structures to extract flow level statistics without impacting per-packet processing. We prototyped and evaluated FENXI on general-purpose servers considering both adversarial and realistic network conditions. Our analysis shows that FENXI can sustain 100 Gbps line rate traffic processing requiring only limited resources, while also dynamically adapting to variable network conditions.