Hi there!

I graduated in Aerospace Engineering from the Technical University of Lisbon and worked in guidance, navigation, and control systems before pursuing my interests in machine learning.

Currently, I work as an Advanced Research Data Scientist at Feedzai, where I've spent the past six years tackling problems in fraud detection and anti-money laundering. I have worked on a range of topics including decision theory, rule systems, and deep-learning architectures for structured transaction data. Recently, my research has focused on RNNs and state-space models, as well as dynamic graph models.

Outside of work, I also enjoy working on personal projects. Lately I've been interested in generative models for discrete and mixed-type structured data.

Personal Projects

🌳 NRGBoost

Tree-based generative modeling for tabular data.

🌱 seed = 123

My blog about machine learning

Selected Publications

ICLR 2025

NRGBoost: Energy-Based Generative Boosted Trees

João Bravo

Abstract

Despite the rise to dominance of deep learning in unstructured data domains, tree-based methods such as Random Forests (RF) and Gradient Boosted Decision Trees (GBDT) are still the workhorses for handling discriminative tasks on tabular data. We explore generative extensions of these popular algorithms with a focus on explicitly modeling the data density (up to a normalization constant), thus enabling other applications besides sampling. As our main contribution we propose an energy-based generative boosting algorithm that is analogous to the second-order boosting implemented in popular libraries like XGBoost. We show that, despite producing a generative model capable of handling inference tasks over any input variable, our proposed algorithm can achieve similar discriminative performance to GBDT on a number of real world tabular datasets, outperforming alternative generative approaches. At the same time, we show that it is also competitive with neural-network-based models for sampling.

TMLR 2024

Mind the truncation gap: challenges of learning on dynamic graphs with recurrent architectures

João Bravo, Jacopo Bono, Pedro Saleiro, Hugo Ferreira, Pedro Bizarro

Abstract

Systems characterized by evolving interactions, prevalent in social, financial, and biological domains, are effectively modeled as continuous-time dynamic graphs (CTDGs). To manage the scale and complexity of these graph datasets, machine learning (ML) approaches have become essential. However, CTDGs pose challenges for ML because traditional static graph methods do not naturally account for event timings. Newer approaches, such as graph recurrent neural networks (GRNNs), are inherently time-aware and offer advantages over static methods for CTDGs. However, GRNNs face another issue: the short truncation of backpropagation-through-time (BPTT), whose impact has not been properly examined until now. In this work, we demonstrate that this truncation can limit the learning of dependencies beyond a single hop, resulting in reduced performance. Through experiments on a novel synthetic task and real-world datasets, we reveal a performance gap between full backpropagation-through-time (F-BPTT) and the truncated backpropagation-through-time (T-BPTT) commonly used to train GRNN models. We term this gap the "truncation gap" and argue that understanding and addressing it is essential as the importance of CTDGs grows, discussing potential future directions for research in this area.

Honors and Awards

Top reviewer at ICML

2025

Top reviewer at NeurIPS

2024

Winner of both tracks of NeurIPS competition - Learning By Doing

In each track participants needed to find controls/policies to optimally interact with a target dynamical system leveraging only historical interaction data.

2021

Merit Scholarship from the University of Lisbon

Awarded for having the best GPA for that academic year among all Aerospace students in the same Bologna cycle.

2007, 2008, 2010