In recent years, the Optimal Transport (OT) based Gromov-Wasserstein (GW) divergence has been investigated as a similarity measure between structured data such as graphs seen as distributions typically lying in different metric spaces. In this talk, we discuss the optimization problem inherent in the computation of GW and some of its recent extensions, such as Entropic, Fused and semi-relaxed GW divergences. Next we will illustrate how these OT problems can be used in machine learning applications to learn graph representations for graph compression, clustering, classification and structured prediction.