This paper discusses the importance of detecting breaking events in real time to help emergency response workers, and how social media can be used to process large amounts of data quickly. Most event detection techniques have focused on either images or text, but combining the two can improve performance. The authors present lessons learned from the Flood-related multimedia task in MediaEval2020, provide a dataset for reproducibility, and propose a new multimodal fusion method that uses Graph Neural Networks to combine image, text, and time information. Their method outperforms state-of-the-art approaches and can handle low-sample labelled data.