Adel Bibi is a senior researcher in machine learning and computer vision at the Department of Engineering Science of the University of Oxford, a Research Fellow (JRF) at Kellogg College, and a member of the ELLIS Society. Bibi is an R&D Distinguished Advisor with Softserve. Previously, Bibi was a senior research associate and a postdoctoral researcher with Philip H.S. Torr since October 2020. He received his MSc and PhD degrees from King Abdullah University of Science & Technology (KAUST) in 2016 and 2020, respectively, advised by Bernard Ghanem. Bibi was awarded an Amazon Research Award in 2022 in the Machine Learning Algorithms and Theory track in addition to being awarded the Google Gemma 2 Academic Award in 2024. Bibi received four best paper awards; a NeurIPS23 workshop, an ICML23 workshop, a 2022 CVPR workshop, and one at the Optimization and Big Data Conference in 2018. His contributions include over 30 papers published in top machine learning and computer vision conferences. He also received four outstanding reviewer awards (CVPR18, CVPR19, ICCV19, ICLR22) and a Notable Area Chair Award in NeurIPS23.
Currently, Bibi is leading a group in Oxford focusing on the intersection between AI safety of large foundational models in both vision and language (covering topics such as robustness, certification, alignment, adversarial elicitation, etc.) and the efficient continual update of these models.
Download my resume
[Note!] I am always looking for strong self-motivated PhD students. If you are interested in AI Safety, Trustworthy, and Security of AI models and Agentic AI, reach out!
[Consulting Expertise] I have consulted in the past on projects spanning core machine learning and data science, computer vision, certification and AI safety, optimization formulations for matching and resource allocation problems, among other areas.
PhD in Electrical Engineering (4.0/4.0); Machine Learning and Optimization Track, 2020
King Abdullah University of Science and Technology (KAUST)
MSc in Electrical Engineering (4.0/4.0); Computer Vision Track, 2016
King Abdullah University of Science and Technology (KAUST)
BSc in Electrical Engineering (3.99/4.0), 2014
Kuwait University
~~ End of 2023 ~~
~~ End of 2022 ~~
~~ End of 2021 ~~
~~ End of 2020 ~~
~~ End of 2019 ~~
~~ End of 2018 ~~
~~ End of 2017 ~~
~~ End of 2016 ~~
~~ End of 2015 ~~
Recent research shows that fine-tuning on benign instruction-following data can inadvertently undo the safety alignment process and increase a model’s propensity to comply with harmful queries. While instruction-following fine-tuning is important, task-specific fine-tuning - where models are trained on datasets with clear ground truth answers (e.g., multiple choice questions) - can enhance model performance on specialized downstream tasks. Understanding and mitigating safety risks in the task-specific setting remains distinct from the instruction-following context due to structural differences in the data. Our work demonstrates how malicious actors can subtly manipulate the structure of almost any task-specific dataset to foster significantly more dangerous model behaviors, while maintaining an appearance of innocuity and reasonable downstream task performance. To address this issue, we propose a novel mitigation strategy that mixes in safety data which mimics the task format and prompting style of the user data, showing this is significantly more effective and efficient than existing baselines at re-establishing safety alignment while maintaining similar task performance.
Since neural classifiers are known to be sensitive to adversarial perturbations that alter their accuracy, \textit{certification methods} have been developed to provide provable guarantees on the insensitivity of their predictions to such perturbations. Furthermore, in safety-critical applications, the frequentist interpretation of the confidence of a classifier (also known as model calibration) can be of utmost importance. This property can be measured via the Brier score or the expected calibration error. We show that attacks can significantly harm calibration, and thus propose certified calibration as worst-case bounds on calibration under adversarial perturbations. Specifically, we produce analytic bounds for the Brier score and approximate bounds via the solution of a mixed-integer program on the expected calibration error. Finally, we propose novel calibration attacks and demonstrate how they can improve model calibration through adversarial calibration training.