Reducing hurdles to clinical trials without compromising the therapeutic promises of peptide candidates becomes an essential step in peptide-based drug design. Machine learning models provide an informed and efficient strategy to create novel peptide and protein sequences with the desired profiles. Most models primarily predict or generate novel peptides and proteins from sequential representation, lacking structural information. It is unclear what impact structural factors might have on biological prediction or sequence generation.1-3
Antimicrobial peptides (AMPs) are rich and structurally diverse sequences with potential applications against human, livestock and crop infections. Machine learning algorithms intertwine predictive and generative models to design optimal AMP sequences rationally. Here, we present different approaches to detect sequential or structural bias early in the process.4,5 For example, we recently benchmarked four protein structure predictors to estimate the structural landscape of medium-large training sets.5 Using our best predictor, we evaluated 13 state-of-the-art AMP predictive models, demonstrating the results to be sensitive to structural class imbalance. Current efforts focus on mitigating these imbalances to build fairer and more generalist models for discovering and designing safe AMPs.