Repeat Proteins


Tandem Repeat Proteins (TRPs) are proteins characterized by repetitive amino acid sequences, which have been the subject of extensive research for over two decades. Various authors have attributed distinct features related to sequence, structure, function, and evolution to these proteins. However, many prominent characteristics become apparent only when examining specific subclasses of tandem repeats.
Our research is focused on a particular subset of repeat proteins that we have defined as Structured Tandem Repeat Proteins (STRPs). STRPs are tandem repeat proteins whose structures can be determined through structural biology techniques such as X-ray crystallography or electron microscopy, or predicted accurately using advanced methods like AlphaFold (Jumper et al., 2021).
These proteins exhibit clear secondary structure propensities and form regular tertiary structures, which can be components of large molecular assemblies. Typically, STRPs fall into Classes III (elongated) and IV (closed) of Kajava’s classification of protein tandem repeats.
They fold in a non-independent manner, but their folding patterns are not generalizable beyond this characteristic. Consequently, there is minimal room for flexibility or intrinsic disorder, limited to partial folding of certain units, as seen in Ankyrin repeats (e.g., IKBalpha). The sequence features of STRPs include repeat units of at least five residues with high sequence complexity.

These proteins exhibit clear secondary structure propensities and form regular tertiary structures, which can be components of large molecular assemblies. Typically, STRPs fall into Classes III (elongated) and IV (closed) of Kajava’s classification of protein tandem repeats.
They fold in a non-independent manner, but their folding patterns are not generalizable beyond this characteristic. Consequently, there is minimal room for flexibility or intrinsic disorder, limited to partial folding of certain units, as seen in Ankyrin repeats (e.g., IKBalpha). The sequence features of STRPs include repeat units of at least five residues with high sequence complexity.
STRPs can be highly degenerate in sequence while maintaining a similar structure and exhibit a variable number of repeat units, indicating a decoupling between structural size and protein function.

Our main aim is to understand the function and evolution of STRPs by applying computational approaches to detect and classify them. To address this challenge, we developed RepeatsDB, a comprehensive database that provides manually curated annotations and predictions of STRPs from the Protein Data Bank (PDB) and AlphaFoldDB, as well as their classification into Class, Topology, Fold, and Family according to Kajava’s classification.