Data Resources from the MMseqs2 Family
Our data resources in one place.
Our data resources in one place.
Tara oceans metagenomics datasets assembled with MegaHit. Proteins predicted with MetaEuk.
Predicted proteins from >1800 metagenomes and >400 metatranscriptomes with Prodigal. Clustered to 50% and 95% sequence identity with Linclust.
Clustered to 30%, 50% and 90% sequence identity. Clustered with MMseqs2 and Linclust.
SRC: 2B proteins from soil metagenomics. MERC: 300M proteins from marine metatranscriptomics. Assembled with Plass.
Over 300M HMMs containing over 2.6B proteins for HHblits. Clustered with MMseqs2 and Linclust.
Built with MMseqs2 and HH-suite3.
Environmental protein databases built for ColabFold - an accessible AlphaFold2 implementation.
Mirdita, Schütze, Moriwaki, Lim, Ovchinnikov, Steinegger, Nature Methods, 2022
Compressed protein structure databases for Foldcomp: a library and format for compressing and indexing large protein structure sets.