Benchmark of Species196-L

Benchmark of Species196-L

Supervised-learning Benchmark

The slash (/) in the table indicates that the values before the slash represent the accuracy when training the model from scratch, while the values after the slash represent the accuracy when Imagenet-1K pretrained weights are loaded. donates models pretrained with Imagenet-22K.

Model Resolution # Params # FLOPs Top-1 ACC Top-5 ACC F1-Macro
MobileViT-XS 224x224 2.3 M 0.7 G 64.11 / 78.55 83.51 / 91.92 53.52 / 69.01
GhostNet 1.0 224x224 5.2 M 0.1 G 62.75 / 76.02 82.58 / 90.77 51.30 / 64.93
EfficientNet-B0 224x224 5.3 M 0.4 G 62.88 / 78.26 81.66 / 91.60 53.13 / 66.91
MobileNetV3 Large 1.0 224x224 5.4 M 0.2 G 62.75 / 77.83 81.46 / 90.77 49.99 / 66.50
RegNetY-4GF 224x224 20.6 M 4.0 G 43.01 / 82.25 69.02 / 93.71 28.99 / 71.24
Deit-S 224x224 22 M 4.6 G 36.89 / 77.21 56.79 / 91.52 29.35 / 65.25
TNT-S 224x224 23.8 M 5.2 G 38.66 / 80.67 59.14 / 93.17 30.67 / 69.34
CMT-S 224x224 25.1 M 4 G 40.86 / 81.12 60.10 / 93.32 33.25 / 70.40
Resnet50 224x224 25.6 M 4.1 G 64.32 / 78.11 81.70 / 91.91 53.31 / 67.29
Swin-T 224x224 28 M 4.5 G 46.88 / 81.66 68.57 / 93.52 37.30 / 71.20
Convnext-T 224x224 29M 4.5G 46.36 / 78.94 68.59 / 92.44 37.16 / 70.43
MaxViT-T 224x224 31 M 5.6 G 52.19 / 83.35 72.12 / 94.16 42.40 / 62.56
MViTv2-B 224x224 52 M 10.2 G 46.22 / 83.79 66.21 / 94.81 35.83 / 72.94
Resnet200-D 224x224 65 M 26 G 51.35 / 82.11 73.07 / 94.76 37.70 / 70.61
VIT-B/32 224x224 86 M 8.6 G 32.59 / 74.68 53.76 / 89.76 25.20 / 63.38
Swin-B 224x224 88 M 15.4 G 48.72 / 82.88 69.71 / 94.30 39.28 / 72.04
MetaFormer-2 384x384 81 M - 88.69 -
TransFG 224x224 85.8 M - 84.42 -
IELT 448x448 93.5 M - 81.92 -

 

Multimodal Large Models Benchmark

Leaderboard of true or false questions of Species-L multimodal benchmark.

Models Phylum ACC(%) Phylum ACC+(%) Class ACC(%) Class ACC+(%) Order ACC(%) Order ACC+(%) Family ACC(%) Family ACC+(%) Genus ACC(%) Genus ACC+(%) Species ACC(%) Species ACC+(%) Avg ACC(%) Avg ACC+(%)
InstructBLIP 59.0 19.2 64.9 31.0 60.5 24.1 54.3 12.2 47.7 15.5 50.25 17.0 56.1 19.8
LLaVA 50.0 0.0 50 0.0 50.0 0.1 50.1 0.3 50.5 6.2 53.8 12.1 50.7 3.1
PandaGPT 50.2 0.4 51.6 3.2 50.0 0.0 50.2 0.4 50.2 0.4 50.0 0.3 50.4 0.8
mPLUG-Owl 52.7 11.6 52.8 12.6 48.6 7.5 50.0 9.1 47.8 11.5 45.4 10.3 49.6 10.4
Visual-GLM6B 47.4 5.1 45.7 2.8 46.8 5.7 48.5 6.5 48.2 7.0 47.3 5.0 47.3 5.4
Otter 48.5 0.0 49.4 0.0 48.3 0.0 49.0 0.6 43.3 0.1 40.6 1.9 46.5 0.4
Multimodal-GPT 39.4 9.1 38 9.7 32.7 8.1 34.0 9.5 35.1 9.3 39.2 15.0 36.4 10.1
MiniGPT4 22.4 7.7 23.4 7.0 24.1 7.1 23.5 8.0 20.2 6.3 22.5 8.4 22.7 7.4
Blip2 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0

 

Leaderboard of multiple choice questions of Species-L multimodal benchmark.

Models Phylum Class Order Family Genus Species Avg Acc
Multimodal-GPT 51.8 71.6 60.6 56.6 57.9 63.2 60.3
InstructBLIP 47.8 58.7 56.3 57.5 45.3 39.8 50.9
PandaGPT 53.0 44.1 42.6 52.8 38.6 34.6 44.3
mPLUG-Owl 34.1 32.1 43.0 39.2 31.0 24.9 34.1
MiniGPT4 28.6 32.7 32.1 28.2 29.9 32.7 30.7
LLaVA 38.1 34.2 17.3 33.4 22.2 23.4 28.1
Blip2 26.7 30.3 23.3 27.9 23.9 24.5 26.1
Visual-GLM6B 23.0 12.2 13.9 30.5 15.7 11.6 17.8
Otter 0.0 6.8 20.8 8.3 6.7 0.3 7.15