Benchmark of Species196-L
Supervised-learning Benchmark
The slash (/) in the table indicates that the values before the slash represent the accuracy when training the model from scratch, while the values after the slash represent the accuracy when Imagenet-1K pretrained weights are loaded.✝ donates models pretrained with Imagenet-22K.
Model | Resolution | # Params | # FLOPs | Top-1 ACC | Top-5 ACC | F1-Macro |
---|---|---|---|---|---|---|
MobileViT-XS | 224x224 | 2.3 M | 0.7 G | 64.11 / 78.55 | 83.51 / 91.92 | 53.52 / 69.01 |
GhostNet 1.0 | 224x224 | 5.2 M | 0.1 G | 62.75 / 76.02 | 82.58 / 90.77 | 51.30 / 64.93 |
EfficientNet-B0 | 224x224 | 5.3 M | 0.4 G | 62.88 / 78.26 | 81.66 / 91.60 | 53.13 / 66.91 |
MobileNetV3 Large 1.0 | 224x224 | 5.4 M | 0.2 G | 62.75 / 77.83 | 81.46 / 90.77 | 49.99 / 66.50 |
RegNetY-4GF | 224x224 | 20.6 M | 4.0 G | 43.01 / 82.25 | 69.02 / 93.71 | 28.99 / 71.24 |
Deit-S | 224x224 | 22 M | 4.6 G | 36.89 / 77.21 | 56.79 / 91.52 | 29.35 / 65.25 |
TNT-S | 224x224 | 23.8 M | 5.2 G | 38.66 / 80.67 | 59.14 / 93.17 | 30.67 / 69.34 |
CMT-S | 224x224 | 25.1 M | 4 G | 40.86 / 81.12 | 60.10 / 93.32 | 33.25 / 70.40 |
Resnet50 | 224x224 | 25.6 M | 4.1 G | 64.32 / 78.11 | 81.70 / 91.91 | 53.31 / 67.29 |
Swin-T | 224x224 | 28 M | 4.5 G | 46.88 / 81.66 | 68.57 / 93.52 | 37.30 / 71.20 |
Convnext-T | 224x224 | 29M | 4.5G | 46.36 / 78.94 | 68.59 / 92.44 | 37.16 / 70.43 |
MaxViT-T | 224x224 | 31 M | 5.6 G | 52.19 / 83.35 | 72.12 / 94.16 | 42.40 / 62.56 |
MViTv2-B | 224x224 | 52 M | 10.2 G | 46.22 / 83.79 | 66.21 / 94.81 | 35.83 / 72.94 |
Resnet200-D | 224x224 | 65 M | 26 G | 51.35 / 82.11 | 73.07 / 94.76 | 37.70 / 70.61 |
VIT-B/32 | 224x224 | 86 M | 8.6 G | 32.59 / 74.68 | 53.76 / 89.76 | 25.20 / 63.38 |
Swin-B | 224x224 | 88 M | 15.4 G | 48.72 / 82.88 | 69.71 / 94.30 | 39.28 / 72.04 | MetaFormer-2 ✝ | 384x384 | 81 M | - | 88.69 | - |
TransFG ✝ | 224x224 | 85.8 M | - | 84.42 | - | |
IELT ✝ | 448x448 | 93.5 M | - | 81.92 | - |
Multimodal Large Models Benchmark
Leaderboard of true or false questions of Species-L multimodal benchmark.
Models | Phylum ACC(%) | Phylum ACC+(%) | Class ACC(%) | Class ACC+(%) | Order ACC(%) | Order ACC+(%) | Family ACC(%) | Family ACC+(%) | Genus ACC(%) | Genus ACC+(%) | Species ACC(%) | Species ACC+(%) | Avg ACC(%) | Avg ACC+(%) |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
InstructBLIP | 59.0 | 19.2 | 64.9 | 31.0 | 60.5 | 24.1 | 54.3 | 12.2 | 47.7 | 15.5 | 50.25 | 17.0 | 56.1 | 19.8 |
LLaVA | 50.0 | 0.0 | 50 | 0.0 | 50.0 | 0.1 | 50.1 | 0.3 | 50.5 | 6.2 | 53.8 | 12.1 | 50.7 | 3.1 |
PandaGPT | 50.2 | 0.4 | 51.6 | 3.2 | 50.0 | 0.0 | 50.2 | 0.4 | 50.2 | 0.4 | 50.0 | 0.3 | 50.4 | 0.8 |
mPLUG-Owl | 52.7 | 11.6 | 52.8 | 12.6 | 48.6 | 7.5 | 50.0 | 9.1 | 47.8 | 11.5 | 45.4 | 10.3 | 49.6 | 10.4 |
Visual-GLM6B | 47.4 | 5.1 | 45.7 | 2.8 | 46.8 | 5.7 | 48.5 | 6.5 | 48.2 | 7.0 | 47.3 | 5.0 | 47.3 | 5.4 |
Otter | 48.5 | 0.0 | 49.4 | 0.0 | 48.3 | 0.0 | 49.0 | 0.6 | 43.3 | 0.1 | 40.6 | 1.9 | 46.5 | 0.4 |
Multimodal-GPT | 39.4 | 9.1 | 38 | 9.7 | 32.7 | 8.1 | 34.0 | 9.5 | 35.1 | 9.3 | 39.2 | 15.0 | 36.4 | 10.1 |
MiniGPT4 | 22.4 | 7.7 | 23.4 | 7.0 | 24.1 | 7.1 | 23.5 | 8.0 | 20.2 | 6.3 | 22.5 | 8.4 | 22.7 | 7.4 |
Blip2 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 |
Leaderboard of multiple choice questions of Species-L multimodal benchmark.
Models | Phylum | Class | Order | Family | Genus | Species | Avg Acc |
---|---|---|---|---|---|---|---|
Multimodal-GPT | 51.8 | 71.6 | 60.6 | 56.6 | 57.9 | 63.2 | 60.3 |
InstructBLIP | 47.8 | 58.7 | 56.3 | 57.5 | 45.3 | 39.8 | 50.9 |
PandaGPT | 53.0 | 44.1 | 42.6 | 52.8 | 38.6 | 34.6 | 44.3 |
mPLUG-Owl | 34.1 | 32.1 | 43.0 | 39.2 | 31.0 | 24.9 | 34.1 |
MiniGPT4 | 28.6 | 32.7 | 32.1 | 28.2 | 29.9 | 32.7 | 30.7 |
LLaVA | 38.1 | 34.2 | 17.3 | 33.4 | 22.2 | 23.4 | 28.1 |
Blip2 | 26.7 | 30.3 | 23.3 | 27.9 | 23.9 | 24.5 | 26.1 |
Visual-GLM6B | 23.0 | 12.2 | 13.9 | 30.5 | 15.7 | 11.6 | 17.8 |
Otter | 0.0 | 6.8 | 20.8 | 8.3 | 6.7 | 0.3 | 7.15 |