Geometry > Scale
# ai
# deeplearning
# machinelearning
# showdev
Bootstraptor
[P] Lila-E8: 40M Parameter Transformer Outperforms 60M...
[P] Lila-E8: 40M Parameter Transformer Outperforms 60M Baselines via Geometric E8 Attention (0.37 Train Loss)
Bootstraptor ・ Feb 25
#ai
#deeplearning
#machinelearning
#showdev