May 10, 2023
While recent releases of language models have emphasized the large in Large Language Models, most everyday NLP work uses smaller language models, finetuned on custom or task specific datasets. In this post, I will show how to achieve fast finetuning performance on modern GPUs using tools like PyTorch 2.0’s torch.compile and FlashAttention.
Jan 20, 2023
Last weekend the paper Growing Cosine Unit: A Novel Oscillatory Activation Function That Can Speedup Training and Reduce Parameters in Convolutional Neural Networks by Noel et al surfaced on my social feed. This paper proposes a new oscillatory activation function, called Growing Cosine Unit (GCU), which is supposed to outperform other activation functions, such as SiLU, Mish, and ReLU. This immediately drew my attention and I decided to see if I could replicate the results.
Aug 31, 2022
While working through Unit 3 of the Hugging Face Reinforcement Learning course, I was feeling impatient by how long it took for sugggested DQN configuration to finish training. I decided to investigate the lethargic performance and succeeded in increasing the training speed of Atari DQN agents by a factor of three to fourteen using EnvPool and a custom PyTorch GPU replay memory buffer.