Stop Blaming Your Data. Your BERT Fine-Tuning Strategy Is the Problem.
I Fine-Tuned BERT 47 Times Before I Realized I Was the Problem Fine-tuning BERT looks simple on Hugging Face. Running it in production looks like a different universe. Attempt number 47. Surely the learning rate is the only variable left to change. It was 1:47 AM. The sprint demo was in six hours. I had a BERT model fine-tuned on our customer support ticket dataset. I’d done everything by the book. Pre-trained weights from bert-base-uncased. Hugging Face Transformers. AdamW optimizer. Learning rate […]