Enhancing Large Language Models: The Power of Continued Pre-Training

Ali Issa
2 min readNov 17, 2024

--

Generated by DALL-E 3

๐–๐ก๐š๐ญ ๐ข๐ฌ ๐œ๐จ๐ง๐ญ๐ข๐ง๐ฎ๐ž๐ ๐ฉ๐ซ๐ž-๐ญ๐ซ๐š๐ข๐ง๐ข๐ง๐ 

Pre-training an LLM can begin with initial weights that are not random but were originally used to pre-train the model on different data. This allows us to leverage the knowledge gained from the initial pre-training rather than starting from scratch. It could be an efficient way to keep models current and adaptable to new domains without retraining from scratch. This is crucial for incorporating the latest information and insights into our models.

๐„๐ฑ๐ฉ๐ž๐ซ๐ข๐ฆ๐ž๐ง๐ญ

A recent paper highlighted three different approaches:
1. Regular pretraining on an initial dataset (D1).
2. Continued pretraining by further training an already pretrained model on a new dataset (D2).
3. Retraining from scratch on the combined datasets (D1 + D2).

๐Š๐ž๐ฒ ๐…๐ข๐ง๐๐ข๐ง๐ ๐ฌ

- Continued pre-training is 2ร— cheaper than full retraining but achieves comparable results.
- Adding a small fraction (as low as 0.5โ€“5%) of the original data helps prevent ๐œ๐š๐ญ๐š๐ฌ๐ญ๐ซ๐จ๐ฉ๐ก๐ข๐œ ๐Ÿ๐จ๐ซ๐ ๐ž๐ญ๐ญ๐ข๐ง๐ , keeping past knowledge intact.
- Re-warming and re-decaying the learning rate: To maintain performance in continued pre-training, the learning rate is reintroduced with a warmup phase, followed by decay โ€” matching the original pre-training schedule. This approach preserves the modelโ€™s stability and ensures efficient adaptation to new data.

This summary was inspired by Sebastian Raschka, PhD, whose newsletter offers valuable insights presented in a clear, organised way. He also recently published a book titled Build a Large Language Model (From Scratch).

Thereโ€™s plenty to explore and learn, but taking small steps each day to absorb a bit of information can lead to significant progress over weeks and months.

If you like what you see, hit the follow button! You can also find me on LinkedIn, and we can follow each other there too. ๐Ÿ˜Š

--

--

No responses yet