ColossalAI

History

Hongxin Liu d202cc28c0 [npu] change device to accelerator api (#5239 ) * update accelerator * fix timer * fix amp * update * fix * update bug * add error raise * fix autocast * fix set device * remove doc accelerator * update doc * update doc * update doc * use nullcontext * update cpu * update null context * change time limit for example * udpate * update * update * update * [npu] polish accelerator code --------- Co-authored-by: Xuanlei Zhao <xuanlei.zhao@gmail.com> Co-authored-by: zxl <43881818+oahzxl@users.noreply.github.com>		2024-01-09 10:20:05 +08:00
..
data	[example] add palm pytorch version (#2172 )	2022-12-22 10:15:34 +08:00
palm_pytorch	[misc] update pre-commit and run all files (#4752 )	2023-09-19 14:20:26 +08:00
README.md	[nfc] fix minor typo in README (#4846 )	2023-10-07 17:51:11 +08:00
requirements.txt	[example] add example requirement (#2345 )	2023-01-06 09:03:29 +08:00
run.sh	[bug] fix get_default_parser in examples (#4764 )	2023-09-21 10:42:25 +08:00
test_ci.sh	[bug] fix get_default_parser in examples (#4764 )	2023-09-21 10:42:25 +08:00
train.py	[npu] change device to accelerator api (#5239 )	2024-01-09 10:20:05 +08:00

README.md

PaLM - Pytorch

Implementation of the specific Transformer architecture from PaLM - Scaling Language Modeling with Pathways, in less than 200 lines of code.

This model is pretty much SOTA on everything language.

It obviously will not scale, but it is just for educational purposes. To elucidate the public how simple it all really is.

Install

$ pip install PaLM-pytorch

Usage

import torch
from palm_pytorch import PaLM

palm = PaLM(
    num_tokens = 20000,
    dim = 512,
    depth = 12,
    heads = 8,
    dim_head = 64,
)

tokens = torch.randint(0, 20000, (1, 2048))
logits = palm(tokens) # (1, 2048, 20000)

The PaLM 540B in the paper would be

palm = PaLM(
    num_tokens = 256000,
    dim = 18432,
    depth = 118,
    heads = 48,
    dim_head = 256
)

New API

We have modified our previous implementation of PaLM with our new Booster API, which offers a more flexible and efficient way to train your model. The new API is more user-friendly and easy to use. You can find the new API in train.py. We also offer a shell script test_ci.sh for you to go through all our plugins for the booster. For more information about the booster API you can refer to https://colossalai.org/docs/basics/booster_api/.

Test on Enwik8

$ python train.py

Todo

offer a Triton optimized version of PaLM, bringing in https://github.com/lucidrains/triton-transformer

Citations

@article{chowdhery2022PaLM,
  title   = {PaLM: Scaling Language Modeling with Pathways},
  author  = {Chowdhery, Aakanksha et al},
  year    = {2022}
}