ColossalAI

History

ver217 26b7aac0be [zero] reorganize zero/gemini folder structure (#3424 ) * [zero] refactor low-level zero folder structure * [zero] fix legacy zero import path * [zero] fix legacy zero import path * [zero] remove useless import * [zero] refactor gemini folder structure * [zero] refactor gemini folder structure * [zero] refactor legacy zero import path * [zero] refactor gemini folder structure * [zero] refactor gemini folder structure * [zero] refactor gemini folder structure * [zero] refactor legacy zero import path * [zero] fix test import path * [zero] fix test * [zero] fix circular import * [zero] update import		2023-04-04 13:48:16 +08:00
..
data	[example] add palm pytorch version (#2172 )	2022-12-22 10:15:34 +08:00
palm_pytorch	[example] make palm + GeminiDPP work (#2227 )	2022-12-29 14:28:31 +08:00
README.md	[example] add palm pytorch version (#2172 )	2022-12-22 10:15:34 +08:00
requirements.txt	[example] add example requirement (#2345 )	2023-01-06 09:03:29 +08:00
run.sh	[CI] add test_ci.sh for palm, opt and gpt (#2475 )	2023-01-16 14:44:29 +08:00
test_ci.sh	[CI] add test_ci.sh for palm, opt and gpt (#2475 )	2023-01-16 14:44:29 +08:00
train.py	[zero] reorganize zero/gemini folder structure (#3424 )	2023-04-04 13:48:16 +08:00

README.md

PaLM - Pytorch

Implementation of the specific Transformer architecture from PaLM - Scaling Language Modeling with Pathways, in less than 200 lines of code.

This model is pretty much SOTA on everything language.

It obviously will not scale, but it is just for educational purposes. To elucidate the public how simple it all really is.

Install

$ pip install PaLM-pytorch

Usage

import torch
from palm_pytorch import PaLM

palm = PaLM(
    num_tokens = 20000,
    dim = 512,
    depth = 12,
    heads = 8,
    dim_head = 64,
)

tokens = torch.randint(0, 20000, (1, 2048))
logits = palm(tokens) # (1, 2048, 20000)

The PaLM 540B in the paper would be

palm = PaLM(
    num_tokens = 256000,
    dim = 18432,
    depth = 118,
    heads = 48,
    dim_head = 256
)

Test on Enwik8

$ python train.py

Todo

offer a Triton optimized version of PaLM, bringing in https://github.com/lucidrains/triton-transformer

Citations

@article{chowdhery2022PaLM,
  title   = {PaLM: Scaling Language Modeling with Pathways},
  author  = {Chowdhery, Aakanksha et al},
  year    = {2022}
}