模型架构实验包括现代LM的一些设计细节是否norm是否pre-norm是否使用RMSNorm是否使用门控FFN脚本#!/bin/bashset-e SCRIPT_DIR$(cd $(dirname$0) pwd)echoechoStarting all ablation and baseline experimentsechoScripts directory: ${SCRIPT_DIR}echoEXPERIMENT_SCRIPTS(baseline.shnorm.shlayer_norm.shpost_norm.shgated_ffn.shrope.sh)forSCRIPTin${EXPERIMENT_SCRIPTS[]};do SCRIPT_PATH${SCRIPT_DIR}/${SCRIPT}if[!-f${SCRIPT_PATH}];then echoERROR: Script not found: ${SCRIPT_PATH}exit1fi echoecho------------------------------------------echoRunning experiment: ${SCRIPT}echo------------------------------------------bash${SCRIPT_PATH}echoFinished experiment: ${SCRIPT}done echoechoechoAll experiments completed successfully echo实验结果代码有一些bug仅供娱乐学习率实验脚本#!/bin/bashset-e# tokenizer VOCAB_PATHdata/TinyStoriesV2-GPT4-train/vocab.jsonMERGES_PATHdata/TinyStoriesV2-GPT4-train/merges.txtSPECIAL_TOKENS|endoftext|EOS_TOKEN|endoftext|# data TRAIN_DATA_PATHdata/TinyStoriesV2-GPT4-train.binVAL_DATA_PATHdata/TinyStoriesV2-GPT4-valid.bin# model VOCAB_SIZE10000MAX_SEQ_LEN256D_MODEL512D_FF1344BIGO10000NUM_LAYERS4NUM_HEADS16EPS1e-5# training BATCH_SIZE64MIN_LR6e-5WARMUP_ITERS1000COSINE_SCHEDULE_ITERS9500MAX_ITERS10000MAX_NORM1.0# eval / checkpoint EVAL_INTERVAL100SAVE_INTERVAL1000OUT_DIRcheckpointsCHECKPOINT_PATH# system DEVICEcuda# logging RUN_NAME_PREFIXlrMAX_LRS(1e-43e-46e-41e-3)# run forLRin${MAX_LRS[]};do echoRunning training with max_lr${LR}uv run cs336_basics/train.py \--vocab_path${VOCAB_PATH}\--merges_path${MERGES_PATH}\--special_tokens${SPECIAL_TOKENS}\--eos_token${EOS_TOKEN}\ \--train_data_path${TRAIN_DATA_PATH}\--val_data_path${VAL_DATA_PATH}\ \--vocab_size${VOCAB_SIZE}\--max_seq_len${MAX_SEQ_LEN}\--d_model${D_MODEL}\--d_ff${D_FF}\--bigo${BIGO}\--num_layers${NUM_LAYERS}\--num_heads${NUM_HEADS}\--eps${EPS}\ \--batch_size${BATCH_SIZE}\--max_lr${LR}\--min_lr${MIN_LR}\--warmup_iters${WARMUP_ITERS}\--cosine_schedule_iters${COSINE_SCHEDULE_ITERS}\--max_iters${MAX_ITERS}\--max_norm${MAX_NORM}\ \--eval_interval${EVAL_INTERVAL}\--save_interval${SAVE_INTERVAL}\--out_dir${OUT_DIR}\--checkpoint_path${CHECKPOINT_PATH}\ \--device${DEVICE}\--run_name${RUN_NAME_PREFIX}_${LR}echoFinished run with max_lr${LR}echo-----------------------------------done结果6e-3是当前实验条件下最优的最大学习率。推理不同方式的推理结果提示词Once upon a time贪心解码Once upon a time, there was a little girl named Lily. She loved to play with her toys and eat yummy food. One day, she found a big, red apple in her kitchen. She was very happy and wanted to eat it all. Lily’s mom saw her and said, “Lily, you can have the apple if you promise to share it with your brother, Tim.” Lily thought about it and said, “I promise, Mom.” She took the apple and went to Tim’s house. When Lily got to Tim’s house, she said, “Tim, I have a surprise for you!” Tim was excited and said, “What is it, Lily?” Lily showed him the apple and said, “I promise to share it with you.” Tim was very happy and they both ate the apple together. |endoftext|从前有一个小女孩叫莉莉。她喜欢玩玩具也喜欢吃美味的食物。一天她在厨房里发现了一个又大又红的苹果。她非常开心想把它全部吃掉。莉莉的妈妈看见了她说“莉莉如果你答应和你的弟弟蒂姆分享这个苹果你就可以吃它。”莉莉想了想说“我答应妈妈。”她拿着苹果去了蒂姆的家。到了那里她说“蒂姆我给你带来了一个惊喜”蒂姆兴奋地问“是什么莉莉”莉莉把苹果给他看说“我答应和你一起分享。”蒂姆非常高兴他们一起把苹果吃掉了。|endoftext|TOPK(K50,T1)解码Once upon a time, there was a boy named Tim and his dog. They loved to play in the yard. One day, while they were playing, Tim got a letter down. The letter said, “Be careful and do not go near the big drain.” Tim and his dog did not listen. They wanted to see what was in the big drain. They jumped over it and ran away. Suddenly, they had to hide behind a tree. Just as Tim was about to catch the big drain, a little bird came out of the tree. The bird said, “Hello, I saw you and wanted to play too!” Tim was surprised, but he was happy to have a new friend to play with. They all played together and had lots of fun. The moral of the story is to always listen to your friends and be kind to everyone. |endoftext|从前有一个男孩叫蒂姆他有一只小狗。他们喜欢在院子里玩耍。一天当他们在玩的时候蒂姆收到了一封信。信上写着“要小心不要靠近那个大排水沟。”蒂姆和他的小狗没有听从劝告。他们想看看大排水沟里有什么。他们跳了过去又跑开了。突然他们不得不躲在一棵树后面。就在蒂姆快要靠近大排水沟的时候一只小鸟从树上飞了出来。小鸟说“你好我看到你们也想一起玩”蒂姆很惊讶但他很高兴能有一个新朋友一起玩。他们一起玩得很开心。这个故事的寓意是要听从朋友的劝告并善待每一个人。|endoftext|TOPP(P0.8,T1)解码Once upon a time, there was a small house with a chimney. The chimney was very old and old. It was very old and sad because it was always dirty. The house was sad because it could not be clean and pretty like the chimney. One day, a little girl named Mia came to the house. She saw the sad house and wanted to help. Mia had an idea to make the house look new and pretty. She picked up the house and put it in a vase. She was very happy with her work. The next day, Mia saw that the house was very clean and pretty. She decided to wash it. She scrubbed and scrubbed until the house was clean and shiny. The house looked very pretty, and Mia was happy too. Now, the house was clean and shiny. The house was happy too, because it could finally be clean and shiny. |endoftext|从前有一座带烟囱的小房子。烟囱非常非常旧也很破旧。它总是脏兮兮的因此显得很伤心。小房子也很难过因为它不能像烟囱那样干净漂亮。一天一个叫米娅的小女孩来到这座房子前。她看到这座伤心的房子想要帮助它。米娅想出了一个让房子变得焕然一新的主意。她把房子拿起来放进一个花瓶里。她对自己的成果感到很开心。第二天米娅看到房子变得非常干净漂亮。她决定再给它清洗一遍。她不停地刷啊刷直到房子变得干净又闪亮。房子看起来非常漂亮米娅也很开心。现在房子终于干净又闪亮了。房子也很开心因为它终于可以变得干净又闪亮。|endoftext|推理代码importtorchimporttorch.nnasnnfromcs336_basics.transformer_blockimportTransformerBlock,TransformerLMfromcs336_basics.optimizerimportAdamWfromcs336_basics.checkpointimportload_checkpointfromcs336_basics.tokenizerimportBPETokenizerfromcs336_basics.preprocessimportbytes_to_unicode,load_trained_tokenizer# load tokenizervocab_pathdata/TinyStoriesV2-GPT4-train/vocab.jsonmerges_pathdata/TinyStoriesV2-GPT4-train/merges.txtspecial_tokens[|endoftext|]tokenizerload_trained_tokenizer(vocab_pathvocab_path,merges_pathmerges_path,special_tokensspecial_tokens)eos_token_idtokenizer.encode(special_tokens[0])[0]# load_modeldevicecudaiftorch.cuda.is_available()elsecpucheckpoint_pathcs336_basics/checkpoints/checkpoint_9999.ptmodelTransformerLM(vocab_size10000,n_layers4,d_model512,d_ff1344,num_heads16,bigo10000,max_seq_len256,is_normTrue,norm_typeRMSNorm,pre_normTrue,is_gateTrue,eps1e-5,devicedevice,dtypetorch.float32,tokenizertokenizer)optimizerAdamW(model.parameters(),lr1e-4,weight_decay0.01)iterationload_checkpoint(model,optimizer,checkpoint_path,device)print(load model successfully from checkpoint: {}, iteration: {}, device: {}.format(checkpoint_path,iteration,device))promptOnce upon a timeinput_idstokenizer.encode(prompt)# print(input_ids: , input_ids)defgreedy_text():generate_textmodel.generate(inputs_idinput_ids,max_new_tokens256,eos_token_ideos_token_id,temperature1.0,greedyTrue,)returngenerate_textdeftopk_text():generate_textmodel.generate(inputs_idinput_ids,max_new_tokens256,eos_token_ideos_token_id,temperature1.0,greedyFalse,top_k50)returngenerate_textdeftopp_text():generate_textmodel.generate(inputs_idinput_ids,max_new_tokens256,eos_token_ideos_token_id,temperature1.0,greedyFalse,top_p0.8)returngenerate_text greed_generated_textgreedy_text()topk_generated_texttopk_text()topp_generated_texttopp_text()print(fGreedy Generated Text:\n ,greed_generated_text)print(fTop-k Generated Text:\n ,topk_generated_text)print(fTop-p Generated Text:\n ,topp_generated_text)