计算机基础·cs336·实验与推理
模型架构实验包括现代LM的一些设计细节是否norm是否pre-norm是否使用RMSNorm是否使用门控FFN脚本#!/bin/bashset-e SCRIPT_DIR$(cd $(dirname$0) pwd)echoechoStarting all ablation and baseline experimentsechoScripts directory: ${SCRIPT_DIR}echoEXPERIMENT_SCRIPTS(baseline.shnorm.shlayer_norm.shpost_norm.shgated_ffn.shrope.sh)forSCRIPTin${EXPERIMENT_SCRIPTS[]};do SCRIPT_PATH${SCRIPT_DIR}/${SCRIPT}if[!-f${SCRIPT_PATH}];then echoERROR: Script not found: ${SCRIPT_PATH}exit1fi echoecho------------------------------------------echoRunning experiment: ${SCRIPT}echo------------------------------------------bash${SCRIPT_PATH}echoFinished experiment: ${SCRIPT}done echoechoechoAll experiments completed successfully echo实验结果代码有一些bug仅供娱乐学习率实验脚本#!/bin/bashset-e# tokenizer VOCAB_PATHdata/TinyStoriesV2-GPT4-train/vocab.jsonMERGES_PATHdata/TinyStoriesV2-GPT4-train/merges.txtSPECIAL_TOKENS|endoftext|EOS_TOKEN|endoftext|# data TRAIN_DATA_PATHdata/TinyStoriesV2-GPT4-train.binVAL_DATA_PATHdata/TinyStoriesV2-GPT4-valid.bin# model VOCAB_SIZE10000MAX_SEQ_LEN256D_MODEL512D_FF1344BIGO10000NUM_LAYERS4NUM_HEADS16EPS1e-5# training BATCH_SIZE64MIN_LR6e-5WARMUP_ITERS1000COSINE_SCHEDULE_ITERS9500MAX_ITERS10000MAX_NORM1.0# eval / checkpoint EVAL_INTERVAL100SAVE_INTERVAL1000OUT_DIRcheckpointsCHECKPOINT_PATH# system DEVICEcuda# logging RUN_NAME_PREFIXlrMAX_LRS(1e-43e-46e-41e-3)# run forLRin${MAX_LRS[]};do echoRunning training with max_lr${LR}uv run cs336_basics/train.py \--vocab_path${VOCAB_PATH}\--merges_path${MERGES_PATH}\--special_tokens${SPECIAL_TOKENS}\--eos_token${EOS_TOKEN}\ \--train_data_path${TRAIN_DATA_PATH}\--val_data_path${VAL_DATA_PATH}\ \--vocab_size${VOCAB_SIZE}\--max_seq_len${MAX_SEQ_LEN}\--d_model${D_MODEL}\--d_ff${D_FF}\--bigo${BIGO}\--num_layers${NUM_LAYERS}\--num_heads${NUM_HEADS}\--eps${EPS}\ \--batch_size${BATCH_SIZE}\--max_lr${LR}\--min_lr${MIN_LR}\--warmup_iters${WARMUP_ITERS}\--cosine_schedule_iters${COSINE_SCHEDULE_ITERS}\--max_iters${MAX_ITERS}\--max_norm${MAX_NORM}\ \--eval_interval${EVAL_INTERVAL}\--save_interval${SAVE_INTERVAL}\--out_dir${OUT_DIR}\--checkpoint_path${CHECKPOINT_PATH}\ \--device${DEVICE}\--run_name${RUN_NAME_PREFIX}_${LR}echoFinished run with max_lr${LR}echo-----------------------------------done结果6e-3是当前实验条件下最优的最大学习率。推理不同方式的推理结果提示词Once upon a time贪心解码Once upon a time, there was a little girl named Lily. She loved to play with her toys and eat yummy food. One day, she found a big, red apple in her kitchen. She was very happy and wanted to eat it all. Lily’s mom saw her and said, “Lily, you can have the apple if you promise to share it with your brother, Tim.” Lily thought about it and said, “I promise, Mom.” She took the apple and went to Tim’s house. When Lily got to Tim’s house, she said, “Tim, I have a surprise for you!” Tim was excited and said, “What is it, Lily?” Lily showed him the apple and said, “I promise to share it with you.” Tim was very happy and they both ate the apple together. |endoftext|从前有一个小女孩叫莉莉。她喜欢玩玩具也喜欢吃美味的食物。一天她在厨房里发现了一个又大又红的苹果。她非常开心想把它全部吃掉。莉莉的妈妈看见了她说“莉莉如果你答应和你的弟弟蒂姆分享这个苹果你就可以吃它。”莉莉想了想说“我答应妈妈。”她拿着苹果去了蒂姆的家。到了那里她说“蒂姆我给你带来了一个惊喜”蒂姆兴奋地问“是什么莉莉”莉莉把苹果给他看说“我答应和你一起分享。”蒂姆非常高兴他们一起把苹果吃掉了。|endoftext|TOPK(K50,T1)解码Once upon a time, there was a boy named Tim and his dog. They loved to play in the yard. One day, while they were playing, Tim got a letter down. The letter said, “Be careful and do not go near the big drain.” Tim and his dog did not listen. They wanted to see what was in the big drain. They jumped over it and ran away. Suddenly, they had to hide behind a tree. Just as Tim was about to catch the big drain, a little bird came out of the tree. The bird said, “Hello, I saw you and wanted to play too!” Tim was surprised, but he was happy to have a new friend to play with. They all played together and had lots of fun. The moral of the story is to always listen to your friends and be kind to everyone. |endoftext|从前有一个男孩叫蒂姆他有一只小狗。他们喜欢在院子里玩耍。一天当他们在玩的时候蒂姆收到了一封信。信上写着“要小心不要靠近那个大排水沟。”蒂姆和他的小狗没有听从劝告。他们想看看大排水沟里有什么。他们跳了过去又跑开了。突然他们不得不躲在一棵树后面。就在蒂姆快要靠近大排水沟的时候一只小鸟从树上飞了出来。小鸟说“你好我看到你们也想一起玩”蒂姆很惊讶但他很高兴能有一个新朋友一起玩。他们一起玩得很开心。这个故事的寓意是要听从朋友的劝告并善待每一个人。|endoftext|TOPP(P0.8,T1)解码Once upon a time, there was a small house with a chimney. The chimney was very old and old. It was very old and sad because it was always dirty. The house was sad because it could not be clean and pretty like the chimney. One day, a little girl named Mia came to the house. She saw the sad house and wanted to help. Mia had an idea to make the house look new and pretty. She picked up the house and put it in a vase. She was very happy with her work. The next day, Mia saw that the house was very clean and pretty. She decided to wash it. She scrubbed and scrubbed until the house was clean and shiny. The house looked very pretty, and Mia was happy too. Now, the house was clean and shiny. The house was happy too, because it could finally be clean and shiny. |endoftext|从前有一座带烟囱的小房子。烟囱非常非常旧也很破旧。它总是脏兮兮的因此显得很伤心。小房子也很难过因为它不能像烟囱那样干净漂亮。一天一个叫米娅的小女孩来到这座房子前。她看到这座伤心的房子想要帮助它。米娅想出了一个让房子变得焕然一新的主意。她把房子拿起来放进一个花瓶里。她对自己的成果感到很开心。第二天米娅看到房子变得非常干净漂亮。她决定再给它清洗一遍。她不停地刷啊刷直到房子变得干净又闪亮。房子看起来非常漂亮米娅也很开心。现在房子终于干净又闪亮了。房子也很开心因为它终于可以变得干净又闪亮。|endoftext|推理代码importtorchimporttorch.nnasnnfromcs336_basics.transformer_blockimportTransformerBlock,TransformerLMfromcs336_basics.optimizerimportAdamWfromcs336_basics.checkpointimportload_checkpointfromcs336_basics.tokenizerimportBPETokenizerfromcs336_basics.preprocessimportbytes_to_unicode,load_trained_tokenizer# load tokenizervocab_pathdata/TinyStoriesV2-GPT4-train/vocab.jsonmerges_pathdata/TinyStoriesV2-GPT4-train/merges.txtspecial_tokens[|endoftext|]tokenizerload_trained_tokenizer(vocab_pathvocab_path,merges_pathmerges_path,special_tokensspecial_tokens)eos_token_idtokenizer.encode(special_tokens[0])[0]# load_modeldevicecudaiftorch.cuda.is_available()elsecpucheckpoint_pathcs336_basics/checkpoints/checkpoint_9999.ptmodelTransformerLM(vocab_size10000,n_layers4,d_model512,d_ff1344,num_heads16,bigo10000,max_seq_len256,is_normTrue,norm_typeRMSNorm,pre_normTrue,is_gateTrue,eps1e-5,devicedevice,dtypetorch.float32,tokenizertokenizer)optimizerAdamW(model.parameters(),lr1e-4,weight_decay0.01)iterationload_checkpoint(model,optimizer,checkpoint_path,device)print(load model successfully from checkpoint: {}, iteration: {}, device: {}.format(checkpoint_path,iteration,device))promptOnce upon a timeinput_idstokenizer.encode(prompt)# print(input_ids: , input_ids)defgreedy_text():generate_textmodel.generate(inputs_idinput_ids,max_new_tokens256,eos_token_ideos_token_id,temperature1.0,greedyTrue,)returngenerate_textdeftopk_text():generate_textmodel.generate(inputs_idinput_ids,max_new_tokens256,eos_token_ideos_token_id,temperature1.0,greedyFalse,top_k50)returngenerate_textdeftopp_text():generate_textmodel.generate(inputs_idinput_ids,max_new_tokens256,eos_token_ideos_token_id,temperature1.0,greedyFalse,top_p0.8)returngenerate_text greed_generated_textgreedy_text()topk_generated_texttopk_text()topp_generated_texttopp_text()print(fGreedy Generated Text:\n ,greed_generated_text)print(fTop-k Generated Text:\n ,topk_generated_text)print(fTop-p Generated Text:\n ,topp_generated_text)

相关新闻

银河麒麟V10安装 openssl-1.1.1f-4.p12.ky10.x86_64.rpm 教程(含依赖解决)

银河麒麟V10安装 openssl-1.1.1f-4.p12.ky10.x86_64.rpm 教程(含依赖解决)

1. 先把东西备好 看看系统对不对头 开个终端,先敲俩命令,确认是 Kylin V10 并且是 64 位的。 cat /etc/os-release uname -m 看到输出里有 Kylin Linux和 x86_64就OK。 找到你的 RPM 包 安装包下载:https://pan.quark.cn/s/6e9d4d969bac &a…

2026/7/4 8:42:12 阅读更多 →
48小时恢复“自发货权限”,完整申诉思路!

48小时恢复“自发货权限”,完整申诉思路!

亚马逊有效追踪率低申诉案例账户站点:US自发货权限停用原因:有效追踪率低停用时间:2025年6月24日接单时间:2025年6月30日恢复时间:2025年7月2日账户现状:自发货权限恢复一、自发货权限停用原因2025年6月24日…

2026/7/4 7:18:31 阅读更多 →
Windows 文档文件夹被 OneDrive 接管:原因分析与彻底修复方案

Windows 文档文件夹被 OneDrive 接管:原因分析与彻底修复方案

在 Windows 10 与 Windows 11 中,不少用户会发现“文档”路径变成: C:\Users\用户名\OneDrive\Documents这并非异常,也不是系统错误,而是微软在近几年持续推动的已知文件夹重定向机制(Known Folder Move)。…

2026/5/17 4:36:18 阅读更多 →

最新新闻

GDSDecomp技术实现:PCK文件极速修改与Godot逆向工程架构设计

GDSDecomp技术实现:PCK文件极速修改与Godot逆向工程架构设计

GDSDecomp技术实现:PCK文件极速修改与Godot逆向工程架构设计 【免费下载链接】gdsdecomp Godot reverse engineering tools 项目地址: https://gitcode.com/GitHub_Trending/gd/gdsdecomp GDSDecomp是一款专为Godot引擎设计的逆向工程工具,提供PC…

2026/7/4 20:11:39 阅读更多 →
掌握专业级Windows Defender控制:高效系统安全防护管理实战指南

掌握专业级Windows Defender控制:高效系统安全防护管理实战指南

掌握专业级Windows Defender控制:高效系统安全防护管理实战指南 【免费下载链接】defender-control An open-source windows defender manager. Now you can disable windows defender permanently. 项目地址: https://gitcode.com/gh_mirrors/de/defender-contr…

2026/7/4 20:07:38 阅读更多 →
角谷猜想的弗洛伊德算法的同构映射:数论映射图论 Version6.6

角谷猜想的弗洛伊德算法的同构映射:数论映射图论 Version6.6

角谷猜想的弗洛伊德算法的同构映射:数论映射图论 Version6.6上古天真论 2026-06-30AI得到的矩阵,我测试不合我意,不知对错,暂当成错的。 于是,我象配方法一样,配方阵法,配矩阵法,一…

2026/7/4 20:05:38 阅读更多 →
ComfyUI-WanVideoWrapper深度评测:5090显卡如何10分钟生成超千帧视频

ComfyUI-WanVideoWrapper深度评测:5090显卡如何10分钟生成超千帧视频

ComfyUI-WanVideoWrapper深度评测:5090显卡如何10分钟生成超千帧视频 【免费下载链接】ComfyUI-WanVideoWrapper 项目地址: https://gitcode.com/GitHub_Trending/co/ComfyUI-WanVideoWrapper 在AI视频生成领域,开源项目性能优化一直是开发者们关…

2026/7/4 20:03:38 阅读更多 →
深度学习图像识别实战:从零构建CNN模型

深度学习图像识别实战:从零构建CNN模型

1. 图像识别实战:从零构建深度学习模型(开头部分自然融入核心关键词"深度学习"和"图像识别",用从业者视角引入) 上周刚结束李哥深度学习班的图像识别专题课,作为班里唯一一个从机械专业转行过来的…

2026/7/4 20:01:37 阅读更多 →
数据产业服务分类(24)——数据要素——数据要素转化

数据产业服务分类(24)——数据要素——数据要素转化

数据作为新型生产要素,正凭借技术赋能、场景深度渗透与价值体系重构,实现对自然资源、劳动力、资本、技术、数据等生产要素的系统性改造。数据转化人的能力数据可以转化成人的能力。提高人的判断能力、识别能力等等,数据通过分析和处理&#…

2026/7/4 19:59:37 阅读更多 →

日新闻

Memcached 1.6.43 发布:关键安全修复版本,多项问题得到解决

Memcached 1.6.43 发布:关键安全修复版本,多项问题得到解决

Memcached 1.6.43 正式发布,这是一个关键的安全修复版本,修复了多个方面的问题,还对部分功能进行了优化。 安全修复亮点 此次发布在安全修复上表现突出。binprot 避免了项目引用计数溢出,mcmc 因安全问题提升了上游版本号&#xf…

2026/7/4 0:04:29 阅读更多 →
终极指南:使用HMCL启动器跨平台畅玩Minecraft的完整解决方案

终极指南:使用HMCL启动器跨平台畅玩Minecraft的完整解决方案

终极指南:使用HMCL启动器跨平台畅玩Minecraft的完整解决方案 【免费下载链接】HMCL A Minecraft Launcher which is multi-functional, cross-platform and popular 项目地址: https://gitcode.com/gh_mirrors/hm/HMCL HMCL(Hello Minecraft! Lau…

2026/7/4 0:06:29 阅读更多 →
KMX63与PIC18F66K40在嵌入式HMI中的硬件协同与低功耗设计

KMX63与PIC18F66K40在嵌入式HMI中的硬件协同与低功耗设计

1. KMX63与PIC18F66K40的硬件协同架构解析KMX63作为一款三轴加速度计和磁力计组合传感器,与PIC18F66K40微控制器的搭配堪称嵌入式HMI开发的黄金组合。这套硬件组合的核心优势在于KMX63提供的高精度运动感知能力与PIC18F66K40强大的信号处理能力形成了完美互补。KMX6…

2026/7/4 0:06:29 阅读更多 →

周新闻

月新闻