Skip to content

[Unified Checkpoint] Support peft model #7691

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 26 commits into from
Feb 29, 2024

Conversation

DesmonDay
Copy link
Contributor

PR types

Function optimization

PR changes

Others

Description

Support peft model save and load in unified checkpoint.

Copy link

paddle-bot bot commented Dec 20, 2023

Thanks for your contribution!

Copy link

codecov bot commented Dec 20, 2023

Codecov Report

Attention: Patch coverage is 12.10375% with 305 lines in your changes are missing coverage. Please review.

Project coverage is 56.42%. Comparing base (e8d6233) to head (1812b75).
Report is 4 commits behind head on develop.

Files Patch % Lines
paddlenlp/trainer/plugins/unified_checkpoint.py 4.56% 209 Missing ⚠️
paddlenlp/peft/lora/lora_model.py 14.49% 59 Missing ⚠️
paddlenlp/peft/prefix/prefix_model.py 30.30% 23 Missing ⚠️
paddlenlp/trainer/trainer.py 39.13% 14 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff             @@
##           develop    #7691      +/-   ##
===========================================
- Coverage    56.56%   56.42%   -0.15%     
===========================================
  Files          589      589              
  Lines        89964    90252     +288     
===========================================
+ Hits         50889    50921      +32     
- Misses       39075    39331     +256     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@DesmonDay DesmonDay marked this pull request as draft December 21, 2023 06:58
@DesmonDay DesmonDay marked this pull request as ready for review January 4, 2024 06:52
@DesmonDay
Copy link
Contributor Author

DesmonDay commented Jan 6, 2024

需要确认peft模型在 from_pretrained 时能否正确加载safetensors格式。

@lugimzzz lugimzzz force-pushed the add_uc_peft branch 2 times, most recently from 82e6c32 to 7b2038c Compare January 18, 2024 06:41
@DesmonDay DesmonDay force-pushed the add_uc_peft branch 2 times, most recently from 03a7a10 to 4961ba8 Compare February 19, 2024 12:08
if self.args.unified_checkpoint and "skip_save_model_weight" in self.args.unified_checkpoint_config:
raise ValueError(
"We do not support skip_save_model_weight in peft model when using unified checkpoint."
)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这里对peft的一些支持功能做一些限制吧。比如一些动态扩缩容之类的,我们暂时没有必要支持。

看此处,要不直接删除 skip_save_model_weight 字段,改为 warning

@@ -2401,6 +2421,8 @@ def _load_optimizer_and_scheduler(self, checkpoint):
opt_state_dict = tmp

# broadcast optimizer state in dp group
if self.args.local_rank != -1:
dist.barrier()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why?

weights_index_name = PADDLE_WEIGHTS_INDEX_NAME if not safe_serialization else SAFE_WEIGHTS_INDEX_NAME
if isinstance(self.model, LoRAModel):
weights_index_name = (
PADDLE_LORA_WEIGHTS_INDEX_NAME if not safe_serialization else SAFE_LORA_WEIGHTS_INDEX_NAME
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

如之前讨论,建议 PEFT 存为一种名字

Copy link
Contributor Author

@DesmonDay DesmonDay Feb 21, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

不建议这么做了,因为原先的写法中 lora、ptuning 就是区分开不同模型文件名称,要保证和原来的写法统一。

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

不建议这么做了,因为原先的写法中 lora、ptuning 就是区分开不同模型文件名称,要保证和原来的写法统一。

修改统一之后影响面有多大,哪些是无法兼容的?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

那就这样吧,原先的 pdparams 的名称不动,unified checkpoint 这边就统一成一样的名字。peft_model.xxx.

Comment on lines 3080 to 3081
if distributed_isfile(weights_index_file) or distributed_isfile(master_weights_index_file):
is_unified_checkpoint_type = True
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

之前的 distributed_isfile 不支持单卡是不?直接修改这个函数支持单卡好了。

lugimzzz
lugimzzz previously approved these changes Feb 26, 2024
Copy link
Contributor

@lugimzzz lugimzzz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

Comment on lines +106 to +107
SAFE_PEFT_WEIGHTS_NAME = "peft_model.safetensors"
SAFE_PEFT_WEIGHTS_INDEX_NAME = "peft_model.safetensors.index.json"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
SAFE_PEFT_WEIGHTS_NAME = "peft_model.safetensors"
SAFE_PEFT_WEIGHTS_INDEX_NAME = "peft_model.safetensors.index.json"
SAFE_PEFT_WEIGHTS_NAME = "peft_model.safetensors"
SAFE_PEFT_WEIGHTS_INDEX_NAME = "peft_model.safetensors.index.json"
PADDLE_PEFT_WEIGHTS_NAME = "peft_model.pdparams"
PADDLE_PEFT_WEIGHTS_INDEX_NAME = "peft_model.pdparams.index.json"

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

ZHUI
ZHUI previously approved these changes Feb 27, 2024
Copy link
Collaborator

@ZHUI ZHUI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Collaborator

@wawltor wawltor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@wawltor wawltor merged commit 6f45e95 into PaddlePaddle:develop Feb 29, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants