[Unified Checkpoint] Support peft model #7691

DesmonDay · 2023-12-20T13:53:46Z

PR types

Function optimization

PR changes

Others

Description

Support peft model save and load in unified checkpoint.

paddle-bot · 2023-12-20T13:53:50Z

Thanks for your contribution!

codecov · 2023-12-20T14:33:27Z

Codecov Report

Attention: Patch coverage is 12.10375% with 305 lines in your changes are missing coverage. Please review.

Project coverage is 56.42%. Comparing base (e8d6233) to head (1812b75).
Report is 4 commits behind head on develop.

Files	Patch %	Lines
paddlenlp/trainer/plugins/unified_checkpoint.py	4.56%	209 Missing ⚠️
paddlenlp/peft/lora/lora_model.py	14.49%	59 Missing ⚠️
paddlenlp/peft/prefix/prefix_model.py	30.30%	23 Missing ⚠️
paddlenlp/trainer/trainer.py	39.13%	14 Missing ⚠️

Additional details and impacted files

@@             Coverage Diff             @@
##           develop    #7691      +/-   ##
===========================================
- Coverage    56.56%   56.42%   -0.15%     
===========================================
  Files          589      589              
  Lines        89964    90252     +288     
===========================================
+ Hits         50889    50921      +32     
- Misses       39075    39331     +256

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

…nto add_uc_peft

DesmonDay · 2024-01-06T14:33:52Z

需要确认peft模型在 from_pretrained 时能否正确加载safetensors格式。

…nto add_uc_peft

ZHUI · 2024-02-21T02:52:51Z

paddlenlp/trainer/trainer.py

+            if self.args.unified_checkpoint and "skip_save_model_weight" in self.args.unified_checkpoint_config:
+                raise ValueError(
+                    "We do not support skip_save_model_weight in peft model when using unified checkpoint."
+                )


这里对peft的一些支持功能做一些限制吧。比如一些动态扩缩容之类的，我们暂时没有必要支持。

看此处，要不直接删除 skip_save_model_weight 字段，改为 warning

ZHUI · 2024-02-21T02:53:29Z

paddlenlp/trainer/trainer.py

@@ -2401,6 +2421,8 @@ def _load_optimizer_and_scheduler(self, checkpoint):
            opt_state_dict = tmp

        # broadcast optimizer state in dp group
+        if self.args.local_rank != -1:
+            dist.barrier()


ZHUI · 2024-02-21T02:54:17Z

paddlenlp/trainer/trainer.py

-        weights_index_name = PADDLE_WEIGHTS_INDEX_NAME if not safe_serialization else SAFE_WEIGHTS_INDEX_NAME
+        if isinstance(self.model, LoRAModel):
+            weights_index_name = (
+                PADDLE_LORA_WEIGHTS_INDEX_NAME if not safe_serialization else SAFE_LORA_WEIGHTS_INDEX_NAME


如之前讨论，建议 PEFT 存为一种名字

不建议这么做了，因为原先的写法中 lora、ptuning 就是区分开不同模型文件名称，要保证和原来的写法统一。

不建议这么做了，因为原先的写法中 lora、ptuning 就是区分开不同模型文件名称，要保证和原来的写法统一。

修改统一之后影响面有多大，哪些是无法兼容的？

那就这样吧，原先的 pdparams 的名称不动，unified checkpoint 这边就统一成一样的名字。peft_model.xxx.

ZHUI · 2024-02-21T02:55:31Z

paddlenlp/trainer/trainer.py

+            if distributed_isfile(weights_index_file) or distributed_isfile(master_weights_index_file):
+                is_unified_checkpoint_type = True


之前的 distributed_isfile 不支持单卡是不？直接修改这个函数支持单卡好了。

…nto add_uc_peft

lugimzzz

lgtm

ZHUI · 2024-02-26T06:48:33Z

paddlenlp/utils/env.py

+SAFE_PEFT_WEIGHTS_NAME = "peft_model.safetensors"
+SAFE_PEFT_WEIGHTS_INDEX_NAME = "peft_model.safetensors.index.json"


Suggested change

SAFE_PEFT_WEIGHTS_NAME = "peft_model.safetensors"

SAFE_PEFT_WEIGHTS_INDEX_NAME = "peft_model.safetensors.index.json"

SAFE_PEFT_WEIGHTS_NAME = "peft_model.safetensors"

SAFE_PEFT_WEIGHTS_INDEX_NAME = "peft_model.safetensors.index.json"

PADDLE_PEFT_WEIGHTS_NAME = "peft_model.pdparams"

PADDLE_PEFT_WEIGHTS_INDEX_NAME = "peft_model.pdparams.index.json"

ZHUI

LGTM

wawltor

LGTM

DesmonDay added 2 commits December 18, 2023 11:29

fix qwen merge tp concat

61ea67c

support peft unified checkpoint

b4191a1

DesmonDay marked this pull request as draft December 21, 2023 06:58

DesmonDay added 4 commits December 26, 2023 15:10

Merge branch 'develop' of https://github.com/PaddlePaddle/PaddleNLP i…

502143a

…nto add_uc_peft

add prefix numpy save

6300749

Merge branch 'develop' of https://github.com/PaddlePaddle/PaddleNLP i…

d6d3afa

…nto add_uc_peft

add single node peft

9949f6b

DesmonDay force-pushed the add_uc_peft branch from f74934e to 9949f6b Compare January 4, 2024 03:55

fix peft distributed load

2b7bb4b

DesmonDay marked this pull request as ready for review January 4, 2024 06:52

add single node optimizer load

c042e42

support peft tensor parallel for unified checkpoint

4d531a6

DesmonDay force-pushed the add_uc_peft branch from eaff4b1 to 4d531a6 Compare January 16, 2024 11:01

DesmonDay added 3 commits January 16, 2024 20:29

add prefix from_pretrained from safetensors

71be277

add lora and prefix safetensor name

700bae6

Merge branch 'develop' of https://github.com/PaddlePaddle/PaddleNLP i…

b6b3785

…nto add_uc_peft

lugimzzz force-pushed the add_uc_peft branch 2 times, most recently from 82e6c32 to 7b2038c Compare January 18, 2024 06:41

change peft safe weight name

f2821f2

lugimzzz force-pushed the add_uc_peft branch from 7b2038c to f2821f2 Compare January 18, 2024 06:43

DesmonDay added 2 commits January 18, 2024 16:16

fix typo

5c29010

fix params that is not enough to save per worker

2274556

lugimzzz force-pushed the add_uc_peft branch from e362e55 to 2274556 Compare January 18, 2024 09:34

DesmonDay added 2 commits January 19, 2024 17:32

fix lora pp tp split

f20b3f6

Merge branch 'develop' of https://github.com/PaddlePaddle/PaddleNLP i…

da60d3a

…nto add_uc_peft

DesmonDay force-pushed the add_uc_peft branch 2 times, most recently from 03a7a10 to 4961ba8 Compare February 19, 2024 12:08

DesmonDay force-pushed the add_uc_peft branch from 4961ba8 to 4e17537 Compare February 20, 2024 05:35

add peft uc tests

1ad6aa8

DesmonDay force-pushed the add_uc_peft branch from 4e17537 to 1ad6aa8 Compare February 20, 2024 06:06

fix master_weight is None

aba715c

DesmonDay force-pushed the add_uc_peft branch from b77f44e to 17616c8 Compare February 20, 2024 09:39

ZHUI self-requested a review February 21, 2024 02:45

ZHUI reviewed Feb 21, 2024

View reviewed changes

add ptuning uc tests

542a047

DesmonDay force-pushed the add_uc_peft branch from 17616c8 to 542a047 Compare February 21, 2024 03:57

DesmonDay added 2 commits February 21, 2024 17:47

fix tensor copy blocking

bc82e93

unify lora and ptuning

f51e5c2

DesmonDay force-pushed the add_uc_peft branch from 8f8762e to 8010d07 Compare February 21, 2024 12:10

fix load_best_model

66c9a58

DesmonDay force-pushed the add_uc_peft branch from 8010d07 to 66c9a58 Compare February 22, 2024 05:05

Merge branch 'develop' of https://github.com/PaddlePaddle/PaddleNLP i…

3edb518

…nto add_uc_peft

lugimzzz previously approved these changes Feb 26, 2024

View reviewed changes

ZHUI reviewed Feb 26, 2024

View reviewed changes

DesmonDay dismissed lugimzzz’s stale review via b48776e February 27, 2024 06:33

fix index json name

8240590

DesmonDay force-pushed the add_uc_peft branch from b48776e to 8240590 Compare February 27, 2024 06:37

ZHUI previously approved these changes Feb 27, 2024

View reviewed changes

remove ptuning unittest

1812b75

DesmonDay dismissed ZHUI’s stale review via 1812b75 February 28, 2024 09:14

wawltor approved these changes Feb 29, 2024

View reviewed changes

wawltor merged commit 6f45e95 into PaddlePaddle:develop Feb 29, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Unified Checkpoint] Support peft model #7691

[Unified Checkpoint] Support peft model #7691

Uh oh!

DesmonDay commented Dec 20, 2023

Uh oh!

paddle-bot bot commented Dec 20, 2023

Uh oh!

codecov bot commented Dec 20, 2023 •

edited

Loading

Uh oh!

DesmonDay commented Jan 6, 2024 •

edited

Loading

Uh oh!

ZHUI Feb 21, 2024

Uh oh!

ZHUI Feb 21, 2024

Uh oh!

ZHUI Feb 21, 2024

Uh oh!

DesmonDay Feb 21, 2024 •

edited

Loading

Uh oh!

ZHUI Feb 21, 2024

Uh oh!

DesmonDay Feb 21, 2024

Uh oh!

ZHUI Feb 21, 2024

Uh oh!

lugimzzz left a comment

Uh oh!

ZHUI Feb 26, 2024

Uh oh!

DesmonDay Feb 27, 2024

Uh oh!

ZHUI left a comment

Uh oh!

wawltor left a comment

Uh oh!

Uh oh!

		if distributed_isfile(weights_index_file) or distributed_isfile(master_weights_index_file):
		is_unified_checkpoint_type = True

		SAFE_PEFT_WEIGHTS_NAME = "peft_model.safetensors"
		SAFE_PEFT_WEIGHTS_INDEX_NAME = "peft_model.safetensors.index.json"

[Unified Checkpoint] Support peft model #7691

[Unified Checkpoint] Support peft model #7691

Uh oh!

Conversation

DesmonDay commented Dec 20, 2023

PR types

PR changes

Description

Uh oh!

paddle-bot bot commented Dec 20, 2023

Uh oh!

codecov bot commented Dec 20, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

DesmonDay commented Jan 6, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

DesmonDay Feb 21, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

lugimzzz left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ZHUI left a comment

Choose a reason for hiding this comment

Uh oh!

wawltor left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

codecov bot commented Dec 20, 2023 •

edited

Loading

DesmonDay commented Jan 6, 2024 •

edited

Loading

DesmonDay Feb 21, 2024 •

edited

Loading