You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
HuangHai a72954a90e
'commit'
1 month ago
..
custom_op 'commit' 1 month ago
slim 'commit' 1 month ago
test_tipc 'commit' 1 month ago
__init__.py 'commit' 1 month ago
config.py 'commit' 1 month ago
data.py 'commit' 1 month ago
data_loader.py 'commit' 1 month ago
load_cifar.py 'commit' 1 month ago
loss.py 'commit' 1 month ago
metric.py 'commit' 1 month ago
mv3.py 'commit' 1 month ago
mv3_distill.yml 'commit' 1 month ago
mv3_large_x0_5.yml 'commit' 1 month ago
optimizer.py 'commit' 1 month ago
readme.md 'commit' 1 month ago
requirements.txt 'commit' 1 month ago
train.py 'commit' 1 month ago
train.sh 'commit' 1 month ago
utils.py 'commit' 1 month ago

readme.md

This file contains ambiguous Unicode characters!

This file contains ambiguous Unicode characters that may be confused with others in your current locale. If your use case is intentional and legitimate, you can safely ignore this warning. Use the Escape button to highlight these characters.

TIPC Linux端补充训练功能测试

Linux端基础训练预测功能测试的主程序为test_train_python.sh可以测试基于Python的模型训练、评估等基本功能包括裁剪、量化、蒸馏训练。

测试链条如上图所示主要测试内容有带共享权重自定义OP的模型的正常训练和slim相关功能训练流程是否正常。

2. 测试流程

本节介绍补充链条的测试流程

2.1 安装依赖

  • 安装PaddlePaddle >= 2.2
  • 安装其他依赖
pip3 install -r requirements.txt

2.2 功能测试

test_train_python.sh包含2种运行模式每种模式的运行数据不同分别用于测试训练是否正常分别是

  • 模式1lite_train_lite_infer使用少量数据训练用于快速验证训练到预测的走通流程不验证精度和速度
bash test_tipc/test_train_python.sh ./test_tipc/train_infer_python.txt 'lite_train_lite_infer'
  • 模式2whole_train_whole_infer使用全量数据训练用于快速验证训练到预测的走通流程验证模型最终训练精度
bash test_tipc/test_train_python.sh ./test_tipc/train_infer_python.txt 'whole_train_whole_infer'

如果是运行量化裁剪等训练方式,需要使用不同的配置文件。量化训练的测试指令如下:

bash test_tipc/test_train_python.sh ./test_tipc/train_infer_python_PACT.txt 'lite_train_lite_infer'

同理FPGM裁剪的运行方式如下

bash test_tipc/test_train_python.sh ./test_tipc/train_infer_python_FPGM.txt 'lite_train_lite_infer'

多机多卡的运行配置文件分别为 train_infer_python_fleet.txt, train_infer_python_FPGM_fleet.txttrain_infer_python_PACT_fleet.txt。 运行时,需要修改配置文件中的 gpu_list:xx.xx.xx.xx,yy.yy.yy.yy;0,1。 将 xx.xx.xx.xx 替换为具体的 ip 地址,各个ip地址之间用,分隔。 另外,和单机训练 不同,启动多机多卡训练需要在多机的每个节点上分别运行命令。以多机多卡量化训练为例,指令如下:

bash test_tipc/test_train_python.sh ./test_tipc/train_infer_python_PACT_fleet.txt 'lite_train_lite_infer'

运行相应指令后,在test_tipc/output文件夹下自动会保存运行日志。如'lite_train_lite_infer'模式运行后在test_tipc/extra_output文件夹有以下文件

test_tipc/output/
|- results_python.log    # 运行指令状态的日志

其中results_python.log中包含了每条指令的运行状态如果运行成功会输出

Run successfully with command - python3.7 train.py -c mv3_large_x0_5.yml -o  use_gpu=True     epoch=20       AMP.use_amp=True TRAIN.batch_size=1280  use_custom_relu=False model_type=cls MODEL.siamese=False  !
Run successfully with command - python3.7 train.py -c mv3_large_x0_5.yml -o  use_gpu=True     epoch=2       AMP.use_amp=True TRAIN.batch_size=1280  use_custom_relu=False model_type=cls MODEL.siamese=False  !
Run successfully with command - python3.7 train.py -c mv3_large_x0_5.yml -o  use_gpu=True     epoch=2       AMP.use_amp=True TRAIN.batch_size=1280  use_custom_relu=False model_type=cls MODEL.siamese=True  !
Run successfully with command - python3.7 train.py -c mv3_large_x0_5.yml -o  use_gpu=True     epoch=2       AMP.use_amp=True TRAIN.batch_size=1280  use_custom_relu=False model_type=cls_distill MODEL.siamese=False  !
Run successfully with command - python3.7 train.py -c mv3_large_x0_5.yml -o  use_gpu=True     epoch=2       AMP.use_amp=True TRAIN.batch_size=1280  use_custom_relu=False model_type=cls_distill MODEL.siamese=True  !
Run successfully with command - python3.7 train.py -c mv3_large_x0_5.yml -o  use_gpu=True     epoch=2       AMP.use_amp=True TRAIN.batch_size=1280  use_custom_relu=False model_type=cls_distill_multiopt MODEL.siamese=False  !