安装Extension
本地安装Remote-SSH、python
远程服务器上安装Python
- 难点:主机和远程服务器上安装Python扩展失败,可能是网络、代理等原因导致
- 解决方法:
- 主机在官方网站下载Python扩展:https://marketplace.visualstudio.com/items?itemName=ms-python.python
主机直接放在vscode的bin目录下并且执行指令code --install-extension ms-python.python-2022.9.11681004.vsix
即可
(细节见https://www.hangge.com/blog/cache/detail_3191.html) - 服务器的python扩展先使用scp从本地传上去,然后先要对其赋予执行权限,我一开始没有解决就是因为没有赋予权限,我直接chmod 777之后install from vsix即可(chmod +x应该也行)
之后就看到环境了:
现在可以选择自己在服务器的conda进行调试:
价值一天半时间的”权限访问“难题被破解!此时不禁想要听一百遍越权访问加深印象…
- 主机在官方网站下载Python扩展:https://marketplace.visualstudio.com/items?itemName=ms-python.python
之后就要run->add configuration->
launch.json如下:
{"version": "0.2","configurations": [{"name": "Python: Launch","type": "python","request": "launch","program": "${workspaceFolder}/CLIP4Clip/main_task_retrieval.py","args": ["--do_train","--num_thread_reader=0","--epochs=5","--batch_size=128","--n_display=50","--train_csv","${env:DATA_PATH}/MSRVTT_train.9k.csv","--val_csv","${env:DATA_PATH}/MSRVTT_JSFUSION_test.csv","--data_path","${env:DATA_PATH}/MSRVTT_data.json","--features_path","${env:DATA_PATH}/MSRVTT_Videos","--output_dir","ckpts/ckpt_msrvtt_retrieval_looseType","--lr","1e-4","--max_words","32","--max_frames","12","--batch_size_val","16","--datatype","msrvtt","--expand_msrvtt_sentences","--feature_framerate","1","--coef_lr","1e-3","--freeze_layer_num","0","--slice_framepos","2","--loose_type","--linear_patch","2d","--sim_header","meanP","--pretrained_clip_name","ViT-B/32"],"env": {"DATA_PATH": "/mnt/cloud_disk/wf/msrvtt_data"},"console": "integratedTerminal"}]
}
之后出现一个问题就是目前引用env变量在命令行中显示为空,目前不能用这个方式引用所以还得用笨方法,就是挨个复制粘贴。
并且python -m要变成module词段,module与program冲突,需要调整:
{"version": "0.2","configurations": [{"name": "Python: Launch","type": "python","request": "launch","module": "torch.distributed.launch","args": ["${workspaceFolder}/CLIP4Clip/main_task_retrieval.py","--do_train","--num_thread_reader=0","--epochs=5","--batch_size=128","--n_display=50","--train_csv","/mnt/cloud_disk/wf/msrvtt_data/MSRVTT_train.9k.csv","--val_csv","/mnt/cloud_disk/wf/msrvtt_data/MSRVTT_JSFUSION_test.csv","--data_path","/mnt/cloud_disk/wf/msrvtt_data/MSRVTT_data.json","--features_path","/mnt/cloud_disk/wf/msrvtt_data/MSRVTT_Videos","--output_dir","ckpts/ckpt_msrvtt_retrieval_looseType","--lr","1e-4","--max_words","32","--max_frames","12","--batch_size_val","16","--datatype","msrvtt","--expand_msrvtt_sentences","--feature_framerate","1","--coef_lr","1e-3","--freeze_layer_num","0","--slice_framepos","2","--loose_type","--linear_patch","2d","--sim_header","meanP","--pretrained_clip_name","ViT-B/32"],"console": "integratedTerminal"}]
}
之后设置断点调试之后发现这个问题:
挨个语句调试之后发现出现在某个加载模型的地方,模型的位置防止错误了,远程调试真的好用,可以清晰看到过程的调用栈call stack
发现以下问题:
在这段程序中计算frameCount的时候我发现计算出来的为0,fps也为0,因此引发了除零报错