ubuntu18.04配置pytorch框架并进行fcn网络并训练 —— 深度学习(一)
前言
ubuntu18.04 cpu版本 pytorch
ubuntu18.04 GPU版本
1.配置cpu环境
选择python3.6版本进行配置,利用anaconda创建python=3.6的环境fcn,参考:https://github.com/wkentaro/pytorch-fcn
https://github.com/wkentaro/pytorch-fcn
1.1 安装fcn包:
#创建和激活虚拟环境
conda create -n py36 python=3.6
source activate py36
pip install fcn
#pip install --default-timeout=100 -i https://pypi.tuna.tsinghua.edu.cn/simple fcn
1.2 安装PyTorch:
进入PyTorch官网,下载cpu版本:
Start Locally | PyTorch https://pytorch.org/get-started/locally/
复制网页的命令,我的如下:
conda install pytorch torchvision torchaudio cpuonly -c pytorch
#或者pip
pip3 install torch==1.10.2+cpu torchvision==0.11.3+cpu torchaudio==0.10.2+cpu -f https://download.pytorch.org/whl/cpu/torch_stable.html
验证安装:
clash$ conda activate py36
(py36) clash$ python
Python 3.6.13 |Anaconda, Inc.| (default, Jun 4 2021, 14:25:59)
[GCC 7.5.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import torch
>>> torch.cuda.is_available()
False
>>>
1.3 安装pillow、scipy、tqdm
pip install pillow
pip install scipy
pip install tqdm
1.4 验证环境配置
下载 https://github.com/wkentaro/pytorch-fcn https://github.com/wkentaro/pytorch-fcn 的代码并解压,pip install .后出现下面一堆successfully。
(py36) paper1$ cd pytorch-fcn-main/
(py36) pytorch-fcn-main$ pip install . ######安装torchfcn
Processing /home/elfoot/paper1/pytorch-fcn-main
Preparing metadata (setup.py) ... done
--------------------------------
Requirement already satisfied: idna<4,>=2.5 in /home/elfoot/anaconda3/envs/py36/lib/python3.6/site-packages (from requests[socks]->gdown->fcn>=6.1.5->torchfcn==1.9.7) (3.3)
Requirement already satisfied: PySocks!=1.5.7,>=1.5.6 in /home/elfoot/anaconda3/envs/py36/lib/python3.6/site-packages (from requests[socks]->gdown->fcn>=6.1.5->torchfcn==1.9.7) (1.7.1)
Building wheels for collected packages: torchfcn
Building wheel for torchfcn (setup.py) ... done
Created wheel for torchfcn: filename=torchfcn-1.9.7-py3-none-any.whl size=137110 sha256=0e0a02e7459ab0c07e029ccefb4d80959a61ee28a9d4a052ea8574855f7c488f
Stored in directory: /home/elfoot/.cache/pip/wheels/c9/60/99/c1bd09fc67e214cb878410d34a27c1a3ac13a0e4f22bddbadf
Successfully built torchfcn
Installing collected packages: torchfcn
Successfully installed torchfcn-1.9.7
2.利用VOC数据集训练example
#!/bin/bash
DIR=~/data/datasets/VOC
mkdir -p $DIR
cd $DIR
if [ ! -e benchmark_RELEASE ]; then
wget http://www.eecs.berkeley.edu/Research/Projects/CS/vision/grouping/semantic_contours/benchmark.tgz -O benchmark.tar
tar -xvf benchmark.tar
fi
if [ ! -e VOCdevkit/VOC2012 ]; then
wget http://host.robots.ox.ac.uk/pascal/VOC/voc2012/VOCtrainval_11-May-2012.tar
tar -xvf VOCtrainval_11-May-2012.tar
fi
2.1 下载数据
运行xxx/paper1/pytorch-fcn-main/examples/voc/download_dataset.sh脚本下载数据集,脚本内容如下,主要下载两个内容,并把他们放到DIR目录处:
#!/bin/bash
DIR=~/data/datasets/VOC
mkdir -p $DIR
cd $DIR
if [ ! -e benchmark_RELEASE ]; then
wget http://www.eecs.berkeley.edu/Research/Projects/CS/vision/grouping/semantic_contours/benchmark.tgz -O benchmark.tar
tar -xvf benchmark.tar
fi
if [ ! -e VOCdevkit/VOC2012 ]; then
wget http://host.robots.ox.ac.uk/pascal/VOC/voc2012/VOCtrainval_11-May-2012.tar
tar -xvf VOCtrainval_11-May-2012.tar
fi
关于直接在终端下载很慢,由于使用了科学上网,我直接把链接放到网页下载——贼快:
创建文件夹~/data/datasets/VOC,并把下载的文件分别解压到文件夹内:
接着如下图,分别将benchmark文件夹内的benchmark_RELEASE、VOCtrainval_11-May-2012内的VOCdevkit提到VOC目录中来。
2.2 配置git
因为xxx/pytorch-fcn-main/examples/voc/train_fcn32s.py中提到了git log以及结合报错,如下,故先配置一下git
//xxx/pytorch-fcn-main/examples/voc/train_fcn32s.py截取
def git_hash():
cmd = 'git log -n 1 --pretty="%h"'
ret = subprocess.check_output(shlex.split(cmd)).strip()
if isinstance(ret, bytes):
ret = ret.decode()
return ret
先在自己的github创建一个repository,其链接为:https://github.com/menghxz/fcn-pytorch-cpu.git
在~/.bashrc配置科学上网(可能需要,现在还没弄清需不需要),格式参考如下
export HTTP_PROXY="http://127.0.0.1:7890"
export HTTPS_PROXY="http://127.0.0.1:7890"
终端配置git:
cd /home/elfoot/paper1/pytorch-fcn-main/examples/voc
git init
git add README.md
git commit -m "first commit"
git branch -M main
git remote add origin https://github.com/menghxz/fcn-pytorch-cpu.git #你的链接
git push -u origin main
2.3 训练
终端进入voc目录,训练如下:
cd /home/elfoot/paper1/pytorch-fcn-main/examples/voc
./train_fcn32s.py
这个过程非常慢。。。。。训练三个小时才训练到epoch1 的53%。
3 配置GPU版本
3.1 pytorch官网conda命令直接安装—失败
#创建和激活虚拟环境
conda create -n fcn36 python=3.6
source activate fcn36
pip install fcn
安装gpu版本的pytorch:
conda安装:没成功——原因是在anaconda默认的网站中没有想要的包。
conda install pytorch torchvision torchaudio cudatoolkit=11.3 -c pytorch
(fcn36) meng@meng:~$ conda install pytorch torchvision torchaudio cudatoolkit=11.3 -c pytorch
Solving environment: failed
PackagesNotFoundError: The following packages are not available from current channels:
- cudatoolkit=11.3
- libgcc-ng[version='>=9.3.0']
- __glibc[version='>=2.17']
- cudatoolkit=11.3
- libstdcxx-ng[version='>=9.3.0']
Current channels:
- https://conda.anaconda.org/pytorch/linux-64
- https://conda.anaconda.org/pytorch/noarch
- https://repo.anaconda.com/pkgs/main/linux-64
- https://repo.anaconda.com/pkgs/main/noarch
- https://repo.anaconda.com/pkgs/free/linux-64
- https://repo.anaconda.com/pkgs/free/noarch
- https://repo.anaconda.com/pkgs/r/linux-64
- https://repo.anaconda.com/pkgs/r/noarch
- https://repo.anaconda.com/pkgs/pro/linux-64
- https://repo.anaconda.com/pkgs/pro/noarch
To search for alternate channels that may provide the conda package you're
looking for, navigate to
https://anaconda.org
and use the search bar at the top of the page.
3.2 修改anaconda源为清华源—失败
直接搜索的只有condarc文件,如下,不是需要的
这因为.condarc文件是不会自动创建的。
创建.condarc文件:
conda config --add channels r
修改为:清华源的anaconda部分
# 编辑.condarc注释defalts
channels:
- http://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main/linux-64/
- http://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/free/linux-64/
# - defaults
ssl_verify: true
show_channel_urls: true
关闭科学上网;再次运行安装命令,去掉-c pytorch, 没有制定版本的包。
conda install pytorch torchvision torchaudio cudatoolkit=11.3
参考链接为win10的,但可以借鉴:
Anaconda建立新的环境,出现CondaHTTPError: HTTP 000 CONNECTION FAILED for url …… 解决过程 - tianlang25 - 博客园
3.3 官网pip命令调整+取消清华源+科学上网+按提示调整——成功
取消配置的清华源:将.condarc文件清空即可
官网pip命令如下,在终端输入
pip3 install torch==1.11.0+cu113 torchvision==0.12.0+cu113 torchaudio==0.11.0+cu113 -f https://download.pytorch.org/whl/cu113/torch_stable.html
没配置科学上网前,会一直打印输入下图的黄色字体,直到失败
配置科学上网后,输入官网给的命令,torch的版本找不到——按提示选了一个最新的版本
(fcn36) meng@meng:~$ pip3 install torch==1.11.0+cu113 torchvision==0.12.0+cu113 torchaudio==0.11.0+cu113 -f https://download.pytorch.org/whl/cu113/torch_stable.html
Looking in links: https://download.pytorch.org/whl/cu113/torch_stable.html
ERROR: Could not find a version that satisfies the requirement torch==1.11.0+cu113 (from versions: 1.0.0, 1.0.1, 1.0.1.post2, 1.1.0, 1.2.0, 1.3.0, 1.3.1, 1.4.0, 1.5.0, 1.5.1, 1.6.0, 1.7.0, 1.7.1, 1.8.0, 1.8.1, 1.9.0, 1.9.1, 1.10.0, 1.10.0+cu113, 1.10.1, 1.10.1+cu113, 1.10.2, 1.10.2+cu113)
ERROR: No matching distribution found for torch==1.11.0+cu113
修改安装命令为:
pip3 install torch==1.10.2+cu113 torchvision==0.12.0+cu113 torchaudio==0.11.0+cu113 -f https://download.pytorch.org/whl/cu113/torch_stable.html
torch下载完后,又报错,是torchvision版本找不到
继续改
pip3 install torch==1.10.2+cu113 torchvision==0.11.3+cu113 torchaudio==0.11.0+cu113 -f https://download.pytorch.org/whl/cu113/torch_stable.html
torchvision下载完后,torchaudio版本找不到
继续改:
pip3 install torch==1.10.2+cu113 torchvision==0.11.3+cu113 torchaudio==0.10.2+cu113 -f https://download.pytorch.org/whl/cu113/torch_stable.html
全部安装成功
3.4 测试pytorch
4 VOC训练报错与重装cuda+cudnn
4.1 VOC数据集训练报错
(fcn36) meng@meng:~/deeplearning/fcn/pytorch-fcn-main/examples/voc$ ./speedtest.py --gpu 2
==> Benchmark: gpu=2, times=1000, dynamic_input=False
/home/meng/anaconda3/envs/fcn36/lib/python3.6/site-packages/chainer/_environment_check.py:75: UserWarning:
--------------------------------------------------------------------------------
CuPy (cupy-cuda113) version 9.2.0 may not be compatible with this version of Chainer.
Please consider installing the supported version by running:
$ pip install 'cupy-cuda113>=7.7.0,<8.0.0'
See the following page for more details:
https://docs.cupy.dev/en/latest/install.html
--------------------------------------------------------------------------------
requirement=requirement, help=help))
==> Testing FCN32s with Chainer
Traceback (most recent call last):
File "./speedtest.py", line 110, in <module>
main()
File "./speedtest.py", line 105, in main
bench_chainer(args.gpu, args.times, args.dynamic_input)
File "./speedtest.py", line 14, in bench_chainer
chainer.cuda.get_device(gpu).use()
File "cupy/cuda/device.pyx", line 172, in cupy.cuda.device.Device.use
File "cupy/cuda/device.pyx", line 178, in cupy.cuda.device.Device.use
File "cupy_backends/cuda/api/runtime.pyx", line 485, in cupy_backends.cuda.api.runtime.setDevice
File "cupy_backends/cuda/api/runtime.pyx", line 261, in cupy_backends.cuda.api.runtime.check_status
cupy_backends.cuda.api.runtime.CUDARuntimeError: cudaErrorInvalidDevice: invalid device ordinal
训练过程中显示cupy的版本不对,需要安装低版本的cupy-cuda11.3,范围为cupy-cuda11.3==7.7.0~8.0.0
4.2 查找不到低版本的cupy-cuda11.3
直接pip安装低版本的cupy-cuda11.3,终端显示找不到。
(fcn36) meng@meng:~/deeplearning/fcn/pytorch-fcn-main/examples/voc$ pip install cupy-cuda113==8.0.0
ERROR: Could not find a version that satisfies the requirement cupy-cuda113==8.0.0 (from versions: 9.2.0, 9.3.0, 9.4.0, 9.5.0, 9.6.0)
ERROR: No matching distribution found for cupy-cuda113==8.0.0
必应搜索:cupy-cuda113下载(一定要用必应搜索,百度可能搜不到),第一条就是:
进入其中查看历史版本:
发现官方没有发布低版本的,怪不得pip install不到
却发现cupy-cuda110有需要的低版本的:cupy-cuda110 · PyPI
下面的图只截取了部分:
4.3 cuda和cudnn版本选择
由4.2,选择了cuda11.0及其适配的cudnn
4.3.1 重装cuda为cuda11.0
我安装显卡驱动+cuda11.3+cudnn—-重装cuda+cudnn的部分为这篇,这里就不叙述了。
ubuntu系统(八):ubuntu18.04双系统安装+ros安装+各种软件安装+深度学习环境配置全家桶_biter0088的博客-CSDN博客
cuda11.0下载链接:CUDA Toolkit 11.0 Download | NVIDIA Developer
4.3.2 cudnn选择
官网为:cuDNN Archive | NVIDIA Developer
选择了这个文件,下载下来的文件名称却为11.2——-自己一定要记清,省的老下载资源
Fcudnn-11.2-linux-x64-v8.1.1.33.tgz
5 重新配置python环境+重新安装pytorch+重新配置fcn环境
5.1 重新配置python环境
想着上面那个fcn36就留着吧,说不定什么时候就用到cuda11.3了
创建python环境:py36cuda110:
conda create -n py36cuda110 python=3.6
source activate py36cuda110
5.2 重新安装pytorch
安装pytorch:
Previous PyTorch Versions | PyTorch
上面的历史版本,一直下拉,找到cuda11.0版本的命令:
# CUDA 11.0
pip install torch==1.7.1+cu110 torchvision==0.8.2+cu110 torchaudio==0.7.2 -f https://download.pytorch.org/whl/torch_stable.html
5.3 安装其他环境
cd /home/meng/deeplearning/fcn/pytorch-fcn-main
pip install .
5.4 安装cupy-cuda110-xxx
pip install cupy-cuda110==7.8.0
5.5 运行测试1
cd /home/meng/deeplearning/fcn/pytorch-fcn-main/examples/voc
./speedtest.py --gpu 2
报错:CuPy is not correctly installed.
(py36cuda110) meng@meng:~/deeplearning/fcn/pytorch-fcn-main/examples/voc$ ./speedtest.py --gpu 2
==> Benchmark: gpu=2, times=1000, dynamic_input=False
==> Testing FCN32s with Chainer
Traceback (most recent call last):
File "./speedtest.py", line 110, in <module>
main()
File "./speedtest.py", line 105, in main
bench_chainer(args.gpu, args.times, args.dynamic_input)
File "./speedtest.py", line 14, in bench_chainer
chainer.cuda.get_device(gpu).use()
File "/home/meng/anaconda3/envs/py36cuda110/lib/python3.6/site-packages/chainer/backends/cuda.py", line 354, in get_device
return _get_cuda_device(*args)
File "/home/meng/anaconda3/envs/py36cuda110/lib/python3.6/site-packages/chainer/backends/cuda.py", line 361, in _get_cuda_device
check_cuda_available()
File "/home/meng/anaconda3/envs/py36cuda110/lib/python3.6/site-packages/chainer/backends/cuda.py", line 150, in check_cuda_available
raise RuntimeError(msg)
RuntimeError: CUDA environment is not correctly set up
(see https://github.com/chainer/chainer#installation).CuPy is not correctly installed.
If you are using wheel distribution (cupy-cudaXX), make sure that the version of CuPy you installed matches with the version of CUDA on your host.
Also, confirm that only one CuPy package is installed:
$ pip freeze
If you are building CuPy from source, please check your environment, uninstall CuPy and reinstall it with:
$ pip install cupy --no-cache-dir -vvvv
Check the Installation Guide for details:
https://docs.cupy.dev/en/latest/install.html
original error: libcublas.so.11: cannot open shared object file: No such file or directory
卸载cupy-cuda110-7.8.0
pip uninstall cupy-cuda110==7.8.0
并运行:pip install cupy —no-cache-dir -vvvv
(这个命令上面报错提到的,貌似是适应性安装,然后终端输出很多东西。。。。)
终端输出的最后一些信息为:
Link requires a different Python (3.6.13 not in: '>=3.7'): https://files.pythonhosted.org/packages/4a/ca/e72b3b399d7a8cb34311aa8f52924108591c013b09f0268820afb4cd96fb/pip-22.0.tar.gz#sha256=d3fa5c3e42b33de52bddce89de40268c9a263cd6ef7c94c40774808dafb32c82 (from https://pypi.org/simple/pip/) (requires-python:>=3.7)
Link requires a different Python (3.6.13 not in: '>=3.7'): https://files.pythonhosted.org/packages/89/a1/2f4e58eda11e591fbfa518233378835679fc5ab766b690b3df85215014d5/pip-22.0.1-py3-none-any.whl#sha256=30739ac5fb973cfa4399b0afff0523d4fe6bed2f7a5229333f64d9c2ce0d1933 (from https://pypi.org/simple/pip/) (requires-python:>=3.7)
Link requires a different Python (3.6.13 not in: '>=3.7'): https://files.pythonhosted.org/packages/63/71/5686e51f06fa59da55f7e81c3101844e57434a30f4a0d7456674d1459841/pip-22.0.1.tar.gz#sha256=7fd7a92f2fb1d2ac2ae8c72fb10b1e640560a0361ed4427453509e2bcc18605b (from https://pypi.org/simple/pip/) (requires-python:>=3.7)
Link requires a different Python (3.6.13 not in: '>=3.7'): https://files.pythonhosted.org/packages/83/b5/df8640236faa5a3cb80bfafd68e9fb4b22578208b8398c032ccff803f9e0/pip-22.0.2-py3-none-any.whl#sha256=682eabc4716bfce606aca8dab488e9c7b58b0737e9001004eb858cdafcd8dbdd (from https://pypi.org/simple/pip/) (requires-python:>=3.7)
Link requires a different Python (3.6.13 not in: '>=3.7'): https://files.pythonhosted.org/packages/d9/c1/146b24a7648fdf3f8b4dc6521ab0b26ac151ef903bac0b63a4e1450cb4d1/pip-22.0.2.tar.gz#sha256=27b4b70c34ec35f77947f777070d8331adbb1e444842e98e7150c288dc0caea4 (from https://pypi.org/simple/pip/) (requires-python:>=3.7)
Link requires a different Python (3.6.13 not in: '>=3.7'): https://files.pythonhosted.org/packages/6a/df/a6ef77a6574781a668791419ffe366c8acd1c3cf4709d210cb53cd5ce1c2/pip-22.0.3-py3-none-any.whl#sha256=c146f331f0805c77017c6bb9740cec4a49a0d4582d0c3cc8244b057f83eca359 (from https://pypi.org/simple/pip/) (requires-python:>=3.7)
Link requires a different Python (3.6.13 not in: '>=3.7'): https://files.pythonhosted.org/packages/88/d9/761f0b1e0551a3559afe4d34bd9bf68fc8de3292363b3775dda39b62ce84/pip-22.0.3.tar.gz#sha256=f29d589df8c8ab99c060e68ad294c4a9ed896624f6368c5349d70aa581b333d0 (from https://pypi.org/simple/pip/) (requires-python:>=3.7)
Link requires a different Python (3.6.13 not in: '>=3.7'): https://files.pythonhosted.org/packages/4d/16/0a14ca596f30316efd412a60bdfac02a7259bf8673d4d917dc60b9a21812/pip-22.0.4-py3-none-any.whl#sha256=c6aca0f2f081363f689f041d90dab2a07a9a07fb840284db2218117a52da800b (from https://pypi.org/simple/pip/) (requires-python:>=3.7)
Link requires a different Python (3.6.13 not in: '>=3.7'): https://files.pythonhosted.org/packages/33/c9/e2164122d365d8f823213a53970fa3005eb16218edcfc56ca24cb6deba2b/pip-22.0.4.tar.gz#sha256=b3a9de2c6ef801e9247d1527a4b16f92f2cc141cd1489f3fffaf6a9e96729764 (from https://pypi.org/simple/pip/) (requires-python:>=3.7)
Skipping link: not a file: https://pypi.org/simple/pip/
Given no hashes to check 181 links for project 'pip': discarding no candidates
Removed build tracker: '/tmp/pip-req-tracker-83poj6hz'
查看cupy-cuda110-xxx版本:居然为9.6.0
5.6 运行测试2
#重新配置
pip install cupy==7.8.0
pip uninstall cupy==9.6.0
测试:
(py36cuda110) meng@meng:~/deeplearning/fcn/pytorch-fcn-main/examples/voc$ ./speedtest.py --gpu 2
==> Benchmark: gpu=2, times=1000, dynamic_input=False
==> Testing FCN32s with Chainer
Traceback (most recent call last):
File "./speedtest.py", line 110, in <module>
main()
File "./speedtest.py", line 105, in main
bench_chainer(args.gpu, args.times, args.dynamic_input)
File "./speedtest.py", line 14, in bench_chainer
chainer.cuda.get_device(gpu).use()
File "/home/meng/anaconda3/envs/py36cuda110/lib/python3.6/site-packages/chainer/backends/cuda.py", line 354, in get_device
return _get_cuda_device(*args)
File "/home/meng/anaconda3/envs/py36cuda110/lib/python3.6/site-packages/chainer/backends/cuda.py", line 361, in _get_cuda_device
check_cuda_available()
File "/home/meng/anaconda3/envs/py36cuda110/lib/python3.6/site-packages/chainer/backends/cuda.py", line 150, in check_cuda_available
raise RuntimeError(msg)
RuntimeError: CUDA environment is not correctly set up
(see https://github.com/chainer/chainer#installation).libcublas.so.11: cannot open shared object file: No such file or directory
目前没成功配置出GPU版本的fcn网络,大家可以给点建议不
参考链接:
Ubuntu18.04安装cpu版pytorch环境 - 简书 https://www.jianshu.com/p/43f66c69baa7https://github.com/pytorch/pytorch#installation https://github.com/pytorch/pytorch#installation
- 分享
- 举报
-
浏览量:5299次2021-08-13 15:39:02
-
浏览量:5105次2021-07-12 11:02:32
-
浏览量:1087次2023-07-22 09:54:51
-
浏览量:6603次2021-05-31 17:02:05
-
浏览量:4956次2021-04-23 14:09:37
-
浏览量:8457次2021-05-28 16:59:43
-
浏览量:738次2023-07-20 17:45:54
-
浏览量:4670次2021-04-20 15:50:27
-
浏览量:4445次2021-04-23 14:09:15
-
浏览量:749次2024-02-01 14:28:23
-
浏览量:104次2023-08-30 20:18:28
-
浏览量:5013次2021-08-02 09:33:43
-
浏览量:5015次2021-07-26 11:28:05
-
浏览量:10276次2020-11-08 17:15:55
-
浏览量:6201次2021-06-11 12:41:01
-
浏览量:5489次2021-05-28 16:59:25
-
浏览量:36580次2021-05-19 16:24:16
-
浏览量:6148次2021-08-03 11:36:37
-
浏览量:4972次2021-08-05 09:21:07
-
广告/SPAM
-
恶意灌水
-
违规内容
-
文不对题
-
重复发帖
愚人陆陆
感谢您的打赏,如若您也想被打赏,可前往 发表专栏 哦~
举报类型
- 内容涉黄/赌/毒
- 内容侵权/抄袭
- 政治相关
- 涉嫌广告
- 侮辱谩骂
- 其他
详细说明