nvidia xavier nx平台调整PCIE速率调试

free-jdx 2021-05-26 14:17:15 9488
1. 前言

如何增加最大速度的pcie上的jetson xavier?
因为被限制在2.5 GT/s
Xavier似乎可以增加到8 GT/s。
我使用Jetpack 4.5

0004:00:00.0 PCI bridge: NVIDIA Corporation Device 1ad1 (rev a1) (prog-if 00 [Normal decode])
        LnkCap: Port #0, Speed 8GT/s, Width x1, ASPM not supported, Exit Latency L0s <1us, L1 <64us
        LnkSta: Speed 2.5GT/s, Width x1, TrErr- Train- SlotClk+ DLActive+ BWMgmt+ ABWMgmt-

没设备连接在nx上时

0004:00:00.0 PCI bridge: NVIDIA Corporation Device 1ad1 (rev a1) (prog-if 00 [Normal decode])
Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR+ FastB2B- DisINTx-
Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
Latency: 0
Interrupt: pin A routed to IRQ 33
Bus: primary=00, secondary=01, subordinate=ff, sec-latency=0
I/O behind bridge: 00001000-00001fff
Memory behind bridge: 40000000-400fffff
Secondary status: 66MHz- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- <SERR- <PERR-
BridgeCtl: Parity- SERR- NoISA- VGA- MAbort- >Reset- FastB2B-
PriDiscTmr- SecDiscTmr- DiscTmrStat- DiscTmrSERREn-
Capabilities: [40] Power Management version 3
Flags: PMEClk- DSI- D1- D2- AuxCurrent=375mA PME(D0+,D1-,D2-,D3hot+,D3cold+)
Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME-
Capabilities: [50] MSI: Enable- Count=1/1 Maskable+ 64bit+
Address: 0000000000000000  Data: 0000
Masking: 00000000  Pending: 00000000
Capabilities: [70] Express (v2) Root Port (Slot-), MSI 00
DevCap: MaxPayload 256 bytes, PhantFunc 0
ExtTag- RBE+
DevCtl: Report errors: Correctable+ Non-Fatal+ Fatal+ Unsupported+
RlxdOrd+ ExtTag- PhantFunc- AuxPwr- NoSnoop+
MaxPayload 128 bytes, MaxReadReq 512 bytes
DevSta: CorrErr- UncorrErr- FatalErr- UnsuppReq- AuxPwr+ TransPend-
LnkCap: Port #0, Speed 8GT/s, Width x1, ASPM not supported, Exit Latency L0s <1us, L1 <64us
ClockPM- Surprise+ LLActRep+ BwNot+ ASPMOptComp+
LnkCtl: ASPM Disabled; RCB 64 bytes Disabled- CommClk+
ExtSynch- ClockPM- AutWidDis- BWInt+ AutBWInt-
LnkSta: Speed 2.5GT/s, Width x1, TrErr- Train- SlotClk+ DLActive+ BWMgmt+ ABWMgmt-
RootCtl: ErrCorrectable- ErrNon-Fatal- ErrFatal- PMEIntEna+ CRSVisible+
RootCap: CRSVisible+
RootSta: PME ReqID 0000, PMEStatus- PMEPending-
DevCap2: Completion Timeout: Range ABCD, TimeoutDis+, LTR+, OBFF Not Supported ARIFwd-
DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-, LTR+, OBFF Disabled ARIFwd-
LnkCtl2: Target Link Speed: 8GT/s, EnterCompliance- SpeedDis-
 Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS-
 Compliance De-emphasis: -6dB
LnkSta2: Current De-emphasis Level: -3.5dB, EqualizationComplete-, EqualizationPhase1-
 EqualizationPhase2-, EqualizationPhase3-, LinkEqualizationRequest-
Capabilities: [b0] MSI-X: Enable- Count=8 Masked-
Vector table: BAR=2 offset=00000000
PBA: BAR=2 offset=00010000
Capabilities: [100 v2] Advanced Error Reporting
UESta: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
UEMsk: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
UESvrt: DLP+ SDES+ TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt- RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol-
CESta: RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr-
CEMsk: RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr+
AERCap: First Error Pointer: 00, GenCap+ CGenEn- ChkCap+ ChkEn-
Capabilities: [148 v1] #19
Capabilities: [168 v1] #26
Capabilities: [18c v1] #27
Capabilities: [1ac v1] L1 PM Substates
L1SubCap: PCI-PM_L1.2+ PCI-PM_L1.1+ ASPM_L1.2- ASPM_L1.1- L1_PM_Substates+
  PortCommonModeRestoreTime=60us PortTPowerOnTime=40us
L1SubCtl1: PCI-PM_L1.2- PCI-PM_L1.1- ASPM_L1.2- ASPM_L1.1-
   T_CommonMode=60us
L1SubCtl2: T_PwrOn=60us
Capabilities: [1bc v1] Vendor Specific Information: ID=0002 Rev=4 Len=100 <?>
Capabilities: [2bc v1] Vendor Specific Information: ID=0001 Rev=1 Len=038 <?>
Capabilities: [2f4 v1] #25
Capabilities: [300 v1] Precision Time Measurement
PTMCap: Requester:+ Responder:+ Root:+
PTMClockGranularity: 16ns
PTMControl: Enabled:- RootSelected:-
PTMEffectiveGranularity: Unknown
Capabilities: [30c v1] Vendor Specific Information: ID=0004 Rev=1 Len=054 <?>
Kernel driver in use: pcieport
2. 查询文档

Jetson Xavier实际上具有Gen-4速度(即16 GT/s),
这是默认设置(即当一个具有Gen-4速度的设备连接时,链接会以Gen-4速度出现)
否则链接速度取决于连接到根端口的是什么,最终速度取决于设备端

可以用这个脚本改变速度pcie_set_speed.sh

    #!/bin/bash

    dev=$1
    speed=$2

    if [ -z "$dev" ]; then
        echo "Error: no device specified"
        exit 1
    fi

    if [ ! -e "/sys/bus/pci/devices/$dev" ]; then
        dev="0000:$dev"
    fi

    if [ ! -e "/sys/bus/pci/devices/$dev" ]; then
        echo "Error: device $dev not found"
        exit 1
    fi

    pciec=$(setpci -s $dev CAP_EXP+02.W)
    pt=$((("0x$pciec" & 0xF0) >> 4))

    port=$(basename $(dirname $(readlink "/sys/bus/pci/devices/$dev")))

    if (($pt == 0)) || (($pt == 1)) || (($pt == 5)); then
        dev=$port
    fi

    lc=$(setpci -s $dev CAP_EXP+0c.L)
    ls=$(setpci -s $dev CAP_EXP+12.W)

    max_speed=$(("0x$lc" & 0xF))

    echo "Link capabilities:" $lc
    echo "Max link speed:" $max_speed
    echo "Link status:" $ls
    echo "Current link speed:" $(("0x$ls" & 0xF))

    if [ -z "$speed" ]; then
        speed=$max_speed
    fi

    if (($speed > $max_speed)); then
        speed=$max_speed
    fi

    echo "Configuring $dev..."

    lc2=$(setpci -s $dev CAP_EXP+30.L)

    echo "Original link control 2:" $lc2
    echo "Original link target speed:" $(("0x$lc2" & 0xF))

    lc2n=$(printf "%08x" $((("0x$lc2" & 0xFFFFFFF0) | $speed)))

    echo "New target link speed:" $speed
    echo "New link control 2:" $lc2n

    setpci -s $dev CAP_EXP+30.L=$lc2n

    echo "Triggering link retraining..."

    lc=$(setpci -s $dev CAP_EXP+10.L)

    echo "Original link control:" $lc

    lcn=$(printf "%08x" $(("0x$lc" | 0x20)))

    echo "New link control:" $lcn

    setpci -s $dev CAP_EXP+10.L=$lcn

    sleep 0.1

    ls=$(setpci -s $dev CAP_EXP+12.W)

    echo "Link status:" $ls
    echo "Current link speed:" $(("0x$ls" & 0xF))

是否有更深层次方法来改变pcie速度,而不是执行这个脚本每次?

3.安装一个8GT/s的设备

不执行任何脚本
没有任何连接,就没有什么可以协商,所以链接速度保持在2.5GT/。

如果安装一个8GT/s的设备,会看到相应的速度调整。
这是一个NVMe设备的片段,运行在8GT/s x4…

0005:01:00.0 Non-Volatile memory controller: Micron/Crucial Technology Device 540a (rev 01) (prog-if 02 [NVM Express])
        Subsystem: Micron/Crucial Technology Device 540a
        Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+
        Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
        Latency: 0
        Interrupt: pin A routed to IRQ 35
        IOMMU group: 61
        Region 0: Memory at 1f40000000 (64-bit, non-prefetchable) [size=16K]
        Capabilities: [80] Express (v2) Endpoint, MSI 00
                DevCap: MaxPayload 256 bytes, PhantFunc 0, Latency L0s unlimited, L1 unlimited
                        ExtTag+ AttnBtn- AttnInd- PwrInd- RBE+ FLReset+ SlotPowerLimit 0.000W
                DevCtl: CorrErr+ NonFatalErr+ FatalErr+ UnsupReq+
                        RlxdOrd+ ExtTag+ PhantFunc- AuxPwr- NoSnoop+ FLReset-
                        MaxPayload 256 bytes, MaxReadReq 512 bytes
                DevSta: CorrErr- NonFatalErr- FatalErr- UnsupReq- AuxPwr- TransPend-
                LnkCap: Port #1, Speed 8GT/s, Width x4, ASPM L1, Exit Latency L1 unlimited
                        ClockPM- Surprise- LLActRep- BwNot- ASPMOptComp+
                LnkCtl: ASPM Disabled; RCB 64 bytes, Disabled- CommClk+
                        ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
                LnkSta: Speed 8GT/s (ok), Width x4 (ok)
                        TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-

这是它连接的bridge

0005:00:00.0 PCI bridge: NVIDIA Corporation Device 1ad0 (rev a1) (prog-if 00 [Normal decode])
        Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR+ FastB2B- DisINTx-
        Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
        Latency: 0
        Interrupt: pin A routed to IRQ 35
        IOMMU group: 60
        Bus: primary=00, secondary=01, subordinate=ff, sec-latency=0
        I/O behind bridge: 0000f000-00000fff [disabled]
        Memory behind bridge: 40000000-400fffff [size=1M]
        Prefetchable memory behind bridge: 00000000fff00000-00000000000fffff [disabled]
        Secondary status: 66MHz- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- <SERR- <PERR-
        BridgeCtl: Parity- SERR- NoISA- VGA- VGA16- MAbort- >Reset- FastB2B-
                PriDiscTmr- SecDiscTmr- DiscTmrStat- DiscTmrSERREn-
        Capabilities: [40] Power Management version 3
                Flags: PMEClk- DSI- D1- D2- AuxCurrent=375mA PME(D0+,D1-,D2-,D3hot+,D3cold+)
                Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME-
        Capabilities: [50] MSI: Enable- Count=1/1 Maskable+ 64bit+
                Address: 0000000000000000  Data: 0000
                Masking: 00000000  Pending: 00000000
        Capabilities: [70] Express (v2) Root Port (Slot-), MSI 00
                DevCap: MaxPayload 256 bytes, PhantFunc 0
                        ExtTag- RBE+
                DevCtl: CorrErr+ NonFatalErr+ FatalErr+ UnsupReq+
                        RlxdOrd+ ExtTag- PhantFunc- AuxPwr- NoSnoop+
                        MaxPayload 256 bytes, MaxReadReq 512 bytes
                DevSta: CorrErr- NonFatalErr- FatalErr- UnsupReq- AuxPwr+ TransPend-
                LnkCap: Port #0, Speed 16GT/s, Width x8, ASPM not supported
                        ClockPM- Surprise+ LLActRep+ BwNot+ ASPMOptComp+
                LnkCtl: ASPM Disabled; RCB 64 bytes, Disabled- CommClk+
                        ExtSynch- ClockPM- AutWidDis- BWInt+ AutBWInt-
                LnkSta: Speed 8GT/s (downgraded), Width x4 (downgraded)
                        TrErr- Train- SlotClk+ DLActive+ BWMgmt+ ABWMgmt+
4. 挖矿程序调试

当我启动挖矿程序,检查基于pcie的内存空间。
如果我什么都没改变,在xavier nx上,我不能挖掘任何东西,因为我得到的消息:

cuda-0   Using Pci Id : 00:00.0 Xavier (Compute 7.2) Memory : 2.5 GB

该进程至少需要4.2 GB才能生成DAG。
如果改变pcie的速度,并运行挖掘进程,我得到以下消息:

cuda-0   Using Pci Id : 00:00.0 Xavier (Compute 7.2) Memory : 6.19 GB

挖掘进程成功运行,因为这次它有足够的内存来生成DAG。
因此,以某种方式或另一种方式,他们是与pcie速度的一个链接,能够在此卡上运行采矿过程。

5. 调整设备树

有一个名为“nvidia,init-speed”的设备树
可以尝试用设备树覆盖将其添加到pcie设备中

pcie@14160000 {
        nvidia,init-speed = <3>;
    };
    pcie@141a0000 {
        nvidia,init-speed = <4>;
    };

介绍的方法涉及创建一个新的dtb,它在引导时加载到内核中。

完成此任务的最简单方法是在启动时自动运行pcie_set_speed.sh脚本。
可以用系统服务轻松做到这一点…
保存路径为“/etc/systemd/system/pcie_set_speed.service”

[Unit]
Description=Set PCIe Speed

[Service]
Type=oneshot
ExecStart=/root/pcie_set_speed.sh

[Install]
WantedBy=sysinit.target

然后将pcie_set_speed.sh脚本复制到/root/,并确保它是可执行的。
现在运行

$ sudo systemctl daemon-reload
$ sudo systemctl enable pcie_set_speed
$ sudo systemctl start pcie_set_speed

配置ok

声明:本文内容由易百纳平台入驻作者撰写,文章观点仅代表作者本人,不代表易百纳立场。如有内容侵权或者其他问题,请联系本站进行删除。
free-jdx
红包 92 6 评论 打赏
评论
0个
内容存在敏感词
手气红包
    易百纳技术社区暂无数据
相关专栏
置顶时间设置
结束时间
删除原因
  • 广告/SPAM
  • 恶意灌水
  • 违规内容
  • 文不对题
  • 重复发帖
打赏作者
易百纳技术社区
free-jdx
您的支持将鼓励我继续创作!
打赏金额:
¥1易百纳技术社区
¥5易百纳技术社区
¥10易百纳技术社区
¥50易百纳技术社区
¥100易百纳技术社区
支付方式:
微信支付
支付宝支付
易百纳技术社区微信支付
易百纳技术社区
打赏成功!

感谢您的打赏,如若您也想被打赏,可前往 发表专栏 哦~

举报反馈

举报类型

  • 内容涉黄/赌/毒
  • 内容侵权/抄袭
  • 政治相关
  • 涉嫌广告
  • 侮辱谩骂
  • 其他

详细说明

审核成功

发布时间设置
发布时间:
是否关联周任务-专栏模块

审核失败

失败原因
备注
拼手气红包 红包规则
祝福语
恭喜发财,大吉大利!
红包金额
红包最小金额不能低于5元
红包数量
红包数量范围10~50个
余额支付
当前余额:
可前往问答、专栏板块获取收益 去获取
取 消 确 定

小包子的红包

恭喜发财,大吉大利

已领取20/40,共1.6元 红包规则

    易百纳技术社区