GPU状态监控:
nvidia-smi dmon -s pucvmet
jtop监控:
sudo jtop
压力测试:
./gpu_burn -m 100000 300
工具:cuda_memtest
git clone https://github.com/ComputationalRadiationPhysics/cuda_memtest.gitcd cuda_memtestmkdir buildcd buildcmake -DCMAKE_CUDA_COMPILER=/usr/local/cuda-13.0/bin/nvcc -DCMAKE_CUDA_ARCHITECTURES=90 ..makesudo make install
./bin/cuda_memtest --stress