Stable Diffusion WebUI 성능 향상 방법

codens 2023. 5. 10. 00:17

Stable Diffusion Automatic1111 WebUI의 이미지 생성 속도 빠르게 하는 방법 (a1111)

- pytorch 2 + cuda 11.8 으로 변경

//-------------------------------------
CUDA v11.8 설치

- 다운로드
https://developer.nvidia.com/cuda-toolkit
https://developer.nvidia.com/cuda-toolkit-archive

  - 설치후 환경변수 설정
PATH 경로, CUDA_HOME, CUDA_PATH
값을 v11.8이 설치된 경로(예) "C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.8") 로 설정

  - CUDA 버전 확인
nvcc --version
      V11.8.89

//-----------------------------------------------------------------------------
PyTorch 2 (+CUDA 11.8) 설치

  - 가상 환경 들어가기
venv\Scripts\activate

  - 이전 설치 삭제
pip uninstall torch torchvision torchaudio

  - 새 버전 설치
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118

- 버전 확인

pip show torch

//-------------------------------------
* 설치후 성능 측정

  RTX 4080 (16 GB) 에서 prompt는 1단어
설치전 - 512 : 5.2 , 768 :  2.8
설치후 - 512 : 23.0, 768 : 11.6 , 1024 - 6.0 (SD-2.1  모델)

- SDXL 1.0 모델

512 : 7.8 , 768 : 7.4 , 1024 : 4.3

- 성능측정 Extension
https://github.com/vladmandic/sd-extension-system-info
benchmark 결과 :

SD 2.1 - 19.39 / 22.21 / 25.29

SDXL 1.0 - 8.02 / 13.45 / 15.85

- 참고) 속도 측정 결과 리스트
https://docs.getgrist.com/3mjouqRSdkBY/sdperformance/p/1

//-------------------------------------
성능에 영향을 주는 설정들

webui setting - 'Upcast cross attention layer to float32' : off 해야 성능 향상

윈도우 설정 - 그래픽 설정 - 하드웨어 가속 GPU 일정 예약 : 꺼짐 설정해야 성능 향상

//-----------------------------------------------------------------------------

< 참고 >
나에게는 성능향상이 없었던 작업들

//-----------------------------------------------------------------------------
cuDNN 변경

- SD WebUI에 적용 방법
cuDNN 압축을 푼 폴더에서 bin 폴더에 있는 Dll 파일을
stable-diffusion-webui\venv\Lib\site-packages\torch\lib 폴더로 복사

cuDNN의 CUDA 지원 버전 리스트
https://docs.nvidia.com/deeplearning/cudnn/support-matrix/index.html

cudnn 8.7.0 : 변화없음
cudnn 8.8.1 : 변화없음
cudnn 8.9.1 : 변화없음

//-----------------------------------------------------------------------------
시작 옵션 수정
--opt-sdp-attention 옵션 사용하여 시작
https://github.com/AUTOMATIC1111/stable-diffusion-webui/wiki/Optimizations

//-------------------------------------
--opt-sdp-attention

  - 에러 발생
RuntimeError: Expected query, key, and value to have the same dtype, but got query.dtype: float key.dtype: float and value.dtype: struct c10::Half instead

  - 해결 방법
webui setting - Upcast cross attention layer to float32 : off

속도 : 19.39 / 22.21 / 25.29

//-------------------------------------
--xformers :
속도 : 18.88 / 22.79 / 25.69 (--opt-sdp-attention 과 동일)

--opt-split-attention : --xformers 보다 느림
--xformers  --opt-split-attention  : --xformers 와 비슷

--opt-sdp-attention --opt-channelslast : 성능 변화 없음

//-----------------------------------------------------------------------------
* xformers 자기 환경에 맞게 컴파일 해서 사용하기
https://arca.live/b/aiartreal/71609267

  - C++ 용 빌드 툴 설치 필요
https://visualstudio.microsoft.com/ko/visual-cpp-build-tools/

  - 가상 환경
venv\Scripts\activate

  - xformers 삭제
pip uninstall xformers --yes

  - xformers 설치
pip install ninja setuptools wheel

cd repositories
git clone https://github.com/facebookresearch/xformers.git --recurse-submodules
setx NVCC_FLAGS "-allow-unsupported-compiler"

cd xformers
python setup.py build
  - 오래 걸림, 많은 경고 발생, 하지만 정상 빌드됨

python setup.py bdist_wheel

cd dist
pip install "xformers-xx.whl"

  - 설치 확인
pip list
xformers                0.0.20+6425fd0.d20230509

- 설치후 속도변화 없음

저작자표시 (새창열림)