728x90

 

https://github.com/ggerganov/whisper.cpp 

 

GitHub - ggerganov/whisper.cpp: Port of OpenAI's Whisper model in C/C++

Port of OpenAI's Whisper model in C/C++. Contribute to ggerganov/whisper.cpp development by creating an account on GitHub.

github.com

 

M1 Install

 

1 . git clone์œผ๋กœ ์ตœ์‹  ๋ฒ„์ „์œผ๋กœ ์„ค์น˜ํ•  ๊ฒฝ์šฐ M1์—์„œ .o architecture error ๋ฐœ์ƒ์œผ๋กœ [stable version]์„ ๋‹ค์šด๋กœ๋“œ ํ•œ๋‹ค.

https://github.com/ggerganov/whisper.cpp/releases/tag/v1.2.1

 

Release v1.2.1 ยท ggerganov/whisper.cpp

Overview This is a minor release. The main reason for it is a critical bug fix that causes the software to crash randomly when the language auto-detect option is used (i.e. whisper_lang_auto_detect...

github.com

 

 

2.  ๋‹ค์šด๋กœ๋“œํ•œ ํด๋”๋ฅผ tar.gz ํŒŒ์ผ ์••์ถ• ํ•ด์ œ

tar -xvf whisper.cpp-1.2.1.tar.gz

 

 

3. ํด๋” ์ด๋™ 

cd whisper.cpp-1.2.1

 

 

4. Whisper cpp version์œผ๋กœ ๋ณ€ํ™˜๋œ ๋ชจ๋ธ ๋‹ค์šด๋กœ๋“œ [https://github.com/ggerganov/whisper.cpp/tree/master/models]

bash ./models/download-ggml-model.sh base.en # ๊ธฐ๋ณธ Base model ๋‹ค์šด๋กœ๋“œ ์˜ˆ์‹œ

# bash ./models/download-ggml-model.sh [model size ex) large]

 

 

5. Build Makefile 

# build the main example
make

# transcribe an audio file
./main -f samples/jfk.wav

 

m1์—์„œ make ์‹œ clang ์—๋Ÿฌ๋ฅผ ๋งˆ์ฃผํ•˜๊ฒŒ ๋œ๋‹ค.  (https://github.com/ggerganov/whisper.cpp/issues/570)

clang symbol error

 

 

cc  -I.              -O3 -DNDEBUG -std=c11   -fPIC -pthread -DGGML_USE_ACCELERATE   -c ggml.c -o ggml.o
c++ -I. -I./examples -O3 -DNDEBUG -std=c++11 -fPIC -pthread -c whisper.cpp -o whisper.o
c++ -I. -I./examples -O3 -DNDEBUG -std=c++11 -fPIC -pthread examples/main/main.cpp examples/common.cpp ggml.o whisper.o -o main  -framework Accelerate

# main file make complete

 

6. Test File ๋ณ€ํ™˜

ffmpeg -i input.mp3 -ar 16000 -ac 1 -c:a pcm_s16le output.wav # ffmpeg๋กœ ๋†’์€ hz ํŒŒ์ผ ๋ณ€ํ™˜

 

7. Test

# ,/main -m ./models/ggml-[model size].bin -f [file].wav -ml
# model.en.bin -> en model ์‚ฌ์šฉ
# -ml = --max-len
# -l language ko -> translation ์‚ฌ์šฉ
./main -m ./models/ggml-base.en.bin -f ./samples/jfk.wav -ml 16

 

8. Result 

system_info: n_threads = 4 / 8 | AVX = 0 | AVX2 = 0 | AVX512 = 0 | FMA = 0 | NEON = 1 | ARM_FMA = 1 | F16C = 0 | FP16_VA = 1 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 0 | VSX = 0 | 

main: processing './samples/jfk.wav' (176000 samples, 11.0 sec), 4 threads, 1 processors, lang = en, task = transcribe, timestamps = 1 ...


[00:00:00.000 --> 00:00:00.850]   And so my
[00:00:00.850 --> 00:00:01.590]   fellow
[00:00:01.590 --> 00:00:04.140]   Americans, ask
[00:00:04.140 --> 00:00:05.660]   not what your
[00:00:05.660 --> 00:00:06.840]   country can do
[00:00:06.840 --> 00:00:08.430]   for you, ask
[00:00:08.430 --> 00:00:09.440]   what you can do
[00:00:09.440 --> 00:00:10.020]   for your
[00:00:10.020 --> 00:00:11.000]   country.


whisper_print_timings:     fallbacks =   0 p /   0 h
whisper_print_timings:     load time =   106.06 ms
whisper_print_timings:      mel time =    15.37 ms
whisper_print_timings:   sample time =    11.49 ms /    27 runs (    0.43 ms per run)
whisper_print_timings:   encode time =   246.60 ms /     1 runs (  246.60 ms per run)
whisper_print_timings:   decode time =    63.65 ms /    27 runs (    2.36 ms per run)
whisper_print_timings:    total time =   455.18 ms
(base) @M1 whisper.cpp-1.2.1 %
๋ฐ˜์‘ํ˜•
๋‹คํ–ˆ๋‹ค