Spaces:
Running
Running
Update README.md
Browse files
README.md
CHANGED
|
@@ -308,12 +308,76 @@ to highlight words with high or low confidence:
|
|
| 308 |
|
| 309 |
<img width="965" alt="image" src="https://user-images.githubusercontent.com/1991296/197356445-311c8643-9397-4e5e-b46e-0b4b4daa2530.png">
|
| 310 |
|
| 311 |
-
##
|
| 312 |
|
| 313 |
-
|
| 314 |
-
is not great, but might be improved in the future.
|
| 315 |
|
| 316 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 317 |
|
| 318 |
Here are a few *"typical"* examples:
|
| 319 |
|
|
|
|
| 308 |
|
| 309 |
<img width="965" alt="image" src="https://user-images.githubusercontent.com/1991296/197356445-311c8643-9397-4e5e-b46e-0b4b4daa2530.png">
|
| 310 |
|
| 311 |
+
## Controlling the length of the generated text segments (experimental)
|
| 312 |
|
| 313 |
+
For example, to limit the line length to a maximum of 16 characters, simply add `-ml 16`:
|
|
|
|
| 314 |
|
| 315 |
+
```java
|
| 316 |
+
./main -m ./models/ggml-base.en.bin -f ./samples/jfk.wav -ml 16
|
| 317 |
+
|
| 318 |
+
whisper_model_load: loading model from './models/ggml-base.en.bin'
|
| 319 |
+
...
|
| 320 |
+
system_info: n_threads = 4 / 10 | AVX2 = 0 | AVX512 = 0 | NEON = 1 | FP16_VA = 1 | WASM_SIMD = 0 | BLAS = 1 |
|
| 321 |
+
|
| 322 |
+
main: processing './samples/jfk.wav' (176000 samples, 11.0 sec), 4 threads, 1 processors, lang = en, task = transcribe, timestamps = 1 ...
|
| 323 |
+
|
| 324 |
+
[00:00:00.000 --> 00:00:00.850] And so my
|
| 325 |
+
[00:00:00.850 --> 00:00:01.590] fellow
|
| 326 |
+
[00:00:01.590 --> 00:00:04.140] Americans, ask
|
| 327 |
+
[00:00:04.140 --> 00:00:05.660] not what your
|
| 328 |
+
[00:00:05.660 --> 00:00:06.840] country can do
|
| 329 |
+
[00:00:06.840 --> 00:00:08.430] for you, ask
|
| 330 |
+
[00:00:08.430 --> 00:00:09.440] what you can do
|
| 331 |
+
[00:00:09.440 --> 00:00:10.020] for your
|
| 332 |
+
[00:00:10.020 --> 00:00:11.000] country.
|
| 333 |
+
```
|
| 334 |
+
|
| 335 |
+
## Word-level timestamp
|
| 336 |
+
|
| 337 |
+
The `--max-len` argument can be used to obtain word-level timestamps. Simply use `-ml 1`:
|
| 338 |
+
|
| 339 |
+
```java
|
| 340 |
+
./main -m ./models/ggml-base.en.bin -f ./samples/jfk.wav -ml 1
|
| 341 |
+
|
| 342 |
+
whisper_model_load: loading model from './models/ggml-base.en.bin'
|
| 343 |
+
...
|
| 344 |
+
system_info: n_threads = 4 / 10 | AVX2 = 0 | AVX512 = 0 | NEON = 1 | FP16_VA = 1 | WASM_SIMD = 0 | BLAS = 1 |
|
| 345 |
+
|
| 346 |
+
main: processing './samples/jfk.wav' (176000 samples, 11.0 sec), 4 threads, 1 processors, lang = en, task = transcribe, timestamps = 1 ...
|
| 347 |
+
|
| 348 |
+
[00:00:00.000 --> 00:00:00.320]
|
| 349 |
+
[00:00:00.320 --> 00:00:00.370] And
|
| 350 |
+
[00:00:00.370 --> 00:00:00.690] so
|
| 351 |
+
[00:00:00.690 --> 00:00:00.850] my
|
| 352 |
+
[00:00:00.850 --> 00:00:01.590] fellow
|
| 353 |
+
[00:00:01.590 --> 00:00:02.850] Americans
|
| 354 |
+
[00:00:02.850 --> 00:00:03.300] ,
|
| 355 |
+
[00:00:03.300 --> 00:00:04.140] ask
|
| 356 |
+
[00:00:04.140 --> 00:00:04.990] not
|
| 357 |
+
[00:00:04.990 --> 00:00:05.410] what
|
| 358 |
+
[00:00:05.410 --> 00:00:05.660] your
|
| 359 |
+
[00:00:05.660 --> 00:00:06.260] country
|
| 360 |
+
[00:00:06.260 --> 00:00:06.600] can
|
| 361 |
+
[00:00:06.600 --> 00:00:06.840] do
|
| 362 |
+
[00:00:06.840 --> 00:00:07.010] for
|
| 363 |
+
[00:00:07.010 --> 00:00:08.170] you
|
| 364 |
+
[00:00:08.170 --> 00:00:08.190] ,
|
| 365 |
+
[00:00:08.190 --> 00:00:08.430] ask
|
| 366 |
+
[00:00:08.430 --> 00:00:08.910] what
|
| 367 |
+
[00:00:08.910 --> 00:00:09.040] you
|
| 368 |
+
[00:00:09.040 --> 00:00:09.320] can
|
| 369 |
+
[00:00:09.320 --> 00:00:09.440] do
|
| 370 |
+
[00:00:09.440 --> 00:00:09.760] for
|
| 371 |
+
[00:00:09.760 --> 00:00:10.020] your
|
| 372 |
+
[00:00:10.020 --> 00:00:10.510] country
|
| 373 |
+
[00:00:10.510 --> 00:00:11.000] .
|
| 374 |
+
```
|
| 375 |
+
|
| 376 |
+
## Karaoke-style movie generation (experimental)
|
| 377 |
+
|
| 378 |
+
The [main](examples/main) example provides support for output of karaoke-style movies, where the
|
| 379 |
+
currently pronounced word is highlighted. Use the `-wts` argument and run the generated bash script.
|
| 380 |
+
This requires to have `ffmpeg` installed.
|
| 381 |
|
| 382 |
Here are a few *"typical"* examples:
|
| 383 |
|