ggerganov commited on
Commit
2f9eebd
·
unverified ·
1 Parent(s): e4f586b

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +68 -4
README.md CHANGED
@@ -308,12 +308,76 @@ to highlight words with high or low confidence:
308
 
309
  <img width="965" alt="image" src="https://user-images.githubusercontent.com/1991296/197356445-311c8643-9397-4e5e-b46e-0b4b4daa2530.png">
310
 
311
- ## Word-level timestamps (experimental)
312
 
313
- The [main](examples/main) example has experimental support for word-level timestamp generation. The accuracy
314
- is not great, but might be improved in the future.
315
 
316
- To use it, simply add the `-owts` command-line argument. There is a free parameter `-wt` that should be around `0.01`.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
317
 
318
  Here are a few *"typical"* examples:
319
 
 
308
 
309
  <img width="965" alt="image" src="https://user-images.githubusercontent.com/1991296/197356445-311c8643-9397-4e5e-b46e-0b4b4daa2530.png">
310
 
311
+ ## Controlling the length of the generated text segments (experimental)
312
 
313
+ For example, to limit the line length to a maximum of 16 characters, simply add `-ml 16`:
 
314
 
315
+ ```java
316
+ ./main -m ./models/ggml-base.en.bin -f ./samples/jfk.wav -ml 16
317
+
318
+ whisper_model_load: loading model from './models/ggml-base.en.bin'
319
+ ...
320
+ system_info: n_threads = 4 / 10 | AVX2 = 0 | AVX512 = 0 | NEON = 1 | FP16_VA = 1 | WASM_SIMD = 0 | BLAS = 1 |
321
+
322
+ main: processing './samples/jfk.wav' (176000 samples, 11.0 sec), 4 threads, 1 processors, lang = en, task = transcribe, timestamps = 1 ...
323
+
324
+ [00:00:00.000 --> 00:00:00.850] And so my
325
+ [00:00:00.850 --> 00:00:01.590] fellow
326
+ [00:00:01.590 --> 00:00:04.140] Americans, ask
327
+ [00:00:04.140 --> 00:00:05.660] not what your
328
+ [00:00:05.660 --> 00:00:06.840] country can do
329
+ [00:00:06.840 --> 00:00:08.430] for you, ask
330
+ [00:00:08.430 --> 00:00:09.440] what you can do
331
+ [00:00:09.440 --> 00:00:10.020] for your
332
+ [00:00:10.020 --> 00:00:11.000] country.
333
+ ```
334
+
335
+ ## Word-level timestamp
336
+
337
+ The `--max-len` argument can be used to obtain word-level timestamps. Simply use `-ml 1`:
338
+
339
+ ```java
340
+ ./main -m ./models/ggml-base.en.bin -f ./samples/jfk.wav -ml 1
341
+
342
+ whisper_model_load: loading model from './models/ggml-base.en.bin'
343
+ ...
344
+ system_info: n_threads = 4 / 10 | AVX2 = 0 | AVX512 = 0 | NEON = 1 | FP16_VA = 1 | WASM_SIMD = 0 | BLAS = 1 |
345
+
346
+ main: processing './samples/jfk.wav' (176000 samples, 11.0 sec), 4 threads, 1 processors, lang = en, task = transcribe, timestamps = 1 ...
347
+
348
+ [00:00:00.000 --> 00:00:00.320]
349
+ [00:00:00.320 --> 00:00:00.370] And
350
+ [00:00:00.370 --> 00:00:00.690] so
351
+ [00:00:00.690 --> 00:00:00.850] my
352
+ [00:00:00.850 --> 00:00:01.590] fellow
353
+ [00:00:01.590 --> 00:00:02.850] Americans
354
+ [00:00:02.850 --> 00:00:03.300] ,
355
+ [00:00:03.300 --> 00:00:04.140] ask
356
+ [00:00:04.140 --> 00:00:04.990] not
357
+ [00:00:04.990 --> 00:00:05.410] what
358
+ [00:00:05.410 --> 00:00:05.660] your
359
+ [00:00:05.660 --> 00:00:06.260] country
360
+ [00:00:06.260 --> 00:00:06.600] can
361
+ [00:00:06.600 --> 00:00:06.840] do
362
+ [00:00:06.840 --> 00:00:07.010] for
363
+ [00:00:07.010 --> 00:00:08.170] you
364
+ [00:00:08.170 --> 00:00:08.190] ,
365
+ [00:00:08.190 --> 00:00:08.430] ask
366
+ [00:00:08.430 --> 00:00:08.910] what
367
+ [00:00:08.910 --> 00:00:09.040] you
368
+ [00:00:09.040 --> 00:00:09.320] can
369
+ [00:00:09.320 --> 00:00:09.440] do
370
+ [00:00:09.440 --> 00:00:09.760] for
371
+ [00:00:09.760 --> 00:00:10.020] your
372
+ [00:00:10.020 --> 00:00:10.510] country
373
+ [00:00:10.510 --> 00:00:11.000] .
374
+ ```
375
+
376
+ ## Karaoke-style movie generation (experimental)
377
+
378
+ The [main](examples/main) example provides support for output of karaoke-style movies, where the
379
+ currently pronounced word is highlighted. Use the `-wts` argument and run the generated bash script.
380
+ This requires to have `ffmpeg` installed.
381
 
382
  Here are a few *"typical"* examples:
383