Print Email Facebook Twitter Speech-based automatic closed caption alignment Title Speech-based automatic closed caption alignment Author Boogaard, J.A. Contributor Wiggers, P. (mentor) Geers, H. (mentor) Jongebloed, H. (mentor) Rothkrantz, L.J.M. (mentor) Faculty Electrical Engineering, Mathematics and Computer Science Department Man-Machine Interaction Group Programme Media & Knowledge Engineering Date 2010-02-10 Abstract In the Netherlands, four million people watch television programs with closed captions because they are hearing impaired or non-native speakers. Closed captions contain Dutch speech transcriptions and non-speech sound descriptions and are displayed as subtitles. Due to government obligation, the number of television programs that must be closed-captioned will increase to at least 95% in 2011. Closed caption alignment comprises the timing of the subtitles as closely as possible to the corresponding times of the video signal. Since alignment is a costly and labor intensive process demanding high quality outputs, an automated solution is desirable. This thesis addresses the application of automatic speech recognition to the task of on-line closed-captioning of television programs. The thesis focuses on the development of an automatic closed caption alignment system for TT888, a company that produces subtitles for Dutch-language television programs. Investigation of related research, consulting professional editors and analyses of a variety of captioned television programs have contributed to the development of an automatic closed-captioning system named SETH (Speech Estimating Title Heuristics). The core of the system is an algorithm capable of matching manually produced captions with speech transcriptions produced by a large vocabulary speech recognizer. The architecture of SETH combines the benefits of modular programming and the pipes and filters architecture. The best results are achieved when the speech is rather formal and nonspontaneous, pure Dutch pronounced by a native speaker and does not contain crosstalk nor background noise. Dissimilarities between the speech and captions are not a major problem as long as the captions include the most important words. The alignment algorithm is also robust to most of the insertions caused by music. Deviant language use, songs, spontaneous speech, strong regional accents are still a difficult job for the speech recognizer and hence a major problem in automatic closed caption alignment. Since there will always be broadcasts with poor speech quality, manual verification or adaptation of the subtitles remain necessary. Subject closed captioningsubtitle alignmentnatural language processingspeech recognition To reference this document use: http://resolver.tudelft.nl/uuid:282b4760-a2a4-42e4-9040-1158ef8327fa Embargo date 2013-02-03 Part of collection Student theses Document type master thesis Rights (c) 2010 Boogaard, J.A. Files PDF report.pdf 6.64 MB Close viewer /islandora/object/uuid:282b4760-a2a4-42e4-9040-1158ef8327fa/datastream/OBJ/view