• Overview
  • Assignments

Course description

Speech recognition is a growing part of many applications in a wide variety of industries, from call centers and mobile internet applications to medical dictation. However, the technology is far from perfect when compared to human performance. This course covers speech recognition and synthesis from both applied and theoretical perspectives. Students will build a speech application using commercial tools, then work through the underlying components and algorithms to understand how the state of the art can be moved forward. Topics include phonetics, Hidden Markov Models, finite state grammars, statistical language models, conversational systems, speech synthesis and industry standards for implementing applications such as VXML.

Schedule

Topics and assignments for each class are posted on the schedule page. Please check this reguarly, as it may change throughout the year.
Details on the assignments are posted on the assignments page. Again, please check this reguarly, I'll update it as the assignments get closer.

Grading

There will be the following types of gradable elements in class. Due dates will be posted on the schedule page and announced in class. No extensions will be considered after the due date of the assignment for any reason and extensions will only be considered for well articulated reasons. If that reason is because you didn't understand the problem or weren't able to access data, etc, then it needs to be well in advance of the actual due date. Bottom line: Start early and communicate.
Policy on working together: Unless it is specifically stated in the assignment, all assignments must be done independently. However, when working with 3rd party toolsets, you may collaborate on getting the tools installed and running. In order to make this collaboration fair for everyone, you must post questions and answers on the class Latte blog, even if it's just a summary of a hallway conversation. If it was helpful, share it.
Type Percent of grade Description
Programming Assignments 60% These will include actually building a speech recogition application, using speech tools to build new models to improve performance, and analyzing data and writing short reports on how something might change given different conditions. There will be 4 - 5 programming assignments over the year.
Quizes, Presentations 30% Quizes are in class or take home with 4-6 questions on the material covered in class. If you miss a quiz you need to make it up. They will be roughly every 2-3 weeks. In addition there will be some reading with presentations.
Class Participation 10% Attendance and paying attention and answering questions, particpation on class Latte discussions. Throughout the semester, I will post questions about the readaing or class material. You should make at least one substantive comment per post.

Due Dates for 2015 still under construction

Submitting Assignments:

When submitting assignments

  • Zip all files together with your name in the file name
  • Include you name in the zip file
  • Submit through Latte

Late submission rules

  • You can only get an extension by asking for permission before the due date
  • No permission will be granted after the due date except for cases of dire emergency
  • While I accept all reasonable excuses, don't overuse the privilege
  • Waiting until the 11th hour to start an assignment is not a reasonable excuse

Links to Assignments

Assignment 1: Review a speech recognition application

Assignment 2: Create and test a speech Grammar on the AT&T Mashup

Assignment 1: Speech application review

Review due September 1st and at least one comment by the 3rd (can continue discussions beyond that, of course)

  • Select a speech application and try it out. Be creative--there are a lot of them out there.
  • Write a short review (2-3 paragraphs)
    • Describe the application (functionality, platform)
    • Report how well it works, including what you tried, what worked and what didn't
    • Describe overall usefulness and limitations
    • Optionally include information about the company or (attributed) quotes from other product reviews
  • Post it to the "Speech Recognition Application review"s forum on Latte.
  • Read the reviews from other classmates and comment on how it compares to the app your reviewed, whether you've used anything similar, or ask for more information about the app or its performance.
  • Throughout the semester if you run across new applications or information about existing apps that you think is interesting, post them to this forum

Assignment 2: Evaluating Speech Recognizers

  • Target domain: Ordering a pizza
  • Detailed information about the assigment and the data are on Git
  • You will receive "dev" and "real" test sets made up from audio and transcription each member of the class will submit
  • You will receive text data for language modeling from both previous classes assignments and automatically generated data

Sept. 9

  • Record 10 sentences.
  • Submit audio in 16K and 8K and transcription (.ref file).
  • Follow formatting guidelines in Git

Sept 11

  • Baseline on your sentences: Recognition in the mashup and scoring using sclite
  • Receive “dev test” (5 sents from everyone in the class)
  • Receive LM text data

Sept 18

  • Submit results on Dev test: Your best grammar run, a baseline LM run with all the sentences

Sept 22

  • 3 sets of recognition results on Real Test
  • Your best grammar on Mashup
  • Your best LM on Mashup
  • NOTE: The mashup only lets you give it a set of sentences, so improvements are based on the selection of data, not any fancy LM tricks. Google chrome
  • 2 page discussion comparing the results
KEEP ALL YOUR SCRIPTS IN GITHUB

Assignment 3: Perplexity using CMU and SRI toolkits DUe October 13

  • Step 1: Baseline
    • Install both the CMU and SRI LM toolkits
    • build a baseline language model using a training dataset you ran through the Mashup
    • run a baseline perplexity on the the dev test data (save the real test for later)
  • Step 2: Improve the model (or at least change it)
    • Use at least 3 different techniques to change the perplexity
    • examples include changing the backoff/discounting strategies, interpolation, class grammars, additional data, data selection,...
    • Keep track of everything you change and what it resulted in, whether it got better or worse.
  • Step 3: Write a report describing your experiments and discussing the results
Warning: file_to_include.html could not be included.