AI634 | Natural Language Processing
- Rating: 4.85
- (439)
Course overview
Understanding the human language by machines is one of the important topics in computer science. There is a large range of tools and technologies for natural language processing that are used by many users in daily life: from the simplest cases such as spell checkers and grammar checkers to more complicated systems such as speech recognition, machine translation, question answering, email categorization, handwriting recognition, and search engines.
What you will learn
Course program
In this course, the main techniques and applications of natural language processing will be introduced. In addition, we briefly describe language modeling and machine learning concepts that are required to deal with language processing techniques and applications.Lessons | Name | Description | Duration | Week | Slides |
---|---|---|---|---|---|
1 | Introduction to NLP | NLP Applications and Techniques | 3 hours | 1 | NLPF2101 |
2 | Linguistic Jnowledge | Semantics, Pragmatics, Discourse, Ambiguity, Phonetics and Phonology | 3 hours | 2 | NLPF2102 |
3 | Regular Expressions | Named Entity Recognition, Information Extraction, Chatter Bot | 3 hours | 3 | NLPF2103 |
4 | Finite State Automata | D-Recognize Algorithm, Formal Language, DFSA, NFSA, State-Speech Search | 3 hours | 4 | NLPF2104 |
5 | Morphology | Morphology Trees, Morphology Parsing, Stemming, Lemmatization, Finite State Lexicon | 3 hours | 5 | NLPF2105 |
6 | Finite State Transducers | FST for Morphological Parsing, Ortographical Rules, WordNet, BioLemmatizer, CRF, LSTM | 3 hours | 6 | NLPF2106 |
7 | Basic Text Processing | Text Normalization, Corpus Datasheets, Tokenization, NLTK | 3 hours | 7 | NLPF2107 |
8 | Byte Pair Encoding | Token Learner Algorithm, Token Segmenter, Word Normalization, Porter Stemmer | 3 hours | 8 | NLPF2108 |
9 | Minimum Edit Distance | The Alignment Game, Longest Common Subsequence, Edit Distance in NLP | 6 hours | 9,10 | NLPF2109 |
10 | Back-Trace for Computing Alignments | Optimal Alignment, Back Tracking, Weighted Edit Distance, Computational Biology | 3 hours | 11 | NLPF2110 |
11 | Language Modelling | Markov Assumption, Unigram, Bigram, N-Gram, Estimating Probabilities, Google N-Gram | 3 hours | 12 | NLPF2111 |
12 | Evaluation and Perplexity | Evaluation of N-Gram Model, Shannon Game, Shakespear as Corpus, Overfitting | 3 hours | 13 | NLPF2112 |
13 | Smmothing | Add-one (Laplace) Smoothing, Interpolation, Backoff, Web-Scale LMs, Kneser-Ney Smoothing | 6 hours | 14,15 | NLPF2113 |