Course finder

<< return to browsing

Semester 2

Foundations of Natural Language Processing (INFR10078)

Subject

Informatics

College

SCE

Credits

Normal Year Taken

Delivery Session Year

2023/2024

Pre-requisites

Understanding of basic probability; e.g., Bayes RuleFamiliar with basic computational processes: e.g., recursion, dynamic programmingAble to code in Python.Basic knowledge of linguistic categories: e.g., Noun, Verb.Familiar with first order logic.

Course Summary

This course covers some of the linguistic and algorithmic foundations of natural language processing (NLP). It builds on algorithmic and data science concepts developed in second year courses, applying these to NLP problems. It also equips students for more advanced NLP courses in year 4. The course is strongly empirical, using corpus data to illustrate both core linguistic concepts and algorithms, including language modelling, part of speech tagging, syntactic processing, the syntax-semantics interface, and aspects of semantic and pragmatic processing. The theoretical study of linguistic concepts and the application of algorithms to corpora in the empirical analysis of those concepts will be interleaved throughout the course.

Course Description

An indicative list of topics to be covered include:1. Lexicon and lexical processing:* morphology* language modeling* hidden Markov Models and associated algorithms* part of speech tagging (e.g., for a language other than English) to illustrate HMMs* smoothing* text classification2. Syntax and syntactic processing:* the Chomsky hierarchy* syntactic concepts: constituency (and tests for it), subcategorization, bounded and unbounded dependencies, feature representations* context-free grammars * lexicalized grammar formalisms (e.g., dependency grammar)* chart parsing and dependency parsing (eg, shift-reduce parsing)* treebanks: lexicalized grammars and corpus annotation * statistical parsing3. Semantics and semantic processing:* word senses: regular polysemy and the structured lexicon; distributional models; word embeddings (including biases found)* compositionality, constructing a formal semantic representation from a (disambiguated) sentential syntactic analysis.* predicate argument structure* word sense disambiguation* semantic role labelling* pragmatic phenomena in discourse and dialogue, including anaphora, presuppositions, implicatures and coherence relations.* labelled corpora addressing word senses (e.g., Brown), semantic roles (e.g., Propbank, SemCor), discourse information (e.g., PDTB, STAC, RST Treebank). 4. Data and evaluation (interspersed throughout other topics):* cross-linguistic similarities and differences* commonly used datasets * annotation methods and issues (e.g., crowdsourcing, inter-annotator agreement)* evaluation methods and issues (e.g., standard metrics, baselines)* effects of biases in data

Assessment Information

Written Exam 75%, Coursework 25%, Practical Exam 0%

Additional Assessment Information

Tutorials and labs will both consist of exercises, from which the students will receive formative feedback from the tutors and demonstrators.

view the timetable and further details for this course

Disclaimer

All course information obtained from this visiting student course finder should be regarded as provisional. We cannot guarantee that places will be available for any particular course. For more information, please see the visiting student disclaimer:

Visiting student disclaimer

This article was published on 27 Apr, 2022