Statistical NLP for Programming Languages

We are seeking to award a Microsoft PhD Scholarship on the topic of "Statistical Machine Learning and Natural Language Processing of Programming Language Text." This is a fully funded three year PhD scholarship. This project will be supervised by Dr Charles Sutton of the School of Informatics at the University of Edinburgh.

The goal of this project is to apply the advanced statistical techniques from natural language processing to a completely different and new textual domain: programming language text. Think about how you program when you are using a new library or new environment for the first time. You "program by search engine", i.e., you search for examples of people who have used the same library, and you copy chunks of code from them. The goal of this project is to systemize this process, and apply it at a large scale.

We have collected a corpus of 1.5 billion lines of source code from 8000 software projects, and we want to find syntactic patterns that recur across projects. These can then be presented to a programmer as she is writing code, providing an autocomplete functionality that can suggest entire function bodies. Statistical techniques involved include language modeling, data mining, and Bayesian nonparametrics. This also raises some deep and interesting questions in software engineering: i.e., Why do syntactic patterns occur in professionally written software when they could be refactored away?

The project is suitable for a student with a top MSc or first-class bachelor's degree in computer science, statistics, physics, or a related numerate discipline. Previous coursework or experience in statistics, machine learning, or statistical natural language processing is desirable, although we do not expect students to have all three of these. Because of the scale of the data set involved, a strong programming background will be very useful for this project.

This is an opportunity to join a world-leading research group in machine learning. The Research Programme in Machine Learning is hosted by the Institute for Adaptive and Neural Computation (ANC), a research gro up of the School of Informatics, University of Edinburgh. According to the 2008 Research Assessment Exercise (RAE), the School of Informatics, University of Edinburgh delivers more world leading (4*) research than all other RAE institutions in the computer science category, and also delivers more internationally excellent or world leading (3* and 4*) research. ANC is a world leader in Machine Learning, with 6 Academic Teaching Staff specialising in developing machine learning methods (Chris Bishop, Chris Williams, Amos Storkey, Charles Sutton, Guido Sanguinetti and Iain Murray).

For more information about the supervisor and the machine learning group at Edinburgh, see the supervisor's Web page:

The Microsoft scholarship consists of an annual bursary up to a maximum of three years. During the course of their PhD, the Scholar will be invited to Microsoft Research in Cambridge for an annual PhD Summer School that includes a series of talks of academic interest and poster sessions, which provides an opportunity to present their work to Microsoft researchers and a number of Cambridge academics.

For informal enquiries about the studentship, please contact, copying in the PhD Secretary .

Formal application must be through the School's normal PhD application process. Select the Informatics: Institute for Adaptive and Neural Computation research area.

For full consideration, please apply by January 13. However, we encourage students to apply before 16 December 2011, which is the main application deadline for the School of Informatics. All applications that arrive by January 13 will receive full consideration for this studentship, but students who apply before 16 Dec will also receive full consideration for other potential funding sources in the School of Informatics. This is especially important for overseas applicants.

Funding Notes: This is a fully funded studentship for UK and EU students. We welcome overseas applicants, and can provide funding for EU fees and maintenance for overseas students. The remaining fees component will need to come from another source. Overseas applicants are advised to apply before the standard informatics deadlines and apply for other scholarships.

Related links

Accessibility menu