Our projects, contracts and research interests

Research carried out by SPERTUS has obtained funding from bodies and institutions such as the Spanish Ministry of Science, Innovation and Universities, the Spanish Ministry of Economy and Competitiveness, the European Union, the Regional Government of Galician and the European Regional Development Fund, among others. We have also signed several contracts with public institutions such as the Instituto Cervantes, publishers i.e. Cambridge Scholars Publishers, CUP, John Benjamins, and companies i.e. Construdata.

CMC Project

Communicating in Multilingual Contexts: Awareness and development of academic language skills for mobility students

This project was funded with support from the European Commission. It was completed in 2005 and was awarded the European language label in 2006.
The objective of the project was to enable university students to improve the quality of their linguistic knowledge as required in trans-national higher education contexts. Specifically, the project aimed at producing a software tool which could offer multimedia customized language learning materials that could allow exchange students to develop and improve their academic language skills. All this material was designed by a partnership made up of six university institutions located in England (London School of Economics), Poland (Wyzsza Szkola Informatyki, Zarzadzania I Administracji), Italy (Università della Calabria), Portugal (Instituto Politecnico de Castelo Branco), Slovakia (Technická Univerzita v Košiciach) and Spain (Universidade de Santiago de Compostela) The programme developed a multilingual e-learning environment which included two principal parts: the academic language learning materials and the information part.
The academic language learning programme contained two academic modules divided into four didactic units (per each module) with tasks, activities and a glossary, which are all located on the CMC website with a common academic template for all partners.
Further information can be obtained at: http://www.cmceproject.it/portale/cmc.cfm

CMC_E Project
Communicating in Multilingual Contexts Meets the Enterprises

The CMC_E project was aimed to encourage the best use of innovative materials developed in the context of the CMC Program (Socrates Programme Action Lingua 2) previously described. It was completed in 2010. The CMC materials promote the development of professional language skills in English and Spanish as well as in less used and less taught languages such as Dutch, Italian, Portuguese, Slovak and Polish, in order to highlight the need for a broader multilingual community. CMC_E moves forward by offering materials which promote the development of linguistic professional skills.
The objectives of this project were to:

  • Enable university students to improve the quality of their language knowledge as required in trans-national higher education contexts, through the Content and Language Integrated Learning (CLIL) approach.
  • Contribute to the development of academic and professional language skills in six different languages.
  • Create a multilingual network and culturally-diverse environment in accordance with EU policies.

The target groups of this project were:

  • Mobility students who wish to carry out a university experience abroad and thus need to master essential academic language skills.
  • University leavers preparing to enter the labour market and therefore in need of acquiring professional language skills which help them become more competitive.
  • In-service workers who need to develop and reinforce their professional language competences in order to fulfil their job responsibilities.

The partners for this project were the same as those for the CMC Project.
Further information can be obtained at: http://www.cmceproject.it/portale/

Santiago University Learner of English Corpus

This is an English learner corpus whose compilation was initiated in October 2002 with the funding of the Galician Department of Education. Our aim is to create a corpus of at least 1.000.000 words of oral and written learners English with materials collected from university and secondary school students of all levels (elementary, intermediate and advanced). The first stage in the compilation and development of this corpus was concluded in 2007. It currently contains half a million words.
Spoken data are collected through semistructured interviews, short oral presentations and brief story descriptions; all of which are recorded in audio and occasionally also in video format. The written part of the corpus is gathered from compositions or argumentative essays following criteria similar to those of ICLE (International Corpus of Learner English). All the data collected by these research instruments are transcribed and computerised. The corpus also provides an application tool to conduct different types of simple and more complex searches.
In 2021 we intend to start a new stage in the development of this corpus by compiling new samples, tagging the existing materials and implementing the current interface and search application tool.
Further information and access to the corpus can be obtained at:

Dictionary of language teaching and learning

This is a project that has been funded by the Xunta de Galicia. This dictionary of language teaching and learning ​​is conceived mainly as a reference material, particularly addressed to language teachers in Galicia, Spain and Latin American countries, specialists in foreign languages ​​of all educational levels (Primary, Secondary - ESO, Bachillerato - Official Language Schools and University - Philology, Translation, Education, Psychology, Pedagogy, Psychopedagogy). Teachers of Spanish and Galician as L1 and L2 will also find here useful information.
The first printed version of this dictionary was published in Spanish by En CLAVE / ELE in 2007. This edition was followed by a Galician version in in 2009 jointly funded by the University of Santiago de Compostela, the University of A Coruña and the University of Vigo.
The current electronic version (DICENLEN 1.0, February 2019) is made up of 1,400 primary entries in Spanish and Galician alphabetically organised. It contains the basic vocabulary of language learning and teaching and it covers topics concerning the design and evaluation of language programs and courses, language teaching methodology, teacher training, learning factors and theories and key linguistic terms for language teachers.
This electronic dictionary is organised as follows:

  • Entry in Spanish / Galician alphabetically ordered. The following information is included for each of the entries: (i) definition illustrated with examples in different languages, ​​when necessary; (ii) related entries, if relevant; (iii) selection of bibliographical references intended for those who want to search for further information.
  • Equivalent terms in other languages (English, French, German, Polish, Portuguese, Russian and Italian) in addition to Spanish and Galician.
  • Appendices of online learning resources for each of the languages ​mentioned above​(except for Polish, Portuguese and Russian) apart from Spanish as a foreign language and Galician.

In 2021 and following years we intend to update, add new entries to the dictionary and implement the existing multilingual glossary.
Further details and access to the dictionary can be obtained at: https://www.dicenlen.eu/es

A Corpus of Online Forums in Higher Education

This is a corpus compiled by our group between 2014 and 2017. It currently contains almost 600,000 words with the asynchronous interactions of 520 university students of different nationalities and L1 in academic forum discussions. In 2021 and following years we intend to implement the on-line application search tool and continue in the study and exploitation of this material by paying special attention to the following issues and topics: interactions structure, opening and ending conventions; strategies used to reinforce the interpersonal relationships of the participants in the forum, i.e. use of informal expressions, criticism mitigators; use of expressive resources typical of online communication, i.e. emoticons, non-standard spelling; frequency and type of reference to other participants and to different authority sources (lecturers, documents, other participants, etc.); characteristic traits of English as a lingua franca in this context of communication such as forms of lexical and phraseological creativity, simplification and accommodation processes, etc; use of concessive and other subordinated structures (conditional, causal, etc.) in the expression of argumentation and persuasion considering the role of different variables (gender, participants’ L1, academic year period, etc.); explore the differences between the expression of argumentation in on-line forums and other genres of argumentative nature in both native and non-native contexts.
Further details and access to the corpus at: http://www.suncodac.com/


Over the last two decades a new multiethnolect has emerged in London, widely known as Multicultural London English (MLE), but also as New Cockney or even as Jafaican/Jafaikan, that is, fake Jamaican, because a large number of its speakers use an accent and expressions typical from the Caribbean, more in particular, from Jamaica. MLE has been formed by a feature pool derived from local varieties (namely Cockney), plus other UK dialects of English, standard English and the expression of an array of speakers from different backgrounds: Caribbean, African-American, Indian, North-African and Asian. Similar developments have taken place in other multilingual European cities and even within the UK to the extent that some scholars refer to the existence of a Multicultural Urban British English.
Since 2011 we have been studying different lexical, grammar and discourse aspects of this variety such as intensifiers, quotatives, negatives, pragmatic markers, invariant tags, etc.
In 2021 and the years to come, we intend to investigate all these issues even further and incorporate new areas such as the language attitudes held by the speakers of this variety of English towards their own mode of expression as well as those shown by other speakers who are not users of this sociolect. The impact that these attitudes may have in education and in the labour market will also be the object of study. We will also consider up to what extent these attitudes, negative in many cases, can be modified and changed. For that purpose, we will examine data extracted from the media and social networks such as Twitter, YouTube and Instagram.
Apart from this, we will also continue with the compilation of the MLE lexical dataset that currently contains over 300 entries, all of them organised according to the word-class, origin, semantic field, meaning, source and use.


New technologies (NTs) have transformed personal, academic and professional communication in our society. They have yielded to new digital genres that, generally speaking, are still to be defined. This project intends to describe these codification processes in two of these communities: the speakers of the variety of English known as Multicultural London English (MLE); and university students who use English as a lingua franca (ELF). This choice responds to two main reasons: 1) Among MLE speakers there are a great number of young people who are at the vanguard in the use of NTs and who lead the changes in the language in this medium of expression; 2) Computer-mediated communication (CMC) is playing a relevant role in the current educational setting, and, in many cases, this is motivated by the internationalisation of academic life, the users are multicultural communities that use English as a lingua franca.
This proposal fills then two important gaps: 1) to give new life to the research on MLE, which is burdened by the evident obsolescence of the existing corpora and by the fast development of this variety; and, 2) to complement the studies on digital communication in the educational domain, traditionally interested in native users of English, with a multicultural approach.
The project is built around three main research lines, two of them of theoretical-descriptive character and the other one of applied nature:
1) The compilation and analysis of innovative data about the written communicative habits of the MLE community in genres such as Twitter, Instagram, YouTube, discussion forums, with the aim of describing the characteristics of this variety in the new digital genres. Our main
lines of study will be: a) the deviation from Standard English, particularly regarding grammar; b) the presence of characteristic features of digital communication; and c) the processes of oral-written hybridisation.
2) The study of the main characteristics of the genre of online forums of written academic discussion, paying special attention to: a) their structure and rhetoric functions; b) the mechanisms of group cohesion or social presence; and, c) the management of the different voices that participate in the contraposition of different points of view.
3) The applications of digital genres, particularly of the discussion forums and social networks (Facebook), together with collaborative writing activities, to the teaching and learning of second languages with special attention to the written skills.

Multilingual Student Translation Project

Two of the members of our research group, Mario Cal Varela and Francisco Javier Fernández Polo, are currently participating  in the Multilingual Student Translation (MUST) project which was launched in 2016 by Sylviane Granger and Marie-Aude Lefer at the Université Catholique de Louvain. This is an international project which aims to collect a large multilingual student translation corpus with rich, standardized metadata that combines insights from both Learner Corpus Research and Corpus-Based Translation Studies. More than forty research teams have joined MUST since its creation. The MUST corpus, which is currently only made available to the MUST community, covers a wide range of language pairs and currently amounts to c. 2 million tokens. It is searchable via a web-based interface, called Hypal4MUST, a tailor-made version of the Hybrid Parallel Text Aligner (Hypal) software tool developed by Adam Obrusnik. A standardized translation-oriented annotation system called TAS has been designed within the framework of the project and integrated into Hypal4MUST.
Further information at: https://uclouvain.be/en/research-institutes/ilc/cecl/must.html

Corpus de Aprendices de Español

This is a free-access learner corpus of Spanish that is being compiled with funding from the Instituto Cervantes in collaboration with the USC research group “Gramática del Español”, under the supervision of Prof. Guillermo Rojo and Prof. Ignacio Palacios.
CAES is a collection of written texts produced by students of Spanish as a foreign language of different levels, from A1 to C1, according to the Common European Framework of reference; subjects of six native or L1 languages are represented: Arab, Mandarin Chinese, French, English, Portuguese and Russian.  The current CAES version contains samples produced by 1424 students who produced two or three texts in keeping with their level and which came up to a total of 3881 written tasks integrated in 1424 samples. The size of the corpus is around half a million words.
Participants were asked to complete a number of written tasks in keeping with their certified level of Spanish. These tasks were designed according to the Common European Framework  descriptors for each of the levels as well as according to the guidelines provided by the Instituto Cervantes regarding  the DELE tests (General Certificate of Spanish as a Foreign Language) for each of the three levels (beginner, intermediate and advanced)  and the Instituto Cervantes General Curricular Document.
The whole corpus has been fully morpho-syntactically annotated. A research tool has also been designed that allows researchers to retrieve statistical information and examples of textual elements, lemmas, word classes and grammatical categories using the filters included as parameters in the corpus (mainly L1 and level together with participant's age, gender and country).
In the next few years, we intend to annotate morpho-syntactically the samples of learners of different L1 that have been collected but have not yet been properly organized and treated. This will allow us to integrate in the corpus new material with samples of learners of other L1 apart from those above mentioned. The incorporation of new functions in the search tool is also contemplated as one of the next actions to undertake.
Further details and access to the corpus can be obtained at: https://galvan.usc.es/caes

 Corpus de Nativos de Español

This is conceived in the form of a parallel corpus of CAES and will allow us to conduct contrastive studies between native and non-native language use. It has been designed under similar conditions and according to exactly the same criteria and parameters as the previous corpus. In the following years, we intend to start with the compilation of the samples produced by secondary school and university students who will directly enter the data in the computer tool that has been designed for that very purpose. Once this part has been completed, the next stages will be focused on the corpus annotation and the development of the search tool.