Heritage speakers represent a special category of bilinguals who are exposed to their first language at home in the childhood, but later acquire the main language of their society that becomes dominant. Brazil has numerous communities of heritage speakers of many languages such as Japanese, German, Italian, Polish, and Ukrainian; however, only few speech corpora are being collected. Here we describe the protocol of the data collection and discuss some points about data management for the BraPoRus (Brazilian Portuguese-Russian) corpus, a spoken corpus of heritage Russian in Brazil. The participants are 26 elderly speakers who were born in Brazil or came to Brazil as children in the 1950s. The protocol of the data collection includes: 1) a brief sociodemographic questionnaire; 2) a working memory test in Russian and Brazilian Portuguese using the Month-Ordering task; 3) a semi-spontaneous narrative about the history of the participants’ family and their immigration to Brazil; 4) the Bilingual Language Profile; 5) a sociolinguistic interview with 139 questions; 6) unscripted dialogues between heritage speakers in Russian; 7) intonation task; and 8) reading task. The BraPoRus corpus contains 167.5 hours of speech recordings and represents a unique collection of heritage Russian spoken in Brazil. We expect that the protocol described in this work will be useful both for Brazilian linguists who study other heritage languages, and for researchers of heritage Russian in other countries.
Os falantes de herança representam uma categoria especial de bilíngues que adquirem a sua primeira língua no ambiente familiar na infância, e a sua segunda língua, que com o tempo se torna dominante, na sua vida em sociedade. No Brasil, há muitas comunidades que falam línguas de herança como japonês, alemão, italiano, polonês e ucraniano. No entanto, poucos trabalhos de preservação dessas línguas, com a coleta de
The flow of Russophone migrants who searched for a place to start a new life in Brazil has intensified since the end of the 19th century. In the first half of the 20th century, Russia went through the Bolshevik Revolution and two world wars. After the Bolshevik Revolution of 1917, a number of Russophone migrants moved to Brazil, but most of them arrived in Brazil in the 1940s-1950s, after having lived in Europe or China, especially in Harbin. These migrants lived in communities and their children learned Russian in the family environment as their parents' heritage language. However, outside the home, in most cases, they spoke Portuguese. Few went to Russian schools. Thus, the children spoke one language at home and another outside the home, and the older they grew, the less they used Russian, their heritage language. In the current work, we present strategies to study heritage Russian in Brazil within the scope of a project entitled BraPoRus that aims to analyze linguistic, behavioral and cultural aspects of the Russophone migrant community. To date, 26 heritage Russian speakers have participated in the survey, allowing us to record 167.5 hours of speech in Russian.
Heritage speakers represent a special category of bilinguals who are exposed to their first language at home in the childhood, but later acquire the main language of their society that becomes dominant (CUMMINS, 2005; POLINSKY; KAGAN, 2007). Heritage languages are acquired under conditions of variable qualitative and quantitative input, and the linguistic competences of the heritage speakers change over the course of the lifespan (D’ALESSANDRO; NATVIG; PUTNAM, 2021; MONTRUL, 2016).
Brazil has numerous communities of heritage speakers of many languages such as Japanese (ABREU MORATO, 2011); German, including a rare dialect Hunsrückisch (
Here we describe the protocol of data collection and discuss some points about data management for the BraPoRus (Brazilian Portuguese-Russian) corpus, a spoken corpus of heritage Russian in Brazil. The participants are 26 elderly speakers who were born in Brazil or came to Brazil in the 1950s as children. This corpus is being recorded since February, 2021, when the pandemic COVID-19 restrictions made it mandatory to switch to online data collection for linguistic research projects. The protocol of the data collection contains a variety of tasks developed by researchers from different areas, from phonetics to psycholinguistics. The tasks include: 1) a brief demographic questionnaire; 2) a working memory test in Russian and Brazilian Portuguese using the Month-Ordering task (KEMPLER et al., 1998; GORAL et al., 2011); 3) a semi-spontaneous narrative about the history of the participants’ family and their immigration to Brazil; 4) the Bilingual Language Profile (BLP) (BIRDSONG; GERTKEN; AMENGUAL, 2022); 5) a sociolinguistic interview with 139 questions adapted from the long HLVC (Heritage Language Variation and Change, Toronto) questionnaire (NAGY, 2016); 6) unscripted dialogues between heritage speakers in Russian; 7) intonation task; and 8) reading task. In our recent work, we confirmed that the acoustic quality of the BraPoRus remote recordings is satisfactory making them adequate for basic acoustic analysis and automatic transcription (SEKERINA et al., in revision to resubmit).
The history of Russophone immigration to Brazil is described in a number of dissertations, theses and books (BYTSENKO, 2006; CHNEE, 2016; HIGA, 2015; RUSEISHVILI, 2016; VOROBIEFF, 2006; ZABOLOTSKY, 1998). The first immigrants from the Russian Empire arrived in Brazil at the turn of the 19th century to the 20th; most of them were Russian German and religious minorities immigrants, such as Old Believers (BYTSENKO, 2006). After the Bolshevik Revolution, four Russian immigration waves took place in Brazil comprising the following categories of the immigrants: 1) 1921 – 1941: the civil and military refugees from the former Russian Empire; 2) the post Second World War period: Soviet Displaced Persons (DPs) and the reminiscent families of the “white” Russian community in Europe; 3) the 1950s: Russian “white” stateless refugees from China; 4) since 1991: the contemporary immigration initially stimulated mainly by economic reasons (RUSEISHVILI, 2016; SMIRNOVA HENRIQUES; RUSEISHVILI, 2019; SKOROBOGATOVA et al., 2021; VOROBYEVA; ALESHKOVSKI; GREBENYUK, 2018).
We presented the relevant IBGE statistics about the number of Russians who had entered Brazil until the 1950s in our recent report (SKOROBOGATOVA et al., 2021). The Russian immigrants that arrived in Brazil from China, mainly Harbin, in the 1950s, present a special interest for our study because the Russian community in China until the 1950s was very strong, and this created the unique conditions for the Russian language preservation there (OGLEZNEVA, 2009). In addition, nearly half of the Chinese Russians that arrived in Brazil were infants and young people under 29 years (RUSEISHVILI, 2018). In São Paulo, Russian schools for these children functioned until the 1970s (HIGA, 2015; SKOROBOGATOVA et al., 2021; VOROBIEFF, 2006). We estimate that up to 1.500 of these Chinese Russians could still be alive (SKOROBOGATOVA et al., 2021) and, at least to some degree, continue to speak Russian.
Language dominance is dynamic and changes across the life course in all the bilinguals (GROSJEAN; LI, 2013). The heritage speakers pass through the stages of dominance in their heritage language in the childhood, more or less balanced bilingualism at the intermediate stage and finally enter the stage when the proficiency in the heritage language is dwindling (AALBERSE; BACKUS; MUYSKEN, 2019). When the population of heritage speakers ages, most of them find themselves at this last stage when they use their heritage language mainly for home and family topics.
In addition, the speech of elderly people can bring some age-specific characteristics that do not come from their bilingual conditions even when the aging is healthy (BOT; MAKONI, 2005). To characterize the age effects on the speech and to study the language variation and communication abilities in later life, corpora of speech of elderly people have recently started to be collected (BOLLY; BOUTET, 2018). The aging lexicon is affected by both environmental exposure and several cognitive mechanisms associated with learning, representation, and retrieval of information (WULLF et al., 2019). Aging in multilingual contexts requires additional factors to be considered (BOT; MAKONI, 2005), and represents a new exciting area to which the BraPoRus corpus can make an important contribution.
A special category of the moribund (endangered) heritage languages contains the languages that are spoken by elderly individuals who represent a last generation of proficient speakers: in this case, the language dies together with these last speakers (D’ALESSANDRO; NATVIG; PUTNAM, 2021). While Russian is not a moribund language, the descendants of the Russian emigrants that left Russia after the Bolshevik revolution speak its frozen variety that preserves some features of the language from one hundred years ago. Oglezneva (2009) shows that Russian spoken in Harbin in the first half of the 20th century preserves some archaic words, and says that the term “archaic” is relative: while some words are considered archaic in Russia, they are currently used abroad by the Russian emigrants and can be important for the self-identification in the community. She gives examples of using the old Russian units of measurements as “verst”, “sazhen’”, “pood” and “funt” instead of the metric system terms established in Russia as mandatory in 1917.
D’Alessandro et al. (2021) highlight three groups of challenges faced by researchers who work with moribund heritage languages. First, the baseline challenge: a baseline standard spoken by previous generations cannot be established. When a baseline is not established, it can be hard to separate the heritage features of the spoken language and the influence of the society language (AALBERSE; BACKUS; MUYSKEN, 2019). Any heritage speakers should be studied in the context of their bilingual or multilingual conditions. Most heritage Russian speakers in Brazil speak Portuguese most of the time, this means that their Russian can suffer some Portuguese influence at the pronunciation level and even to get some grammar features typical for Russian as L2. These features can be especially notable in the speakers who were born in Brazil and did not go to a Russian school. However, the relevant baseline for BraPoRus can be set up by using the recordings of the speech of elderly Russians in Harbin performed by Oglezneva (2009) in 2000-2002. The Russian language spoken in Harbin has some special features and exhibits some influence from Chinese; nonetheless, comparisons of the recordings made in Harbin in the 21st century and in the community of Chinese Russians in Brazil who left China in the 1950s can provide a useful reference.
The second challenge identified by D’Alessandro et al. (2021) is data elicitation: usually, the number of speakers is low, they are elderly and unfamiliar with the actual data collection methods. In the case of BraPoRus, the pandemic COVID-19 restrictions favored online data collection: in February, 2021, the elderly people in Brazil were in lockdown at home, had a lot of time, and many of them learned how to use Zoom for various purposes, from speaking to their family to the participation in the Orthodox church meetings.
We consider that the difficulties with the definition of the language dominance and proficiency in a heritage language also suits the data elicitation challenge. Many heritage speakers are not able to read, so they cannot perform the tasks based on reading common for the proficiency level definition (AALBERSE; BACKUS; MUYSKEN, 2019). Although there is a big variety of proficiency tests for English, not many options exist for Russian. So, we decided to follow the Bilingual Language Profile protocol based on the self-evaluation and not related to reading (BIRDSONG; GERTKEN; AMENGUAL, 2022).
The third challenge identified by D’Alessandro et al. (2021) is the data amount challenge, and it coalesces from the first two: a limited number of data does not provide a base to obtain a good statistical power. We have currently collected for BraPoRus 167.5 hours of speech recordings from 26 participants, and the data collection is going on (SMIRNOVA HENRIQUES et al., 2021; 2022).
Data collection practices for corpus construction include collecting spontaneous speech, collecting questionnaires, and data elicitation (AALBERSE; BACKUS; MUYSKEN, 2019). The challenges of data collection begin with the Observer’s Paradox: the knowledge of being recorded alters the way of speaking (LABOV, 1972). However, in the situation of the online/phone interaction, the recording process can be perceived by participants as less intruding, because they stay in their homes and not in artificial laboratory conditions.
In the situation of the heritage language studying, there is an additional pressure coming from the interviewer because the researchers generally speak the contemporary variety of the language. In the ideal situation, the interviewer should be a heritage speaker; however, it is not feasible when dealing with elderly people, and their language variety is moribund. To create more natural conditions for the conversation and to study everyday language use, it is important to organize some spontaneous dialogue interactions between the heritage speakers (AALBERSE; BACKUS; MUYSKEN, 2019; JOHANNESSEN, 2021).
The common issues in data curation (storage, transcription and annotation) also can look quite differently in the context of heritage language studies. First, when the conversations are about the family history, the data can be framed in two ways: as an oral corpus which are generally anonymized, or as a database of oral history interviews (SMIRNOVA HENRIQUES et al., 2021). The question of how to balance the participant anonymization and the preservation of the memories is very important. For example, the Code of Ethics of TalkBank, a popular database for open sharing of speech corpora, establishes that only age and location of the oral corpus recordings could be publicly available (MACWHINNEY, 2022). Many participants of the BraPoRus corpus have clearly expressed their desire to preserve their names and history of their parents, and these data have a special importance for the historical and sociological studies of Russian immigration. Columbia University Libraries (2022) made available a number of oral history interviews recorded in the 1960s by Russian immigrants who are identified and talk about their biography mentioning their personal data; however, the recordings made in this format cannot be submitted to the TalkBank database. As far as we know, there are no publicly available annotated heritage speech corpora that contain personal data as the full names and family history, so we need to develop a strategy that permits to share the data but would not be considered by the Ethics Committee as an attempt to violate the privacy and confidentiality of the research participants who authorized this sharing without foreseeing the consequences. To achieve this goal, we are currently revising the questionnaire of our sociolinguistic interview with the PUC-SP juridical assistance: probably, a part of recordings should be kept only for our use.
Another challenge is the transcription procedure. There are a number of tools that perform the automatic transcription in Russian (SEKERINA et al., in revision to resubmit; SMIRNOVA HENRIQUES et al., 2022), but they lose uncommon words and morphological markers especially interesting in the heritage language studies, and do not allow to study the code-switching.
Now we present the protocol used for the BraPoRus data collection.
The essential criteria for participant selection were: 1) age 59 years and older (range: 59 - 98); 2) living in Brazil for the most part of their life or being born in Brazil, and speaking Brazilian Portuguese in a native-like way; 3) high proficiency in Russian as a heritage language, sufficient to maintain a conversation for an hour; 4) no long-term residence in Russia; 5) no documented cognitive impairment (SMIRNOVA HENRIQUES et al., 2021). At the present moment, we have recorded 167.5 hours of speech collected from 26 participants with mean age 75.7 years.
Data collection for the postdoctoral project of Dr. Smirnova Henriques “Russophones and Brazilian Portuguese: an interdisciplinary study” was approved by the Ethics Committee
Under the pandemic COVID-19 restrictions, the participants were recorded through the video teleconferencing software ZoomTM (16 participants) or through a phone call (10 participants). The recording sessions were conducted according to the availability of each participant, and lasted between 30 minutes and two hours. The maximal number of sessions until now was ten. The Zoom conferences were recorded on the Dell Inspiron PC, with the audio being recorded in .mp3 format, and the video, in .mp4. The phone calls were recorded on the Xiaomi Redmi 6 smartphone, using the Android package program, also in .mp3 format. As of now, the total of 167.5 hours of the speech have been recorded (Table 1).
Task | Monologue | BLP | Sociolinguistic interview | Intonational phrases | Reading text | Dialogue |
Number of participants | 26 | 21 | 21 | 14 | 10 | 5 |
Duration (min) | 2,011 | 1,910 | 5,636 | 116 | 107 | 272 |
The preliminary data on the language dominance score evaluation in 23 BraPoRus participants are shown in Figure 1 (SKOROBOGATOVA; SMIRNOVA HENRIQUES; MADUREIRA, 2021). Most of heritage Russian speakers, 18 out of 23, were dominant in their society language, Portuguese (Figure 1А). In this group, seven participants were born in Brazil, other eleven arrived at the mean age of 7.8 years. The data obtained from other five participants are shown on Figure 1B. One of them was a balanced bilingual (the final dominance score -1.45, age of arrival 10 years). Four participants were dominant in Russian, their ages of arrival range from 14 to 17 years. A strong negative correlation was found between the age of arrival to Brazil and BLP score (Spearman
As well as for the monologue task, the interviewer spoke the contemporary Russian as L1 and Brazilian Portuguese as L2. The interviewer was instructed to allow the participant to speak and not to hurry with the formulation of the next question. Only few participants competed this task until now because it takes many sessions.
Each of the 17 trials contains a target phrase where a particular Russian melodic pattern is expected. In Table 2 below, we show the expected melodic pattern before each target word (the expected main word, or nucleus) in Russian. The melodic patterns are described in terms of the Bryzgunova intonation system of seven “intonation constructions” (ICs) (BRYZGUNOVA, 1977) as well as in terms of the ToBI-like description suggested by Odé (ToRI) (2008). The trials were translated into Portuguese, the participants recorded the task both in Russian and Portuguese. Only 14 BraPoRus participants from 26 who were able to read performed the task. The English translation is provided only for clarity in the current article.
Using this task, we have recently analyzed the intonational interference of Brazilian Portuguese L2, the society language, on heritage Russian L1 in contrastive (elliptical) questions, yes/no questions and echo
We have recently analyzed the acoustic quality of 16 BraPoRus samples (8 ZoomTM and 8 phone recordings), 15 minutes each one, using a protocol developed by the Group of Studies in Forensic Phonetics (GEFF, 2020) for forensic phonetic analysis. Most ZoomTM recordings were classified by acoustic quality as AB: F4 was not visible and fricatives not always identifiable, but the major parts of samples had very good quality for acoustic analysis (SEKERINA et al., in revision to resubmit). Most phone recordings were classified as B or BC, F3 and F4 were not always identifiable, and the sound quality was affected in a more significant manner. Nonetheless, the quality of all this material was sufficient for F0, F1 and F2 measurements making it adequate for basic acoustic analysis.
The BraPoRus has 167.5 hours of recordings, and manual transcription would be very laborious and time-consuming. We have recently tested three automatic transcription tools, and commercially available transcription software Sonix was shown to yield the lowest word error rate (WER) (SMIRNOVA HENRIQUES et al., 2022). Next, we tested automatic transcription by Sonix for four ZoomTM and four phone recorded 15-minutes samples, and found the mean WERs to be 11.51 and 8.21, respectively, considered quite low (SEKERINA et al., in revision to resubmit). However, the errors affect the words specially interesting for our study: results of the code-switching, words subjected to the L2 interference at the level of pronunciation or grammar, and also words considered archaic. Thus, manual curation is still necessary.
The annotation will include the assignment of word class labels and code-switching tags (BULLOCK et al., 2018; GUZMAN, 2017). The annotation scheme is currently being developed.
In the current article, we present a protocol used for the BraPoRus corpus construction to study heritage Russian speech in Brazil. The protocol describes both the tasks for corpus recording and complementary tasks planned to obtain additional information on language dominance of the participants and the interference of Brazilian Portuguese on heritage Russian speech at the levels of intonation or pronunciation. At this point, BraPoRus contains 167.5 hours of recordings collected from 26 elderly heritage Russian speakers (Table 1). Here we do not describe the participants’ demographic characteristics in detail because this information has already been provided elsewhere (SMIRNOVA HENRIQUES et al., 2021), and because in this article we focus on the procedure of corpus construction.
First analyses of the BraPoRus project data addressed the working memory in heritage Russian-Brazilian Portuguese bilinguals (SKOROBOGATOVA et al., 2021), language dominance profiles (SKOROBOGATOVA; SMIRNOVA HENRIQUES; MADUREIRA, 2021) and sociodemographic characterization (SMIRNOVA HENRIQUES et al., 2021) of these speakers. The intonation interference of L2 Brazilian Portuguese on L1 heritage Russian also was described (KACHKOVSKAIA et al., 2022). The analyses of the BraPoRus corpus, until the current moment, include the evaluation of acoustic quality of ZoomTM and phone recordings, and WER definition for automatic transcription by Sonix (SEKERINA et al., in revision to resubmit; SMIRNOVA HENRIQUES et al., 2022). Reasonable acoustic quality and low WER for automatic transcription show the feasibility of data collection for corpus building online even for elderly heritage speakers. However, the manual curation after automatic transcription is necessary, and the data are not appropriate for acoustic analyses that involve the F3 and F4 evaluation.
The main challenge during the BraPoRus data collection online consists in the organization of dialogues between heritage Russian speakers to record their interactions without the observer’s influence, as performed for the CANS corpus (JOHANNESSEN, 2021). The BraPoRus participants are elderly and not very interested in making friends on the internet, the best interaction was observed between two friends who knew one another for more than 60 years. Another challenge is performing tasks that require reading skills, such as reading intonation tasks and phonetically representative text: only 14 out of 26 participants were able to carry out these activities. The recordings of more speakers and new sessions continue at the same time as the already recorded speech is automatically transcribed and manually verified. The next steps include annotation and decision in which format the data will be made available.
The most serious challenge for the corpus construction as a whole is to obtain the authorization of the Ethics Committee to publicly share the corpus recordings: they were initially collected within a broader project of studying Russian – Brazilian Portuguese bilingualism that did not suppose the open sharing (SMIRNOVA HENRIQUES et al., 2020). Although many participants wish to reveal their family history, the respect of the privacy and confidentiality is the mandatory norm requested by the Brazilian and foreign Ethics Committees (MACWHINNEY, 2022; PUC-SP, 2022). While in History and Sociology the names can be part of the open data (Columbia University Libraries, 2022), this is not commonly applied to the voice samples in linguistic research.
Our main reference on the initial step of the research was the project of Heritage Language Variation and Change in Toronto (NAGY, 2016) that suggested extensive questionnaires for interviewing the heritage speakers. However, the site of the corpus has not functioned during the last months or, at least, is not accessible more from Brazil. Another important reference is the CANS (Corpus of American Nordic Speech) that prioritized to diminish the interviewer’s influence (JOHANNESSEN, 2021). The BraPoRus corpus contains many additional tasks suggested by researchers of our group with different backgrounds, and this allows to characterize the population of heritage Russian-Brazilian Portuguese bilinguals much better. As far as we know, there is no other oral corpus of elderly heritage Russian speakers.
For the next steps, we also plan to assess narrative abilities of the elderly heritage Russian speakers in Russian and Brazilian Portuguese using Multilingual Assessment Instrument for Narratives (MAIN) (GAGARINA et al., 2016). This is a widely used tool available for testing the narrative competence in more than 80 languages, in children and adults. MAIN was specifically created to assess development of narrative skills in bilinguals, and is based on multidimensional model of story organization and six-picture stories controlled for cognitive and linguistic complexity. We are currently adapting the materials for remote testing in Russian and Brazilian Portuguese (SKOROBOGATOVA; SMIRNOVA HENRIQUES; GAGARINA, 2021). The face-to-face protocol has been translated to Portuguese previously (CUNHA DE AGUIAR; MARTINS DOS REIS, 2020), but it has never been applied in Brazil.
The BraPoRus corpus provides a huge quantity of data on the heritage Russian in Brazil spoken by elderly people, and heritage Russian-Brazilian Portuguese bilingualism. We expect that the protocol described in this work will be useful both for Brazilian linguists that study other heritage languages, and for researchers of heritage Russian in other countries.
Dr. Smirnova Henriques is supported by postdoctoral fellowship PNPD/CAPES (
The authors have no conflicts of interest to declare.
Anna Smirnova Eddelbuettel declare on behalf of all authors of the manuscript “BRAPORUS,SPOKEN CORPUS OF HERITAGE RUSSIAN IN BRAZIL: PROTOCOL OF DATA COLLECTION”,submitted to Cadernos de Linguística, that the aforementioned manuscript is submitted asa project registration and, therefore, does not yet have the analyzed speech data that canbe accessed by other researchers. The BraPoRus corpus, described in the article, contains167.5 hours of recordings, however, at the moment, the authorization of the EthicsCommittee of PUC-SP that we have does not allow us to make the recordings publiclyavailable because they contain the personal information of the participants. The datacollection protocols used for the recordings are either cited in the manuscript, or madeavailable in the text itself, as explained below:
1) sociodemographic questionnaire; a semi-spontaneous narrative about the history ofthe participants’ family and their immigration to Brazil; a sociolinguistic interview;unscripted dialogues between heritage Russian speakers; intonation task – describedin the manuscript;
2) working memory test - SKOROBOGATOVA, Aleksandra S.; SMIRNOVA HENRIQUES,Anna; RUSEISHVILI, Svetlana; SEKERINA, Irina; MADUREIRA, Sandra. Verbal workingmemory assessment in Russian-Brazilian Portuguese bilinguals. Cadernos deLinguística, v. 2, n. 4, p. 01-24, e572, 2021. DOI 10.25189/2675-4916.2021.V2.N4.ID572.
3) Bilingual Language Profile - https://sites.la.utexas.edu/bilingual/
4) reading task - BONDARKO, Liya V. et al. (Orgs.). Prilozhenie №3 k BulleteniuFoneticheskogo fonda russkogo yazika “Fond zvukovykh edinits russkoi rechi”[Attachment №3 to the Bulletin of the Phonetic Bank of the Russian language “Bank ofthe Russian Speech Sound Units”]. St Petersburg/ Bochum, 1993.
Dr. Smirnova Henriques is supported by postdoctoral fellowship PNPD/CAPES (
AALBERSE, Suzanne; BACKUS, Ad; MUYSKEN, Pieter.
ABREU MORATO, Geanne Alves de. Situando a língua japonesa no contexto da história do ensino de línguas no Brasil.
ALMA-H.
ALTENHOFEN, Cleo Vilson; MORELLO, Rosângela (Orgs.).
BENINCÁ, Ludimilla Rupf. Sócio-história do contato entre o vêneto e o português: um estudo de caso.
BIRDSONG, David; GERTKEN, Libby M.; AMENGUAL, Mark.
BOLLY, Catherine T.; BOUTET, Dominique. The multimodal CorpAGEst corpus: Keeping an eye on pragmatic competence in later life.
BONDARKO, Liya V. et al. (Orgs.).
BOT, Kees de; MAKONI, Sinfree.
BRYZGUNOVA, Elena A.
BULLOCK, Barbara E.; SERIGOS, Jacqueline; TORIBIO, Almeida Jacqueline; WENDORF,
Arthur. The challenges and benefits of annotating oral bilingual corpora: The Spanish in Texas Corpus Project.
BYTSENKO, Anastassia.
CHNEE, Igor.
Columbia University Libraries,
COSTA, Luciane Trennephol da; LOREGIAN-PENKAL, Loremi. A coleta de dados do banco VARLINFE - Variação Linguística de Fala Eslava: peculiaridades e características.
CUMMINS, Jim. A proposal for action: Strategies for recognizing heritage language competence as a learning resource within the mainstream classroom.
CUNHA DE AGUIAR, Laís Vitória; MARTINS DOS REIS, Micaela Nunes. Adapting MAIN to Brazilian Portuguese.
D’ALESSANDRO, Roberta; NATVIG, David; PUTNAM, Michael T. Addressing Challenges in Formal Research on Moribund Heritage Languages: A Path Forward.
GAGARINA, Natalia; KLOP, Daleen; TSIMPLI, Ianthi M.; WALTERS, Joel. Narrative abilities in bilingual children.
GEFF (Grupo de Estudos em Fonética Forense). Protocolo de análise fonético-forense.
GEWEHR-BORELLA, Sabrina; ZIMMER, Márcia Cristina; ALVES, Ubiratã Kickhöfel. Transferências grafo-fônico-fonológicas: uma análise de dados de crianças monolíngues (Português) e bilíngues (Hunsrückisch-Português).
GORAL, Mira; CLARK-COTTON, Manuella; SPIRO, Avron Ill; OBLER, Loraine K.; VERKUILEN, Jay; ALBERT, Martin L. The Contribution of Set Switching and Working Memory to Sentence Processing in Older Adults.
GROSJEAN, François; LI, Ping.
GUZMAN, Gualberto; RICARD, Joseph; SERIGOS, Jacqueline; BULLOCK, Barbara E.;
TORIBIO, Almeida Jacqueline. Metrics for modeling code-switching across corpora.
HIGA, Bárbara Silva.
JOHANNESSEN, Janne Bondi. From Fieldwork to Speech Corpus: The American Norwegian Heritage Language and CANS.
KACHKOVSKAIA, Tatiana V.; SKRELIN, Pavel A.; GUSEVA, Daria; SKOROBOGATOVA
Aleksandra S.; SMIRNOVA HENRIQUES, Anna; MADUREIRA, Sandra. Cross-Linguistic Influence in Interrogative Intonation Patterns: A Case of Russophones in Brazil. Presentation at the II Congeressso Brasileiro de Prosódia. Campinas: UNICAMP, 2022.
KEMPLER, Daniel; ALMOR, Amit; TYLER, Lorraine K.; ANDERSEN, Elaine S.; MACDONALD, Maryellen C. Sentence comprehension deficits in Alzheimer’s disease: a comparison of off-line vs. on-line sentence processing.
LABOV, William. Some principles of linguistic methodology.
MACWHINNEY, Brian.
MILESKI, Ivanete.
MONTRUL, Silvina.
NAGY, Naomi. Heritage languages as new dialects.
ODÉ, Cecilia. Transcription of Russian intonation, ToRI, an interactive research tool and learning module on the internet. In: HOUTZAGERS, Peter; KALSBEEK, Janneke; SCHAEKEN, Jos. (Eds.).
OGLEZNEVA, Еlena А.
POLINSKY, Maria; KAGAN, Olga. Heritage languages: In the ‘Wild’ and in the Classroom.
PUC-SP.
RUSEISHVILI, Svetlana.
RUSEISHVILI, Svetlana. Perfil sociodemográfico e distribuição territorial dos russos em São Paulo: deslocados de guerra da Europa e refugiados da China após a Segunda Guerra Mundial.
SEKERINA, Irina A.; SMIRNOVA HENRIQUES, Anna; SKOROBOGATOVA, Aleksandra S.; TYULINA, Natalia; KACHKOVSKAIA, Tatiana V.; SKRELIN, Pavel A.; RUSEISHVILI, Svetlana; MADUREIRA, Sandra. Brazilian Portuguese-Russian (BraPoRus) Corpus: Transcription and Acoustic Analysis of Elderly Speech During Covid-19 Pandemic.
SKOROBOGATOVA, Aleksandra S.; SMIRNOVA HENRIQUES, Anna; GAGARINA, Natalia. A elaboração da versão on-line do protocolo MAIN para a avaliação de narrativas de bilíngues russo-português brasileiro.
SKOROBOGATOVA, Aleksandra S.; SMIRNOVA HENRIQUES, Anna; MADUREIRA, Sandra. Bilingual language profiles of the heritage Russian speakers in Brazil, participants of the BraPoRus corpus.
SKOROBOGATOVA, Aleksandra S.; SMIRNOVA HENRIQUES, Anna; RUSEISHVILI, Svetlana; SEKERINA, Irina; MADUREIRA, Sandra. Verbal working memory assessment in Russian-Brazilian Portuguese bilinguals.
SKRELIN, Pavel. German-Russian Language Contact: Is it in our power to foresee the flight of a word we have uttered?
SMIRNOVA HENRIQUES, Anna; FONTES, Mario A. de S.; SKRELIN, Pavel A.; KACHKOVSKAIA, Tatiana V.; RUSEISHVILI, Svetlana; BORREGO, Maria C.; PICCIN BERTELLI ZULETA, Patrícia; PICCOLOTTO FERREIRA, Léslie; MADUREIRA, Sandra. Russian immigrants in Brazil: to understand, to be understood.
SMIRNOVA HENRIQUES, Anna; RUSEISHVILI, Svetlana. Migrantes russófonos no Brasil no século XXI: perfis demográficos, caminhos de inserção e projetos migratórios.
SMIRNOVA HENRIQUES, Аnna; SKOROBOGATOVA, Aleksandra S.; RUSEISHVILI, Svetlana; MADUREIRA, Sandra; SEKERINA, Irina A. Challenges in heritage language documentation: BraPoRus, spoken corpus of heritage Russian in Brazil.
SMIRNOVA HENRIQUES, Аnna; SKOROBOGATOVA, Aleksandra S.; RUSEISHVILI, Svetlana; MADUREIRA, Sandra; SEKERINA, Irina A. BraPoRus, a Spoken Corpus of Elderly Heritage Russian in Brazil: Early Challenges and Future Plans. Presentation on the CLARE 5 (CORPORA FOR LANGUAGE AND AGING RESEARCH 5). Anchorage: University of Alaska Anchorage, 2022.
VOROBIEFF, Alexandre.
VOROBYEVA, Olga;
WULLF, Dirk U.; DAYNE, Simon de; JONES, Michael N.; MATA, Rui. The Aging Lexicon consortium. New perspectives on the aging lexicon.
ZABOLOTSKY, Jacinto A.
.
Ignacio Miguel Palacios Martinez
ORCID: https://orcid.org/0000-0001-9202-9190
Reviewer 1: Universidade de Santiago de Compostela, Galiza, Espanha.
.
Marcia dos Santos Machado Vieira
ORCID: https://orcid.org/0000-0002-2320-5055
Reviewer 2: Universidade Federal do Rio de Janeiro, Rio de Janeiro, Brasil.
.
Reviewer 1
2022-02-22 | 11:13
This is an intesting paper on the instruments and procedures followed for the compilation of a spoken corpus of heritage Russian in Brazil. In this respect, the paper fulfills the purpose pursued although one would expect to find in the same article some references to the transcription conventions followed together with some information about the annotation scheme. Some mention should also be made to the problems found in the collection of the data. The authors should also make it clear up to what extent this corpus is different from other heritage corpora of the kind. There should also be in the final section an indication of what sort of projects could be carried out with this material.
The paper is written in good academic English and it complies with the conventions typical of this genre. There is a good use of bibliographical references which are complete and well cited.
Some further suggestions follow:
p. 3. In the current work, delete comma
p. 4. arrived to Brazil> arrived in Brazil
p. 4. XIX th century > nineteeenth century; XXth century> twentieth century
p. 4. the data collection continues> the data collection still continues
p. 4. he generally speaks> they generally speak
p. 5. there is a number> there are a number
p. 6. In this task, > delete comma
p. 8. Please, revise the caption for the figures. Some f the current information should be included in the body of the text. This also applies to Table 2, p. 10
p. 9. the observer influence> the observer's influence
References
Cunha de Aguiar. Please, alphabetise.
Reviewer 2
2022-05-07 | 11:32
First of all, many thanks to the authors for this manuscript! Certainly, the protocol about data collection and management described in this project registration will be useful for linguists in Brazil and beyond.
I made few observations in the manuscript and make here some suggestions:
1) the form of listing references (;/&);
2) non plural or plural for references related to more than one author (Org./Orgs.);
3) MADUREIRA, Ssandra;
Rethink/restructure/(re)explain:
4) the idea of “special feature” conceptualized in these words: “The special feature of this corpus is that it is not restricted to the sociodemographic data and speech recordings of the participants, but contains a variety of tasks developed by researchers from different areas, from phonetics to psycholinguistics.” > Why is it special? And putting in perspective/profiling which audience (for BraPoRus)?
5) the perception of on-line recording process: “However, in the situation of the online/phone recordings, the recording process is perceived as less intruding even though the informed consent is signed.” Perceived by whom? Which evidences are the basis of such perception description?
6) the idea of “never-to-accessing” participants: “If we treat these data as a spoken corpus for linguistic research, the identification of participants would be never permitted” > Is it a practice that is, than, claimed by ethics committees (made of researchers, from different disciplines) or is it a necessary or mandatory condition? Support the alleged anonymization, perhaps in resolutions of ethical conduct in research.
This is a good manuscript of project registration.
Reviewer 1
2022-05-30 | 09:52
I believe the comments and suggestions made in my previous review have been successfully incorporated into this revised version so, in my view, it can be now accepted for publication. Some minor points follow for the authors' consideration regarding some aspects of the paper concerning the wording of some sections.
p. 5. "Most Russian heritage speakers in Brazil speak Portuguese the most part of time, so, their Russian can suffer some Portuguese influence"> the most part of the time: this means that
p. 7. "mentioning their personal data, however, "> mentioning their personal data; however,
p. 11. "The interviewer was instructed to allow the participant to speak and to not be hurry to make the next question"> and not to hurry with the formulation of the next question
p. 15: "The main challenges during the BraPoRus data collection online is organization of dialogues between Russian heritage speakers to record" data collection online was the oganization of dialogues...
p. 15: " The BraPoRus participants are elderly and not very interested in making new friends by internet"> inrested in making friends on line or on the internet
Please make sure that the bibliographical references are complete. Check the date of publication for each of them.
Reviewer 2
2022-06-07 | 05:37
This project registration is a good contribution to literature and a useful material for who has interest in database, in data curation and in heritage language, as it reports the protocol of the data collection of one and explores important points about data management for a spoken corpus of heritage Russian in Brazil, for BraPoRus, as well as ethical points related to care for the image, family history and recording conditions of the participants (elderly speakers).
2022-05-24
Dear reviewer 1,
Thank you a lot for your comments, we carefully revised the text following your suggestions.
As requested, we added detailed information about the transcription process and added essential information about the annotation scheme. The discussion of main problems of our data collection also was included, as well as examples of other heritage corpora and data analyses.
Sincerely,
Anna Smirnova Henriques
.
Dear reviewer 2,
Thank you for your comments, we carefully revised the text following your suggestions. In relation to the comment 4, we did not focus on profiling audience because the data cannot be publicly available yet. The phrase mentioned in comment 5 was reformulated.
In relation to the comment 6, the section about anonymization was rewritten and completed. The orientations for the preparation of research protocols on the site of the PUC-SP Ethics Committee (https://www.pucsp.br/cometica/orientacoes-para-elaboracao-de-protocolo-de-pesquisa) directly mentions the requirement to respect the privacy and confidentiality of the research participants. As requested, the informed consent form should ensure the maintenance of confidentiality and privacy of the research subject's data, before, during and after the end of the research. The international data protection regulation for speech recordings is discussed in some recent articles (https://www.researchgate.net/publication/336957804_Preserving_privacy_in_speaker_and_speech_characterisation). We are currently discussing with the PUC-SP Ethics Committee the possibility to open the corpus when the participants authorize the open use of data. The Ethics Committee sent our request to the juridical assistance, but we do not have the final decision yet.
Sincerely,
Anna Smirnova