<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.2 20190208//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:ali="http://www.niso.org/schemas/ali/1.0">
  <front>
    <article-meta>
      <article-categories>
        <subj-group>
          <subject content-type="Type of Contribution">Research Report</subject>
        </subj-group>
      </article-categories>
      <title-group>
        <article-title>USING A COUPLED-OSCILLATOR MODEL OF SPEECH RHYTHM TO ESTIMATE <bold id="bold-3c48dc4091ff53eebc2398acc0df96bc">RHYTHMIC VARIABILITY</bold> IN TWO BRAZILIAN PORTUGUESE VARIETIES (CE AND SP)</article-title>
      </title-group>
      <contrib-group content-type="author">
        <contrib id="person-3c3c60556173f8d573eb113c2d6e07bd" contrib-type="person" equal-contrib="no" corresp="yes" deceased="no">
          <name>
            <surname>Arantes</surname>
            <given-names>Pablo</given-names>
          </name>
          <email>pabloarantes@ufscar.br</email>
          <xref ref-type="aff" rid="affiliation-59e5a1663c487e892bc737b74328acd6" />
        </contrib>
        <contrib id="person-7489a7f9395283b766b58d944eecf35c" contrib-type="person" equal-contrib="no" corresp="no" deceased="no">
          <name>
            <surname>Lima Júnior</surname>
            <given-names>Ronaldo Mangueira</given-names>
          </name>
          <email>onaldojr@letras.ufc.br</email>
          <xref ref-type="aff" rid="affiliation-5694375721cde86b6a8b6f3ca7502b2a" />
        </contrib>
      </contrib-group>
      <contrib-group content-type="editor">
        <contrib id="person-f6e93de22d5a621eea9c13c16a4230ff" contrib-type="person" equal-contrib="no" corresp="no" deceased="no">
          <name>
            <surname>Oliveira, Jr</surname>
            <given-names>Miguel </given-names>
          </name>
          <email>miguel@fale.ufal.br</email>
          <xref ref-type="aff" rid="affiliation-cacaa336e1c0380ea054bce3cbeed908" />
        </contrib>
        <contrib id="person-bbf699a10b319dd73c8522b6a904fa4c" contrib-type="person" equal-contrib="no" corresp="no" deceased="no">
          <name>
            <surname>Almeida</surname>
            <given-names>René Alain</given-names>
          </name>
          <email>renealain@hotmail.com</email>
          <xref ref-type="aff" rid="affiliation-cebc39cbb6f1834bd64f5407ca61b830" />
        </contrib>
      </contrib-group>
      <aff id="affiliation-59e5a1663c487e892bc737b74328acd6">
        <institution content-type="orgname">Universidade Federal de São Carlos (UFSCar)</institution>
      </aff>
      <aff id="affiliation-cacaa336e1c0380ea054bce3cbeed908">
        <institution content-type="orgname">Universiade Federal de Alagoas</institution>
      </aff>
      <aff id="affiliation-cebc39cbb6f1834bd64f5407ca61b830">
        <institution content-type="orgname">Universidade Federal de Sergipe</institution>
      </aff>
      <aff id="affiliation-5694375721cde86b6a8b6f3ca7502b2a">
        <institution content-type="orgname">Universidade Federal do Ceará (UFC)</institution>
      </aff>
      <pub-date date-type="pub" iso-8601-date="09/05/2021" />
      <volume>2</volume>
      <issue>4</issue>
      <issue-title>Linguistics Challenges in Open Science</issue-title>
      <elocation-id>e577</elocation-id>
      <history>
        <date date-type="accepted" iso-8601-date="08/30/2021" />
        <date date-type="received" iso-8601-date="08/20/2021" />
      </history>
      <permissions id="permission">
        <license>
          <ali:license_ref>http://creativecommons.org/licenses/by/4.0/</ali:license_ref>
        </license>
      </permissions>
      <abstract>
        <p id="_paragraph-1">This paper presents preliminary results of a semi-automatic methodology to extract three parameters of a dynamic model of speech rhythm. The model attempts to analyze the production of rhythm as a system of coupled oscillators which represent syllabicity and phrase stress as levels of temporal organization. The estimated parameters are the syllabic oscillator entrainment rate (alpha), the syllabic oscillator decay rate (beta), and the coupling strength between the oscillators (w0). The methodology involves finding the &lt;alpha, beta, w0&gt; combination that minimizes the distance between natural duration contours (restored from normalized and smoothed raw duration) and simulated contours generated using several combinations of the parameters. The distance between natural and model-generated contours was measured in two ways by comparing: (1) the plain/overt syllable-to-syllbale duration duration in the natural contour with that of the model-generated contour and (2) the relative change along both contours. We applied this methodology to read speech produced by five speakers of the state of Ceará (CE) and eight speakers of the state of São Paulo (SP). Mean w0 and alpha values are compatible with the view that Brazilian Portuguese is a mixed-rhythm language. Results from two Bayesian hierarchical regression models do not suggest a difference between SP and CE speakers, but indicate a difference between the two methods, with the relative change method generating lower alpha values and higher w0 values.</p>
      </abstract>
      <abstract abstract-type="executive-summary">
        <title>Resumo</title>
        <p id="paragraph-02f994ad19f88df58a61dcaca08d9c4a">O artigo apresenta resultados preliminares de uma metodologia semiautomática para a extração de três parâmetros de um modelo dinâmico do ritmo da fala. O modelo propõe analisar a produção do ritmo como um sistema de dois osciladores acoplados que representam a silabicidade e acentuação como níveis de organização temporal. Os parâmetros estimados são a taxa de indução do oscilador silábico pelo oscilador acentual (alfa), a taxa de decaimento do oscilador silábico (beta), e a força de acoplamento entre os dois osciladores (w0). A metodologia consiste em encontrar a combinação &lt;alfa, beta, w0&gt; que minimiza a distância entre contornos de duração natural de enunciados e contornos simulados gerados usando as combinações de parâmetros. A distância entre contornos naturais (duração restaurada a partir do contorno normalizado e suavizado) e os gerados pelo modelo foi medida de duas maneiras: (1) a duração propriamente dita de sílaba a sílaba do contorno natural e do simulado são comparadas (2) a mudança relativa da duração ao longo de ambos os contornos é comparada. Aplicamos a metodologia a enunciados lidos produzidos por cinco falantes do Ceará e oito falantes de São Paulo. Os valores médios de w0 e alfa são compatíveis com a análise segundo a qual o português brasileiro é uma língua de ritmo misto. Os resultados de dois modelos de regressão hierárquicos bayesianos não sugerem uma diferença entre os falantes de CE e SP, mas indicam uma diferença entre os dois métodos, com o método da mudança relativa gerando valores menores de alfa e valores maiores de w0.</p>
      </abstract>
      <kwd-group>
        <kwd content-type="">Prosody</kwd>
        <kwd content-type="">Speech rhythm</kwd>
        <kwd content-type="">Brazilian Portuguese</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body id="body">
    <sec id="heading-ac5b9ebaf5848a2aebfab7263fa7a046">
      <title>Introduction</title>
      <p id="paragraph-c63cf480b1fe1cd29c2828b46287b8cc">The goal of this study is to test a methodology we are developing to study rhythmic variability across speakers and languages based on the concepts underlying Barbosa's dynamic model of speech rhythm (BARBOSA 2006; 2007). In this initial exploration of this methodology, we use it to compare rhythmic variability in two Brazilian Portuguese regional varieties, the one spoken in the Northeastern state of Ceará and the one spoken in the Southeastern state of São Paulo.</p>
      <p id="paragraph-8168b8a721cde8751759229ed34f3475">We base our work on the version of the model presented in detail in Barbosa (2006, 2007). Here we omit the mathematical details of the implementation for the sake of brevity and focus on a qualitative presentation of the key ideas underlying it. The model assumes the existence of two abstract oscillators. The first is a syllabic oscillator, which stands for the sequence of syllable-sized units (or VV units) that make up the speech chain. The second oscillator is the phrase stress oscillator and represents the sequence of beats or phrase stresses that occur along a given utterance. The phrase stress oscillator entrains the syllabic oscillator, such that its period gets progressively lengthened as it approaches the phrase stress beat location in the utterance. Points of maxima on the entrained oscillator cycle correspond to the onsets of vocalic gestures in speech.</p>
      <p id="paragraph-c89183b53a0dae7f076275c0c8f2696f">Four model variables are relevant to our discussion on rhythmic variability: <italic id="italic-82fbf8191c60764f9eb2a024447d7ea2">T</italic><sub id="subscript-46a2a2b38926133b57da308915b05a71">0</sub>, the syllabic oscillator uncoupled period, which can be understood as the underlying speech rate; α, the syllabic oscillator entrainment rate, which modulates how fast the syllabic oscillator period changes in response to coupling; β, the decay or period reset rate, which modulates how fast the syllabic oscillator tries to return to the uncoupled period after the phrasal stress beat; and <italic id="italic-396b1f11e9e23cab8b0c5dc9082ae5a0">w</italic><sub id="subscript-f51016cd1cb3c2e1772835b9453ac6e4">0</sub>, the relative coupling strength, which modulates the degree of synchronization between the two oscillators.</p>
      <p id="paragraph-fd1adae2728381718e764afcafac7fa1">Barbosa (2006, 2007) makes a connection between his model and the so-called rhythm typology or the classification of a particular language as syllable-timed or stress-timed. This typology has a long tradition in linguistic research – see Bertinetto (1989) for a review. At first, Pike (1945) treated the issue as a matter of a flexible tendency towards one or the other rhythmic type, but later on Abercrombie (1967) reframed the subject as a dichotomy. Even though the empirical research has been unsupportive of the dichotomy view on the matter (BERTINETTO, 1989; DAUER, 1983), the idea has persisted, augmented by the inclusion of moraic rhythm as a third type (LADEFOGED, 2001). Barbosa (2002, 2004, 2006, 2007) suggests that the dynamical nature of his model allows the syllable-timed and stress-timed typology to be reinterpreted as a continuum rather than a discrete dichotomy. He argues, based on theoretical and empirical grounds, that by varying <italic id="italic-29acbd056bec26a06abfda0f1ca26b83">w</italic><sub id="subscript-ef83141d03f3363820a3d72eae1d2a0c">0</sub> from low (near zero) to high (near 1) values, it is possible to simulate duration patterns which go from syllable-timed to stress-timed. The author hypothesizes that <italic id="italic-fb140236a2a01cd27060129572813dbd">w</italic><sub id="subscript-4">0</sub> values “vary more extremely from language to language, than from speaker to speaker within a same linguistic community” (BARBOSA, 2007, p. 733), but points out that the claim of higher crosslinguistic variation compared to intralinguistic variation should be verified experimentally. He also suggests (Barbosa 2007, p. 734) that <italic id="italic-1fc91252b9d6274fd466834f211fdbce">w</italic><sub id="subscript-5">0</sub> should vary less than α and β within a language community and that α and β may vary as a function of both speakers and speaking styles.</p>
      <p id="paragraph-b95f2bfc9260fb08c867609ecb950f35">The main goal of the present study is to take a step forward in defining a semi-automatic methodology that will make it possible to estimate α, β and <italic id="italic-3d704f668ebefb437f0e737f204ee2bb">w</italic><sub id="subscript-6">0</sub> given a set of audio samples, making it easier to investigate some of Barbosa’s claims about the ability of the dynamic model of speech rhythm to explain both rhythmic typology as well as claims about where particular languages lie within this typology. In this paper we present an outline of the methodology as it stands now, and apply it to speech samples from two Brazilian Portuguese varieties. The results reported here will help to further develop our methodology and enrich the description of BP rhythmic variability. In the spirit of open science and reproducibility, we made all data and script files used in this study available at <ext-link id="external-link-00503b91a9069b5e23ae8d33be5f2abf" xlink:href="https://osf.io/w82ru/">https://osf.io/w82ru/</ext-link>.</p>
    </sec>
    <sec id="heading-29981f222a59bd909240b093f4b046de">
      <title>1. Materials and methods</title>
      <sec id="heading-b26b03a56825e7eaad933816d388384a">
        <title>1.1. Speech material</title>
        <p id="paragraph-618c5659994c0508b697fca9f7f4cc6e">Two sets of oral production data were analyzed, one from Ceará (CE) and the other from São Paulo (SP). The recordings analyzed are not identical because the data come from different research projects; however, both are readings of texts at a normal speech rate, which allow for the comparison at hand.</p>
        <p id="paragraph-f7e0ddeef84ef8f3b27f46cf73cc111c">The speech material of the CE speakers consisted of five recordings of a 225-word-long text, which was a translation to Portuguese of the diagnostic reading passage from Celce-Murcia et al. (2010). The mean duration of recordings was 87.6 seconds, ranging from 80 to 95 seconds. Speakers (four males and one female) were aged between 18 and 20, all born and raised in the metropolitan area of the state of Ceará.</p>
        <p id="paragraph-ab449333ca0903431b46d99b648e8264">The speech material of the SP speakers consisted of eight recordings of a 144-word-long text, the passage “A Menina do Narizinho Arrebitado”<xref id="xref-0e47e5472f66567bcb7bf9993d9a4867" ref-type="fn" rid="footnote-f0ecd757fd7fc925aa874771fae4684f">1</xref>, written by Monteiro Lobato. The mean duration of recordings was 33.5 seconds, ranging from 21 to 54 seconds. Speakers (five males and three females) were aged between 18 to 30, all born and raised in the state of São Paulo.</p>
      </sec>
      <sec id="heading-203d8ded3da6636cf65f3b82f5137d06">
        <title>1.2. Phonetic analysis</title>
        <p id="paragraph-eca4fa765ee094ad0ab647e1bfc26873">Speech samples were segmented into vowel-to-vowel (VV) units<xref id="xref-4f5511525056929de1c49e7f28b1001c" ref-type="fn" rid="footnote-7b5440395e48ccfaa3724e72270c51e8">2</xref> using a Praat script (ARANTES, 2021) and manually adjusted if necessary. Segments present in VV units were labelled in a <italic id="italic-f1c9b104fa0cd2dcef60461c0537165d">TextGrid</italic> file using the SAMPA-PB convention<xref id="xref-420a1652fdcd75d9bb4bf1dacc6c6956" ref-type="fn" rid="footnote-881c793ce392532ad45971c4dad89a78">3</xref>. Figure 1 shows a <italic id="italic-558042051c11e058e36beb6de5955ed6">TextGrid</italic> file with the first few seconds of an audio sample segmented in VV units according to the procedure described earlier in the section. Segmentation was used to identify stress groups in speech samples. Stress groups were identified using the 3-step procedure described in Barbosa (2006, 2007) and implemented as a Praat script<xref id="xref-39c4501206038fdbb02219dbd87da4d3" ref-type="fn" rid="footnote-a1d6973104298a223e5b1327089a771e">4</xref>. First, the raw duration of each VV unit is normalized using a <italic id="italic-0da7d18f08ecb036faf149a125e55bc3">z-</italic>score transform; in the second step, the normalized duration contour is smoothed using a 5-point moving average; lastly, peaks in the normalized, smoothed contours are identified. Peaks on the contour are considered occurrences of phrasal stresses and two consecutive phrasal stresses define a stress group.</p>
        <fig id="figure-panel-81419c35e8c0cf8b43f91f68c48d8bca">
          <label>Figure 1</label>
          <caption>
            <title><bold id="bold-a6571024d2ce6dcd5f61d92d7898697d">Figure 1</bold>. Part of an audio sample from the SP variety (speaker SP1) segmented into VV units following the SAMPA-PB notation. The content shown in the sample is “Em seguida, apareceu um papagaio real” (<italic id="italic-25ef7a63c4e2a9aa2f7b207d724c9766">Then a royal parrot appeared</italic>).</title>
            <p id="paragraph-a682aa9862e07ba3871c6d64408e1ba9" />
          </caption>
          <graphic id="graphic-5507040a9fe4eb86e53d4471825e8dd6" mimetype="image" mime-subtype="png" xlink:href="fig-1-segmentation_2.png" />
        </fig>
      </sec>
      <sec id="heading-8b45e7a1eff65ef55f04a57b8b7ea2c1">
        <title>1.3. Model parameter extraction</title>
        <p id="paragraph-df5acfab4e54d170571f2878365d6e5c">The semi-automatic methodology to estimate the parameters involves generating thousands of simulated contours with varying α, β and <italic id="italic-5435a21398e4dae85d9631f9da2b8625">w</italic><sub id="subscript-8d487638a63c2fd20c18b1fdde84842c">0</sub> values, which are later compared to the natural contours to arrive at the &lt;α, β, <italic id="italic-2702933cda91791fd145aa6059072465">w</italic><sub id="subscript-e6dda22f2f2ec5a94e36ce1e39853b79">0</sub>&gt; combination that best fits the natural contour. Syllable-sized durations generated by the model are expressed in time units. The normalized smoothed natural contours are expressed in <italic id="italic-414bb092ed01b0e366aded87cca17157">z</italic>-score units. In order to make a comparison between the two types of contours possible, both must be in the same scale. To achieve this, we adopted the procedure described in Barbosa (2004, p. 42-43) to restore the normalized and smoothed duration of each VV unit in the contour to an abstract duration expressed in real time units by reversing the <italic id="italic-2ab3c00adca0456bdafcd4aa8c28a871">z</italic>-score transform. The restored abstract duration of a given VV unit (<italic id="italic-4e8ec1b060a7b027fdc75f7a3eff9e8e">dur</italic><sub id="subscript-6e14edf7b3c8fc2670d66410601adb52">abs</sub>) is obtained by the use of the formula in (1), where <italic id="italic-ec09d0233bf75a7580823f9c74fc41df">z</italic><sub id="subscript-e748db88c89792e50cc350d399d6529a">sm</sub> stands for the normalized and smoothed duration of the VV unit, σ<sub id="subscript-9f2082fb79d5f2aa64cd69e4d7835c0d">r</sub> is the reference standard deviation value and μ<sub id="subscript-6f0bb511961132430259778d169fb35b">r</sub> is the reference mean value.</p>
        <p id="paragraph-4b909754629df8213ad3d742d884bcd7" />
        <p id="paragraph-9a00158d6e1a430c65881b059b972ae7"><italic id="italic-59c68b004ce884bfabc6a08eea9c9026">dur</italic><sub id="subscript-7cc34b42826d632ea98dee3622bce685">abs</sub> = z<sub id="subscript-f8382669c0f80c6d4ba964dda70e7632">sm</sub> ⋅ σr + μr                                (1)</p>
        <p id="paragraph-ce684fb91ca90bc1dfe919b24b547e62" />
        <p id="paragraph-1fc6ce3418d957299d6857be6cae4b5d">Barbosa suggests the use of the abstract VV unit “aC” as a reference in the restoration operation. The rationale is that “a” is the most frequent vowel in BP and “C” is a voiceless stop consonant (in BP, those are [p t k]), the most frequent consonantal class in BP. The value of μ<sub id="subscript-21994522ab3c69f9cd9613aefba61764">r</sub> is obtained by summing the reference mean value of [a] and the mean of the reference mean values of [p t k], giving 212 ms as a result. The value of σ<sub id="subscript-055a6b2b8b30c33e7efb93fe5c69aea0">r</sub> is obtained by taking the square root of the sum of the reference variance values of the four segments, which gives 56 ms as a result. The mean and standard deviation values used as reference are given in Barbosa (2006).</p>
        <p id="paragraph-a0b794611ec6bd5604bc3f97f5a3142a">In order to generate simulated contours using the coupled-oscillator model, it is necessary to specify, in addition to the α, β and <italic id="italic-ca6e8154e526eceed2899538c6793e10">w</italic><sub id="subscript-f9df32fc4db93d1b6097be15563afcb4">0</sub> parameters, an estimate of the uncoupled syllabic oscillator period (<italic id="italic-5807a1bdf57a8c4c34b61d6a12066aef">T</italic><sub id="subscript-49fd408fa4281b934c509b516061a9ce">0</sub>), the number of stress groups, the number of VV units in each stress group and the magnitude of each phrase stress. The information on the number of stress groups and their size in VV units is provided by the output of the <italic id="italic-22dfd8c4842d12589addda69f0936dd0">duration-suite</italic> script mentioned in the previous section. <italic id="italic-0fb27f5ac5a8f2f799801487fb88486b">T</italic><sub id="subscript-782b236a1138fa2d4df70828a9dda5fa">0</sub> was estimated by taking the median duration of VV units in stress groups with more than four VV units, excluding the first two and the last VV units. The first two VV units are excluded because the model stipulates that the syllabic oscillator period reset is most active at the start of the stress group. The last VV unit is excluded because it is the most affected by the influence of phrase stress. The magnitude of phrase stresses was kept constant at a value of 1. The rationale for this decision is that the formulation of the coupled-oscillator model we use here does not incorporate information on higher linguistic information that may affect boundary strength and especially the presence and duration of silent pauses at the boundary<xref id="xref-16df9126a6cbe37813bc7a192ab5fd1b" ref-type="fn" rid="footnote-dcf81e2db878738e068b07e2853a7247">5</xref>.</p>
        <p id="paragraph-d78353fd746aedf7662a1130ad271b44">When generating simulated contours, the information on the number of stress groups and their size (in VV units) was kept constant and values for the α, β and <italic id="italic-71754e4eba50c869c6c7161219981582">w</italic><sub id="subscript-6e348df72f777a07c759cb82d78af826">0</sub> parameters are systematically varied: α and β vary from 0.05 to 1.5 in steps of 0.05, and <italic id="italic-d880c8da5a08966e933c6ce21b3c6be2">w</italic><sub id="subscript-3446d918f5175be8459664b93101692f">0</sub> varies from 0.05 to 1 in steps of 0.025. In doing so, a total of 35,100 simulated contours are generated based on the stress group structure of each speech sample, one for each &lt;α, β, <italic id="italic-1098f1cff55ffc7641e9b54fe8ca4cb9">w</italic><sub id="subscript-dfbad0403c3a18f10f731ac0c616a35a">0</sub>&gt; combination. In order to generate the simulated contours, an R implementation of the the coupled-oscillator model was used<xref id="xref-6bbafde2c08d217cf0d0c13b5caec613" ref-type="fn" rid="footnote-7b956e897a48d70fa23b5221fd90f288">6</xref>.</p>
      </sec>
      <sec id="heading-0905d179e26c51bf41f60b15eb16d78a">
        <title>1.4. Contour comparison and ranking</title>
        <p id="paragraph-48fd88d401f72589c80dd32467aa1cfd">In order to determine which &lt;α, β, <italic id="italic-3841b669f0bae9603cd4d8195ed1a16f">w</italic><sub id="subscript-815b1b41f5e3135d2a5c17560365dccd">0</sub>&gt; combination generates the simulated contour that best matches a natural contour, the natural contour is compared to all simulated contours. The distance between contours was measured using two methods:</p>
        <p id="paragraph-30eb3ec381baadeae1d8f869f1cd2a6d" />
        <list list-type="bullet" id="list-3ac780081e55d502afa00f89e456bda7">
          <list-item>
            <p><italic id="italic-3acc477b54d4d85fe89e5e9c0c9ed59d">Plain duration</italic>: the restored abstract duration (expressed in real time units) of each VV unit along the natural contour is directly compared to the corresponding unit in the model-generated contour.</p>
          </list-item>
        </list>
        <list list-type="bullet" id="list-f140c890c9a71f29908a8c65b1634691">
          <list-item>
            <p>l <italic id="italic-805851ba44a00cc7746d1584d060960b">Relative change in duration</italic>: instead of duration itself, the series of unit-to-unit relative change in duration is compared. Relative change is computed by dividing the backward difference (i.e., the duration of the <italic id="italic-99493a0bd8448d39873f697349c78d58">i</italic><sup id="superscript-1">th</sup> position in the contour minus the duration of the (<italic id="italic-7dac26e4ed2caa1d54c5af5113584fc8">i</italic> - 1) position) by the <italic id="italic-fa4fa8b0dffa1a9b77b4623ce9c9cb92">i</italic><sup id="superscript-2">th</sup> position duration.</p>
          </list-item>
        </list>
        <p id="paragraph-9b1ea1467c6d82f1863440e38770d49f" />
        <p id="paragraph-2efc170f82e8b45271997a24a1edf89a">The <italic id="italic-90f4329628a0ef350040d940a56400e8">plain duration</italic> method (Figure 2) is the most straightforward way to compare the natural and simulated contours. The rationale for creating the <italic id="italic-9a168e84f603ca51e26a4ffa558236df">relative change</italic> method (Figure 3) comes from Barbosa (2004). In that paper, the author outlines a procedure based on rearrangements of his model’s mathematical formulas that allows estimating <italic id="italic-ae3b4e1d602352e0dbb249875d5d0030">w</italic><sub id="subscript-2c9b7811a7e176eacb6fbe190a82c58f">0</sub> by measuring relative change in duration contours<xref id="xref-59c2acf8550253d12623a785b421cb37" ref-type="fn" rid="footnote-99546956e01aae5ebb8b35e8666f6574">7</xref>.</p>
        <fig id="figure-panel-c942fce8b60ce32b2b8a2779affbcefc">
          <label>Figure 2</label>
          <caption>
            <title><bold id="bold-1676c6fd0296fdcecf5c846cd92266f1">Figure 2</bold>. Example of natural and simulated contours being compared using the plain duration method. The simulated contour is one of the best ranked in the comparison with the natural contour. α, β, <italic id="italic-2fc63ad3424d94b64442c870a57a5dbd">w</italic><sub id="subscript-f8a4fa0af267f8822c2e5b56a796d29d">0</sub> and <italic id="italic-dc4d4be6c32972b5ffa2752f3a3c9653">T</italic><sub id="subscript-3bb1c79f877003ded53b10aad4f9329f">0</sub> values are 1, 0.45, 0.5 and 224 ms (Speaker SP2).</title>
            <p id="paragraph-259f0ee1ab28712a0a0957a979b671c4" />
          </caption>
          <graphic id="graphic-fa9b89e72896e085957bee1ac68354c9" mimetype="image" mime-subtype="png" xlink:href="fig-2-method_a_exemp_2.png" />
        </fig>
        <fig id="figure-panel-9e7bb5bff06efcc44a4d104d32da42f4">
          <label>Figure 3</label>
          <caption>
            <title><bold id="bold-4203346e93eb95d73c01f8861f90f64e">Figure 3</bold>. Example of natural and simulated contours being compared using the relative change in duration method. The simulated contour is one of the best ranked in the comparison with the natural contour. α, β, <italic id="italic-c57c280a6e71931706f65bf4edd2dc61">w</italic><sub id="subscript-844c4aa28b4d765ec66de8c7dda51028">0</sub> and <italic id="italic-2bfa1d91d286eb992f5cca30935c9f8d">T</italic><sub id="subscript-658f84957c9274a72ac0be0d05cacc98">0</sub> values are 0.3, 0.85, 0.6 and 224 ms (Speaker SP2).</title>
            <p id="paragraph-8477bc6a9208d47ce18646d83b069929" />
          </caption>
          <graphic id="graphic-ce7cd09c248c9519f0a30ace1257e37b" mimetype="image" mime-subtype="png" xlink:href="fig-3-method_b_exemp_2.png" />
        </fig>
        <p id="paragraph-512a31828f4c7ab692cf3ff29bb4c990">Also following Barbosa (2004), only stress groups with more than four VV units were included in the comparisons; and, within the stress groups, the first and the last VV units were excluded. The rationale for this is the same outlined in the previous section for the estimation of <italic id="italic-a10af4153620f910ecc9f2f7dfa512c5">T</italic><sub id="subscript-783533e063f689e74857c0def1b68557">0</sub>.</p>
        <p id="paragraph-28451fddad5106b8e85d101d749342b8">Mean Absolute Error (MAE) was used as the measure of distance between natural and simulated contours<xref id="xref-ca9faa786e8ebb871ba4a4d92d35fba1" ref-type="fn" rid="footnote-f7b9bff149653152fa06168259bd2393">8</xref>. Error was measured as the mean of the absolute difference in each VV unit between the corresponding values in the natural and simulated contours. Simulated contours that are impossible or unlikely in natural speech were excluded from the comparison, namely contours that have negative values. Contours that have a minimum that is smaller than the minimum VV duration found in the natural contour by more than 10% or a maximum that exceeds the maximum VV duration in the natural contour by more than 10% were also excluded.</p>
        <p id="paragraph-a80009dcc1f8623e13e86df4715b23f8"> In addition to the error measure, the ratio between the duration range of natural and simulated contours was used as a second index to rank a contour comparison. Range ratios close to 1 ensure that the simulated contour spans a range of VV unit durations that is closer to the one found in the natural contour.</p>
        <p id="paragraph-0b6a97bc8ee723c851ce581e7a55c9af"> In order to determine the parameter combination that yielded the simulated contour with the best fit to the natural one, the list of all valid simulated contours was sorted in an ascending order by the range ratio index. A list with the 50 candidates with ratios closest to 1 was generated and then sorted in descending order of error measure. The median of the values of α, β and <italic id="italic-194c3e86eefe12d541ece401b6cf0468">w</italic><sub id="subscript-60586508e5f842d9580230a7cfa8761c">0</sub> of the first 20 candidates in this list are considered the values that generate the simulated contour that best fits the natural contour.</p>
      </sec>
    </sec>
    <sec id="heading-9409f2346e05a7f6efee2d772c77cb4e">
      <title>2. Results</title>
      <p id="paragraph-de6dd989fc3e6a19b4eb4a890fc9619e"> Since β, out of the three parameters, is the one with the least impact on rhythmic typology, and for the sake of brevity, this analysis consisted of two response variables: α and <italic id="italic-eec819488b6e0dcd32f8aa0489deeb56">w</italic><sub id="subscript-9ab93284c951ea8042eff77dfacef36f">0</sub>; and two predictor variables: method (<italic id="italic-7cc68de7a195feddb4729abb2b9230d2">plain duration</italic> and <italic id="italic-dc7dc15c42c2cee9f8b92f01d1fd7623">relative change</italic>), and dialect (CE and SP). Table 1 presents the mean and standard deviation of α and <italic id="italic-59cb298631fefcbbfc945bd70a67a651">w</italic><sub id="subscript-6005d6c226ddf626b6778aaa1b4c6002">0</sub> for each combination of predictor variables.</p>
      <fig id="figure-panel-0ccbe16e58846a07d8e970951cbd4149">
        <label>Figure 4</label>
        <caption>
          <title><bold id="bold-7cf7a9fa54cf7433e35e2a5e18cb5690">Table 1</bold>. Mean (x̄) and standard deviation (s) of α and <italic id="italic-b51906b66ef2f7d07c282fe98788b736">w</italic><sub id="subscript-d9176e589eb332bf992637b970a9977f">0</sub> for each dialect and for all speakers together, under the <italic id="italic-648207e7e65c8b92c7eedc076ee60877">plain duration</italic> and <italic id="italic-81e6516a511e061e633b3c550f8b278f">relative change</italic> methods. </title>
          <p id="paragraph-b161619bc0b456e72b9fcc69eef9e482" />
        </caption>
        <graphic id="graphic-2efff1788071d5e678101db81b541299" mimetype="image" mime-subtype="png" xlink:href="t1_2.png" />
      </fig>
      <p id="paragraph-2c00367a9ef3feb0d4428f1ed31339bd">Both methods yielded <italic id="italic-d2d6b95556df772dd535ff73ef65a9a7">w</italic><sub id="subscript-abcbafe6bffd4f7d3e673f279b101477">0</sub> values that are consistent with the hypothesis of Brazilian Portuguese having a mixed rhythm, with the <italic id="italic-096b28438c7c73a3b387a591ab475ae2">relative change </italic>method generating higher values. As can be seen from table 1 and figure 4, the method that generated higher values for α (<italic id="italic-dc1f6c935bb02c364b9c839276f03ad2">plain duration</italic>) had lower values for <italic id="italic-1aa93b4cc9ec8af17d2d786c3eced3db">w</italic><sub id="subscript-ea17dbdf07a638e8307bcaff63b35e9f">0</sub>, and vice-versa: the method that generated lower α values (<italic id="italic-0c01596919b68c12448606c5030bf43d">relative change</italic>) had higher <italic id="italic-083c875339a80c69cd361ec91fe782f6">w</italic><sub id="subscript-ceb128056c7aa58b4a715168000fb329">0</sub> values. The grand mean for both dialects pooled together for α was 0.987 under <italic id="italic-56ae1818ba284c78efdb9ce4ec9dacf4">plain duration</italic> and 0.837 under <italic id="italic-74243b79aa3f796b58139a9ab0fdc2c1">relative change</italic>; and for <italic id="italic-293259f6a69dd8e1e347c093c0b3b042">w</italic><sub id="subscript-333224d1ad9bead8492946677863ac1c">0</sub> was 0.486 under <italic id="italic-3a02b75324c22bd30bc7dd20ca4732c7">plain duration</italic> and 0.683 under <italic id="italic-bcd481388301e14325cf44795dbb8367">relative change.</italic></p>
      <fig id="figure-panel-ffd3c84c95ea3525909489b46a4961f5">
        <label>Figure 5</label>
        <caption>
          <title><bold id="bold-984c3c3612e676d02125bd588df2295e">Figure 4</bold>. Boxplots of α and <italic id="italic-57442700e7a1cbc890419fc12394446c">w</italic><sub id="subscript-7269835371fcd6aba471753323f29ee1">0</sub> values of CE and SP dialects pooled together for the <italic id="italic-d02ead314834a77afc36d0d4d06e3778">plain duration</italic> and <italic id="italic-7ba5642196c57416f6c204c483f41314">relative change</italic> methods. Blue dots and blue lines are the means and standard errors, respectively.</title>
          <p id="paragraph-913386636ba121f8949d447d837c8203" />
        </caption>
        <graphic id="graphic-82b90b33a593d2786a5cbc9a84ab5c4b" mimetype="image" mime-subtype="png" xlink:href="fig-4-method_2.png" />
      </fig>
      <p id="paragraph-f913ac8c0e846fa8f7dcb6a5ca3baf0e">Considering the dialects separately, as shown in Figure 5, SP speakers had higher <italic id="italic-7e80c67438d43370cc137afbf34d38ac">w</italic><sub id="subscript-9f9fa500966a8c4a4b0ad3c79129cc07">0</sub> values in both methods, and also had higher α values in <italic id="italic-013156a72a2e3d89c493855e53b4abcf">plain duration</italic>. The only parameter that was higher for CE speakers was their α values under the <italic id="italic-350a07668210cd0a02facf455c0fd2cb">relative change</italic> method.</p>
      <fig id="figure-panel-0c71df2f4cf5869b756b68c9410b79b6">
        <label>Figure 6</label>
        <caption>
          <title><bold id="bold-c9729529f49b724991a6a2bb05c90686">Figure </bold><bold id="bold-81ec7ea07c2f4732cca59647cc64e41e">5</bold>. Boxplots of α and <italic id="italic-cd7edecf3264a7dc7e547f5ef6406fb0">w</italic><sub id="subscript-7c1b22ade64d5bedf908fd56ed4b9a93">0</sub> values for the <italic id="italic-a5df53ee105bfb311efdc035b4b8ff49">plain duration</italic> and <italic id="italic-34b208b60c8eee36d8e0b82fb1e0819e">relative change</italic> methods, and for CE and SP speakers. Blue dots and blue lines are the means and standard errors, respectively.</title>
          <p id="paragraph-e55fc11ea66f31fdf13ffbc39592c904" />
        </caption>
        <graphic id="graphic-6747d5e246a8ff1d29f1285300f11899" mimetype="image" mime-subtype="png" xlink:href="fig-5-method-dialect_2.png" />
      </fig>
      <p id="paragraph-f39055541a9afef0ae748e509a0bddd2">In order to assess the tendencies revealed by the descriptive statistics and the boxplots presented as well as to evaluate possible effects of dialect and method on α and <italic id="italic-9d1456a91d23b9292d1d585858c82a20">w</italic><sub id="subscript-79b7a9b6c25e57b08f34d658f20cc13e">0</sub>, two Bayesian mixed-effects regression models were fitted, one to the α data and another to the <italic id="italic-7791b928ae2ab60b76185e822ca43adc">w</italic><sub id="subscript-6d44aadf676046037cafb0737bf17523">0</sub> data. Both models include random intercepts for “speakers”, since each one contributed contributed with more than one data point. We also fitted a model with random slopes for "dialect", and another one with an interaction between dialect and method (which was not significant), but a model comparison using the Bayesian leave-one-out cross-validation<xref id="xref-1b76724f0be27a70b1c229ec5cb3d8d4" ref-type="fn" rid="footnote-a71906364d1842623afb42e98100b6d3">9</xref> indicated the model with only varying intercepts with no interaction as the the one with more predictive ability, both for α and for <italic id="italic-2e280fc0708991df9deb5c46fdab9382">w</italic><sub id="subscript-1325a926d20defc2347dcd9bda91d5c8">0</sub> . </p>
      <p id="paragraph-d91d50e84af7b77fb38bc9a51317ee8f">Since there is not enough previous literature to justify expected values of α <italic id="italic-15d34b415c68af70a0ed72187b05bb5a">a priori</italic> for Brazilian Portuguese, flat priors<xref id="xref-279e3bcb5df00e55072feb99a215d5b7" ref-type="fn" rid="footnote-5d29e0a75c7fd14f9c2f7a9e6947bdbe">10</xref> were used for the α model, whose coefficients are presented in Table 2.</p>
      <fig id="figure-panel-60a07c66632f6013a16fdd3439308aed">
        <label>Figure 7</label>
        <caption>
          <title><bold id="bold-222a2fdc138bd5b60f95cdbba5a6d7cd">Table 2.</bold> Coefficients of the Bayesian mixed-effects model fitted to the α data. Model: alpha ~ method + dialect + (1 | id)</title>
          <p id="paragraph-5d9fbef64a17f033fa145d4bf6878826" />
        </caption>
        <graphic id="graphic-d9592c8b11958c7580ec8a32aec56c1a" mimetype="image" mime-subtype="png" xlink:href="t2_2.png" />
      </fig>
      <p id="paragraph-142811dedc8ceed362de316e4fea59b3">In a Bayesian model, coefficients are given as probability distributions, and not as point estimates. Therefore, the values under “Estimates” in table 2 are simply one of the most probable values of each parameter (median of the distribution), but any value within the central portion of the distribution is very likely. This is why table 2 also presents the intervals of the 50% and the 95% most credible values, for comparison and assistance in interpreting the model. The estimate for the “Intercept” is that for the first level of each predictor variable in alphabetical order, so the 0.97 for the intercept indicates the most probable value of α for CE speakers under the <italic id="italic-cc2c937c35d91ba82fede70f1d9f9a16">plain duration</italic> method (but ranging from 0.85 to 1.09 in its 95% most credible interval). In the second row we can see what happens to this probable α value when changing the method from <italic id="italic-c92858d55945392df18bd4f739afe177">plain duration</italic> to <italic id="italic-3886509e6ac2a4fa0b25b06676e0f4dd">relative change</italic> – it decreases α by -0.15 (it could be from as much as -0.24 to just -0.06, according to its 95% credible interval, but still meaning a decrease). Changing the dialect from CE to SP (third row) might seem to increase α values if one only looks at the most probable value for this estimate, but the negative values within the 95% credible interval (and even within the 50% CI) show that the model cannot really tell if this change in dialect increases or decreases α. The random effects showed a probable standard deviation of 0.08 for individuals, meaning we could expect 68% of participants to vary their intercepts in + or - 0.08 (1 SD), and 95% to vary their intercepts in + or - 0.16 (2 SDs)<xref id="xref-0bc2d618b19d6201e495a86e3c7857d3" ref-type="fn" rid="footnote-0582d5237542541ae10efffcb3d628e2">11</xref>.</p>
      <p id="paragraph-1c86149fd30c48820eef1c1f949ec0b2">Figure 5 presents the posterior probability distributions of the coefficients of the model, with the thick blue lines at the medians of the distributions (the "Estimates" from table 2), the shaded blue areas corresponding to the 50% CIs, and the tails of each distribution bounded at its 99% CI.</p>
      <fig id="figure-panel-13da4ca68afdb8555a470b3085f0db6d">
        <label>Figure 8</label>
        <caption>
          <title><bold id="bold-f89f97ac01d7f86a75700636a8898d40">Figure 6</bold>. Posterior probability distributions of model coefficients for α.</title>
          <p id="paragraph-d504e2f2cc5d3fbee74c9a6dd04a5ac2" />
        </caption>
        <graphic id="graphic-4ff1650b6a9a717cc8c9b72a85bdd063" mimetype="image" mime-subtype="png" xlink:href="fig-6-alpha-coefficients_2.png" />
      </fig>
      <p id="paragraph-c00d3711b8ac8d7f1eecf1f4ef356e21">The third distribution shows the most probable values of α for CE speakers under the <italic id="italic-7c0a35b44ae86ed6ca62c31c28c8a362">plain duration</italic> method (Intercept). The second distribution, completely on the negative side, makes us confident that using the <italic id="italic-0f3a94d83e7e09b4155cd66a86c502b0">relative change</italic> method decreases α values. The first distribution, on the other hand, with 34% of its area on the negative side, indicates that changing the dialect from CE to SP, at least with our data, does not affect α<xref id="xref-a6ae5133e0d04aae11f0b085d5e812ed" ref-type="fn" rid="footnote-df7cd254c764395e116606933e80eacc">12</xref>.</p>
      <p id="paragraph-b0d854c947c65ddf2c3d37ace1a88a24">The graph with fitted values, in figure 7, reinforces the lack of effect of dialect in our data and the decrease of α values in the <italic id="italic-8442cdbe5fb21c475cf642f0445de959">relative change </italic>method. Keeping the same system used so far, the dots represent the most probable value (median of the distribution) of α, the thicker portion of each line represents its 50% most credible interval, and the thinner portions extend up to the 95% CI.</p>
      <fig id="figure-panel-86c948c9ba8f48b349aa46720e8410d8">
        <label>Figure 9</label>
        <caption>
          <title><bold id="bold-f4ddf1fa53d40c0b1e668b7ad18da454">Figure 7</bold>. Fitted values of α for CE and SP dialects, and for <italic id="italic-17ccc9abdfefc93f812574b4c278337f">plain duration</italic> and <italic id="italic-2296b0b9c4b3dfe95099a449f4c1b496">relative change </italic>methods.<bold id="bold-efb5bb6e1a22a48b5c9b89d90c8e5efb"/></title>
          <p id="paragraph-add83c5d9cf4a03e344544eeae9fa67c" />
        </caption>
        <graphic id="graphic-1bdaaa98610492c1e34a7d08047904a4" mimetype="image" mime-subtype="png" xlink:href="fig-7-alpha-fitted_2.png" />
      </fig>
      <p id="paragraph-f52bda2b9f732473e37d037bb694c4e1">Assuming Brazilian Portuguese as having a mixed rhythm, we set a regularizing prior distribution for <italic id="italic-f2bf9eb067572f2999d4604465b39ead">w</italic><sub id="subscript-f03bbd9c1198c0ec7d0a8357c5b21881">0</sub> centered at 0.5, but allowing for values between 0 and 1. We defined the prior for the Intercept as a normal distribution with mean 0.5 and standard deviation 0.25. The prior for the slopes were also regularizing priors, centered at zero to allow for changes in any direction (increasing or decreasing <italic id="italic-609ea08dccba4cdba820868c8c8be94d">w</italic><sub id="subscript-c44c8b5d694d10bbdfff01329547e8e2">0</sub>). It was defined as a normal distribution with mean 0 with a standard deviation of 0.35, to allow for both the increase or decrease of <italic id="italic-13a2983e4cb21025ea657e86b145c9c9">w</italic><sub id="subscript-2bbac35e06c7abc5f54549428320a503">0</sub>, but within the 0 to 1 range. Finally, the prior for sigma was set as a normal distribution with mean 0 and standard deviation 0.25 (truncated at 0).</p>
      <p id="paragraph-395447789835895c6b1973562a3a0d3c">The posterior distributions of the coefficients of the model for <italic id="italic-f2d5b348699d2c6fd8eeb128e84f3fc6">w</italic><sub id="subscript-99df26acd2b7ae3d627f445c55602ce0">0</sub> are presented in table 3, with the same style previously used to report on α.</p>
      <fig id="figure-panel-1343def896712aa89811a35b12937944">
        <label>Figure 10</label>
        <caption>
          <title><bold id="bold-dc9af005c8b5dca96cdb3575be6096f2">Table 3.</bold> Coefficients of the Bayesian mixed-effects model fitted to the w0 data. Model: w0 ~ method + dialect + (1 | id)</title>
          <p id="paragraph-6dd2eb387d1f211c1617d060c42dd082" />
        </caption>
        <graphic id="graphic-4d13b4cfe9eec93d58994d1aebe92224" mimetype="image" mime-subtype="png" xlink:href="t3_2.png" />
      </fig>
      <p id="paragraph-eecbe037e5ff6e46f543670a82d33faf">The most probable value for <italic id="italic-2592b6bb1c7dbb39d215e06b12a706a4">w</italic><sub id="subscript-1cb99c4af37229f7138fceef9b77001d">0</sub> for a CE speaker in the <italic id="italic-acf5674e84fdc80aeb0f047becc38a27">plain duration</italic> method (Intercept) is 0.45 (ranging from 0.36 to 0.55 in the 95% most credible interval). Changing the method to <italic id="italic-a9cc96e15ee03e89ef2fb5c26f134f7f">relative duration</italic> increases <italic id="italic-ef5065cb85464611de3c6d9b4ad76146">w</italic><sub id="subscript-1cdd7140f0616cc1905266ef9e84bb4d">0</sub>, the most probable value being +0.2, but ranging from 0.11 to 0.28 in the 95% credible interval. Changing the dialect to SP seems to increase <italic id="italic-4cb12a651f2b11b9eea41d780cc7ba4a">w</italic><sub id="subscript-d768315c7fabaa7cc28c34f32d882603">0</sub> values, but we cannot be certain at this point because part of the 95% CI of this effect crosses zero, showing there is probability that there is no effect of dialect. The 50% most credible interval stays in the positive side, showing an increase of around <italic id="italic-9c38554bab52fa63dff2128ce944f3d0">w</italic><sub id="subscript-c9fb309a957a9a369a41a864ddf9b495">0</sub> of 0.03 to 0.1, which is probably still very low to have any linguistic effect on one's perceived rhythm of speech. At the present time, there is no objective way of establishing the linguistic significance of differences in <italic id="italic-88eab3e0d9733c7c33fd4e475adf3b60">w</italic><sub id="subscript-f32c682aac6c1cce3a163e1fce5d6245">0</sub> for rhythmic classification purposes either on production or perceptual grounds. This should be a topic for future research. The random effects show a probable standard deviation of 0.05, which represents the variation of individual values in the intercept. Individual variation around <italic id="italic-28734f5eae83a12231f6dd625fcf7082">w</italic><sub id="subscript-f3f011e49ce85ecf4ba7d3517d501e71">0 </sub>was slightly smaller than individual variation in α.</p>
      <p id="paragraph-929bf30f872543a173d05b2f557fe9b9">Figure 8 presents the posterior probability distributions of the coefficients of this model. Just as was done with the previous model, the thick lines are the medians of the distributions, the shaded blue areas correspond to the 50% most credible interval of each distribution, and the tails of the distributions are bounded to the 99% CI.</p>
      <fig id="figure-panel-8f285ec92a781594920712ddec1853b3">
        <label>Figure 11</label>
        <caption>
          <title><bold id="bold-ccf261ce8fb6c4443d171855a8b41af3">Figure 8</bold>. Posterior probability distributions of model coefficients for <italic id="italic-364ae60cd1c96d0242b13a07e6ae496b">w</italic><sub id="subscript-0af24dde951edad273c194292dfa3a0b">0</sub>.</title>
          <p id="paragraph-7cca80a95fd41fbb665bc54399da8e97" />
        </caption>
        <graphic id="graphic-4ebb3244781dc3da4010e726472da611" mimetype="image" mime-subtype="png" xlink:href="fig-8-w0-coefficients_2.png" />
      </fig>
      <p id="paragraph-5c75fd5e74dfa9f6343dc0e9edaed630">The bottom distribution shows the most probable value of <italic id="italic-38dcef6b93c73aa845464478b13bcbff">w</italic><sub id="subscript-980bc8b3f9c0629c9204578d46522d56">0</sub> for a speaker from CE under the <italic id="italic-b4f3f5230ab5f40eeebe5f0a8f01dea7">plain duration</italic> method (Intercept from the model). The distribution in the middle, completely on the positive side, indicates that the <italic id="italic-a51a8853bebb4cf56228cac379ebd029">relative change</italic> method yielded higher values for <italic id="italic-65d6ba5e3e68a9bedf77cec749986446">w</italic><sub id="subscript-3aeea0740898c2b653d0841636d62a98">0</sub>. The distribution on the top has 13% of its area on the negative side, making us skeptical about an effect of dialect on <italic id="italic-b87cf91a28808fbdd1b4cd794070c77c">w</italic><sub id="subscript-a210a4f601ddd274cba20b8f67cb8a05">0</sub> with our data. </p>
      <p id="paragraph-d3e404e87f7ece019df324f5b3cc64d4">The graph with fitted values, in figure 9, reinforces the increase in <italic id="italic-4d4070b3577a3753ef9b29890c4b7516">w</italic><sub id="subscript-8089a24e6c8076afc7dcd0609ea819c1">0</sub> values derived by the <italic id="italic-ee1707716af86d8626cdd34187c9fc75">relative change</italic> method. It also shows that the difference between CE and SP is higher for <italic id="italic-ab19bf50714483a26a2949af9d149797">w</italic><sub id="subscript-62a74784470f0a5e05589de6edd743a8">0</sub> than it was for α (figure 7), but still not high enough to lead to the conclusion of an effect of dialect in our data. The graph in figure 9 uses the same system used so far, where the dots represent the most probable value (medians of the distributions) of <italic id="italic-e7c2813886e0ca8e68488ee7157d256d">w</italic><sub id="subscript-e072c9d5d184e713415fcce8c7592acb">0</sub>, the thicker portion of each line represents its 50% most credible interval, and the thinner portions extend up to the 95% CI.</p>
      <fig id="figure-panel-0110005822122e13c8d0e21e2f700693">
        <label>Figure 12</label>
        <caption>
          <title><bold id="bold-13103d17798a4009958115df2e4378e9">Figure 9</bold>. Fitted values of <italic id="italic-01e6bedf5115559bc34c1dcc15e8d22b">w</italic><sub id="subscript-18e603e5124c45200d48e62360e4367c">0</sub> for CE and SP dialects, and for <italic id="italic-fc8d4c9b247fe9a032df5a87fda89bb2">plain duration</italic> and <italic id="italic-432587b844a00c2bed7d6061ec3ae328">relative change </italic>methods.</title>
          <p id="paragraph-0b12d99b9ce6ff99d2dd36e73d849672" />
        </caption>
        <graphic id="graphic-4f10b9d924f3a1f9ee404747e063a5f8" mimetype="image" mime-subtype="png" xlink:href="fig-9-w0-fitted_2.png" />
      </fig>
    </sec>
    <sec id="heading-b7c9c0d32c272367ba21eec76b71e1d4">
      <title>3. Discussion</title>
      <p id="paragraph-30fcf23dcd1159f44875e54dfccfda39">From the point of view of the development of our methodology, the results are relevant because they show that the contour comparison method has an important effect on the estimates of both <italic id="italic-ec554ad34965df74a33861b80cad7b68">w</italic><sub id="subscript-71f284c9fbd2a7065d67f2ee174bc128">0</sub> and ɑ values. The <italic id="italic-0acbe05d8a9bb8c6ae44aaa3003195a4">relative change</italic> method moves the <italic id="italic-1034cab95e38670aaac9a66e8cfca36c">w</italic><sub id="subscript-b9c50e64d1494005dc9410a5418e1cfc">0</sub> estimates up by 0.19 in comparison with <italic id="italic-292920fea6c4f0e488424c2366c852a1">plain duration</italic>, while lowering α by 0.15. In other words, the changes caused by the methods go in opposite directions for <italic id="italic-15d05be5ba7f32f71df81dfe04ee0a34">w</italic><sub id="subscript-c51158abff9ccce7076f0313e32c9444">0</sub> and α, almost as if to cancel each other out. Since both differences are credible and their effect is not negligible, given that <italic id="italic-5ca0d4dd1ae5370d246deb602ff6d6b3">w</italic><sub id="subscript-f2c628d03ac384bcef8abaeea4f7724a">0</sub> varies between 0 and 1, further analyses of these results are necessary, so a principled decision can be made about which method to choose as the default for future work.</p>
      <p id="paragraph-6c6adbd168969797fefc3ffdd3dc4faf">From the linguistic point of view, even if the results are affected by the comparison method, they show relevant trends. Regarding the effect of dialect, the present results do not show compelling evidence of differences either in <italic id="italic-34ba2099d471af8e2e92bbf5e18543fb">w</italic><sub id="subscript-9ca5bcd1a49e89eec7ee4c5c072982ac">0</sub> or in α when we look separately at contour comparison methods. There is also not strong evidence in favor of a difference between <italic id="italic-137ac4e71d1c7c41d63f9382a1a11fc7">w</italic><sub id="subscript-3489ae5634f5972f6c49c33e364a3a15">0</sub> and α in terms of variability. These results seem to confirm some of the previously held broad assumptions about BP’s rhythmic tendencies. Results also agree with Barbosa’s assumption that <italic id="italic-5956a278fd408585fa9b228a65b78ceb">w</italic><sub id="subscript-7">0</sub> values should be fairly similar among varieties of the same language. </p>
      <p id="paragraph-9c958389663d76eedad50d97af0b0ef7">As for α, the results show that both varieties have similar values as well, and its overall variability is only slightly higher than the one seen for <italic id="italic-1167485988e1669474c5d4ae76aa1f18">w</italic><sub id="subscript-8">0</sub>, not enough to allow us to infer that it has greater variability These results do not necessarily contradict Barbosa’s assumptions, because the author does not assume that α <italic id="italic-ee24b70093f8ecfeda513be40413282f">should</italic> be more variable than <italic id="italic-8797d2290eca1e9ce8ba4e53e696652e">w</italic><sub id="subscript-9">0</sub>. Our sample is too small in order to compare between-speaker variability with between-dialect variability, so a larger sample would unveil a difference in α if there is one. Given the scarcity of previous studies on rhythmic differences between the various BP varieties, we cannot yet firmly assume that the lack of dialect effect reflects the true nature of the two varieties or is a by-product of our procedure.</p>
    </sec>
    <sec id="heading-7917aab99f77d4f7e2472fc15f6537d1">
      <title>4. Final remarks</title>
      <p id="paragraph-6788201f3781d5d6b259d3b3f0427a39">Given the complex nature of speech rhythm and the number of degrees of freedom involved in the design of a procedure such as the one we are trying to develop, the results presented here are encouraging. Except for the labeling part, the procedure is driven by scripts, making it easier to scale up studies on rhythmic characterization and variability. In comparison with Barbosa’s procedure (BARBOSA, 2004), it has the advantage of also estimating α and β.</p>
      <p id="paragraph-2">Regarding the development of the procedure, the present results indicate that further analysis of the performance of the contour comparison methods presented here are needed to gain a better understanding of the differences and make an informed decision on which of the two should be used going forward.</p>
      <p id="paragraph-9b8cd2ac2b4d3d29d7f238b578f6353b"> Notwithstanding the pending issue of the contour comparison methods, the present results suggest that the procedure generates linguistically sensible estimates for <italic id="italic-16d34ed59daabbfc1c9e10667f1e86b8">w</italic><sub id="subscript-1">0</sub> and α, corroborating the basic assumption that BP is a mix-type language in terms of rhythmic organization. The results also lean towards corroborating some assumptions made by Barbosa about the cross-dialectal and between-speaker variability of <italic id="italic-59006f87282c7556c35daa603abba782">w</italic><sub id="subscript-2">0</sub> and α, although firmer conclusions require a larger sample.</p>
      <p id="paragraph-4">On the linguistic side, once it becomes possible to select one single contour comparison method, we plan to apply the methodology to languages other than BP. Crucially, we plan to look at languages that are widely regarded in the literature as prime examples of stress-timed and syllable-timed rhythm, such as English and Spanish, respectively. This will allow us to test some of the most crucial hypotheses raised by Barbosa regarding the role of the coupling strength parameter (<italic id="italic-936488d4f079309ec62d323d47389dfa">w</italic><sub id="subscript-3">0</sub>) as an index of rhythm typology. Further in the future, if the methodology proves to be sound, we also plan to apply it to the speech of L2 learners. </p>
    </sec>
    <sec id="heading-2ca0228d640d0e3f71ed27eee684a219">
      <title>5. Acknowledgements</title>
      <p id="paragraph-24b66c1d468ec1a0c872bd803442a55d">The first author would like to thank Plinio A. Barbosa for sharing the sound files for the São Paulo variety analyzed here. The second author would like to thank the Brazilian National Council for Scientific and Technological Development – CNPq (process 438823/2018-4) for partially supporting this project. The authors also thank the reviewers for their careful reading and for their comments, which greatly improved the paper.</p>
    </sec>
    <sec id="heading-c03f343f2fc82417e05dee4e592cd85b">
      <title>References</title>
      <p id="paragraph-b092b9bc521e192456797512e0fb0533">ABERCROMBIE, David. <italic id="italic-a33597a7125b24ee9f5e196c8647bea6">Elements of General Phonetics</italic>. Edinburgh: Edinburgh University Press, 1967.</p>
      <p id="paragraph-3">BARBOSA, P. A. Elementos para uma tipologia do ritmo (lingüístico) da fala à luz de um modelo de osciladores acoplados. <italic id="italic-2">In Cognito - Cadernos Românicos em Ciências Cognitivas</italic>, v. 2, n. 1, p. 31–58, 2004.</p>
      <p id="paragraph-5">BARBOSA, P. A. Explaining cross-linguistic rhythmic variability via a coupled-oscillator model of rhythm production. 2002, Aix-en-Provence, France. <italic id="italic-3">Anais</italic>... Aix-en-Provence, France: [s.n.], 2002. p. 163–166. </p>
      <p id="paragraph-7">BARBOSA, P. A. From syntax to acoustic duration: a dynamical model of speech rhythm production. <italic id="italic-4">Speech Communication</italic>, v. 49, p. 725–742, 2007.</p>
      <p id="paragraph-9">BARBOSA, P. A. <italic id="italic-5">Incursões em torno do ritmo da fala</italic>. Campinas: Pontes, 2006. </p>
      <p id="paragraph-11">BERTINETTO, Pier Marco. Reflections on the dichotomy “stress” vs. “syllable timing”. <italic id="italic-6">Révue de Phonétique Appliquée</italic>, v. 91, p. 99–129, 1989.</p>
      <p id="paragraph-13">BÜRKNER, P. (2017). brms: An R Package for Bayesian Multilevel Models Using Stan. <italic id="italic-7">Journal of Statistical Software</italic>, 80(1), 1–28, 2017. doi: 10.18637/jss.v080.i01.</p>
      <p id="paragraph-15">BÜRKNER, P. Advanced Bayesian Multilevel Modeling with the R Package brms. <italic id="italic-8">The R Journal</italic>, 10(1), 395–411, 2018. doi: 10.32614/RJ-2018-017.</p>
      <p id="paragraph-17">CELCE-MURCIA, M. <italic id="italic-9">et al. Teaching Pronunciation: a course book and reference guide</italic>. Cambridge, UK: Cambridge University Press, 2010. </p>
      <p id="paragraph-19">DAUER, R. M. Stress-timing and syllable-timing reanalyzed. <italic id="italic-10">Journal of Phonetics</italic>, v. 11, p. 51–62, 1983.</p>
      <p id="paragraph-21">LADEFOGED, Peter. <italic id="italic-11">A Course in Phonetics</italic>. 4th. ed. Boston, MA: Heinle &amp; Heinle, 2001. </p>
      <p id="paragraph-23">PETTORINO, Massimo <italic id="italic-12">et al.</italic> VtoV: a perceptual cue for rhythm identification. In: PROSODY-DISCOURSE INTERFACE CONFERENCE 2013, 2013, [S.l: s.n.], 2013. p. 101–106. </p>
      <p id="paragraph-25">PIKE, Kenneth Lee. <italic id="italic-13">The Intonation of American English</italic>. Ann Arbor: University of Michigan Press, 1945.</p>
    </sec>
    <sec id="heading-b9a86c9941492b6a2566b572cabae97f">
      <title>Appendix</title>
      <p id="paragraph-b1b0dfaf8bc83bb3325833efe0df147e">
        <bold id="bold-3fd2733024e47932f47162af7ba6bdab">The Celce-Murcia (2010) passage</bold>
      </p>
      <p id="paragraph-d36ab1a52fbddf60e5e0c6eb5ab4c4ee">“O inglês é a sua língua nativa? Caso não seja, o seu sotaque estrangeiro pode mostrar para as pessoas que você vem de outro país. Por que é difícil falar uma língua estrangeira sem sotaque? Existem algumas respostas para essa pergunta. Primeiro, a idade é um fator importante na aprendizagem da pronúncia. Nós sabemos que crianças pequenas conseguem aprender uma segunda língua com pronúncia perfeita. Também sabemos que aprendizes mais velhos normalmente têm sotaque, apesar de alguns aprendizes mais velhos também conseguirem aprender a falar sem sotaque algum.</p>
      <p id="paragraph-5f0ac16881f8c9eaa0f89796345151d7">Outro fator que influencia a pronúncia é a sua língua materna. Falantes de inglês conseguem, por exemplo, reconhecer franceses por seus sotaques franceses. Eles também conseguem identificar falantes de espanhol ou de árabe ao telefone, apenas por escutá-los com cuidado. Isso significa que sotaques não podem ser mudados? De maneira nenhuma! Mas você não consegue mudar a sua pronúncia sem trabalhar bastante nisso. Afinal, melhorar a pronúncia é uma combinação de três coisas: muito trabalho concentrado, um bom ouvido, e uma forte ambição de soar como um falante nativo.</p>
      <p id="paragraph-79d483eab45bf4da28c0a034a3e530cd">Você também precisa de informações precisas sobre os sons do inglês, estratégias eficientes para praticar, muita exposição ao inglês falado, e paciência. Você fará progresso ou desistirá? Apenas o tempo dirá. Mas é sua decisão. Você pode melhorar! Boa sorte, e não se esqueça de estudar e praticar bastante!”</p>
      <p id="paragraph-13271c1bfbce9befb190ba194911c450" />
      <p id="paragraph-963d333679df05085db73aa1d8a2c62e">
        <bold id="bold-2">The Lobato passage</bold>
      </p>
      <p id="paragraph-6195190b38d31298aabadd7e10f46b0e">“Em seguida apareceu um papagaio real que tinha fama de orador. Subiu a tribuna de um poleiro de ouro e fez um belo discurso a respeito da arte de falar. Nesse discurso provou que os homens tinham aprendido a falar com os papagaios, e não os papagaios com os homens, como diz a ciência destes. Uma chuva de palmas acolheu suas palavras.</p>
      <p id="paragraph-6212ccf830442243354827f197673acd">O mesmo não aconteceu, porém, com a poetisa Lagartixa, que principiou a recitar uma longa poesia e engasgou no meio, acabando o recitativo em choro e faniquito. Para destruir essa má impressão vieram três vagalumes mágicos que fizeram várias sortes, sendo muito apreciada a sorte de comer fogo.”</p>
      <p id="paragraph-21988c1557782a0038ac0b4b34ad3a0a">Lobato, M. (1920) A Menina do Narizinho Arrebitado. São Paulo: Revista do Brasil.</p>
      <p id="paragraph-1de370017299de44e09bda32251e9306">Monteiro Lobato &amp; Cia. p. 18 (ortografia modernizada).</p>
    </sec>
  </body>
  <back>
    <fn-group>
      <fn id="footnote-f0ecd757fd7fc925aa874771fae4684f">
        <label>1</label>
        <p id="paragraph-06f6f820d598643591d92fdf66ddfbdc">Both passages can be found in the Appendix.</p>
      </fn>
      <fn id="footnote-7b5440395e48ccfaa3724e72270c51e8">
        <label>2</label>
        <p id="paragraph-ee751903420d2cdc4180d8e330c4ba9d">A VV unit is a syllable-sized segment delimited by two consecutive vowel onsets in running speech. A body of literature suggests VV units are better than the phonological syllable to reveal the rhythmic structure of speech (BARBOSA, P. A., 2006, 2007; PETTORINO <italic id="italic-1">et al.</italic>, 2013).</p>
      </fn>
      <fn id="footnote-881c793ce392532ad45971c4dad89a78">
        <label>3</label>
        <p id="paragraph-f7e00fd5576de425ba05a271957c4cd1">SAMPA-PB is a convention inspired by SAMPA to phonetically annotate Brazilian Portuguese using ASCII characters. See <ext-link id="external-link-1" xlink:href="https://github.com/parantes/sampa-pb">https://github.com/parantes/sampa-pb</ext-link> for more information.</p>
      </fn>
      <fn id="footnote-a1d6973104298a223e5b1327089a771e">
        <label>4</label>
        <p id="paragraph-6ee8a6ce3a2ac8b04bcb0c7b44fb9fa0">The script can be found at <ext-link id="external-link-d1e3ad3cb2aa284dfcd4c57eb736034f" xlink:href="https://github.com/parantes/duration_suite">https://github.com/parantes/duration_suite</ext-link>. It is a rewrite of the SGDetector script originally coded by Barbosa (2006). The reader is referred to Barbosa (2006) for further information on the rationale for each step of the procedure.</p>
      </fn>
      <fn id="footnote-dcf81e2db878738e068b07e2853a7247">
        <label>5</label>
        <p id="paragraph-5ebbb293fad830a1cb01eba8903fe889">Barbosa (2006, 2007) presents a probabilistic approach to incorporate the effect of the syntax-prosody interaction on phrase stress magnitude, although it does not deal with silent pause insertion.</p>
      </fn>
      <fn id="footnote-7b956e897a48d70fa23b5221fd90f288">
        <label>6</label>
        <p id="paragraph-56cefae448526767ed86b7ad6bc1b21e">The code can be accessed at <ext-link id="external-link-845237bbdd643200b671d362c23561a9" xlink:href="https://github.com/parantes/rhythm">https://github.com/parantes/rhythm</ext-link>.</p>
      </fn>
      <fn id="footnote-99546956e01aae5ebb8b35e8666f6574">
        <label>7</label>
        <p id="paragraph-5880d17e2227d3dfc44753343cbc005f">In the procedure described in Barbosa (2004), the value for α is fixed by the user, but the value may change for different speaking rates if the data to which the procedure is being applied systematically vary the speaking rate.</p>
      </fn>
      <fn id="footnote-f7b9bff149653152fa06168259bd2393">
        <label>8</label>
        <p id="paragraph-3c9beebd9d6bae0a00788ddae1306110">We also tested using Dynamic Time Warping (DTW) and Root-Mean-Square Error (RMSE) as error measures. All three measures performed well, and we decided to use MAE since it generated less variability.</p>
      </fn>
      <fn id="footnote-a71906364d1842623afb42e98100b6d3">
        <label>9</label>
        <p id="paragraph-3daf72fa9d535bb5b3dbb459e362c265">Using the LOO() function from the brms package for R (BÜRKNER, 2017; 2018).</p>
      </fn>
      <fn id="footnote-5d29e0a75c7fd14f9c2f7a9e6947bdbe">
        <label>10</label>
        <p id="paragraph-13a69ecd6942c66a6430063132cbca3a">The default values given by the <italic id="italic-4cf1f078aa098639b6c85634e403ba53">brms</italic> package for R (BÜRKNER, 2017; 2018).</p>
      </fn>
      <fn id="footnote-0582d5237542541ae10efffcb3d628e2">
        <label>11</label>
        <p id="paragraph-00ebd4ce3ed2750e8354cf43de041e2f">Note, though, that random effects are also given as probability distributions and the 95% credible interval of the effect is given in the table.</p>
      </fn>
      <fn id="footnote-df7cd254c764395e116606933e80eacc">
        <label>12</label>
        <p id="paragraph-63699062ef54dafa1cac5bf26605c113">Having 66% of the area of the posterior distribution on the positive side is not enough to recognize an effect of dialect. If we took the 95% / 5% (e.g., p = 0.05) value commonly used in frequentist statistics for decision making, 66% would be a definite "no effect".</p>
      </fn>
    </fn-group>
  </back>
</article>