Reproducibility

What Is Reproducibility?

Reproducibility refers to the ability to obtain consistent results when scientific studies are repeated or their analyses are revisited by other researchers. Put simply, a finding is considered reproducible if other scientists — by following the same methods and conditions, or by analyzing the same data using the same code — are able to reach equivalent conclusions.

This concept lies at the core of scientific reliability: robust results should persist beyond the original study. In recent years, several meta-research studies have revealed a “reproducibility crisis” across multiple fields of knowledge. A significant number of published studies have reported results that could not be confirmed in independent replication attempts.

In psychology and other social sciences, for instance, there has been a surge in failed attempts to replicate effects previously considered solid. This has raised concerns about the trustworthiness and generalizability of scientific evidence. In response, numerous initiatives have emerged to reform scientific practices within the broader movement of Open Science, aiming to enhance transparency and methodological rigor, and thereby address the crisis of confidence and improve study reproducibility.

In this context, reproducibility can be understood along different dimensions, all grounded in the fundamental principle that the same conclusions “would be” or “have been” reached in a reproducible experiment. Simkus et al. (2025) propose a five-type classification:

Type A Reproducibility (Analytical/Computational): obtaining the same results using the same data, methods, and code from the original study. This type emphasizes the importance of clear and accessible documentation to allow others to verify published results or conduct alternative analyses.
Type B Reproducibility (Analytical Robustness): reaching the same conclusion using the same data but applying different statistical methods. This type assesses the robustness of findings across various analytical approaches.
Type C Reproducibility (Repeatability/Same Team): obtaining the same conclusion with newly collected data in a new study conducted by the same team, in the same laboratory, and using the same method.
Type D Reproducibility (Replicability/Different Team): obtaining the same conclusion with new data from an independent study conducted by a different team in a different laboratory, using the same method as the original study.
Type E Reproducibility (Generalizability): obtaining the same conclusion with new data from a study that uses different experimental or analytical methods. This type tests the validity of findings under altered conditions, such as different populations or contexts.

It is important to note that the terms reproducibility and replicability are sometimes used with distinct meanings. Some authors define reproducibility as the repetition of analyses using the same data (e.g., reproducing the results of a paper using the original dataset and code), while replicability refers to repeating the study with new data to verify whether the phenomenon holds.

In this text, we adopt a broad definition: reproducibility encompasses both the reproduction of analyses (same data, same methods) and the replication of the study (new data under the same design). Both practices are essential for validating scientific findings. In all cases, the central goal is to determine whether results withstand independent scrutiny.

Why Is Reproducibility Important?

Reproducibility is a cornerstone of the scientific method. Reproducible results increase our confidence that observed phenomena are valid and not merely the outcome of chance, bias, or methodological flaws. When studies fail to reproduce previous findings, this is not inherently negative — identifying inconsistencies is, in fact, part of the self-correcting nature of science.

However, high rates of replication failure may signal systemic issues in research practices, such as weak experimental designs, underpowered studies, publication bias (a preference for publishing “positive” results), and excessive analytical flexibility (p-hacking, HARKing, and related issues).

In Linguistics — a field historically less accustomed to quantitative experimentation — the discussion around reproducibility has also gained momentum. Recent studies have shown that the adoption of transparency practices in linguistics remains slow. For example, in a sample of 600 papers from the field, fewer than 10% shared their materials, data, or protocols openly; none reported pre-registration; and only 1% mentioned having conducted a replication study (Grieve, 2021). These figures suggest ample room for improvement.

As in other disciplines, linguistics faces the challenge of ensuring that its results are trustworthy and cumulative. Various scholars and scientific organizations have proposed solutions to improve reproducibility in linguistic research, involving both methodological enhancements and cultural shifts.

Among the most commonly cited recommendations are: using more rigorous and appropriate statistical analyses; increasing sample sizes to boost statistical power; and adopting a more open approach throughout all stages of the research process — from hypothesis registration and experimental design to public sharing of data, code, and materials, as well as transparent reporting of results (regardless of whether they are positive or null).

In the next sections, we offer practical guidance for authors interested in conducting reproducibility studies in linguistics. The recommendations are organized in stages — from initial planning (including pre-registration) through execution and dissemination. Following them can help align your work with current best practices, improving both its internal quality and its potential contribution to the field.

Practical Recommendations for Authors

1. Study Planning and Initial Pre-registration

Selecting a target study: The first step is to select a published result or study that is relevant and worthy of verification. This may involve an influential finding whose robustness you wish to test, a result with important theoretical implications, or even a contradictory outcome in the existing literature.

Keep in mind that replicating a study is not about hunting for errors, but about clarifying the validity and generalizability of a phenomenon — it is part of the cumulative progress of science. Approach replication with a collaborative spirit, not as a “criminal investigation” of the original research.

Review and design: Carefully examine the original paper, its methods, data, and analyses. Aim to understand all the details necessary to repeat the procedure. Then, develop a complete replication plan, defining a priori the research questions or hypotheses to be tested, the experimental design, sample size and data collection criteria (including a power analysis to ensure statistical adequacy), the dependent and independent variables, and the statistical analysis plan.

The more faithfully the plan reproduces the original study (in the case of a direct replication), the more meaningfully the results can be compared. It is also advisable to anticipate, from the outset, any necessary adaptations — such as linguistic adjustments or changes based on the characteristics of the new sample — and to document all such decisions clearly.

Pre-registration and Registered Reports: Once the study protocol is finalized, register it publicly before any data collection begins. Pre-registration involves submitting your experimental plan to an open-access platform (such as the Open Science Framework, AsPredicted, or similar), making the planned hypotheses and methods publicly available. This practice helps prevent post hoc alterations of the research goals (HARKing) and strengthens the credibility of confirmatory results.

In addition, consider submitting the protocol as a Registered Report to a scientific journal. In this editorial format, the manuscript is peer-reviewed before data collection begins. If approved, the study receives in-principle acceptance for publication, regardless of the results, provided the research is conducted according to the registered plan.

The journal Cadernos de Linguística accepts submissions in this format. Authors are encouraged to first submit a registered report containing the introduction, methodology, and proposed analysis plan. Only after this initial stage is approved by the editorial board should data collection proceed. This ensures that the study is evaluated based on methodological rigor, not on the outcome of the results.

Important: When pre-registering your study, be as detailed as possible. Include inclusion/exclusion criteria, how outliers will be handled, which statistical comparisons are planned, and all variables to be collected — even those not directly tied to the main hypothesis. This level of transparency prevents suspicions of ad hoc variable or condition selection.

A detailed pre-registration demonstrates a serious commitment to the study plan and enhances the reliability of confirmatory analyses.

Methods: The methods section should provide a thorough description of all planned procedures, with enough detail to allow other researchers to reproduce the study exactly. This includes:

Study procedures: Describe each step of the experiment in detail. Any future deviations from the original plan must be documented and justified in the final report.
Sample characteristics: Provide a full description of the intended participants, objective inclusion and exclusion criteria, and how technical failures or data errors will be handled. If data substitution is allowed, specify the criteria for doing so. Attention to sample diversity is critical to avoiding WEIRD (Western, Educated, Industrialized, Rich, Democratic) bias. Experimental linguistics, for instance, is heavily skewed toward Indo-European languages, particularly English. In studies aiming for broader generalizability (e.g., Type E Reproducibility), systematic sample heterogenization — varying age, gender, or context of data collection — can be an effective strategy to embrace natural variability and strengthen the robustness of findings.
Analysis plan: Clearly specify all planned analyses, including preprocessing procedures and corrections for multiple comparisons. List any covariates or regressors, and define any analytical contingencies — that is, decisions that depend on the outcome of earlier results. Only pre-registered analyses should appear in the main results section; exploratory analyses may be reported separately. Whenever possible, involve a statistician or ensure that team members have solid training in data analysis.
Timeline: Provide an estimated timeline for completing the study and the expected date for submitting the final manuscript.

2. Reproducibility Research Report (Execution and Reporting of Results)

With the study protocol pre-registered and approved, proceed to the replication phase by following the outlined protocol as strictly as possible. The fundamental principle here is methodological fidelity: every decision must adhere to the previously defined design. Whenever feasible, conduct a direct replication, reproducing the original study conditions — including the same stimuli, procedures, experimental environment, exposure times, and participant instructions.

It is acknowledged that minor deviations may be unavoidable — particularly in linguistic research, where sociocultural contexts and sample profiles naturally vary. Still, the goal is to minimize unplanned variations. Any deviation from the pre-registered plan must be carefully documented. This includes technical issues, adjustments to participant characteristics, or changes to experimental materials. These notes should be transparently incorporated into the final Reproducibility Report.

In addition to strictly following the original plan, aim to implement best methodological practices that improve the overall quality of the experiment — even if such practices were absent in the original study. Key practices include:

Randomization of participant assignment to conditions or of stimulus presentation order, in order to control for order effects and experimenter bias;
Blinding of participants — and ideally also of experimenters or evaluators — to the condition being tested, reducing the influence of expectations or unintentional cues;
Full procedural standardization to ensure that every step is executed uniformly across all participants and researchers.

These procedures reduce the risk of peripheral factors distorting the results. If the original study did not include them, their use in replication can also serve to test the robustness of the phenomenon under more controlled conditions — aligning with what is known as Type E Reproducibility (generalizability).

Another essential consideration is statistical power. One of the most frequently cited causes of the reproducibility crisis is the widespread use of underpowered studies, which lead to unstable results and an increased risk of false negatives. Therefore, the replication study should be planned with a sample size equal to or greater than that of the original study, based on a power analysis conducted a priori. Strong replications typically involve larger samples, which improve estimation accuracy and inferential reliability.

For example, if the original study reported a significant effect with N = 30 per group, planning the replication with N = 60 or N = 100 per group may double or triple the statistical power, making it more likely to detect (or confidently reject) the same effect. In reproducibility studies, it is better to err on the side of too much data than to repeat the mistake of an underpowered design.

Note: In some cases, practical limitations such as logistical constraints or small populations may make large samples unfeasible. In such situations, these limitations should be addressed in the pre-registration, including a clear statement of the intended power and the interpretive implications of working with a smaller sample. Results from small samples should be interpreted with caution and limited generalizability.

The experimental phase should be conducted with attention to detail and professional rigor. This includes thorough checking of all equipment (questionnaires, software, hardware, scripts), ensuring that all team members are properly trained, and confirming that all participants are exposed to the conditions exactly as planned.

It is essential to maintain a clear distinction between confirmatory and exploratory research. No new data collection, exclusions, or unplanned analyses should be introduced without proper justification and formal documentation. If such changes become necessary, they must be labeled as deviations and described in the final Reproducibility Report. Any ad hoc improvisation undermines the confirmatory nature of the study.

If unforeseen events occur during data collection — such as a failed stimulus, a participant withdrawing, or an external interruption — handle them according to the criteria pre-defined in the pre-registered protocol (e.g., data exclusion, participant replacement, rescheduling of sessions). Never make ad hoc decisions. Every unexpected occurrence must be documented, including the date, description, and justification.

Maintain a detailed lab log throughout the process, recording data collection dates, experimental conditions, technical issues, challenges encountered, and any relevant participant feedback. While not all of this information will be included in the final article, such records are crucial for ensuring traceability, identifying sources of variability, and strengthening the study’s overall reliability.

3. Transparency, Documentation, and Data Sharing

The post-data collection stage is just as critical as the planning and execution phases for ensuring reproducibility. At this point, it is essential to adopt practices of transparency, rigorous documentation, and open data sharing so that other researchers can verify, reproduce, and benefit from your replication study.

Data organization and analysis: Once data collection is complete, carry out all analyses exactly as outlined in the pre-registered plan. Do not adjust models, exclude data without justification, or alter hypotheses or criteria based on the results obtained — such practices would compromise the confirmatory nature of the study. If additional analyses not included in the protocol become necessary, they must be clearly labeled as exploratory and presented in a separate section of the manuscript.

All analyses pre-registered in the protocol should appear in the manuscript, unless a specific analysis can be logically demonstrated to be invalid or inappropriate — in which case, the decision must be clearly justified. Always report exact p-values, effect sizes, and confidence intervals for all inferential analyses. This avoids overreliance on p-values and contributes to a more meaningful interpretation of the findings.

Document the entire analytical process using complete and well-commented scripts (in R, Python, SPSS, etc.), detailing each step taken. A highly recommended practice is to conduct an internal reproduction: ask a colleague who was not involved in the study to run your scripts using the raw data and check whether the same results are produced (compute and compare). This helps identify inconsistencies and increases the reliability of the study.

Tools like R Markdown, the knitr package (for R), and Jupyter Notebook (for Python) are strongly recommended, as they integrate code, documentation, and results into a single reproducible document. These formats enhance traceability and transparency, making peer review and third-party replication easier and more robust.

Sharing data, code, and materials: Publicly sharing your datasets, materials, and scripts is one of the most effective ways to promote reproducibility. Unless there are ethical or legal constraints, raw anonymized data should be deposited in open-access repositories such as OSF, Zenodo, Figshare, or domain-specific repositories such as TROLLing, IRIS, or CLARIN ERIC. The same applies to analysis scripts and experimental materials (stimulus lists, instructions, audio/video files, questionnaires, etc.).

When sharing, include a README file that describes the contents of each folder or file and provides instructions for reproducing the analyses. Many repositories assign a DOI (Digital Object Identifier), allowing these materials to be formally cited and reliably retrieved. Adhering to the FAIR principles — Findable, Accessible, Interoperable, Reusable — is highly recommended and is increasingly recognized by funding agencies, journals, and data platforms as a benchmark for quality and reproducibility.

Limitations on data sharing: If data cannot be fully shared — for example, speech recordings that could identify participants or data protected under institutional agreements — disclose this transparently. Alternatives may include sharing metadata, creating synthetic datasets that preserve the statistical structure of the original data, or providing only aggregate statistics. When possible, indicate that data may be made available upon request under an appropriate data use agreement. What matters is demonstrating a commitment to openness within ethical and legal constraints.

Experiment documentation: Transparency also extends to detailed documentation of the experimental procedures. Consider providing supplementary materials such as the consent form, participant instructions, specifications of the experimental setting, timelines, data collection protocols, and any other relevant information that would allow another researcher to replicate the study precisely. Document all deviations from the original plan, even seemingly minor ones — for example: “We originally planned to exclude reaction times under 300 ms, but after data collection we revised the threshold to 200 ms due to [reason].”

The journal Cadernos de Linguística actively encourages open science practices and awards specific badges to articles that share data, materials, and protocols. These badges — such as Open Data, Open Materials, and Preregistration — signal to readers that the article adheres to current standards of transparency and reproducibility. If your study meets the requirements, be sure to request these badges when submitting your manuscript.

Transparency applies equally (and especially) to null or negative results. Publishing replications that fail to reproduce original effects is essential to combat publication bias and to strengthen cumulative knowledge. By reporting such outcomes with methodological clarity, you contribute to a more honest, robust, and scientifically useful literature. A well-executed failed replication can be just as informative as a successful one.

4. Critical Analysis and Dissemination of Results

Interpreting the findings: After completing the replication, the critical analysis of results should be guided by the objectives established in the preregistered protocol. Avoid rushed judgments or interpretations based on expectations of confirmation. The focus should be on methodological consistency and what the data actually show in comparison to the original study. Broadly speaking, three scenarios may emerge:

Successful replication: The results consistently confirm the original finding. This reinforces the robustness of the investigated phenomenon. Even so, examine nuances: Was there any difference in effect size? Was the pattern consistent across all subgroups? A replication with greater methodological rigor (e.g., larger sample size, blinding, randomization) may even increase confidence in the validity of the effect. Example: “We replicated effect X using a larger sample and a double-blind protocol, thus strengthening the original evidence.”
Replication failed to find the effect: The data did not confirm the original result — either due to lack of significance, reversal of the effect, or some other divergent pattern. These findings are equally valuable. Avoid sensational interpretations (“the original study was wrong”) and focus on technical analysis: Could the original effect have been a false positive? Are there differences in design, sample, or sociocultural context that might explain the divergence? Did your replication correct limitations of the original study? A balanced approach enhances the value of a non-replication.
Partially successful replication: Some aspects were replicated, others were not. The effect may have appeared only in a subgroup or with smaller magnitude. Describe the differences in detail and explore plausible contextual hypotheses — especially relevant in linguistic studies, where cultural, dialectal, or pragmatic variations can impact replicability. Explicitly recognizing such variation helps refine theoretical models and improves our understanding of the sensitivity of effects across different contexts.

Writing the manuscript: When drafting the article, maintain the logical and transparent structure of scientific reporting. The introduction should essentially follow the preregistered version, with minor stylistic adjustments and verb tense shifted to the past. Hypotheses must not be altered or expanded. Studies published after the preregistration may be discussed in the Discussion section, provided they are clearly identified as subsequent developments.

The Results section must report all confirmatory analyses as planned, including exact p-values, effect sizes, and confidence intervals. If a preregistered analysis turned out to be logically flawed or inapplicable, explain this clearly. If additional exploratory analyses are conducted, include them in a separate section, with methodological justification and explicit distinction from the planned analyses. The article’s conclusions must be based solely on the confirmatory results.

Do not omit variables or unexpected results. Selectively excluding data that “didn’t support expectations” distorts scientific interpretation. On the contrary, reporting null or divergent results enhances the credibility of the study. Indicate, case by case, whether each preregistered hypothesis was confirmed. Use descriptive and objective language — avoid evaluative terms such as “the replication failed” or “was a success.” Instead, use formulations like: “We did not observe effect X (p=0.45), while the original study reported p<0.01.” This keeps the focus on the data and avoids unnecessary judgment or blame.

When possible, consider contacting the authors of the original study. Some journals — such as Cadernos de Linguística — offer space for original authors to comment on replications of their work, fostering constructive dialogue and scientific transparency.

Submission and publication: Once the manuscript is complete, submit it to the journal of your choice, ensuring that all open materials (data, scripts, protocols) are accessible via trusted repositories. In the case of Cadernos de Linguística, the peer-review process is open and transparent: reviews are signed, and if the article is accepted, both reviews and author responses are published as supplementary material. This promotes editorial accountability, traceability, and encourages constructive peer critique.

By rigorously following all methodological steps — including preregistration, faithful execution of the plan, transparent documentation, and open sharing — your replication is likely to be positively evaluated, even if the results do not confirm the original study. Cadernos de Linguística and other journals committed to open science do not reject manuscripts based on the “novelty” of findings, but rather on methodological quality and contribution to the cumulative body of knowledge.

Dissemination: After publication, promote the study through academic networks, seminars, and conferences. Share your experience with colleagues and students, and encourage the development of new replications. The practice of reproducibility not only strengthens the accumulation of knowledge, but also promotes a culture of openness and methodological rigor in linguistics — contributing to a more transparent, trustworthy, and self-correcting science.

Recommended Readings

ALGER, Bradley E. Defense of the Scientific Hypothesis: From Reproducibility Crisis to Big Data. New York: Oxford University Press, 2020. A book defending the role of clear and testable scientific hypotheses amid the reproducibility crisis. It explores how inadequate practices threaten confidence in science, and how hypothesis-driven reasoning, combined with tools like big data and robust statistics, can help restore the credibility of discoveries.
ATMANSPACHER, Harald; MAASEN, Sabine (Eds.). Reproducibility: Principles, Problems, Practices, and Prospects. Chichester: John Wiley & Sons, 2016. A collection of essays by various authors discussing philosophical and methodological principles of reproducibility, challenges faced in different disciplines, recommended practices, and future prospects for open and reproducible science.
BAUSELL, R. Barker. The Problem with Science: The Reproducibility Crisis and What to Do About It. New York: Oxford University Press, 2021. A book that dissects the so-called reproducibility crisis, exploring contributing factors such as publication bias, questionable research practices, and lack of rigor. It proposes changes — including preregistration and data sharing — to enhance the reliability of scientific findings.
BOCHYNSKA, A.; KEEBLE, L.; HALFACRE, C.; CASILLAS, J. V.; CHAMPAGNE, I.; CHEN, K.; RÖTHLISBERGER, M.; BUCHANAN, E. M.; ROETTGER, T. B. Reproducible research practices and transparency across linguistics. Glossa Psycholinguistics, vol. 2, no. 1, 2023. This article assesses the prevalence of transparency practices in 600 linguistics papers over time — open access, data/materials sharing, preregistration, etc. — revealing low adoption rates up to 2018/19. It offers a set of recommendations for researchers, journals, and institutions to promote reproducibility in linguistics.
BRANDT, M. J.; IJZERMAN, H.; DIJKSTERHUIS, A.; FARACH, F. J.; GELLER, J.; GINER-SOROLLA, R.; GRANGE, J. A.; PERUGINI, M.; SPIES, J. R.; VEER, A. V. The replication recipe: what makes for a convincing replication? Journal of Experimental Social Psychology, vol. 50, 2014. A seminal article offering a practical “recipe” for conducting rigorous replications in social psychology. It defines standardized criteria for convincing replications — such as faithful reproduction of the original method, high statistical power, collaboration with original authors, and preregistration — and discusses how to assess replication outcomes. Useful as a step-by-step guide across disciplines.
DUDDA, L.; KORMANN, E.; KOZULA, M.; DEVITO, N. J.; KLEBEL, T.; DEWI, A. P. M.; SPIJKER, R.; STEGEMAN, I.; EYNDEN, V. V. D.; ROSS-HELLAUER, T.; LEEFLANG, M. M. G. Open science interventions to improve reproducibility and replicability of research: a scoping review. Royal Society Open Science, vol. 12, 2025. A scoping review of open science interventions aimed at improving reproducibility and replicability. It analyzes 105 studies that empirically evaluated practices such as preregistration, data sharing, null result publication, open peer review, and others — discussing which have evidence of effectiveness and where knowledge gaps remain.
GRIEVE, Jack. Observation, experimentation, and replication in linguistics. Linguistics, vol. 59, no. 5, 2021. This article discusses the challenges of replication in linguistics, arguing that replication failures may result not just from poor practices but from the inherently social nature of language — since social context varies between studies. It advocates caution in the use of experimental methods and highlights the value of observational approaches.
PORTE, Graeme; McMANUS, Kevin. Doing Replication Research in Applied Linguistics. New York: Routledge, 2019. A book fully dedicated to conducting replication studies in applied linguistics. It covers identifying suitable target studies, detailed planning (direct, conceptual, and partial replications), execution, analysis, and writing, along with publication advice. A highly accessible and essential read for those wishing to engage in replication research in the language sciences.
SIMKUS, Andrea; COOLEN-MATURI, Tahani; COOLEN, Frank P. A.; BENDTSEN, Claus. Statistical Perspectives on Reproducibility: Definitions and Challenges. Journal of Statistical Theory and Practice, vol. 19, 2025. This article reviews the literature on reproducibility, defining five types of reproducibility and discussing the causes of low reproducibility across fields, as well as statistical approaches to quantify and improve it.

Reproducibility

Cadernos de Linguística supports the Opens Science movement