how to find statistical treatment in a research paper

Community Blog

Keep up-to-date on postgraduate related issues with our quick reads written by students, postdocs, professors and industry leaders.

Statistical Treatment of Data – Explained & Example

Picture of DiscoverPhDs

  • By DiscoverPhDs
  • September 8, 2020

Statistical Treatment of Data in Research

‘Statistical treatment’ is when you apply a statistical method to a data set to draw meaning from it. Statistical treatment can be either descriptive statistics, which describes the relationship between variables in a population, or inferential statistics, which tests a hypothesis by making inferences from the collected data.

Introduction to Statistical Treatment in Research

Every research student, regardless of whether they are a biologist, computer scientist or psychologist, must have a basic understanding of statistical treatment if their study is to be reliable.

This is because designing experiments and collecting data are only a small part of conducting research. The other components, which are often not so well understood by new researchers, are the analysis, interpretation and presentation of the data. This is just as important, if not more important, as this is where meaning is extracted from the study .

What is Statistical Treatment of Data?

Statistical treatment of data is when you apply some form of statistical method to a data set to transform it from a group of meaningless numbers into meaningful output.

Statistical treatment of data involves the use of statistical methods such as:

  • regression,
  • conditional probability,
  • standard deviation and
  • distribution range.

These statistical methods allow us to investigate the statistical relationships between the data and identify possible errors in the study.

In addition to being able to identify trends, statistical treatment also allows us to organise and process our data in the first place. This is because when carrying out statistical analysis of our data, it is generally more useful to draw several conclusions for each subgroup within our population than to draw a single, more general conclusion for the whole population. However, to do this, we need to be able to classify the population into different subgroups so that we can later break down our data in the same way before analysing it.

Statistical Treatment Example – Quantitative Research

Statistical Treatment of Data Example

For a statistical treatment of data example, consider a medical study that is investigating the effect of a drug on the human population. As the drug can affect different people in different ways based on parameters such as gender, age and race, the researchers would want to group the data into different subgroups based on these parameters to determine how each one affects the effectiveness of the drug. Categorising the data in this way is an example of performing basic statistical treatment.

Type of Errors

A fundamental part of statistical treatment is using statistical methods to identify possible outliers and errors. No matter how careful we are, all experiments are subject to inaccuracies resulting from two types of errors: systematic errors and random errors.

Systematic errors are errors associated with either the equipment being used to collect the data or with the method in which they are used. Random errors are errors that occur unknowingly or unpredictably in the experimental configuration, such as internal deformations within specimens or small voltage fluctuations in measurement testing instruments.

These experimental errors, in turn, can lead to two types of conclusion errors: type I errors and type II errors . A type I error is a false positive which occurs when a researcher rejects a true null hypothesis. On the other hand, a type II error is a false negative which occurs when a researcher fails to reject a false null hypothesis.

Rationale for Research

The term rationale of research means the reason for performing the research study in question.

Science Investigatory Project

A science investigatory project is a science-based research project or study that is performed by school children in a classroom, exhibition or science fair.

Unit of Analysis

The unit of analysis refers to the main parameter that you’re investigating in your research project or study.

Join thousands of other students and stay up to date with the latest PhD programmes, funding opportunities and advice.

how to find statistical treatment in a research paper

Browse PhDs Now

What is an Academic Transcript?

An academic transcript gives a breakdown of each module you studied for your degree and the mark that you were awarded.

PhD Imposter Syndrome

Impostor Syndrome is a common phenomenon amongst PhD students, leading to self-doubt and fear of being exposed as a “fraud”. How can we overcome these feelings?

how to find statistical treatment in a research paper

Dr Ilesanmi has a PhD in Applied Biochemistry from the Federal University of Technology Akure, Ondo State, Nigeria. He is now a lecturer in the Department of Biochemistry at the Federal University Otuoke, Bayelsa State, Nigeria.

how to find statistical treatment in a research paper

Freya’s in the final year of her PhD at the University of Leeds. Her project is about improving the precision of observations between collocated ground-based weather radar and airborne platforms.

Join Thousands of Students

Research Paper Statistical Treatment of Data: A Primer

We can all agree that analyzing and presenting data effectively in a research paper is critical, yet often challenging.

This primer on statistical treatment of data will equip you with the key concepts and procedures to accurately analyze and clearly convey research findings.

You'll discover the fundamentals of statistical analysis and data management, the common quantitative and qualitative techniques, how to visually represent data, and best practices for writing the results - all framed specifically for research papers.

If you are curious on how AI can help you with statistica analysis for research, check Hepta AI .

Introduction to Statistical Treatment in Research

Statistical analysis is a crucial component of both quantitative and qualitative research. Properly treating data enables researchers to draw valid conclusions from their studies. This primer provides an introductory guide to fundamental statistical concepts and methods for manuscripts.

Understanding the Importance of Statistical Treatment

Careful statistical treatment demonstrates the reliability of results and ensures findings are grounded in robust quantitative evidence. From determining appropriate sample sizes to selecting accurate analytical tests, statistical rigor adds credibility. Both quantitative and qualitative papers benefit from precise data handling.

Objectives of the Primer

This primer aims to equip researchers with best practices for:

Statistical tools to apply during different research phases

Techniques to manage, analyze, and present data

Methods to demonstrate the validity and reliability of measurements

By covering fundamental concepts ranging from descriptive statistics to measurement validity, it enables both novice and experienced researchers to incorporate proper statistical treatment.

Navigating the Primer: Key Topics and Audience

The primer spans introductory topics including:

Research planning and design

Data collection, management, analysis

Result presentation and interpretation

While useful for researchers at any career stage, earlier-career scientists with limited statistical exposure will find it particularly valuable as they prepare manuscripts.

How do you write a statistical method in a research paper?

Statistical methods are a critical component of research papers, allowing you to analyze, interpret, and draw conclusions from your study data. When writing the statistical methods section, you need to provide enough detail so readers can evaluate the appropriateness of the methods you used.

Here are some key things to include when describing statistical methods in a research paper:

Type of Statistical Tests Used

Specify the types of statistical tests performed on the data, including:

Parametric vs nonparametric tests

Descriptive statistics (means, standard deviations)

Inferential statistics (t-tests, ANOVA, regression, etc.)

Statistical significance level (often p < 0.05)

For example: We used t-tests and one-way ANOVA to compare means across groups, with statistical significance set at p < 0.05.

Analysis of Subgroups

If you examined subgroups or additional variables, describe the methods used for these analyses.

For example: We stratified data by gender and used chi-square tests to analyze differences between subgroups.

Software and Versions

List any statistical software packages used for analysis, including version numbers. Common programs include SPSS, SAS, R, and Stata.

For example: Data were analyzed using SPSS version 25 (IBM Corp, Armonk, NY).

The key is to give readers enough detail to assess the rigor and appropriateness of your statistical methods. The methods should align with your research aims and design. Keep explanations clear and concise using consistent terminology throughout the paper.

What are the 5 statistical treatment in research?

The five most common statistical treatments used in academic research papers include:

The mean, or average, is used to describe the central tendency of a dataset. It provides a singular value that represents the middle of a distribution of numbers. Calculating means allows researchers to characterize typical observations within a sample.

Standard Deviation

Standard deviation measures the amount of variability in a dataset. A low standard deviation indicates observations are clustered closely around the mean, while a high standard deviation signifies the data is more spread out. Reporting standard deviations helps readers contextualize means.

Regression Analysis

Regression analysis models the relationship between independent and dependent variables. It generates an equation that predicts changes in the dependent variable based on changes in the independents. Regressions are useful for hypothesizing causal connections between variables.

Hypothesis Testing

Hypothesis testing evaluates assumptions about population parameters based on statistics calculated from a sample. Common hypothesis tests include t-tests, ANOVA, and chi-squared. These quantify the likelihood of observed differences being due to chance.

Sample Size Determination

Sample size calculations identify the minimum number of observations needed to detect effects of a given size at a desired statistical power. Appropriate sampling ensures studies can uncover true relationships within the constraints of resource limitations.

These five statistical analysis methods form the backbone of most quantitative research processes. Correct application allows researchers to characterize data trends, model predictive relationships, and make probabilistic inferences regarding broader populations. Expertise in these techniques is fundamental for producing valid, reliable, and publishable academic studies.

How do you know what statistical treatment to use in research?

The selection of appropriate statistical methods for the treatment of data in a research paper depends on three key factors:

The Aim and Objective of the Study

The aim and objectives that the study seeks to achieve will determine the type of statistical analysis required.

Descriptive research presenting characteristics of the data may only require descriptive statistics like measures of central tendency (mean, median, mode) and dispersion (range, standard deviation).

Studies aiming to establish relationships or differences between variables need inferential statistics like correlation, t-tests, ANOVA, regression etc.

Predictive modeling research requires methods like regression, discriminant analysis, logistic regression etc.

Thus, clearly identifying the research purpose and objectives is the first step in planning appropriate statistical treatment.

Type and Distribution of Data

The type of data (categorical, numerical) and its distribution (normal, skewed) also guide the choice of statistical techniques.

Parametric tests have assumptions related to normality and homogeneity of variance.

Non-parametric methods are distribution-free and better suited for non-normal or categorical data.

Testing data distribution and characteristics is therefore vital.

Nature of Observations

Statistical methods also differ based on whether the observations are paired or unpaired.

Analyzing changes within one group requires paired tests like paired t-test, Wilcoxon signed-rank test etc.

Comparing between two or more independent groups needs unpaired tests like independent t-test, ANOVA, Kruskal-Wallis test etc.

Thus the nature of observations is pivotal in selecting suitable statistical analyses.

In summary, clearly defining the research objectives, testing the collected data, and understanding the observational units guides proper statistical treatment and interpretation.

What is statistical techniques in research paper?

Statistical methods are essential tools in scientific research papers. They allow researchers to summarize, analyze, interpret and present data in meaningful ways.

Some key statistical techniques used in research papers include:

Descriptive statistics: These provide simple summaries of the sample and the measures. Common examples include measures of central tendency (mean, median, mode), measures of variability (range, standard deviation) and graphs (histograms, pie charts).

Inferential statistics: These help make inferences and predictions about a population from a sample. Common techniques include estimation of parameters, hypothesis testing, correlation and regression analysis.

Analysis of variance (ANOVA): This technique allows researchers to compare means across multiple groups and determine statistical significance.

Factor analysis: This technique identifies underlying relationships between variables and latent constructs. It allows reducing a large set of variables into fewer factors.

Structural equation modeling: This technique estimates causal relationships using both latent and observed factors. It is widely used for testing theoretical models in social sciences.

Proper statistical treatment and presentation of data are crucial for the integrity of any quantitative research paper. Statistical techniques help establish validity, account for errors, test hypotheses, build models and derive meaningful insights from the research.

Fundamental Concepts and Data Management

Exploring basic statistical terms.

Understanding key statistical concepts is essential for effective research design and data analysis. This includes defining key terms like:

Statistics : The science of collecting, organizing, analyzing, and interpreting numerical data to draw conclusions or make predictions.

Variables : Characteristics or attributes of the study participants that can take on different values.

Measurement : The process of assigning numbers to variables based on a set of rules.

Sampling : Selecting a subset of a larger population to estimate characteristics of the whole population.

Data types : Quantitative (numerical) or qualitative (categorical) data.

Descriptive vs. inferential statistics : Descriptive statistics summarize data while inferential statistics allow making conclusions from the sample to the larger population.

Ensuring Validity and Reliability in Measurement

When selecting measurement instruments, it is critical they demonstrate:

Validity : The extent to which the instrument measures what it intends to measure.

Reliability : The consistency of measurement over time and across raters.

Researchers should choose instruments aligned to their research questions and study methodology .

Data Management Essentials

Proper data management requires:

Ethical collection procedures respecting autonomy, justice, beneficence and non-maleficence.

Handling missing data through deletion, imputation or modeling procedures.

Data cleaning by identifying and fixing errors, inconsistencies and duplicates.

Data screening via visual inspection and statistical methods to detect anomalies.

Data Management Techniques and Ethical Considerations

Ethical data management includes:

Obtaining informed consent from all participants.

Anonymization and encryption to protect privacy.

Secure data storage and transfer procedures.

Responsible use of statistical tools free from manipulation or misrepresentation.

Adhering to ethical guidelines preserves public trust in the integrity of research.

Statistical Methods and Procedures

This section provides an introduction to key quantitative analysis techniques and guidance on when to apply them to different types of research questions and data.

Descriptive Statistics and Data Summarization

Descriptive statistics summarize and organize data characteristics such as central tendency, variability, and distributions. Common descriptive statistical methods include:

Measures of central tendency (mean, median, mode)

Measures of variability (range, interquartile range, standard deviation)

Graphical representations (histograms, box plots, scatter plots)

Frequency distributions and percentages

These methods help describe and summarize the sample data so researchers can spot patterns and trends.

Inferential Statistics for Generalizing Findings

While descriptive statistics summarize sample data, inferential statistics help generalize findings to the larger population. Common techniques include:

Hypothesis testing with t-tests, ANOVA

Correlation and regression analysis

Nonparametric tests

These methods allow researchers to draw conclusions and make predictions about the broader population based on the sample data.

Selecting the Right Statistical Tools

Choosing the appropriate analyses involves assessing:

The research design and questions asked

Type of data (categorical, continuous)

Data distributions

Statistical assumptions required

Matching the correct statistical tests to these elements helps ensure accurate results.

Statistical Treatment of Data for Quantitative Research

For quantitative research, common statistical data treatments include:

Testing data reliability and validity

Checking assumptions of statistical tests

Transforming non-normal data

Identifying and handling outliers

Applying appropriate analyses for the research questions and data type

Examples and case studies help demonstrate correct application of statistical tests.

Approaches to Qualitative Data Analysis

Qualitative data is analyzed through methods like:

Thematic analysis

Content analysis

Discourse analysis

Grounded theory

These help researchers discover concepts and patterns within non-numerical data to derive rich insights.

Data Presentation and Research Method

Crafting effective visuals for data presentation.

When presenting analyzed results and statistics in a research paper, well-designed tables, graphs, and charts are key for clearly showcasing patterns in the data to readers. Adhering to formatting standards like APA helps ensure professional data presentation. Consider these best practices:

Choose the appropriate visual type based on the type of data and relationship being depicted. For example, bar charts for comparing categorical data, line graphs to show trends over time.

Label the x-axis, y-axis, legends clearly. Include informative captions.

Use consistent, readable fonts and sizing. Avoid clutter with unnecessary elements. White space can aid readability.

Order data logically. Such as largest to smallest values, or chronologically.

Include clear statistical notations, like error bars, where applicable.

Following academic standards for visuals lends credibility while making interpretation intuitive for readers.

Writing the Results Section with Clarity

When writing the quantitative Results section, aim for clarity by balancing statistical reporting with interpretation of findings. Consider this structure:

Open with an overview of the analysis approach and measurements used.

Break down results by logical subsections for each hypothesis, construct measured etc.

Report exact statistics first, followed by interpretation of their meaning. For example, “Participants exposed to the intervention had significantly higher average scores (M=78, SD=3.2) compared to controls (M=71, SD=4.1), t(115)=3.42, p = 0.001. This suggests the intervention was highly effective for increasing scores.”

Use present verb tense. And scientific, formal language.

Include tables/figures where they aid understanding or visualization.

Writing results clearly gives readers deeper context around statistical findings.

Highlighting Research Method and Design

With a results section full of statistics, it's vital to communicate key aspects of the research method and design. Consider including:

Brief overview of study variables, materials, apparatus used. Helps reproducibility.

Descriptions of study sampling techniques, data collection procedures. Supports transparency.

Explanations around approaches to measurement, data analysis performed. Bolsters methodological rigor.

Noting control variables, attempts to limit biases etc. Demonstrates awareness of limitations.

Covering these methodological details shows readers the care taken in designing the study and analyzing the results obtained.

Acknowledging Limitations and Addressing Biases

Honestly recognizing methodological weaknesses and limitations goes a long way in establishing credibility within the published discussion section. Consider transparently noting:

Measurement errors and biases that may have impacted findings.

Limitations around sampling methods that constrain generalizability.

Caveats related to statistical assumptions, analysis techniques applied.

Attempts made to control/account for biases and directions for future research.

Rather than detracting value, acknowledging limitations demonstrates academic integrity regarding the research performed. It also gives readers deeper insight into interpreting the reported results and findings.

Conclusion: Synthesizing Statistical Treatment Insights

Recap of statistical treatment fundamentals.

Statistical treatment of data is a crucial component of high-quality quantitative research. Proper application of statistical methods and analysis principles enables valid interpretations and inferences from study data. Key fundamentals covered include:

Descriptive statistics to summarize and describe the basic features of study data

Inferential statistics to make judgments of the probability and significance based on the data

Using appropriate statistical tools aligned to the research design and objectives

Following established practices for measurement techniques, data collection, and reporting

Adhering to these core tenets ensures research integrity and allows findings to withstand scientific scrutiny.

Key Takeaways for Research Paper Success

When incorporating statistical treatment into a research paper, keep these best practices in mind:

Clearly state the research hypothesis and variables under examination

Select reliable and valid quantitative measures for assessment

Determine appropriate sample size to achieve statistical power

Apply correct analytical methods suited to the data type and distribution

Comprehensively report methodology procedures and statistical outputs

Interpret results in context of the study limitations and scope

Following these guidelines will bolster confidence in the statistical treatment and strengthen the research quality overall.

Encouraging Continued Learning and Application

As statistical techniques continue advancing, it is imperative for researchers to actively further their statistical literacy. Regularly reviewing new methodological developments and learning advanced tools will augment analytical capabilities. Persistently putting enhanced statistical knowledge into practice through research projects and manuscript preparations will cement competencies. Statistical treatment mastery is a journey requiring persistent effort, but one that pays dividends in research proficiency.

Avatar of Antonio Carlos Filho

Antonio Carlos Filho @acfilho_dev

Statistical Treatment

Statistics Definitions > Statistical Treatment

What is Statistical Treatment?

Statistical treatment can mean a few different things:

  • In Data Analysis : Applying any statistical method — like regression or calculating a mean — to data.
  • In Factor Analysis : Any combination of factor levels is called a treatment.
  • In a Thesis or Experiment : A summary of the procedure, including statistical methods used.

1. Statistical Treatment in Data Analysis

The term “statistical treatment” is a catch all term which means to apply any statistical method to your data. Treatments are divided into two groups: descriptive statistics , which summarize your data as a graph or summary statistic and inferential statistics , which make predictions and test hypotheses about your data. Treatments could include:

  • Finding standard deviations and sample standard errors ,
  • Finding T-Scores or Z-Scores .
  • Calculating Correlation coefficients .

2. Treatments in Factor Analysis

statistical treatment

3. Treatments in a Thesis or Experiment

Sometimes you might be asked to include a treatment as part of a thesis. This is asking you to summarize the data and analysis portion of your experiment, including measurements and formulas used. For example, the following experimental summary is from Statistical Treatment in Acta Physiologica Scandinavica. :

Each of the test solutions was injected twice in each subject…30-42 values were obtained for the intensity, and a like number for the duration, of the pain indiced by the solution. The pain values reported in the following are arithmetical means for these 30-42 injections.”

The author goes on to provide formulas for the mean, the standard deviation and the standard error of the mean.

Vogt, W.P. (2005). Dictionary of Statistics & Methodology: A Nontechnical Guide for the Social Sciences . SAGE. Wheelan, C. (2014). Naked Statistics . W. W. Norton & Company Unknown author (1961) Chapter 3: Statistical Treatment. Acta Physiologica Scandinavica. Volume 51, Issue s179 December Pages 16–20.

Logo

  • Search this journal
  • Search all journals

Cover HortScience

Article Sections

  • Section 1: When Are Statistics Needed and What Is the Purpose of Statistics in a Research Paper?
  • What goes in the Materials and Methods section?
  • What goes in the Results section?
  • Additional details and descriptions about design, data collection, and analysis
  • Pointers for Writing about Statistics for the Horticultural Sciences

Related Content

  • Previous Article
  • Next Article

Best Practices for Presenting Statistical Information in a Research Article

Click on author name to view affiliation information

A key characteristic of scientific research is that the entire experiment (or series of experiments), including the data analyses, is reproducible. This aspect of science is increasingly emphasized. The Materials and Methods section of a scientific paper typically contains the necessary information for the research to be replicated and expanded on by other scientists. Important components are descriptions of the study design, data collection, and statistical analysis of those data, including the software used. In the Results section, statistical analyses are presented; these are usually best absorbed from figures. Model parameter estimates (including variances) and effect sizes should also be included in this section, not just results of significance tests, because they are needed for subsequent power and meta-analyses. In this article, we give key components to include in the descriptions of study design and analysis, and discuss data interpretation and presentation with examples from the horticultural sciences.

This article provides recommendations for statistical reporting in a research journal article. Appropriate and informative reporting, and the wise use of statistical design and analysis throughout the research process, are both essential to good science; neither can happen without the other. In addition, many journals now require access to original data and the code used for analyses. This article is not a statistics tutorial; we do not explain how to do any of the statistical methods mentioned. There are many, many papers and books that provide that information; some are cited in our reference and selected reading section. Instead, we give guidelines for horticultural scientists on how best to incorporate and present statistical information in a scientific paper. We also focus on experimental rather than observational studies. To do the latter justice would require greatly expanding this article, and the majority of papers published by the American Society for Horticultural Scientists are experimental studies. A very useful complementary article is by Onofri et al. (2010) , which gives specific advice for many issues we treat only generally.

This paper is divided into two sections, as follows:

Section 1. When Are Statistics Needed and What Is the Purpose of Statistics in a Research Paper?

Section 2. Recommendations for Writing about Statistics in a Research Paper

What Goes in the Materials and Methods Section?

What Goes in the Results Section?

Additional Details and Descriptions about Design, Data Collection, and Analysis

Literature Cited and Selected References

The scope of horticultural research is large and not all studies require statistics. For example, anatomical and morphological studies can be purely descriptive. With that said, these kinds of descriptive studies are a subset of observational studies, which also include studies at the genomic, ecologic, and landscape level. For observational studies, there are useful methods for determining associations, clusters, and dimension reduction, to name a few, that are statistics based. In this article we focus primarily on research questions that require inferential statistics. Typically, using designed experiments when addressing a research question requires experiment planning, data collection, and subsequent statistical analysis, and the following recommendations are applicable.

The statistical section in an article serves five general functions. First, the design, data collection, method of analysis, and software used must be described with sufficient clarity to demonstrate that the study is capable of addressing the primary objectives of the research. When adequate information is provided, it allows for an informed peer review and for readers, in principle, to reproduce the study, including the data analysis . Second, authors must provide sufficient documentation to create confidence that the data have been analyzed appropriately. This includes verifying required statistical assumptions and justifying choices—such as the chosen mean comparison procedure and any other method that might affect results and conclusions, such as controlling for experimental-wise error. Experiment-wise error rate (or family-wise error rate, depending on how family is defined) is the probability of committing at least one Type I error throughout the whole experiment. Although the error rate for an individual hypothesis test may be small, if one tests many hypotheses, one becomes more likely to declare false significance for at least one. If the tests are not independent (e.g., using the same plants to test multiple attributes or over time, as is common in this field), this can increase the experiment-wise error rate. For example, if a plant in one treatment group is diseased, this will affect all the (correlated) measures of that group, and thus all hypotheses tests. Third, data and their analyses must be presented coherently. The statistical model and analysis should naturally follow from the study design, and be consistent with relevant characteristics of the data, such as the underlying sampling distribution (e.g., normal, Poisson, binomial). Figures and tables should illustrate, and be consistent with, important results from the analysis. Fourth, readers should not have to guess which scientific questions the analysis answers. Effects deemed statistically significant must also be shown to be biologically/economically important. Effects of potential biologic/economic importance but whose statistical significance is not supported by the data should also be reported. There is an implicit assumption of adequate power when discussing results from any statistical tests. Power is estimated during the design phase using results from previous experiments or parameter estimates from the literature. Fifth, readers should be able to use information in the statistical reporting section as a resource for planning future experiments. Variance estimates are especially important for this function.

The goal of this article is to provide an overview of how best to communicate statistics used in horticultural research. Therefore, it does not include specifics to address every contingency. Statistical methods continuously change, with new methods developed to address advances in biologic and ecologic research. For many studies, traditional and familiar methods (a.k.a. “standard statistics”) are adequate. However, for other studies, newer, less familiar methods are preferable, if not essential. Use of newer methods should not be an obstacle for publication.

Section 2: Recommendations for Writing about Statistics in a Research Paper

The following sections outline key points that should be addressed in the Materials and Methods section, and in the Results section of a journal article. Kramer et al. (2016) document common statistical problems for a sample of horticultural articles and should be used as a checklist of mistakes to avoid. The work by Reinhart (2015) is not overly technical and it explains many of these issues and other mistakes further, mostly in a biologic context.

Broadly speaking, there are two main statistical areas that the Materials and Methods section should address: 1) how was the study designed and 2) how were the data analyzed. Recommendations are grouped by subtopic.

Design and data collection.

The main idea of this section is to provide all information relevant to subsequent statistical analysis and interpretation about the design—specifically, how the experiment was conducted, how the data were collected and subsequently handled up to the point when the data were ready for statistical analysis. These are detailed next.

Describe the design. There are two components of experimental design: the experiment design and the treatment design. Both must be described.

The treatment design refers to the organization of treatment factors. Factorial designs (e.g., varieties × potting substrate) and dose–response (e.g., amount of nutrient applied) are familiar examples.

The experiment design refers to how the experimental units were organized and how randomization was done. Familiar examples are the completely randomized design (CRD) and randomized complete block design (RCBD). Any restrictions on randomization (e.g., blocking) or other ways observations were grouped must be described; this is part of the experiment design.

Describe covariates, if any. Provide the units of replication (the experimental unit; in other words, the smallest unit to which treatments were assigned independently) and the units of observation (sampling unit). The units of replication may differ for different factors (as they do, for example, in a split-plot design).

Describe how data were collected and how samples were pooled/batched, if this was done. Identify whether these were one-time measurements, multiple measurements on the plant/plot at the same time, repeated measures over time, or measurements on different plant characteristics.

Provide numbers, so it is clear how many units were in each block/group, how many received each treatment, and so on. Total sample size must be easily calculated, if not given. If a power analysis was used to determine the sample size, provide details. If not, explain how the sample size was determined. For example, one could write: “Growth chambers were limited to 30 plants, and three growth chambers were available. Previous studies using a similar setup and similar plant numbers had no difficulty detecting even moderate differences in growth patterns.”

Identify which variables are dependent (i.e., the response variables one measures, such as yield, biomass, time to flowering, elemental concentration) and which are independent (see the previous description of treatment design).

Describe any transformation of variables (e.g., logarithmic transformation) and the reason it was needed; this applies to both dependent and independent variables. Often, dependent variables can be fit without transformation if the appropriate sampling distribution is specified in a generalized linear model. When this is possible, generalized linear models are preferable to variance stabilizing transformations.

Data analysis.

Broadly speaking, data analysis includes the following steps:

Plot the original data to visualize what has happened in terms of treatment effects, distribution of data, and other features of the data deemed to be important.

Determine a statistical model consistent with the study design and the distribution of the data, and mean comparison procedures needed to address the objective of the research.

Determine the statistical assumptions associated with the selected model.

Select the software to be used to implement the analysis.

Run the analysis and verify that the assumptions are satisfied.

Report in the Materials and Methods section how the previous steps were completed.

Report the outcome of the analysis in the Results section.

There is no one-size-fits-all way of presenting the results of a statistical analysis. This is true for many aspects of using statistics in horticultural science, making it impossible to give advice covering every situation; instead, we provide general guidelines. Authors must decide what best tells the story of their research results. Tables and figures are common methods of presenting data results. The following are principles to follow:

If you include graphics showing the data, presenting data summaries, or depicting results from modeling, the intent is to portray the findings of the research accurately and make it easier for readers to visually understand the data, estimates and findings from the analysis.

Statistics that appear in both figures and tables should be consistent with the way the data were analyzed. If objectives are addressed using descriptive statistics, then these should appear in a figure or table, along with their appropriate measures of variability.

If the objectives are addressed using a statistical model, as is usually the case, then statistics obtained from the model should appear in the figure or table, along with their appropriate measures of variability.

For modeling results and hypothesis testing, there are two main categories of output from statistical software that should be presented: 1) diagnostic information demonstrating that the method and statistical model used are appropriate and 2) parameter estimates and hypothesis tests that bear directly on the research objectives. The connection to the research objectives must be clear for each statistical result (do not simply copy results produced by software). Two other categories of statistical results should be considered: 1) estimates of quantities from the model that may be useful in future research (e.g., variance estimates) and 2) statistical support for unexpected findings.

Demonstrate that model assumptions were satisfied (this could be just a sentence for simple models). See the previous point.

For multiple dependent variables, give the correlations among these variables [and possibly the correlations separately for each treatment if the treatments affect the correlations (discussed later)]. Experiment-wise error control may be necessary.

Statistics for the Materials and Methods section.

The Materials and Methods section should address the first function given in Section 1. The design, data collection, method of analysis, and software used must be described clearly. When choices were made or when nonstandard procedures were used must be justified.

Description of the study design.

This means “design” as broadly defined. If data were collected, whether from an observational study, a survey, or a designed experiment, there was a design. At a minimum, all designs include three elements: The first is the response variable (i.e., the outcome or outcomes measured), the second is the treatment design (i.e., the treatments or conditions being evaluated), and the third is the design structure of the experiment, which includes the units of replication (called the experimental unit in designed experiments), the units of observation (called the sampling unit in designed experiments), and grouping of units, if any. Grouping may consist of blocking, research conducted at multiple locations, or data collected on multiple occasions.

The following are three scenarios to illustrate these points. Scenario 1: Suppose there are plants in flats on a bench. If treatments are assigned randomly and applied to the bench, the bench is the experimental unit. If observations are made on the flat, then the flat is the unit of observation (sampling unit). This is a CRD. Scenario 2: If treatments are assigned randomly to individual flats within each bench, then flat is the experimental unit. Bench is a blocking factor. If observations are made on the flat, then the flat is the unit of observation. Notice that the experimental unit and the sampling unit can be identical. This is not the case in scenario 1. This is an RCBD. Scenario 3: Experiments with factorial treatment designs often have different-size experimental units for different factors. In this scenario, irrigation or nutrients are applied using drip lines across a bench, but each bench has several flats, with a different variety in each flat. Here, bench is the experimental unit with respect to irrigation/nutrient and flat is the experimental unit with respect to variety. In design language, this is a split-plot experiment, with the bench as the whole-plot experimental unit, irrigation/nutrient is the whole-plot treatment factor, flat is the split-plot experimental unit, and variety is the split-plot treatment factor. See Onofri et al. (2010) for another good example illustrating true and pseudo-replication.

Important note: Although it is acceptable to name the design, such as an RCBD or Latin square design, a name alone is insufficient and may be misleading. So regardless of whether a design name is used, authors must give the treatment factors, the experimental units, sampling units, and the blocking criteria (if any). For example, an RCBD may or may not have treatments replicated in each block. If treatments are replicated, one can test whether a treatment effect is the same in all blocks; if not, one has to assume it is. So, “RCBD” does not contain all the necessary information about the design.

Data collection.

This means list the response variables measured and describe how each was measured. It is also beneficial to make various plots of the original data to determine if there is a treatment effect (these plots are not necessarily included in the published paper). The biology should lead the statistics. Beyond this, you are looking for two things. When you describe the response variable, you want to focus on the sampling distribution of the response variable because this affects the model selected for the analysis of the data. You should plot the response variable against the predictor variables and look for recognizable patterns—in particular, to determine if (and how) variability changes systematically with the mean. For example, these may be scatterplots or boxplots. Another useful plot groups observations in a natural way (say, by treatment combination) and plots the means of the groups against their standard deviations. Many statistical methods assume the response variable is normally distributed, in which case variability should be roughly the same throughout the range of the response variable. A histogram of the residuals from the appropriate model with a normally distributed response variable results in a bell-shaped distribution. Note that a histogram of the raw response variable should not have a bell-shaped distribution because, if there really are treatment effects, the histogram should have a peak at each treatment mean.

Many commonly measured response variables in horticulture have a non-normal distribution. For example, germination rate (number of seeds germinated successfully/the number planted) has a binomial distribution. Many variables are continuous but have strongly right-skewed distributions, such as berry weight. A log-normal distribution often works well for this response variable. Generalized linear models allow the data to arise from many processes; the normal distribution is just one of several. Others include the log-normal, gamma, exponential, beta, binomial, Poisson, and negative binomial. The latter three are used to model count data. Again, plots used to assess the data and suggest models are part of your toolbox for determining the formal statistical analysis you will conduct, but usually are not included in an article.

The second thing you are looking for is any aspect of the data collection process that might affect the structure of the experiment design. Milliken and Johnson (2009) give examples in which the data collection process alters the study design. In one example, plants were grown in multiple distinct blocks, but then material for each treatment was combined from all blocks to allow measurement of the micronutrients of interest. The original blocks were legitimate replicates, but combining material precludes estimating block-to-block variability, effectively creating an unreplicated experiment. For this reason, a clear description of the data collection process is essential.

Model description.

Model description consists of giving the assumed distribution of the response variable and the sources of variation in the treatment and experiment design.

Scenario 1: plants assigned to benches in a CRD. The model would simply be Response = Treatment + Experimental error. (Plant-to-plant variability should be the largest contributor to the experimental error component.)

Scenario 2: treatments assigned to flats in an RCBD, with benches as the blocking criteria. The model would be Response = Treatment + Benches + Experimental error. This model assumes the treatment effect does not differ from bench to bench.

Scenario 3: Irrigation is the whole-plot treatment factor, benches are the whole-plot experimental units, variety is the split-plot treatment factor, and flat is the split-plot experimental unit. The model is Response = Irrigation treatment + Whole-plot error + Variety + Irrigation × Variety + Split-plot error. This model assumes the irrigation effect does not differ from bench to bench and that the variety effect does not differ from flat to flat. [In statistical jargon, there is no interaction between any of the fixed effects (irrigation and variety) and any of the random effects (bench and flat)].

Other aspects of analysis.

Because of the wide range of research subject matter and scales (laboratory to field), we give general principles. First, the statistical software used to analyze the data is not the method of analysis. Authors must first describe clearly the statistical procedures to compare or otherwise characterize the treatments. As illustrated in the three previous scenarios, the method of analysis must be consistent with the study design and data collection process. If there are assumptions critical to the validity of the method of analysis used, authors must state that the assumptions were met and how those assumptions were verified. If it is unclear what the assumptions are or how to verify them, talk to a statistician. Third, there must be a clear connection between the statistical methods used and the primary objectives of the research. This is where treatment design comes in, and it is important to match how you compare the treatments with the treatment design. For example, if you are comparing different varieties, then a mean comparison test is appropriate. Depending on the relative seriousness of Type I (false positives) and Type II (false negatives) errors, there are different ways to implement a means comparison test. At one extreme are two tests: Duncan multiple ranges test and an unprotected least significant difference test, neither of which control Type I error. At the other extreme are Scheffé and Bonferroni tests, which offer extreme control of Type I error at the expense of Type II error. There is a time and place for each test. Authors must state which procedure was used and why that procedure was chosen. The treatment design for experiments yielding genomic data is often simple, but the analyses are complicated. When analyzing RNAseq and similar genomic data, controlling for false discovery rate (which is also a multiple-comparisons issue) is similarly important.

In addition to factorial treatment designs [when main effects (factors with discrete levels) and their interactions are important], regression (when one or more predictor variables are continuous) is often used in horticulture. In some cases, continuous predictor variables are observational in nature. They are often called covariates in designs that also have factors. The distribution of the response variable needs to be stated because that distribution, in part, determines which statistical model is appropriate.

When the assumptions underlying a parametric method are violated, “nonparametric” methods should be used. These are not assumption-free; one assumption is that the response variable has the same sampling distribution across treatments (e.g., always skewed to the right).

Ratios constructed of two random variables (e.g., root mass/aboveground mass) have poor statistical properties (the assumptions of a parametric test are often violated because the variance of the ratios is not well determined). If ratios need to be used in an analysis, consider obtaining advice from a statistician familiar with the analysis of ratio data.

The trend in biological, medical, and social sciences journals is also to report effect sizes rather than simply the results of a significance test [see Nakagawa and Cuthill (2007) for a readable justification and concrete suggestions]. This now required in many journals ( Tressoldi et al., 2013 ).

With software improvements, Bayesian statistical methodology is gaining acceptance among biologists. In certain cases, such as models with layers of random effects, Bayesian methods enable analyses that would otherwise not be possible. In simpler models, there is often not much difference between results from Bayesian and frequentist (“traditional”) statistical analyses unless there is relevant prior information that improves the accuracy and precision of parameter estimates. Findings based on the use of Bayesian methodology are, in principle, acceptable in most biological journals, although require more explanation for readers to understand the results.

It may not be clear at the onset of an analysis which statistical methodology should be used, and several different kinds of analyses may be done with the same data set to determine which one makes the most sense. For example, diagnostics following fitting a model may suggest that the assumptions are not met. Alternative models may be examined to determine whether they fit the data better. This is not a free pass to try models until one finds the results one desires. Rather, one oscillates between fitting models and judging them using diagnostics until one is satisfied that one has selected a model that both captures the essential features of the data and has its assumptions satisfied. A useful discussion on obvious and not-so-obvious biases resulting from such a path is given by Gelman and Loken (2014) . Note that if two reasonable statistical models give contradictory conclusions, authors could present both, as long as sufficient information for the reviewers and readers to understand the issue is provided.

Statistical software.

After authors have described the method of analysis, following guidelines given previously, then any software used for statistical analyses should be cited, including online software. Include the version (the release) in the citation. Software developed by the authors for the analysis and, thus, not generally available should be explained sufficiently (perhaps in an appendix) for readers to understand what it does and why off-the-shelf software was not suitable. Authors must make the software available for others to use upon request and should include well-documented copies of the code for the reviewers. If the software was part of a system, such as SAS ® or R, authors must also give the specific procedure used, such SAS PROC GLIMMIX or the lme4 package in R.

Statistics for the Results section.

As with the method of analysis, there is no one-size-fits-all rule for presentation of data and associated formal statistical analysis. Again, we provide general principles.

First, data should be presented so that the relevant information with regard to the study’s primary objectives and most important findings are clear. Presentation may be via figures or tables, as long as these figures or tables inform rather than inadvertently hide or distort important information. In general, a picture is worth a thousand numbers. Well-conceived figures tend to portray the data’s important messages more understandably than tables.

If multiple responses are measured on the same sampling unit, such as weight, height, sugar content, and macro- and micronutrient content in a plant, correlation among these variables is likely and should be accounted for in the analysis (this is a kind of repeated-measures design) and correlation coefficients should be provided. Note that these correlations may change with different treatments or environments, just as mean responses may, so a single set of correlation coefficients may not summarize adequately the relationships among the variables in the experiment. If multiple responses are measured, experiment-wise error control may be needed. The same considerations for balancing Type I and Type II error rates could be applied here, as mentioned earlier.

Anytime means are compared, the standard error of the difference must be reported. In most cases, the standard error of a mean can be considered optional. This is admittedly a break with tradition, but it is an essential one. A plot depicting means with standard error bars is, by itself, insufficient.

Formal statistics.

Formal statistics include results of hypothesis tests (e.g., F or t statistics, P values), results of mean separation tests, estimates of means, differences, regression coefficients and their associated standard errors or confidence intervals, predicted values and their associated prediction intervals, and so on. In general, providing the mean (or mean difference) and its confidence interval is preferable to reporting only the results of a hypothesis test. Formal statistics should accompany and provide support for, but not substitute for, the depictions of the data described earlier. The American Statistical Association issued a policy statement in 2016 ( Wasserstein and Lazar, 2016 ) that clarifies legitimate vs. illegitimate uses and interpretations of P values associated with hypothesis tests. P values tell us whether the observed differences in the data are likely the result of chance or whether there is strong evidence of a true difference. They cannot tell us whether the difference is big enough to matter.

The main message should be that the observed difference is biologically, economically, or scientifically consequential, not that a P value was statistically significant. If the treatment group differs significantly from the control group, the emphasis should be on the biological consequences of finding a difference of that magnitude. If a regression line has a significant slope, the emphasis should be on the functional relationship between the independent and dependent variables. What underlying biological principle is responsible for a slope of this size? Let biology lead and let significance tests follow.

Often, not finding a statistically significant difference is important and should be reported if there was sufficient power to detect a biologically important difference. For example, if a study is done on the assumption (perhaps based on conventional wisdom or a previous research report) that a treatment difference exists, and data from a new study suggest otherwise, that information should be reported. Journals do science a major disservice by preferentially reporting only statistically significant results. This practice is called “publication bias” and is increasingly recognized to be a serious issue in all sciences. Sometimes a nondifference is the most important finding.

Many terms have technical meanings in statistics, as well as more general—and less precise—uses in common language. For example, “significant” has a specific definition in hypothesis testing, but the words “significant” and “important” tend to be used loosely and interchangeably when describing scientific results. It is best to avoid ambiguities in your writing (What is the meaning of “significant findings?”) Instead, describe the difference. For example, for a dry weight measurement, treatment A resulted in a heavier plant than treatment B. Commonly used statistical terms (e.g., analysis of variance) do not need to be defined in the article. Less common ones (e.g., reliability) do need accompanying definitions. If a reference needs to be given for a statistical technique, refer to an easily available (and commonly used) textbook if possible. The second choice would be an article in a horticulture or other biological journal. The third choice is a review article that explains the technique and perhaps compares it with others. The last choice is an article in the statistical literature that requires an advanced background in statistical theory.

Readers of an article may have a different reason for looking at results than the author’s stated purpose (e.g., to compare some of the results in the article with data readers have from a location they use, rather than the within-location comparison of cultivars in the article), which is another reason why summary information about the original data (e.g., means and standard deviations) needs to be provided. Data summaries may also be used in a subsequent meta-analysis; these typically use means, standard deviations, and other estimated parameters (e.g., block-to-block variance).

Statistics, and figures and tables.

Scientific publications are replete with tables, figures, and plots that are easy to read, technically impressive, pretty to look at, but, unfortunately, can be misleading in their content with respect to the objectives of the research they are intended to portray. If a figure shows the results of statistical modeling (e.g., means and their standard errors), you should try including the original data in the figure, perhaps in the background. This helps readers assess the adequacy of the statistical model visually. Rather than reiterate the advice of others, we suggest an excellent source for describing how data (and legends) should be presented: How to Report Statistics in Medicine ( Lang and Secic, 2006; pp. 325–393).

Plant scientists are not expected to know everything when conducting research, and this is becoming more evident with increasing collaborations across fields of study. Plant scientists should know, however, when they need input from a statistician. If so, we advise meeting with a statistician before setting up the experiment. A statistician will not be able to help after data from a poorly designed experiment are collected (other than to suggest rerunning the experiment with a better design).

A well-designed experiment can often be analyzed a number of ways and, usually, there are choices to make along the way. Examples include whether there is overdispersion, whether interaction terms are necessary, or whether a multivariate analysis should be considered to account for correlation among response variables. Should the statistician be extensively involved in the design and analysis, they should be included on the grant and/or the resulting journal article.

The following references are excellent sources for additional information about the statistical topics described in this article.

Bolker, B.M. , Brooks, M.E. , Clark, C.J. , Geange, S.W. , Poulsen, J.R. , Stevens, M.H. & White, J.S. 2009 Generalized linear mixed models: A practical guide for ecology and evolution Trends Ecol. Evol. 24 127 135

  • Search Google Scholar
  • Export Citation

Cochran, W.G. & Cox, G.M. 1957 Experimental designs. 2nd ed. Wiley, New York, NY

Cohen, J. 1992 A power primer Psychol. Bull. 112 155 159

Gelman, A. & Loken, E. 2014 The statistical crisis in science: Data-dependent analysis—a “garden of forking paths”—explains why many statistically significant comparisons don’t hold up Amer. Sci. 102 460

James, G. , Witten, D. , Hastie, T. & Tibshirani, R. 2013 An introduction to statistical learning. Springer, New York, NY

Keselman, H.J. 2015 Per family or familywise Type I error control: “Eether, eyether, neether, nyther, let’s call the whole thing off!” J. Mod. Appl. Stat. Methods 14 1 6

Kramer, M.H. , Paparozzi, E.T. & Stroup, W.W. 2016 Statistics in a horticultural journal: Problems and solutions J. Amer. Hort. Sci. 141 400 406

Lang, T.A. & . Secic, M 2006 How to report statistics in medicine: Annotated guidelines for authors, editors and reviewers. 2nd ed. American College of Physicians. Sheridan Press, Chelesa, MI

Little, T.M. 1978 If Galileo published in HortScience HortScience 13 504 506

Milliken, G.A. & Johnson, D.E. 2009 Analysis of messy data. Vol. 1, 2nd ed. Chapman & Hall/CRC Press, Boca Raton, FL

Nakagawa, S. & Cuthill, I.C. 2007 Effect size, confidence interval and statistical significance: A practical guide for biologists Biol. Rev. Camb. Philos. Soc. 82 591 605

Onofri, A. , Carbonell, E.A. , Piepho, H.-P. , Mortimer, A.M. & Cousens, R.D. 2010 Current statistical issues in Weed Research Weed Res. 50 524

Reinhart, A. 2015 Statistics done wrong: The woefully complete guide. No Starch Press, San Francisco, CA

Schabenberger, O. & Pierce, F.J. 2002 Contemporary statistical models for the plant and soil sciences. CRC Press, Boca Raton, FL

Stroup, W.W. 2013 Generalized linear mixed models: Modern concepts, methods and applications. CRC Press, Boca Raton, FL

Stroup, W.W. 2015 Rethinking the analysis of non-normal data in plant and soil science Agron. J. 107 811 827

Tressoldi, P.E. , Giofré, D. , Sella, F. & Cumming, G. 2013 High impact = high statistical standards? Not necessarily so PLoS One 8 2 E56180 doi: 10.1371/journal.pone.0056180

Vance, E.A. 2015 Recent developments and their implications for the future of academic statistical consulting centers Amer. Stat. 69 127 137

Wasserstein, R.L. & Lazar, N.A. 2016 The ASA’s statement on p -values: Context process, and purpose Amer. Stat. 70 129 133

Weissgerber, T.L. , Milic, N.M. , Winham, S.J. & Garovic, V.D. 2015 Beyond bar and line graphs: Time for a new data presentation paradigm PLoS Biol. 13 4 E1002128 doi:10.1371/journal.pbio.1002128

Contributor Notes

We thank the reviewers for their excellent comments and reference recommendations.

1 Corresponding author. E-mail: [email protected] .

All Time Past Year Past 30 Days
Abstract Views 0 0 0
Full Text Views 3015 1879 313
PDF Downloads 1987 1019 66

PP Systems Measuring Far Red Advert

  Headquarters:

  1018 Duke Street

  Alexandria, VA 22314

  Phone : 703.836.4606

  Email : [email protected]

© 2018-2024 American Society for Horticultural Science

  • [185.66.15.189]
  • 185.66.15.189

Character limit 500 /500

National Academies Press: OpenBook

On Being a Scientist: A Guide to Responsible Conduct in Research: Third Edition (2009)

Chapter: the treatment of data.

Below is the uncorrected machine-read text of this chapter, intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text of each book. Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.

 On Being a S c i e n t i s t The Treatment of Data In order to conduct research responsibly, graduate students need to understand how to treat data correctly. In 2002, the editors of the Journal of Cell Biology began to test the images in all accepted manu- scripts to see if they had been altered in ways that violated the jour- nal’s guidelines. About a quarter of the papers had images that showed evidence of inappropriate manipulation. The editors requested the original data for these papers, compared the original data with the submitted images, and required that figures be remade to accord with the guidelines. In about 1 percent of the papers, the editors found evidence for what they termed “fraudulent manipulation” that affected conclusions drawn in the paper, resulting in the papers’ rejection. Researchers who manipulate their data in ways that deceive others, even if the manipulation seems insignificant at the time, are violating both the basic values and widely accepted professional standards of science. Researchers draw conclusions based on their observations of nature. If data are altered to present a case that is stronger than the data warrant, researchers fail to fulfill all three of the obligations described at the beginning of this guide. They mis- lead their colleagues and potentially impede progress in their field or research. They undermine their own authority and trustworthiness as researchers. And they introduce information into the scientific record that could cause harm to the broader society, as when the dangers of a medical treatment are understated. This is particularly important in an age in which the Internet al- lows for an almost uncontrollably fast and extensive spread of infor- mation to an increasingly broad audience. Misleading or inaccurate data can thus have far-reaching and unpredictable consequences of a magnitude not known before the Internet and other modern com- munication technologies. Misleading data can arise from poor experimental design or care- less measurements as well as from improper manipulation. Over time,

T h e T r e a t m e n t o f D a t a  researchers have developed and have continually improved methods and tools designed to maintain the integrity of research. Some of these methods and tools are used within specific fields of research, such as statistical tests of significance, double-blind trials, and proper phrasing of questions on surveys. Others apply across all research fields, such as describing to others what one has done so that research data and results can be verified and extended. Because of the critical importance of methods, scientific papers must include a description of the procedures used to produce the data, sufficient to permit reviewers and readers of a scientific paper to evaluate not only the validity of the data but also the reliability of the methods used to derive those data. If this information is not available, other researchers may be less likely to accept the data and the conclusions drawn from them. They also may be unable to reproduce accurately the conditions under which the data were derived. The best methods will count for little if data are recorded incor- rectly or haphazardly. The requirements for data collection differ among disciplines and research groups, but researchers have a fun- damental obligation to create and maintain an accurate, accessible, and permanent record of what they have done in sufficient detail for others to check and replicate their work. Depending on the field, this obligation may require entering data into bound notebooks with sequentially numbered pages using permanent ink, using a computer application with secure data entry fields, identifying when and where work was done, and retaining data for specified lengths of time. In much industrial research and in some academic research, data note- books need to be signed and dated by a witness on a daily basis. Unfortunately, beginning researchers often receive little or no formal training in recording, analyzing, storing, or sharing data. Regularly scheduled meetings to discuss data issues and policies maintained by research groups and institutions can establish clear expectations and responsibilities.

10 On Being a S c i e n t i s t The Selection of Data Deborah, a third-year graduate student, and Kamala, a postdoc- toral fellow, have made a series of measurements on a new experimental semiconductor material using an expensive neutron test at a national laboratory. When they return to their own laboratory and examine the data, a newly proposed mathematical explanation of the semiconductor’s behavior predicts results indicated by a curve. During the measurements at the national laboratory, Deborah and Kamala observed electrical power fluctuations that they could not control or predict were affecting their detector. They suspect the fluctuations af- fected some of their measurements, but they don’t know which ones. When Deborah and Kamala begin to write up their results to present at a lab meeting, which they know will be the first step in preparing a publication, Kamala suggests dropping two anomalous data points near the horizontal axis from the graph they are preparing. She says that due to their deviation from the theoretical curve, the low data points were obviously caused by the power fluctuations. Furthermore, the deviations were outside the expected error bars calculated for the remaining data points. Deborah is concerned that dropping the two points could be seen as manipulating the data. She and Kamala could not be sure that any of their data points, if any, were affected by the power fluctuations. They also did not know if the theoretical prediction was valid. She wants to do a separate analysis that includes the points and discuss the issue in the lab meeting. But Kamala says that if they include the data points in their talk, others will think the issue important enough to discuss in a draft paper, which will make it harder to get the paper published. Instead, she and Deborah should use their professional judgment to drop the points now. 1. What factors should Kamala and Deborah take into account in deciding how to present the data from their experiment? 2. Should the new explanation predicting the results affect their deliberations? 3. Should a draft paper be prepared at this point? 4. If Deborah and Kamala can’t agree on how the data should be presented, should one of them consider not being an author of the paper?

T h e T r e a t m e n t o f D a t a 11 Most researchers are not required to share data with others as soon as the data are generated, although a few disciplines have ad- opted this standard to speed the pace of research. A period of confi- dentiality allows researchers to check the accuracy of their data and draw conclusions. However, when a scientific paper or book is published, other re- searchers must have access to the data and research materials needed to support the conclusions stated in the publication if they are to verify and build on that research. Many research institutions, funding agencies, and scientific journals have policies that require the sharing of data and unique research materials. Given the expectation that data will be accessible, researchers who refuse to share the evidentiary basis behind their conclusions, or the materials needed to replicate published experiments, fail to maintain the standards of science. In some cases, research data or materials may be too voluminous, unwieldy, or costly to share quickly and without expense. Neverthe- less, researchers have a responsibility to devise ways to share their data and materials in the best ways possible. For example, centralized facilities or collaborative efforts can provide a cost-effective way of providing research materials or information from large databases. Examples include repositories established to maintain and distribute astronomical images, protein sequences, archaeological data, cell lines, reagents, and transgenic animals. New issues in the treatment and sharing of data continue to arise as scientific disciplines evolve and new technologies appear. Some forms of data undergo extensive analysis before being recorded; con- sequently, sharing those data can require sharing the software and sometimes the hardware used to analyze them. Because digital tech- nologies are rapidly changing, some data stored electronically may be inaccessible in a few years unless provisions are made to transport the data from one platform to another. New forms of publication are challenging traditional practices associated with publication and the evaluation of scholarly work.

The scientific research enterprise is built on a foundation of trust. Scientists trust that the results reported by others are valid. Society trusts that the results of research reflect an honest attempt by scientists to describe the world accurately and without bias. But this trust will endure only if the scientific community devotes itself to exemplifying and transmitting the values associated with ethical scientific conduct.

On Being a Scientist was designed to supplement the informal lessons in ethics provided by research supervisors and mentors. The book describes the ethical foundations of scientific practices and some of the personal and professional issues that researchers encounter in their work. It applies to all forms of research—whether in academic, industrial, or governmental settings-and to all scientific disciplines.

This third edition of On Being a Scientist reflects developments since the publication of the original edition in 1989 and a second edition in 1995. A continuing feature of this edition is the inclusion of a number of hypothetical scenarios offering guidance in thinking about and discussing these scenarios.

On Being a Scientist is aimed primarily at graduate students and beginning researchers, but its lessons apply to all scientists at all stages of their scientific careers.

READ FREE ONLINE

Welcome to OpenBook!

You're looking at OpenBook, NAP.edu's online reading room since 1999. Based on feedback from you, our users, we've made some improvements that make it easier than ever to read thousands of publications on our website.

Do you want to take a quick tour of the OpenBook's features?

Show this book's table of contents , where you can jump to any chapter by name.

...or use these buttons to go back to the previous chapter or skip to the next one.

Jump up to the previous page or down to the next one. Also, you can type in a page number and press Enter to go directly to that page in the book.

To search the entire text of this book, type in your search term here and press Enter .

Share a link to this book page on your preferred social network or via email.

View our suggested citation for this chapter.

Ready to take your reading offline? Click here to buy this book in print or download it as a free PDF, if available.

Get Email Updates

Do you enjoy reading reports from the Academies online for free ? Sign up for email notifications and we'll let you know about new publications in your areas of interest when they're released.

Enago Academy

Effective Use of Statistics in Research – Methods and Tools for Data Analysis

' src=

Remember that impending feeling you get when you are asked to analyze your data! Now that you have all the required raw data, you need to statistically prove your hypothesis. Representing your numerical data as part of statistics in research will also help in breaking the stereotype of being a biology student who can’t do math.

Statistical methods are essential for scientific research. In fact, statistical methods dominate the scientific research as they include planning, designing, collecting data, analyzing, drawing meaningful interpretation and reporting of research findings. Furthermore, the results acquired from research project are meaningless raw data unless analyzed with statistical tools. Therefore, determining statistics in research is of utmost necessity to justify research findings. In this article, we will discuss how using statistical methods for biology could help draw meaningful conclusion to analyze biological studies.

Table of Contents

Role of Statistics in Biological Research

Statistics is a branch of science that deals with collection, organization and analysis of data from the sample to the whole population. Moreover, it aids in designing a study more meticulously and also give a logical reasoning in concluding the hypothesis. Furthermore, biology study focuses on study of living organisms and their complex living pathways, which are very dynamic and cannot be explained with logical reasoning. However, statistics is more complex a field of study that defines and explains study patterns based on the sample sizes used. To be precise, statistics provides a trend in the conducted study.

Biological researchers often disregard the use of statistics in their research planning, and mainly use statistical tools at the end of their experiment. Therefore, giving rise to a complicated set of results which are not easily analyzed from statistical tools in research. Statistics in research can help a researcher approach the study in a stepwise manner, wherein the statistical analysis in research follows –

1. Establishing a Sample Size

Usually, a biological experiment starts with choosing samples and selecting the right number of repetitive experiments. Statistics in research deals with basics in statistics that provides statistical randomness and law of using large samples. Statistics teaches how choosing a sample size from a random large pool of sample helps extrapolate statistical findings and reduce experimental bias and errors.

2. Testing of Hypothesis

When conducting a statistical study with large sample pool, biological researchers must make sure that a conclusion is statistically significant. To achieve this, a researcher must create a hypothesis before examining the distribution of data. Furthermore, statistics in research helps interpret the data clustered near the mean of distributed data or spread across the distribution. These trends help analyze the sample and signify the hypothesis.

3. Data Interpretation Through Analysis

When dealing with large data, statistics in research assist in data analysis. This helps researchers to draw an effective conclusion from their experiment and observations. Concluding the study manually or from visual observation may give erroneous results; therefore, thorough statistical analysis will take into consideration all the other statistical measures and variance in the sample to provide a detailed interpretation of the data. Therefore, researchers produce a detailed and important data to support the conclusion.

Types of Statistical Research Methods That Aid in Data Analysis

statistics in research

Statistical analysis is the process of analyzing samples of data into patterns or trends that help researchers anticipate situations and make appropriate research conclusions. Based on the type of data, statistical analyses are of the following type:

1. Descriptive Analysis

The descriptive statistical analysis allows organizing and summarizing the large data into graphs and tables . Descriptive analysis involves various processes such as tabulation, measure of central tendency, measure of dispersion or variance, skewness measurements etc.

2. Inferential Analysis

The inferential statistical analysis allows to extrapolate the data acquired from a small sample size to the complete population. This analysis helps draw conclusions and make decisions about the whole population on the basis of sample data. It is a highly recommended statistical method for research projects that work with smaller sample size and meaning to extrapolate conclusion for large population.

3. Predictive Analysis

Predictive analysis is used to make a prediction of future events. This analysis is approached by marketing companies, insurance organizations, online service providers, data-driven marketing, and financial corporations.

4. Prescriptive Analysis

Prescriptive analysis examines data to find out what can be done next. It is widely used in business analysis for finding out the best possible outcome for a situation. It is nearly related to descriptive and predictive analysis. However, prescriptive analysis deals with giving appropriate suggestions among the available preferences.

5. Exploratory Data Analysis

EDA is generally the first step of the data analysis process that is conducted before performing any other statistical analysis technique. It completely focuses on analyzing patterns in the data to recognize potential relationships. EDA is used to discover unknown associations within data, inspect missing data from collected data and obtain maximum insights.

6. Causal Analysis

Causal analysis assists in understanding and determining the reasons behind “why” things happen in a certain way, as they appear. This analysis helps identify root cause of failures or simply find the basic reason why something could happen. For example, causal analysis is used to understand what will happen to the provided variable if another variable changes.

7. Mechanistic Analysis

This is a least common type of statistical analysis. The mechanistic analysis is used in the process of big data analytics and biological science. It uses the concept of understanding individual changes in variables that cause changes in other variables correspondingly while excluding external influences.

Important Statistical Tools In Research

Researchers in the biological field find statistical analysis in research as the scariest aspect of completing research. However, statistical tools in research can help researchers understand what to do with data and how to interpret the results, making this process as easy as possible.

1. Statistical Package for Social Science (SPSS)

It is a widely used software package for human behavior research. SPSS can compile descriptive statistics, as well as graphical depictions of result. Moreover, it includes the option to create scripts that automate analysis or carry out more advanced statistical processing.

2. R Foundation for Statistical Computing

This software package is used among human behavior research and other fields. R is a powerful tool and has a steep learning curve. However, it requires a certain level of coding. Furthermore, it comes with an active community that is engaged in building and enhancing the software and the associated plugins.

3. MATLAB (The Mathworks)

It is an analytical platform and a programming language. Researchers and engineers use this software and create their own code and help answer their research question. While MatLab can be a difficult tool to use for novices, it offers flexibility in terms of what the researcher needs.

4. Microsoft Excel

Not the best solution for statistical analysis in research, but MS Excel offers wide variety of tools for data visualization and simple statistics. It is easy to generate summary and customizable graphs and figures. MS Excel is the most accessible option for those wanting to start with statistics.

5. Statistical Analysis Software (SAS)

It is a statistical platform used in business, healthcare, and human behavior research alike. It can carry out advanced analyzes and produce publication-worthy figures, tables and charts .

6. GraphPad Prism

It is a premium software that is primarily used among biology researchers. But, it offers a range of variety to be used in various other fields. Similar to SPSS, GraphPad gives scripting option to automate analyses to carry out complex statistical calculations.

This software offers basic as well as advanced statistical tools for data analysis. However, similar to GraphPad and SPSS, minitab needs command over coding and can offer automated analyses.

Use of Statistical Tools In Research and Data Analysis

Statistical tools manage the large data. Many biological studies use large data to analyze the trends and patterns in studies. Therefore, using statistical tools becomes essential, as they manage the large data sets, making data processing more convenient.

Following these steps will help biological researchers to showcase the statistics in research in detail, and develop accurate hypothesis and use correct tools for it.

There are a range of statistical tools in research which can help researchers manage their research data and improve the outcome of their research by better interpretation of data. You could use statistics in research by understanding the research question, knowledge of statistics and your personal experience in coding.

Have you faced challenges while using statistics in research? How did you manage it? Did you use any of the statistical tools to help you with your research data? Do write to us or comment below!

Frequently Asked Questions

Statistics in research can help a researcher approach the study in a stepwise manner: 1. Establishing a sample size 2. Testing of hypothesis 3. Data interpretation through analysis

Statistical methods are essential for scientific research. In fact, statistical methods dominate the scientific research as they include planning, designing, collecting data, analyzing, drawing meaningful interpretation and reporting of research findings. Furthermore, the results acquired from research project are meaningless raw data unless analyzed with statistical tools. Therefore, determining statistics in research is of utmost necessity to justify research findings.

Statistical tools in research can help researchers understand what to do with data and how to interpret the results, making this process as easy as possible. They can manage large data sets, making data processing more convenient. A great number of tools are available to carry out statistical analysis of data like SPSS, SAS (Statistical Analysis Software), and Minitab.

' src=

nice article to read

Holistic but delineating. A very good read.

Rate this article Cancel Reply

Your email address will not be published.

how to find statistical treatment in a research paper

Enago Academy's Most Popular Articles

Empowering Researchers, Enabling Progress: How Enago Academy contributes to the SDGs

  • Promoting Research
  • Thought Leadership
  • Trending Now

How Enago Academy Contributes to Sustainable Development Goals (SDGs) Through Empowering Researchers

The United Nations Sustainable Development Goals (SDGs) are a universal call to action to end…

Research Interviews for Data Collection

  • Reporting Research

Research Interviews: An effective and insightful way of data collection

Research interviews play a pivotal role in collecting data for various academic, scientific, and professional…

Planning Your Data Collection

Planning Your Data Collection: Designing methods for effective research

Planning your research is very important to obtain desirable results. In research, the relevance of…

best plagiarism checker

  • Language & Grammar

Best Plagiarism Checker Tool for Researchers — Top 4 to choose from!

While common writing issues like language enhancement, punctuation errors, grammatical errors, etc. can be dealt…

Year

  • Industry News
  • Publishing News

2022 in a Nutshell — Reminiscing the year when opportunities were seized and feats were achieved!

It’s beginning to look a lot like success! Some of the greatest opportunities to research…

2022 in a Nutshell — Reminiscing the year when opportunities were seized and feats…

how to find statistical treatment in a research paper

Sign-up to read more

Subscribe for free to get unrestricted access to all our resources on research writing and academic publishing including:

  • 2000+ blog articles
  • 50+ Webinars
  • 10+ Expert podcasts
  • 50+ Infographics
  • 10+ Checklists
  • Research Guides

We hate spam too. We promise to protect your privacy and never spam you.

  • Publishing Research
  • AI in Academia
  • Career Corner
  • Diversity and Inclusion
  • Infographics
  • Expert Video Library
  • Other Resources
  • Enago Learn
  • Upcoming & On-Demand Webinars
  • Peer Review Week 2024
  • Open Access Week 2023
  • Conference Videos
  • Enago Report
  • Journal Finder
  • Enago Plagiarism & AI Grammar Check
  • Editing Services
  • Publication Support Services
  • Research Impact
  • Translation Services
  • Publication solutions
  • AI-Based Solutions
  • Call for Articles
  • Call for Speakers
  • Author Training
  • Edit Profile

I am looking for Editing/ Proofreading services for my manuscript Tentative date of next journal submission:

how to find statistical treatment in a research paper

In your opinion, what is the most effective way to improve integrity in the peer review process?

  • Foundations
  • Write Paper

Search form

  • Experiments
  • Anthropology
  • Self-Esteem
  • Social Anxiety
  • Statistics >

Statistical Treatment Of Data

Statistical treatment of data is essential in order to make use of the data in the right form. Raw data collection is only one aspect of any experiment; the organization of data is equally important so that appropriate conclusions can be drawn. This is what statistical treatment of data is all about.

This article is a part of the guide:

  • Statistics Tutorial
  • Branches of Statistics
  • Statistical Analysis
  • Discrete Variables

Browse Full Outline

  • 1 Statistics Tutorial
  • 2.1 What is Statistics?
  • 2.2 Learn Statistics
  • 3 Probability
  • 4 Branches of Statistics
  • 5 Descriptive Statistics
  • 6 Parameters
  • 7.1 Data Treatment
  • 7.2 Raw Data
  • 7.3 Outliers
  • 7.4 Data Output
  • 8 Statistical Analysis
  • 9 Measurement Scales
  • 10 Variables and Statistics
  • 11 Discrete Variables

There are many techniques involved in statistics that treat data in the required manner. Statistical treatment of data is essential in all experiments, whether social, scientific or any other form. Statistical treatment of data greatly depends on the kind of experiment and the desired result from the experiment.

For example, in a survey regarding the election of a Mayor, parameters like age, gender, occupation, etc. would be important in influencing the person's decision to vote for a particular candidate. Therefore the data needs to be treated in these reference frames.

An important aspect of statistical treatment of data is the handling of errors. All experiments invariably produce errors and noise. Both systematic and random errors need to be taken into consideration.

Depending on the type of experiment being performed, Type-I and Type-II errors also need to be handled. These are the cases of false positives and false negatives that are important to understand and eliminate in order to make sense from the result of the experiment.

how to find statistical treatment in a research paper

Treatment of Data and Distribution

Trying to classify data into commonly known patterns is a tremendous help and is intricately related to statistical treatment of data. This is because distributions such as the normal probability distribution occur very commonly in nature that they are the underlying distributions in most medical, social and physical experiments.

Therefore if a given sample size is known to be normally distributed, then the statistical treatment of data is made easy for the researcher as he would already have a lot of back up theory in this aspect. Care should always be taken, however, not to assume all data to be normally distributed, and should always be confirmed with appropriate testing.

Statistical treatment of data also involves describing the data. The best way to do this is through the measures of central tendencies like mean , median and mode . These help the researcher explain in short how the data are concentrated. Range, uncertainty and standard deviation help to understand the distribution of the data. Therefore two distributions with the same mean can have wildly different standard deviation, which shows how well the data points are concentrated around the mean.

Statistical treatment of data is an important aspect of all experimentation today and a thorough understanding is necessary to conduct the right experiments with the right inferences from the data obtained.

  • Psychology 101
  • Flags and Countries
  • Capitals and Countries

Siddharth Kalla (Apr 10, 2009). Statistical Treatment Of Data. Retrieved Sep 05, 2024 from Explorable.com: https://explorable.com/statistical-treatment-of-data

You Are Allowed To Copy The Text

The text in this article is licensed under the Creative Commons-License Attribution 4.0 International (CC BY 4.0) .

This means you're free to copy, share and adapt any parts (or all) of the text in the article, as long as you give appropriate credit and provide a link/reference to this page.

That is it. You don't need our permission to copy the article; just include a link/reference back to this page. You can use it freely (with some kind of link), and we're also okay with people reprinting in publications like books, blogs, newsletters, course-material, papers, wikipedia and presentations (with clear attribution).

how to find statistical treatment in a research paper

Want to stay up to date? Follow us!

Save this course for later.

Don't have time for it all now? No problem, save it as a course and come back to it later.

Footer bottom

  • Privacy Policy

how to find statistical treatment in a research paper

  • Subscribe to our RSS Feed
  • Like us on Facebook
  • Follow us on Twitter

Have a language expert improve your writing

Run a free plagiarism check in 10 minutes, generate accurate citations for free.

  • Knowledge Base

Descriptive Statistics | Definitions, Types, Examples

Published on July 9, 2020 by Pritha Bhandari . Revised on June 21, 2023.

Descriptive statistics summarize and organize characteristics of a data set. A data set is a collection of responses or observations from a sample or entire population.

In quantitative research , after collecting data, the first step of statistical analysis is to describe characteristics of the responses, such as the average of one variable (e.g., age), or the relation between two variables (e.g., age and creativity).

The next step is inferential statistics , which help you decide whether your data confirms or refutes your hypothesis and whether it is generalizable to a larger population.

Table of contents

Types of descriptive statistics, frequency distribution, measures of central tendency, measures of variability, univariate descriptive statistics, bivariate descriptive statistics, other interesting articles, frequently asked questions about descriptive statistics.

There are 3 main types of descriptive statistics:

  • The distribution concerns the frequency of each value.
  • The central tendency concerns the averages of the values.
  • The variability or dispersion concerns how spread out the values are.

Types of descriptive statistics

You can apply these to assess only one variable at a time, in univariate analysis, or to compare two or more, in bivariate and multivariate analysis.

  • Go to a library
  • Watch a movie at a theater
  • Visit a national park

Here's why students love Scribbr's proofreading services

Discover proofreading & editing

A data set is made up of a distribution of values, or scores. In tables or graphs, you can summarize the frequency of every possible value of a variable in numbers or percentages. This is called a frequency distribution .

  • Simple frequency distribution table
  • Grouped frequency distribution table
Gender Number
Male 182
Female 235
Other 27

From this table, you can see that more women than men or people with another gender identity took part in the study. In a grouped frequency distribution, you can group numerical response values and add up the number of responses for each group. You can also convert each of these numbers to percentages.

Library visits in the past year Percent
0–4 6%
5–8 20%
9–12 42%
13–16 24%
17+ 8%

Measures of central tendency estimate the center, or average, of a data set. The mean, median and mode are 3 ways of finding the average.

Here we will demonstrate how to calculate the mean, median, and mode using the first 6 responses of our survey.

The mean , or M , is the most commonly used method for finding the average.

To find the mean, simply add up all response values and divide the sum by the total number of responses. The total number of responses or observations is called N .

Mean number of library visits
Data set 15, 3, 12, 0, 24, 3
Sum of all values 15 + 3 + 12 + 0 + 24 + 3 = 57
Total number of responses = 6
Mean Divide the sum of values by to find : 57/6 =

The median is the value that’s exactly in the middle of a data set.

To find the median, order each response value from the smallest to the biggest. Then , the median is the number in the middle. If there are two numbers in the middle, find their mean.

Median number of library visits
Ordered data set 0, 3, 3, 12, 15, 24
Middle numbers 3, 12
Median Find the mean of the two middle numbers: (3 + 12)/2 =

The mode is the simply the most popular or most frequent response value. A data set can have no mode, one mode, or more than one mode.

To find the mode, order your data set from lowest to highest and find the response that occurs most frequently.

Mode number of library visits
Ordered data set 0, 3, 3, 12, 15, 24
Mode Find the most frequently occurring response:

Measures of variability give you a sense of how spread out the response values are. The range, standard deviation and variance each reflect different aspects of spread.

The range gives you an idea of how far apart the most extreme response scores are. To find the range , simply subtract the lowest value from the highest value.

Standard deviation

The standard deviation ( s or SD ) is the average amount of variability in your dataset. It tells you, on average, how far each score lies from the mean. The larger the standard deviation, the more variable the data set is.

There are six steps for finding the standard deviation:

  • List each score and find their mean.
  • Subtract the mean from each score to get the deviation from the mean.
  • Square each of these deviations.
  • Add up all of the squared deviations.
  • Divide the sum of the squared deviations by N – 1.
  • Find the square root of the number you found.
Raw data Deviation from mean Squared deviation
15 15 – 9.5 = 5.5 30.25
3 3 – 9.5 = -6.5 42.25
12 12 – 9.5 = 2.5 6.25
0 0 – 9.5 = -9.5 90.25
24 24 – 9.5 = 14.5 210.25
3 3 – 9.5 = -6.5 42.25
= 9.5 Sum = 0 Sum of squares = 421.5

Step 5: 421.5/5 = 84.3

Step 6: √84.3 = 9.18

The variance is the average of squared deviations from the mean. Variance reflects the degree of spread in the data set. The more spread the data, the larger the variance is in relation to the mean.

To find the variance, simply square the standard deviation. The symbol for variance is s 2 .

Prevent plagiarism. Run a free check.

Univariate descriptive statistics focus on only one variable at a time. It’s important to examine data from each variable separately using multiple measures of distribution, central tendency and spread. Programs like SPSS and Excel can be used to easily calculate these.

Visits to the library
6
Mean 9.5
Median 7.5
Mode 3
Standard deviation 9.18
Variance 84.3
Range 24

If you were to only consider the mean as a measure of central tendency, your impression of the “middle” of the data set can be skewed by outliers, unlike the median or mode.

Likewise, while the range is sensitive to outliers , you should also consider the standard deviation and variance to get easily comparable measures of spread.

If you’ve collected data on more than one variable, you can use bivariate or multivariate descriptive statistics to explore whether there are relationships between them.

In bivariate analysis, you simultaneously study the frequency and variability of two variables to see if they vary together. You can also compare the central tendency of the two variables before performing further statistical tests .

Multivariate analysis is the same as bivariate analysis but with more than two variables.

Contingency table

In a contingency table, each cell represents the intersection of two variables. Usually, an independent variable (e.g., gender) appears along the vertical axis and a dependent one appears along the horizontal axis (e.g., activities). You read “across” the table to see how the independent and dependent variables relate to each other.

Number of visits to the library in the past year
Group 0–4 5–8 9–12 13–16 17+
Children 32 68 37 23 22
Adults 36 48 43 83 25

Interpreting a contingency table is easier when the raw data is converted to percentages. Percentages make each row comparable to the other by making it seem as if each group had only 100 observations or participants. When creating a percentage-based contingency table, you add the N for each independent variable on the end.

Visits to the library in the past year (Percentages)
Group 0–4 5–8 9–12 13–16 17+
Children 18% 37% 20% 13% 12% 182
Adults 15% 20% 18% 35% 11% 235

From this table, it is more clear that similar proportions of children and adults go to the library over 17 times a year. Additionally, children most commonly went to the library between 5 and 8 times, while for adults, this number was between 13 and 16.

Scatter plots

A scatter plot is a chart that shows you the relationship between two or three variables . It’s a visual representation of the strength of a relationship.

In a scatter plot, you plot one variable along the x-axis and another one along the y-axis. Each data point is represented by a point in the chart.

From your scatter plot, you see that as the number of movies seen at movie theaters increases, the number of visits to the library decreases. Based on your visual assessment of a possible linear relationship, you perform further tests of correlation and regression.

Descriptive statistics: Scatter plot

If you want to know more about statistics , methodology , or research bias , make sure to check out some of our other articles with explanations and examples.

  • Statistical power
  • Pearson correlation
  • Degrees of freedom
  • Statistical significance

Methodology

  • Cluster sampling
  • Stratified sampling
  • Focus group
  • Systematic review
  • Ethnography
  • Double-Barreled Question

Research bias

  • Implicit bias
  • Publication bias
  • Cognitive bias
  • Placebo effect
  • Pygmalion effect
  • Hindsight bias
  • Overconfidence bias

Descriptive statistics summarize the characteristics of a data set. Inferential statistics allow you to test a hypothesis or assess whether your data is generalizable to the broader population.

The 3 main types of descriptive statistics concern the frequency distribution, central tendency, and variability of a dataset.

  • Distribution refers to the frequencies of different responses.
  • Measures of central tendency give you the average for each response.
  • Measures of variability show you the spread or dispersion of your dataset.
  • Univariate statistics summarize only one variable  at a time.
  • Bivariate statistics compare two variables .
  • Multivariate statistics compare more than two variables .

Cite this Scribbr article

If you want to cite this source, you can copy and paste the citation or click the “Cite this Scribbr article” button to automatically add the citation to our free Citation Generator.

Bhandari, P. (2023, June 21). Descriptive Statistics | Definitions, Types, Examples. Scribbr. Retrieved September 3, 2024, from https://www.scribbr.com/statistics/descriptive-statistics/

Is this article helpful?

Pritha Bhandari

Pritha Bhandari

Other students also liked, central tendency | understanding the mean, median & mode, variability | calculating range, iqr, variance, standard deviation, inferential statistics | an easy introduction & examples, what is your plagiarism score.

U.S. flag

An official website of the United States government

The .gov means it’s official. Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

The site is secure. The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

  • Publications
  • Account settings
  • My Bibliography
  • Collections
  • Citation manager

Save citation to file

Email citation, add to collections.

  • Create a new collection
  • Add to an existing collection

Add to My Bibliography

Your saved search, create a file for external citation management software, your rss feed.

  • Search in PubMed
  • Search in NLM Catalog
  • Add to Search

Statistical methods in research

Affiliation.

  • 1 The Sackler Institute of Pulmonary Pharmacology, School of Biomedical Science, King's College London, 5th Floor Franklin Wilkins Building, SEI9NH Waterloo Campus, London, UK. [email protected]
  • PMID: 21607874
  • DOI: 10.1007/978-1-61779-126-0_26

Statistical methods appropriate in research are described with examples. Topics covered include the choice of appropriate averages and measures of dispersion to summarize data sets, and the choice of tests of significance, including t-tests and a one- and a two-way ANOVA plus post-tests for normally distributed (Gaussian) data and their non-parametric equivalents. Techniques for transforming non-normally distributed data to more Gaussian distributions are discussed. Concepts of statistical power, errors and the use of these in determining the optimal size of experiments are considered. Statistical aspects of linear and non-linear regression are discussed, including tests for goodness-of-fit to the chosen model and methods for comparing fitted lines and curves.

PubMed Disclaimer

Similar articles

  • Statistical methods in G-protein-coupled receptor research. Freeman P, Spina D. Freeman P, et al. Methods Mol Biol. 2004;259:391-414. doi: 10.1385/1-59259-754-8:391. Methods Mol Biol. 2004. PMID: 15250507
  • [Evaluation of using statistical methods in selected national medical journals]. Sych Z. Sych Z. Ann Acad Med Stetin. 1996;42:67-84. Ann Acad Med Stetin. 1996. PMID: 9199127 Polish.
  • Parametric versus non-parametric statistics in the analysis of randomized trials with non-normally distributed data. Vickers AJ. Vickers AJ. BMC Med Res Methodol. 2005 Nov 3;5:35. doi: 10.1186/1471-2288-5-35. BMC Med Res Methodol. 2005. PMID: 16269081 Free PMC article.
  • Basic epidemiologic and statistical methods in clinical research. Gottlieb M, Anderson G, Lepor H. Gottlieb M, et al. Urol Clin North Am. 1992 Nov;19(4):641-53. Urol Clin North Am. 1992. PMID: 1441022 Review.
  • Interpretation and use of statistics in nursing research. Giuliano KK, Polanowicz M. Giuliano KK, et al. AACN Adv Crit Care. 2008 Apr-Jun;19(2):211-22. doi: 10.1097/01.AACN.0000318124.33889.6e. AACN Adv Crit Care. 2008. PMID: 18560290 Review.
  • Search in MeSH

LinkOut - more resources

Full text sources.

  • MedlinePlus Health Information
  • Citation Manager

NCBI Literature Resources

MeSH PMC Bookshelf Disclaimer

The PubMed wordmark and PubMed logo are registered trademarks of the U.S. Department of Health and Human Services (HHS). Unauthorized use of these marks is strictly prohibited.

how to find statistical treatment in a research paper

How to... Choose the right statistical technique

It is well worth spending a little time considering how you will analyse your data before you design your survey instrument or start to collect any data. This will ensure that data are collected – and, more importantly, coded – in an appropriate way for the analysis you hope to do.

By Claire Creaser

On this page

Fundamentals, basic techniques, testing validity, advanced techniques, graphical presentation.

Start to think about the techniques you will use for your analysis before you collect any data.

What do you want to know?

The analysis must relate to the research questions, and this may dictate the techniques you should use.

What type of data do you have?

The type of data you have is also fundamental – the techniques and tools appropriate to interval and ratio variables are not suitable for categorical or ordinal measures. (See  How to collect data  for notes on types of data)

What assumptions can – and can’t – you make?

Many techniques rely on the sampling distribution of the test statistic being a Normal distribution (see below). This is always the case when the underlying distribution of the data is Normal, but in practice, the data may not be Normally distributed. For example, there could be a long tail of responses to one side or the other (skewed data). Non-parametric techniques are available to use in such situations, but these are inevitably less powerful and less flexible. However, if the sample size is sufficiently large, the Central Limit Theorem allows use of the standard analyses and tools.

Techniques for a non-Normal distribution

Parametric or non-parametric statistics.

Parametric methods and statistics rely on a set of assumptions about the underlying distribution to give valid results. In general, they require the variables to have a Normal distribution.

Non-parametric techniques must be used for categorical and ordinal data, but for interval & ratio data they are generally less powerful and less flexible, and should only be used where the standard, parametric, test is not appropriate – e.g. when the sample size is small (below 30 observations).

Central limit theorem

As the sample size increases, the shape of the sampling distribution of the test statistic tends to become Normal, even if the distribution of the variable which is being tested is not Normal.

In practice, this can be applied to test statistics calculated from more than 30 observations.

How much can you expect to get out of your data?

The smaller the sample size, the less you can get out of your data. Standard error is inversely related to sample size, so the larger your sample, the smaller the standard error, and the greater chance you will have of identifying statistically significant results in your analysis.

In general, any technique which can be used on categorical data may also be used on ordinal data. Any technique which can be used on ordinal data may also be used on ratio or interval data. The reverse is  not  the case.

Describing your data

The first stage in any analysis should be to describe your data, and the hence the population from which it is drawn. The statistics appropriate for this activity fall into three broad groups, and depend on the type of data you have.

Look at the distribution Categorical / Ordinal Plot the percentage
in each category
(column or bar chart)
  Ratio / Interval Histogram
Cumulative frequency
diagram
Describe the
central tendency
Categorical n/a
  Ordinal Median
Mode
  Ratio / Interval Mean
Median
Describe the spread Categorical n/a
  Ordinal Range
Inter-quartile range
  Ratio / Interval Range
Inter-quartile range
Variance
Standard variation

See Graphical presentation for descriptions of the main graphical techniques.

Mean  – the arithmetic average, calculated by summing all the values and dividing by the number of values in the sum.

Median  – the mid point of the distribution, where half the values are higher and half lower.

Mode  – the most frequently occurring value.

Range  – the difference between the highest and lowest value.

Inter-quartile range  – the difference between the upper quartile (the value where 25 per cent of the observations are higher and 75 per cent lower) and the lower quartile (the value where 75 per cent of the observations are higher and 25 per cent lower). This is particularly useful where there are a small number of extreme observations much higher, or lower, than the majority.

Variance  – a measure of spread, calculated as the mean of the squared differences of the observations from their mean.

Standard deviation  – the square root of the variance.

Differences between groups and variables

Chi-squared test  – used to compare the distributions of two or more sets of categorical or ordinal data.

t-tests  – used to compare the means of two sets of data.

Wilcoxon U test  – non-parametric equivalent of the t-test. Based on the rank order of the data, it may also be used to compare medians.

ANOVA  – analysis of variance, to compare the means of more than two groups of data.

Compare two groups Categorical Chi-squared test
  Ordinal Chi-squared test
Wicoxon U test
  Ratio / Interval t-test for
independent samples
Compare more than two groups Categorical / Ordinal Chi-squared test
  Ratio / Interval ANOVA
Compare two variables
over the same subjects
Categorical / Ordinal Chi-squared test
  Ratio / Interval t-test for
dependent samples

Relationships between variables

The correlation coefficient measures the degree of linear association between two variables, with a value in the range +1 to -1. Positive values indicate that the two variables increase and decrease together; negative values that one increases as the other decreases. A correlation coefficient of zero indicates no linear relationship between the two variables. The Spearman rank correlation is the non-parametric equivalent of the Pearson correlation.

Categorical Chi-squared test
Ordinal Chi-squared test
Spearman rank
correlation (Tau)
Ratio / Interval Pearson
correlation (Rho)

Note that correlation analyses will only detect linear relationships between two variables. The figure below illustrates two small data sets where there are clearly relationships between the two variables. However, the correlation for the second data set, where the relationship is not linear, is 0.0. A simple correlation analysis of these data would suggest no relationship between the measures, when that is clearly not the case. This illustrates the importance of undertaking a series of basic descriptive analyses before embarking on analyses of the differences and relationships between variables.

Significance levels

The  statistical significance  of a test is a measure of probability - the probability that you would have obtained that particular result of the test on that sample if the null hypothesis (that there is no effect due to the parameters being tested) you are testing was true. The  example below  tests whether scores in an exam change after candidates have received training. The hypothesis suggests that they should, so the null hyopothesis is that they won't .

In general, any level of probability above 5 per cent (p>0.05) is not considered to be statistically significant, and for large surveys 1 per cent (p>0.01) is often taken as a more appropriate level.

Note that statistical significance does not mean that the results you have obtained actually have value in the context of your research. If you have a large enough sample, a very small difference between groups can be identified as statistically significant, but such a small difference may be irrelevant in practice. On the other hand, an apparently large difference may not be statistically significant in a small sample, due to the variation within the groups being compared.

Degrees of freedom

Some test statistics (e.g. chi-squared) require the number of degrees of freedom to be known, in order to test for statistical significance against the correct probability table. In brief, the degrees of freedom is the number of values which can be assigned arbitrarily within the sample.

For example:

In a sample of size n divided into k classes, there are k-1 degrees of freedom (the first k-1 groups could be of any size up to n, while the last is fixed by the total of the first k-1 and the value of n. In numerical terms, if a sample of 500 individuals is taken from the UK, and it is observed that 300 are from England, 100 from Scotland and 50 from Wales, then there must be 50 from Northern Ireland. Given the numbers from the first three groups, there is no flexibility in the size of the final group. Dividing the sample into four groups gives three degrees of freedom.

In a two-way contingency table with p rows and q columns, there are (p-1)*(q-1) degrees of freedom (given the values of the first rows and columns, the last row and column are constrained by the totals in the table)

One-tail or two-tail tests

If, as is generally the case, what matters is simply that the statistics for the populations are different, then it is appropriate to use the critical values for a two-tailed test.

If, however, you are only interested to find out if the statistic for population A has a larger value than that for population B, then a one-tailed test would be appropriate. The critical value for a one-tailed test is generally lower than for a two-tailed test, and should only be used if your research hypothesis is that population A has a greater value than population B, and it does not matter how different they are if population A has a value that is less than that for population B.

For example

Null hypothesis  – there is no difference in mean exam scores before and after training (i.e. training has no effect on the exam score) Alternative  – there is a difference in the mean scores before and after training (i.e. training has an unspecified effect) Use a  two-tail test

Null hypothesis  – Training does not increase the mean score Alternative  – Mean score increases after training Use a  one-tail test , if there is an observed increase in mean score. (If there is an observed fall in scores, there is no need to test, as you cannot reject the null hypothesis.)

Null hypothesis  – Training does not cause mean scores to fall Alternative  – Mean score falls after training Use a  one-tail test , if there is an observed fall in mean score. (If there is an observed increase in scores, there is no need to test, as you cannot reject the null hypothesis.)

t-Test: Paired Two Sample for Means
 
Mean

360.4

361.1

Variance

46,547

46,830

Observations

62

62

Degrees of freedom (df)

61

 
t Stat

1.79

 

 
t Critical one-tail

1.67

 

 
t Critical two-tail

2.00

 

If the above test results were obtained, then under scenario 1, using a two-tail test, you might conclude that there was no statistically significant difference between the scores (p=0.08), and, as a consequence, that training had no effect. Similarly, under scenario 3, you would conclude that there is no evidence to suggest that training causes mean scores to fall, as they have in fact risen. However, under scenario 2, using a one-tail test, you would conclude that there was an increase in mean scores, statistically significant at the 5 per cent level (p=0.04).

A final warning!

Statistical packages will do what you tell them, on the whole. They do not know whether the data you have provided is of good quality, or (with a very few exceptions) whether it is of an appropriate type for the analysis you have undertaken.

Rubbish in = Rubbish out!

These tools and techniques have specialist applications, and will generally be designed into the research methodology at an early stage, before any data are collected. If you are considering using any of these, you may wish to consult a specialist text or an experienced statistician before you start.

In each case, we give some examples of Emerald articles which use the technique. 

Factor analysis

To reduce the number of variables for subsequent analysis by creating combinations of the original variables measured which account for as much of the original variance as possible, but allow for easier interpretation of the results. Commonly used to create a small set of dimension ratings from a large number of opinion statements individually rated on Likert scales. You must have more observations (subjects) than you have variables to be analysed.

A Likert scale variable: "I like to eat chocolate ice cream for breakfast"

Strongly agree 

1

2

3

4

5

  Strongly disagree

A factor analysis of Page and Wong's servant leadership instrument Rob Dennis and Bruce E. Winston Leadership & Organization Development Journal  , vol. 24 no. 8

Understanding factors for benchmarking adoption: New evidence from Malaysia Yean Pin Lee, Suhaiza Zailani and Keng Lin Soh Benchmarking: An International Journal  , vol. 13 no. 5

Cluster analysis

To classify subjects into groups with similar characteristics, according to the values of the variables measured. You must have more observations than you have variables included in the analysis.

Organic product avoidance: Reasons for rejection and potential buyers' identification in a countrywide survey   C. Fotopoulos and A. Krystallis British Food Journal , vol. 104 no. 3/4/5

Detection of financial distress via multivariate statistical analysis S. Gamesalingam and Kuldeep Kumar Managerial Finance , vol. 27 no. 4

Discriminant analysis

To identify those variables which best discriminate between known groups of subjects. The results may be used to allocate new subjects to the known groups based on their values of the discriminating variables

Methodology

Discriminant analysis was used to determine whether statistically significant differences exist between the average score profile on a set of variables for two a priori defined groups and so enabled them to be classified. Besides, it could help to determine which of the independent variables account the most for the differences in the average score profiles of the two groups. In this study, discriminant analysis was the main instrument to classify the benchmarking adopter and non-adopter. It was also utilised to determine which of the independent variables would contribute to benchmarking adoption.

To model how one, dependant, variable behaves depending on the values of a set of other, independent, variables. The dependant variable must be interval or ratio in type; the independent variables may be of any type, but special methods must be used when including categorical or ordinal independent variables in the analysis.

Developments in milk marketing in England and Wales during the 1990s Jeremy Franks British Food Journal , vol. 103 no. 9

Training under fire: The relationship between obstacles facing training and SMEs' development in Palestine Mohammed Al Madhoun Journal of European Industrial Training , vol. 30 no. 2

Time series analysis

To investigate the patterns and trends in a variable measured regularly over a period of time. May also be used to identify and adjust for seasonal variation, for example in financial statistics.

An analysis of the trends and cyclical behaviours of house prices in the Asian markets   Ming-Chi Chen, Yuichiro Kawaguchi and Kanak Patel Journal of Property Investment & Finance , vol. 22 no. 1

Presenting data in graphical form can increase the accessibility of your results to a non-technical audience, and highlight effects and results which would otherwise require lengthy explanation, or complex tables. It is therefore important that appropriate graphical techniques are used. This section gives examples of some of the most commonly used graphical presentations, and indicates when they may be used. All, except the histogram, have been produced using Microsoft Excel®. 

Column or bar charts

There are four main variations, and whether you display the data in horizontal bars or vertical columns is largely a matter of personal preference.

To illustrate a frequency distribution in categorical or ordinal data, or grouped ratio/interval data. Usually displayed as a column graph.

Clustered column/bar

To compare categorical, ordinal or grouped ratio/interval data across categories. The data used in fig 4 are the same as those in Figs 5 and 6.

Stacked column/bar

To illustrate the actual contribution to the total for categorical, ordinal or grouped ratio/interval data by categories. The data used in Fig 5 are the same as those in Figs 4 and 6.

Percentage stacked column/bar

To compare the percentage contribution to the total for categorical, ordinal or grouped ratio/interval data across categories. The data used in fig 6 are the same as those in Figs 4 and 5.

Line graphs

To show trends in ordinal or ratio/interval data. Points on a graph should only be joined with a line if the data on the x-axis are at least ordinal. One particular application is to plot a frequency distribution for interval/ratio data (fig 8).

To show the percentage contribution to the whole of categorical, ordinal or grouped ratio/interval data.

Scatter graphs

To illustrate the relationship between two variables, of any type (although most useful where both variables are ratio/interval in type). Also useful in the identification of any unusual observations in the data.

Box and whisker plot

A specialist graph illustrating the central tendency and spread of a large data set, including any outliers.

Connecting Mathematics Brief explanations of mathematical terms and ideas

Statistics Glossary   compiled by Valerie J. Easton and John H. McColl of Glasgow University

Statsoft electronic textbook

100 Statistical Tests  by Gopal K. Kanji (Sage, 1993, ISBN 141292376X)

Oxford Dictionary of Statistics  by Graham Upton and Ian Cook (Oxford University Press, 2006, ISBN 0198614314)

Chapter 3 Research Design and Methodology

  • October 2019
  • Thesis for: Master of Arts in Teaching

Kenneth De la Piedra at University of Perpetual Help System Jonelta Pueblo de Panay

  • University of Perpetual Help System Jonelta Pueblo de Panay

Discover the world's research

  • 25+ million members
  • 160+ million publication pages
  • 2.3+ billion citations
  • Recruit researchers
  • Join for free
  • Login Email Tip: Most researchers use their institutional email address as their ResearchGate login Password Forgot password? Keep me logged in Log in or Continue with Google Welcome back! Please log in. Email · Hint Tip: Most researchers use their institutional email address as their ResearchGate login Password Forgot password? Keep me logged in Log in or Continue with Google No account? Sign up

how to find statistical treatment in a research paper

Standard statistical tools in research and data analysis

Introduction.

Statistics is a field of science concerned with gathering, organising, analysing, and extrapolating data from samples to the entire population. This necessitates a well-designed study, a well-chosen study sample, and a proper statistical test selection. A good understanding of statistics is required to design epidemiological research or a clinical trial. Improper statistical approaches might lead to erroneous findings and unethical behaviour.

A variable is a trait that differs from one person to the next within a population. Quantitative variables are measured by a scale and provide quantitative information, such as height and weight. Qualitative factors, such as sex and eye colour, provide qualitative information (Figure 1).

how to find statistical treatment in a research paper

Figure 1. Classification of variables [1]

Quantitative variables

Discrete and continuous measures are used to split quantitative or numerical data. Continuous data can take on any value, whereas discrete numerical data is stored as a whole number such as 0, 1, 2, 3,… (integer). Discrete data is made up of countable observations, while continuous data is made up of measurable observations. Discrete data examples include the number of respiratory arrest episodes or re-intubation in an intensive care unit. Continuous data includes serial serum glucose levels, partial pressure of oxygen in arterial blood, and oesophageal temperature. A hierarchical scale with increasing precision can be used based on category, ordinal, interval and ratio scales (Figure 1).

Descriptive statistics try to explain how variables in a sample or population are related. The mean, median, and mode forms, descriptive statistics give an overview of data. Inferential statistics use a random sample of data from that group to characterise and infer about a community as a whole. It’s useful when it’s not possible to investigate every single person in a group.

AGGE

Descriptive statistics

The central tendency describes how observations cluster about a centre point, whereas the degree of dispersion describes the spread towards the extremes.

Inferential statistics

In inferential statistics, data from a sample is analysed to conclude the entire population. The goal is to prove or disprove the theories. A hypothesis is a suggested explanation for a phenomenon (plural hypotheses). Hypothesis testing is essential to process for making logical choices regarding observed effects’ veracity.

SOFTWARES FOR STATISTICS, SAMPLE SIZE CALCULATION AND POWER ANALYSIS

There are several statistical software packages accessible today. The most commonly used software systems are Statistical Package for the Social Sciences (SPSS – manufactured by IBM corporation), Statistical Analysis System (SAS – developed by SAS Institute North Carolina, Minitab (developed by Minitab Inc), United States of America), R (designed by Ross Ihaka and Robert Gentleman from the R core team), Stata (developed by StataCorp), and MS Excel. There are several websites linked to statistical power studies. Here are a few examples:

  • StatPages.net – contains connections to a variety of online power calculators.
  • G-Power — a downloadable power analysis software that works on DOS.
  • ANOVA power analysis creates an interactive webpage that estimates the power or sample size required to achieve a specified power for one effect in a factorial ANOVA design.
  • Sample Power is software created by SPSS. It generates a comprehensive report on the computer screen that may be copied and pasted into another document.

A researcher must be familiar with the most important statistical approaches for doing research. This will aid in the implementation of a well-designed study that yields accurate and valid data. Incorrect statistical approaches can result in erroneous findings, mistakes, and reduced paper’s importance. Poor statistics can lead to poor research, which can lead to immoral behaviour. As a result, proper statistical understanding and the right application of statistical tests are essential. A thorough understanding of fundamental statistical methods will go a long way toward enhancing study designs and creating high-quality medical research that may be used to develop evidence-based guidelines.

[1] Ali, Zulfiqar, and S Bala Bhaskar. “Basic statistical tools in research and data analysis.”  Indian journal of anaesthesia  vol. 60,9 (2016): 662-669. doi:10.4103/0019-5049.190623

[2] Ali, Zulfiqar, and S Bala Bhaskar. “Basic statistical tools in research and data analysis.” Indian journal of anaesthesia vol. 60,9 (2016): 662-669. doi:10.4103/0019-5049.190623

  • ANOVA power analysis
  • Quantitative Data analysis
  • quantitative variables
  • R programming
  • sample size calculation.

how to find statistical treatment in a research paper

  • A global market analysis (1)
  • Academic (22)
  • Algorithms (1)
  • Big Data Analytics (4)
  • Bio Statistics (3)
  • Clinical Prediction Model (1)
  • Corporate (9)
  • Corporate statistics service (1)
  • Data Analyses (23)
  • Data collection (11)
  • Genomics & Bioinformatics (1)
  • Guidelines (2)
  • Machine Learning – Blogs (1)
  • Network Analysis (1)
  • Predictive analyses (2)
  • Qualitative (1)
  • Quantitaive (2)
  • Quantitative Data analysis service (1)
  • Research (59)
  • Shipping & Logistics (1)
  • Statistical analysis service (7)
  • Statistical models (1)
  • Statistical Report Writing (1)
  • Statistical Software (10)
  • Statistics (64)
  • Survey & Interview from Statswork (1)
  • Uncategorized (3)

Recent Posts

  • Top 10 Machine Learning Algorithms Expected to Shape the Future of AI
  • Data-Driven Governance: Revolutionizing State Youth Policies through Web Scraping
  • The Future is Now: The Potential of Predictive Analytics Models and Algorithms
  • 2024 Vision: Exploring the Impact and Evolution of Advanced Analytics Tools
  • Application of machine learning in marketing

Statswork is a pioneer statistical consulting company providing full assistance to researchers and scholars. Statswork offers expert consulting assistance and enhancing researchers by our distinct statistical process and communication throughout the research process with us.

Functional Area

– Research Planning – Tool Development – Data Mining – Data Collection – Statistics Coursework – Research Methodology – Meta Analysis – Data Analysis

  • – Corporate
  • – Statistical Software
  • – Statistics

Corporate Office

#10, Kutty Street, Nungambakkam, Chennai, Tamil Nadu – 600034, India No : +91 4433182000, UK No : +44-1223926607 , US No : +1-9725029262 Email: [email protected]

Website: www.statswork.com

© 2024 Statswork. All Rights Reserved

Warning: The NCBI web site requires JavaScript to function. more...

U.S. flag

An official website of the United States government

The .gov means it's official. Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you're on a federal government site.

The site is secure. The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

  • Publications
  • Account settings
  • Browse Titles

NCBI Bookshelf. A service of the National Library of Medicine, National Institutes of Health.

StatPearls [Internet]. Treasure Island (FL): StatPearls Publishing; 2024 Jan-.

Cover of StatPearls

StatPearls [Internet].

Exploratory data analysis: frequencies, descriptive statistics, histograms, and boxplots.

Jacob Shreffler ; Martin R. Huecker .

Affiliations

Last Update: November 3, 2023 .

  • Definition/Introduction

Researchers must utilize exploratory data techniques to present findings to a target audience and create appropriate graphs and figures. Researchers can determine if outliers exist, data are missing, and statistical assumptions will be upheld by understanding data. Additionally, it is essential to comprehend these data when describing them in conclusions of a paper, in a meeting with colleagues invested in the findings, or while reading others’ work.

  • Issues of Concern

This comprehension begins with exploring these data through the outputs discussed in this article. Individuals who do not conduct research must still comprehend new studies, and knowledge of fundamentals in analyzing data and interpretation of histograms and boxplots facilitates the ability to appraise recent publications accurately. Without this familiarity, decisions could be implemented based on inaccurate delivery or interpretation of medical studies.

Frequencies and Descriptive Statistics

Effective presentation of study results, in presentation or manuscript form, typically starts with frequencies and descriptive statistics (ie, mean, medians, standard deviations). One can get a better sense of the variables by examining these data to determine whether a balanced and sufficient research design exists. Frequencies also inform on missing data and give a sense of outliers (will be discussed below).

Luckily, software programs are available to conduct exploratory data analysis. For this chapter, we will be examining the following research question.

RQ: Are there differences in drug life (length of effect) for Drug 23 based on the administration site?

A more precise hypothesis could be: Is drug 23 longer-lasting when administered via site A compared to site B?

To address this research question, exploratory data analysis is conducted. First, it is essential to start with the frequencies of the variables. To keep things simple, only variables of minutes (drug life effect) and administration site (A vs B) are included. See Image. Figure 1 for outputs for frequencies.

Figure 1 shows that the administration site appears to be a balanced design with 50 individuals in each group. The excerpt for minutes frequencies is the bottom portion of Figure 1 and shows how many cases fell into each time frame with the cumulative percent on the right-hand side. In examining Figure 1, one suspiciously low measurement (135) was observed, considering time variables. If a data point seems inaccurate, a researcher should find this case and confirm if this was an entry error. For the sake of this review, the authors state that this was an entry error and should have been entered 535 and not 135. Had the analysis occurred without checking this, the data analysis, results, and conclusions would have been invalid. When finding any entry errors and determining how groups are balanced, potential missing data is explored. If not responsibly evaluated, missing values can nullify results.  

After replacing the incorrect 135 with 535, descriptive statistics, including the mean, median, mode, minimum/maximum scores, and standard deviation were examined. Output for the research example for the variable of minutes can be seen in Figure 2. Observe each variable to ensure that the mean seems reasonable and that the minimum and maximum are within an appropriate range based on medical competence or an available codebook. One assumption common in statistical analyses is a normal distribution. Image . Figure 2 shows that the mode differs from the mean and the median. We have visualization tools such as histograms to examine these scores for normality and outliers before making decisions.

Histograms are useful in assessing normality, as many statistical tests (eg, ANOVA and regression) assume the data have a normal distribution. When data deviate from a normal distribution, it is quantified using skewness and kurtosis. [1]  Skewness occurs when one tail of the curve is longer. If the tail is lengthier on the left side of the curve (more cases on the higher values), this would be negatively skewed, whereas if the tail is longer on the right side, it would be positively skewed. Kurtosis is another facet of normality. Positive kurtosis occurs when the center has many values falling in the middle, whereas negative kurtosis occurs when there are very heavy tails. [2]

Additionally, histograms reveal outliers: data points either entered incorrectly or truly very different from the rest of the sample. When there are outliers, one must determine accuracy based on random chance or the error in the experiment and provide strong justification if the decision is to exclude them. [3]  Outliers require attention to ensure the data analysis accurately reflects the majority of the data and is not influenced by extreme values; cleaning these outliers can result in better quality decision-making in clinical practice. [4]  A common approach to determining if a variable is approximately normally distributed is converting values to z scores and determining if any scores are less than -3 or greater than 3. For a normal distribution, about 99% of scores should lie within three standard deviations of the mean. [5]  Importantly, one should not automatically throw out any values outside of this range but consider it in corroboration with the other factors aforementioned. Outliers are relatively common, so when these are prevalent, one must assess the risks and benefits of exclusion. [6]

Image . Figure 3 provides examples of histograms. In Figure 3A, 2 possible outliers causing kurtosis are observed. If values within 3 standard deviations are used, the result in Figure 3B are observed. This histogram appears much closer to an approximately normal distribution with the kurtosis being treated. Remember, all evidence should be considered before eliminating outliers. When reporting outliers in scientific paper outputs, account for the number of outliers excluded and justify why they were excluded.

Boxplots can examine for outliers, assess the range of data, and show differences among groups. Boxplots provide a visual representation of ranges and medians, illustrating differences amongst groups, and are useful in various outlets, including evidence-based medicine. [7]  Boxplots provide a picture of data distribution when there are numerous values, and all values cannot be displayed (ie, a scatterplot). [8]  Figure 4 illustrates the differences between drug site administration and the length of drug life from the above example.

Image . Figure 4 shows differences with potential clinical impact. Had any outliers existed (data from the histogram were cleaned), they would appear outside the line endpoint. The red boxes represent the middle 50% of scores. The lines within each red box represent the median number of minutes within each administration site. The horizontal lines at the top and bottom of each line connected to the red box represent the 25th and 75th percentiles. In examining the difference boxplots, an overlap in minutes between 2 administration sites were observed: the approximate top 25 percent from site B had the same time noted as the bottom 25 percent at site A. Site B had a median minute amount under 525, whereas administration site A had a length greater than 550. If there were no differences in adverse reactions at site A, analysis of this figure provides evidence that healthcare providers should administer the drug via site A. Researchers could follow by testing a third administration site, site C. Image . Figure 5 shows what would happen if site C led to a longer drug life compared to site A.

Figure 5 displays the same site A data as Figure 4, but something looks different. The significant variance at site C makes site A’s variance appear smaller. In order words, patients who were administered the drug via site C had a larger range of scores. Thus, some patients experience a longer half-life when the drug is administered via site C than the median of site A; however, the broad range (lack of accuracy) and lower median should be the focus. The precision of minutes is much more compacted in site A. Therefore, the median is higher, and the range is more precise. One may conclude that this makes site A a more desirable site.

  • Clinical Significance

Ultimately, by understanding basic exploratory data methods, medical researchers and consumers of research can make quality and data-informed decisions. These data-informed decisions will result in the ability to appraise the clinical significance of research outputs. By overlooking these fundamentals in statistics, critical errors in judgment can occur.

  • Nursing, Allied Health, and Interprofessional Team Interventions

All interprofessional healthcare team members need to be at least familiar with, if not well-versed in, these statistical analyses so they can read and interpret study data and apply the data implications in their everyday practice. This approach allows all practitioners to remain abreast of the latest developments and provides valuable data for evidence-based medicine, ultimately leading to improved patient outcomes.

  • Review Questions
  • Access free multiple choice questions on this topic.
  • Comment on this article.

Exploratory Data Analysis Figure 1 Contributed by Martin Huecker, MD and Jacob Shreffler, PhD

Exploratory Data Analysis Figure 2 Contributed by Martin Huecker, MD and Jacob Shreffler, PhD

Exploratory Data Analysis Figure 3 Contributed by Martin Huecker, MD and Jacob Shreffler, PhD

Exploratory Data Analysis Figure 4 Contributed by Martin Huecker, MD and Jacob Shreffler, PhD

Exploratory Data Analysis Figure 5 Contributed by Martin Huecker, MD and Jacob Shreffler, PhD

Disclosure: Jacob Shreffler declares no relevant financial relationships with ineligible companies.

Disclosure: Martin Huecker declares no relevant financial relationships with ineligible companies.

This book is distributed under the terms of the Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International (CC BY-NC-ND 4.0) ( http://creativecommons.org/licenses/by-nc-nd/4.0/ ), which permits others to distribute the work, provided that the article is not altered or used commercially. You are not required to obtain permission to distribute this article, provided that you credit the author and journal.

  • Cite this Page Shreffler J, Huecker MR. Exploratory Data Analysis: Frequencies, Descriptive Statistics, Histograms, and Boxplots. [Updated 2023 Nov 3]. In: StatPearls [Internet]. Treasure Island (FL): StatPearls Publishing; 2024 Jan-.

In this Page

Bulk download.

  • Bulk download StatPearls data from FTP

Related information

  • PMC PubMed Central citations
  • PubMed Links to PubMed

Similar articles in PubMed

  • Contour boxplots: a method for characterizing uncertainty in feature sets from simulation ensembles. [IEEE Trans Vis Comput Graph. 2...] Contour boxplots: a method for characterizing uncertainty in feature sets from simulation ensembles. Whitaker RT, Mirzargar M, Kirby RM. IEEE Trans Vis Comput Graph. 2013 Dec; 19(12):2713-22.
  • Review Univariate Outliers: A Conceptual Overview for the Nurse Researcher. [Can J Nurs Res. 2019] Review Univariate Outliers: A Conceptual Overview for the Nurse Researcher. Mowbray FI, Fox-Wasylyshyn SM, El-Masri MM. Can J Nurs Res. 2019 Mar; 51(1):31-37. Epub 2018 Jul 3.
  • [Descriptive statistics]. [Rev Alerg Mex. 2016] [Descriptive statistics]. Rendón-Macías ME, Villasís-Keever MÁ, Miranda-Novales MG. Rev Alerg Mex. 2016 Oct-Dec; 63(4):397-407.
  • An exploratory data analysis of electroencephalograms using the functional boxplots approach. [Front Neurosci. 2015] An exploratory data analysis of electroencephalograms using the functional boxplots approach. Ngo D, Sun Y, Genton MG, Wu J, Srinivasan R, Cramer SC, Ombao H. Front Neurosci. 2015; 9:282. Epub 2015 Aug 19.
  • Review Graphics and statistics for cardiology: comparing categorical and continuous variables. [Heart. 2016] Review Graphics and statistics for cardiology: comparing categorical and continuous variables. Rice K, Lumley T. Heart. 2016 Mar; 102(5):349-55. Epub 2016 Jan 27.

Recent Activity

  • Exploratory Data Analysis: Frequencies, Descriptive Statistics, Histograms, and ... Exploratory Data Analysis: Frequencies, Descriptive Statistics, Histograms, and Boxplots - StatPearls

Your browsing activity is empty.

Activity recording is turned off.

Turn recording back on

Connect with NLM

National Library of Medicine 8600 Rockville Pike Bethesda, MD 20894

Web Policies FOIA HHS Vulnerability Disclosure

Help Accessibility Careers

statistics

Long-Term Effects of Childhood Exposure to War on Domestic Violence

  • Original Article
  • Open access
  • Published: 22 August 2024

Cite this article

You have full access to this open access article

how to find statistical treatment in a research paper

  • Joseph B. Ajefu 1 , 2 &
  • Daniela Casale 3  

This paper highlights the scarring effects of early life exposure to civil war, by examining the impact of exposure to conflict in childhood on the incidence of domestic violence in adulthood among married women. To estimate these effects, we use a difference-in-differences model which exploits variation in exposure to Nigeria’s 30-month-long civil war by year of birth and ethnicity. Our results, based on the 2008 Nigerian Demographic Health Survey, show that women exposed to the war during childhood are more likely to be victims of domestic violence in adulthood compared to those not exposed to the war, with larger effects observed for those exposed at younger ages. Additionally, we explore the mechanisms through which exposure to civil war might affect domestic violence and find some support for both the normalisation of violence and weakened bargaining power hypotheses. Understanding the root causes of domestic violence is important given the high prevalence in developing countries and the deleterious consequences for women and their children.

Ce document met en évidence les effets cicatrisants d'une exposition précoce à la guerre civile, en examinant l'impact de l'exposition au conflit pendant l'enfance sur l'incidence de la violence domestique à l'âge adulte chez les femmes mariées. Pour estimer ces effets, nous utilisons un modèle de différences en différences qui exploite la variation de l'exposition à la guerre civile nigériane de 30 mois en fonction de l'année de naissance et de l'ethnicité. Nos résultats, basés sur l'Enquête démographique de santé nigériane de 2008, montrent que les femmes exposées à la guerre pendant l'enfance sont plus susceptibles d'être victimes de violence domestique à l'âge adulte par rapport à celles qui n'ont pas été exposées à la guerre, avec des effets plus importants observés pour celles exposées à des âges plus jeunes. De plus, nous explorons les mécanismes par lesquels l'exposition à la guerre civile pourrait affecter la violence domestique et trouvons un certain soutien pour les hypothèses de normalisation de la violence et d'affaiblissement du pouvoir de négociation. Comprendre les causes profondes de la violence domestique est important étant donné la prévalence élevée dans les pays en développement et les conséquences délétères pour les femmes et leurs enfants.

Este documento destaca los efectos perjudiciales de la exposición en los primeros años de vida a la guerra civil, examinando el impacto de la exposición al conflicto en la infancia sobre la incidencia de la violencia doméstica en la adultez entre mujeres casadas. Para estimar estos efectos, utilizamos un modelo de diferencias en diferencias que explota la variación en la exposición a la guerra civil de Nigeria de 30 meses de duración por año de nacimiento y etnia. Nuestros resultados, basados en la Encuesta de Salud Demográfica de Nigeria 2008, muestran que las mujeres expuestas a la guerra durante la infancia tienen más probabilidades de ser víctimas de violencia doméstica en la adultez en comparación con aquellas que no estuvieron expuestas a la guerra, con efectos mayores observados para aquellas expuestas a edades más tempranas. Además, exploramos los mecanismos a través de los cuales la exposición a la guerra civil podría afectar la violencia doméstica y encontramos cierto apoyo tanto para las hipótesis de normalización de la violencia como para el debilitamiento del poder de negociación. Comprender las causas fundamentales de la violencia doméstica es importante dado su alta prevalencia en los países en desarrollo y las consecuencias perjudiciales para las mujeres y sus hijos.

Explore related subjects

  • Medical Ethics

Avoid common mistakes on your manuscript.

Introduction

Since World War II, almost one-third of all countries have experienced civil war, and the incidence of armed conflict has been on the rise (Gleditsch et al. 2002 ). In Sub-Saharan Africa specifically, nearly three-fourths of countries in the region have experienced civil war (Gleditsch et al. 2002 ). These conflicts have often led to considerable loss of lives, deterioration of physical and human capital, erosion of institutional capacity, and reduced economic growth (Akbulut-Yuksel and Yuksel 2017 ). It has been estimated, for instance, that between 2012 and 2017, the global economic costs of conflict increased from $12.62 trillion to $14.76 trillion, with many of the conflict-torn countries trapped in a perpetual cycle of violence (World Development Report 2011 ; World Humanitarian Data and Trends Report 2017 ; Institute for Economics and Peace 2018 ).

While the macroeconomic costs of war have long been studied in economics, literature on the microeconomic impacts of civil war, particularly in developing countries, has grown in the last 20 years especially, perhaps as more data have become available (Verwimp et al 2019 ). Studies have shown that exposure to conflict is negatively associated with educational attainment (Singh and Shemyakina 2016 ; Chamarbagwala and Moran 2011 ; Shemyakina 2011 ; Swee 2015 ), health outcomes (Akresh et al. 2012a , 2012b ; Grimard and Laszlo 2014 ; Weldeegzie; 2017 ), social trust (Kijewski and Freitag 2018 ), and labour market outcomes (Galdo 2013 ; Islam et al. 2016 ).

In this paper, we add to this literature by exploring how exposure to conflict in childhood affects experiences of domestic violence among women in adulthood, using the case of the Nigerian civil war. Recent work suggests that exposure to war increases women’s likelihood of experiencing intimate partner violence across a range of contexts. La Mattina ( 2017 ) finds that exposure to the genocide in Rwanda increased the incidence of domestic violence among women who married after 1994 compared to those who married before the genocide occurred, with a larger effect for women in areas with high genocide intensity. Kelly et al ( 2018 ) match district-level information on conflict-related fatalities during the civil war in Liberia from 1999 to 2003 to data on post-conflict intimate partner violence from the 2007 Demographic Health Survey (DHS). They find a strong effect of fatalities on the incidence of intimate partner violence, with 4–5 years of cumulative exposure having the strongest effect. In a similar vein, Østby et al ( 2019 ) analyse the experiences of women in Peru during and after the civil war from 1980 to 2000 and find that those living in areas with higher exposure to conflict-related violence are at increased risk of violence in the home. Svallfors ( 2023 ) analyses DHS data from 2005 to 2015 for Columbia and shows that local-level exposure to armed conflict events in the previous year especially, increased women’s likelihood of experiencing intimate partner violence.

In all these studies, the focus has been on the association between conflict exposure and domestic violence in adulthood, or on temporally proximate relationships. In our reading of the literature, we could find only one very recent published paper by Torrisi ( 2023 ) which tries to uncover whether the timing of exposure matters, and particularly whether exposure to armed conflict during childhood has long-lasting consequences for domestic violence in adulthood. Torrisi ( 2023 ) combines DHS data with geo-referenced information on the armed conflicts that occurred in four ex-Soviet countries (Armenia, Azerbaijan, Moldova, and Tajikistan) soon after the break-up of the USSR. She finds that women who were exposed to conflict by age 19 were more likely to experience domestic violence than those never exposed or not exposed by age 19, and that this effect is driven largely by exposure in the sensitive childhood period from 0 to 10 years of age (with no significant effect for those exposed at ages 11 to 15 or 16 to 19).

We also found two working papers that explore the relationship between childhood exposure and domestic violence in adulthood (Gutierrez and Gallegos 2016 ; La Mattina and Shemyakina 2017 ). Gutierrez and Gallegos ( 2016 ) use DHS data from Peru coupled with information on geographical variation in exposure to violent conflict to show that both women who were exposed at ages 0 to 8 and 9 to 16 experienced a higher incidence of domestic violence in adulthood compared to those not exposed. La Mattina and Shemyakina ( 2017 ) use the DHS data on selected Sub-Saharan African countries and exploit both temporal and geographical variation in conflict intensity between 1946 and 2006 across sub-national regions. Their results suggest that women who live in a region where there was an armed conflict when they were 6 to 10 years old are more likely to experience domestic violence than individuals not exposed to conflict by age 20, but they do not observe similar effects for individuals who were exposed to conflict at ages 0 to 5 or 11 to 20.

There is a common methodological thread that runs throughout all these studies: they use geo-referenced data on conflict-related violence combined with post-conflict data on domestic violence from the DHS surveys. In addition to imperfect matching at the sub-national or district level due to differences in levels of geographical disaggregation or demarcation between the two sources of data, a key concern with this approach is endogenous migration. The DHS only has information on the individual’s current place of residence and not on their residence in childhood or at the time of conflict. There is therefore no guarantee that the women who are currently living in a previously conflict-exposed area were also living there during childhood when the conflict took place. Indeed, endogenous migration is likely to be more of a concern during times of conflict, and the direction of the effect is difficult to predict. It is possible that the most vulnerable women (and men) may be displaced or forced to flee with their families during times of conflict, but it is also possible that the least vulnerable, those with better economic resources and social networks, are the ones who can more easily relocate to places of safety. To try and address this problem, many of the studies listed above restrict their samples to those who had never moved since birth or who had not moved in the previous five years, depending on the data available in the DHS. In doing so, however, they tend to lose 50 percent or more of their initial sample (Gutierrez and Gallegos 2016 ; La Mattina and Shemyakina 2017 ; Torrisi 2023 ), likely leading to biassed results.

Our paper makes a useful methodological contribution to this growing literature on the long-term effects of war exposure by using what we consider to be a more robust method of identifying exposure than the commonly used geographical approach. We use ethnicity and birth cohort to identify exposure to conflict in childhood during the Nigerian civil war (following the approach adopted in Akresh et al 2012a , 2023 ). We are able to adopt this approach because of the very specific nature of the Nigerian civil war, which occurred from 6 July 1967 to 15 January 1970, and which was restricted to the south-eastern region of Nigeria inhabited by the Igbos and other minority ethnic groups (which we will describe in more detail below). This strategy mitigates the problem of selective migration associated with the use of geography-based variables to identify exposure, a problem which is likely to be more pronounced during times of conflict.

In addition, we examine exposure in early childhood using more granular age ranges than have currently been explored, namely those exposed in utero, between the ages of 0 to 4, 5 to 8, and 9 to 12. In doing so, we add to the growing body of literature in economics which recognises that there are long-run implications of early life shocks and that adverse circumstances during the sensitive early period of childhood can impact a range of later life outcomes (Case et al. 2005 ; Cunha and Heckman 2007 ; Almond and Currie 2011 ; Currie 2020 ). This includes increasing evidence that in utero exposure to shocks such as war, disease, and famine have long-term negative consequences on physical and mental health, educational attainment, earnings, and other socio-economic outcomes (Almond 2006 ; Camacho 2009 ; Almond and Currie 2011 ; Comfort 2016 ; Almond et al. 2018 ).

Finally, we try to unpack the mechanisms through which early life exposure to conflict affects experiences of domestic violence in adulthood, using the rich data available in the Nigerian Demographic Health Survey. We explore two possible channels. The first, the normalisation of violence hypothesis, relies on the well-known finding that children who witness violence at home are more likely to become a victim or perpetrator of domestic violence themselves in adulthood (Schwab-Stone et al. 1995 ; Gage 2005 ; Yount and Li 2009 ; Cesur and Sabia 2016 ; Jin et al. 2017 ). If war results in more intimate partner violence among married couples, as the evidence presented earlier suggests, we would expect children growing up during war to witness more violence among their parents than observably similar children. Even if children do not witness violence within their own homes, one might expect that children exposed to community-level violence through war during their formative years might also be more likely to view violence as a justifiable response to certain problems (Barnett et al. 2005 ; Fowler et al. 2009 ; Gutierrez and Galegos 2016 ). To examine whether exposure to violence in childhood might have affected the formation of beliefs during the critical early years, we use data in the DHS on whether war-exposed women witnessed domestic violence in their homes as children and on women’s and men’s attitudes towards wife-beating in adulthood (Huber 2023 ).

The second hypothesis we explore is reduced bargaining power in the household, which would affect women’s options outside of the marriage and in turn increase their likelihood of being victims of domestic violence (Bhattacharyya et al. 2011 ; Heath 2014 ; La Mattina 2017 ). There are a number of reasons why women exposed to war may have fewer outside options. For instance, a number of studies in a range of countries have found evidence that civil conflict results in poorer educational outcomes (Akresh and Walque 2008 ; Leon 2012 ; Shemyakina 2011 ; Chamarbagwala and Moran 2011 ; and Dabalen and Paul 2014 ), and there is some evidence that exposure to conflict negatively affects girls more than boys (Singh and Shemyakina 2016 ). Women with lower education have fewer out-of-marriage options given their weaker labour market outcomes and increased financial dependence on their husbands, raising the likelihood of domestic violence (Lundberg and Pollak 1996 ; Farmer and Tiefenthaler 1997 ; Aizer 2010 ; Bhattacharyya et al. 2011 ; Eswaran and Malhotra 2011 ; Galdo 2013 ; Heath 2014 ). Moreover, war exposure can affect marriage, reproductive and health outcomes, which would have consequences for women’s intra-household bargaining power (Verwimp and van Bavel 2005 ; Aizer 2011 ; Akresh 2012a ; Islam et al 2016 ; Cetorelli and Khawaja 2017 ; La Mattina 2017 ). To measure women’s bargaining power in adulthood, we use the information in the DHS on women’s decision-making power in the household across a number of domains (Ajefu and Casale 2021 ).

Our main findings are as follows. We find that women exposed to the Nigerian civil war during childhood are more likely to be victims of domestic violence in adulthood compared to women not exposed to the civil war. Specifically, we find that exposure to the civil war is associated with an increase in the likelihood of being a victim of domestic violence of 1.2 percentage points compared to non-exposed cohorts (or 6% given the sample mean incidence of 19.7%). These effects appear to be more pronounced the earlier on one is exposed in childhood, with particularly large effects for those exposed in utero. While it is far more difficult to identify the channels through which exposure to the civil war affects domestic violence (particularly across the cohorts), in our exploratory work, we find some evidence to support both the normalisation of violence and bargaining power hypotheses.

The rest of the paper is structured as follows. Section 2 provides background information on the Nigerian civil war. Section 3 discusses the data and the empirical identification strategy, and presents some descriptive statistics. Section 4 presents the estimation results, and Sect. 5 concludes.

Background on the Nigerian Civil War

Under British colonial rule, Nigeria comprised three regions, namely the northern, western, and eastern regions. Footnote 1 Each of these regions had a predominant ethnic group, with the Hausa in the North, the Yoruba in the West, and the Igbo in the East. Like many countries in Africa, political and social conflict in Nigeria bore both ethnic and regional dimensions (Simpson 2014 ). In less than seven years after becoming an independent nation (on 1 October 1960), some of these long-standing tensions between the different groups intensified and the country was plunged into a civil war, also known as the Biafran War.

While the underlying geo-political causes of the war are too complex to explain here, some of the immediate causes of the Nigerian Civil War were the military coup on 15 January 1966, organised by primarily Igbo army officers, the counter-coup of 28 July 1966, and the subsequent persecution and killing of the Igbos in the Northern part of the country (Kirk-Greene 1971 ; Nafziger 1972 ). In response to this, there was a massive return migration of Igbos seeking refuge (estimated to involve around 1.5 million people) to their homeland in the south-eastern region (Aall 1970 ; Akresh et al 2012a ). On 30 May 1967, the south-eastern region declared itself the Republic of Biafra and this led to a full-blown civil war that began on 6 July 1967 (see Fig.  1 ).

figure 1

Map of Nigeria indicating the south-east states. The civil war was restricted to the south-east region that declared itself the Biafra republic

Nigeria’s Federal Military Government fiercely resisted the breakaway republic for two and a half years, using both their military might and their ability to impose a blockade of the landlocked territory (preventing the inflow of food, medicine, and other essential supplies). It has been estimated that between 1 and 3 million people died from the violence and mass starvation that ensued, in what was considered one of the bloodiest wars in sub-Saharan Africa (Akresh et al. 2012a ; Simpson 2014 ). The war ended on 15 January 1970 after the Republic of Biafra surrendered to the Nigerian troops.

Two key features of this devastating conflict are salient to our empirical strategy. First, because of the military blockade (which prevented movement of both people and supplies), the war was fought in the south-eastern region with direct civilian exposure largely restricted to this area (Akresh et al. 2012a ). Second, at the time of the war, most Igbos were living in their native states in the south-east, and many of those living outside the area returned there before the war to seek refuge in the mass migration that occurred just before secession was declared (Aall 1970 ). We can therefore use ethnicity and birth cohort to identify exposure to the civil war. This identification strategy is similar to that used by Akresh et al ( 2012a ) in their study on the impact of exposure to the Nigerian civil war on women’s stature in adulthood. This strategy is preferred to using current geographical demarcation, as is the case in other studies exploring the relationship between war exposure and domestic violence, as it circumvents the problem of selective migration (ethnicity is invariant to migration).

To investigate the impact of the Nigerian civil war on women’s experience of domestic violence in adulthood, we use the 2008 Nigerian Demographic Health Survey (DHS). The DHS is a large nationally representative cross-sectional survey conducted in a number of developing countries. It provides information on women between the ages of 15 and 49 years on a large number of demographic and socio-economic factors. The 2008 Nigerian DHS covered 34,070 households and 33,385 women. Footnote 2 We use the 2008 survey in this study for two main reasons: it is the first wave of the Nigerian DHS to collect information on the incidence of domestic violence among women; and given the timing of the war, this particular survey covers the largest sample of war-exposed women, allowing us to explore the effects of exposure in utero through to exposure at 12 years of age. Footnote 3

The information on domestic violence was collected through a specially designed questionnaire that was administered to one randomly selected woman in each household. Footnote 4 Women who were (or had been) married or cohabiting were asked in private about incidents of domestic violence as follows: “(Does/did) your (last) husband ever do any of the following things to you: (a) slap you? (b) twist your arm or pull your hair? (c) push you, shake you, or throw something at you? (d) punch you with his fist or with something that could hurt you? (e) kick you, drag you or beat you up? (f) try to choke you or burn you on purpose? (g) threaten or attack you with a knife, gun, or any other weapon? (h) physically force you to have sexual intercourse with him even when you did not want to? (i) force you to perform any sexual acts you did not want to?” We measure domestic violence using a binary variable that takes the value of 1 if a woman suffered any of the above-mentioned aggressive behaviours from her husband or partner and 0 otherwise.

Empirical Identification Strategy

To estimate the causal impact of exposure to the civil war in childhood on experiences of domestic violence in adulthood, we adopt a difference-in-differences strategy. As described above, our identification strategy exploits variation in exposure to the civil war by birth cohort and ethnicity. This estimation strategy minimises the problem of selective migration associated with the use of geographical variation in conflict exposure and helps to circumvent one of the limitations of the Nigerian DHS, namely, that it only has information on the current residence of respondents but no information on their place of birth or their place of residence during the war.

We define the treatment or war-exposed group as those Igbo and other minority ethnic groups (who would have been in the south-eastern region when the war was fought) born between 1958 and October 1970. These women were between 0 and 12 years old (including in utero) when the war took place between July 1967 and January 1970, and are aged 38 to 49 years in 2008 when we observe their experiences of domestic violence.

We present two distinct control groups: i) one across time, i.e. women from the war-exposed ethnicities but born in the six-year period following the war, namely from November 1970 to December 1976 (and aged 32 to 38 years in 2008), Footnote 5 and ii) one across ethnicity, i.e. the same birth cohorts (1958–1976) but from the non-war-exposed ethnicities (predominant in the other regions of Nigeria). Table 1 summarises birth cohorts for the war-exposed and non-exposed groups, respectively.

We estimate Eq. ( 1 ) below:

where \({\text{Y}}_{\text{ijt}}\) is equal to one (zero otherwise) if individual i belonging to ethnicity j and born in year t was a victim of domestic violence in adulthood. \(wa{r}_{ethnicity}\) denotes Igbo or other minority ethnic groups in the south-east region and \({Cohort}_{it}\) includes four cohorts, namely those exposed to war in utero (born between February and October 1970), those exposed between 0 and 4 years (born 1966–1970), those exposed between 5 and 8 years (born 1962–1965), and those exposed between 9 and 12 years (born 1958–1961), where the omitted category is those born between November 1970 (i.e. nine months after the war) and December 1976. The interactions of war ethnicity with each of the four cohorts are the variables of interest and capture the effect of an individual’s exposure to the civil war on the incidence of domestic violence. \({X}_{ij}\) is a vector of individual and household characteristics, which includes age at first marriage, religion, education, urban residence, and household wealth; \({\delta }_{r}\) is a state fixed effect; and \({\varepsilon }_{ijt}\) is a random, idiosyncratic error term. We estimate the regressions using ordinary least squares (OLS) (although the results are robust to using probit regressions), and standard errors are clustered at the ethnicity level to account for serial correlation (Bertrand et al. 2004 ).

Summary Statistics

Table 2 reports the summary statistics for our sample of married/cohabiting women from whom domestic violence data were collected. The average age of women in this sample was 39 years, the average age at first marriage was 19 years, around 47% of women in the sample had completed at least primary education, and 32% were resident in urban areas. Among the women who were surveyed, 20% said they had experienced at least one type of domestic violence from their partner.

To explore the normalisation of violence and bargaining power hypotheses as potential mechanisms through which exposure to conflict affects the incidence of domestic violence, we also examine data on attitudes towards domestic violence, domestic violence among parents, and decision-making in the household. The summary statistics for these variables are also shown in Table  2 . On average, 34% of the women in the sample responded that domestic violence is justified if the woman goes out without informing the husband/partner, 32% felt it was justified if a woman neglects the children, 29% felt it was justified if a woman argues with her husband/partner, 26% felt it was justified if a woman refuses to have sex with her husband/partner, and 17% justified violence if a woman burns the food. Nearly 13% percent of women reported witnessing domestic violence among their own parents. In terms of household decision-making, 12% of women reported having the final say on own health care, 7% reported having the final say on large household purchases, 20% reported having the final say on household purchases for daily needs, and 14% reported having the final say on visits to family or relatives.

Table 3 shows that are large and significant differences in these variables by war exposure. Just under 18% of the non-exposed group reported being victims of domestic violence, compared to 27% of the war-exposed group. Moreover, 11% of the non-exposed group witnessed domestic violence among their parents, compared to 19% of the war-exposed group. There are also statistically significant differences in attitudes towards domestic violence, with war-exposed women more likely to report that wife-beating was justified in certain circumstances. For example, 15% of the non-exposed group justified wife-beating if a woman refuses to have sex with her partner compared to 30% of the war-exposed group. In terms of household decision-making, statistically significant differences are observed in three out of the four domains, with war-exposed women less likely to report having the final say on own health care, purchases for daily needs and visits to family and friends.

Figure  2 presents a box plot of our main variable of interest, the incidence of domestic violence, across the cohorts. Within each birth cohort, the incidence of domestic violence is clearly higher for the war-exposed ethnic groups compared to the non-exposed ethnic groups, and the difference between the two appears larger for those exposed at younger ages. However, these are unconditional estimates, and it remains to be seen whether these effects will hold in the multivariate difference-in-differences analysis, which we present in the next section.

figure 2

Box plot showing the incidence of domestic violence across the cohorts for the exposed and non-exposed ethnicities

Exposure to Civil War and Domestic Violence

Table 4 presents the results from a series of equations which estimate the effect of exposure to the civil war in childhood (in utero to age 12) on the incidence of domestic violence in adulthood, without disaggregating by birth cohort. The coefficients on the interaction term suggest a positive and significant effect of war exposure in childhood on the incidence of domestic violence among women in adulthood. The size of the coefficient tends to fall as an increasing number of controls are added between columns 1 and 4. The regression in column 4 includes controls for individual and household characteristics and fixed effects for state, ethnicity, and cohort, and is our preferred specification. The coefficient from this regression suggests that exposure to the civil war increases the likelihood of being a victim of domestic violence by 1.2 percentage points (or 6% given the sample mean incidence of 19.7%). Footnote 6

In Table  5 , we disaggregate exposure to the civil war by birth cohort to test whether the effects of civil war exposure on domestic violence vary by the age at which the women were exposed to the war in childhood. The categories represent those exposed in utero (born between February 1970 and October 1970), those exposed between the ages of 0–4 (born 1966–1970), those exposed between the ages of 5–8 (born 1962–1965), and those exposed between the ages of 9–12 (born 1958–1961). From the estimates, we find that the effects are largest for those exposed at younger ages. Specifically, exposure to the civil war in utero increases the probability of experiencing domestic violence in adulthood by 7.4 percentage points, and exposure to the civil war between 0 and 4 years increases the probability of experiencing domestic violence by 1.7 percentage points (specification 4).

These results are consistent with the increasing evidence described earlier that there are long-run implications of early life shocks and that adverse circumstances during the sensitive early period of childhood impact later life outcomes (Case et al. 2005 ; Cunha and Heckman 2007 ; Currie 2020 ). This includes a growing body of literature showing that in utero exposure to shocks such as war, drought, and famine have long-term negative consequences.

This literature draws on the ‘fetal origins’ hypothesis, which proposes that conditions in utero, particularly nutrition, ‘program’ the foetus with particular metabolic features that can result in disease later on in life (Barker; 1990 , 1995 ). Studies have found evidence to link events or circumstances in utero to birth weight, adult height, disability, heart disease, and obesity, suggesting latent and long-lasting consequences on health outcomes (Ravelli et al 1976 ; Dunn 2007 ; Camacho 2009 ; Almond and Currie 2011 ; Comfort 2016 ). In addition, there is evidence to suggest negative effects on mental health and cognitive function as well as on education, employment, and adult earnings, implying potential neurological involvement (Hoek et al 1998 ; Almond 2006 ; Almond et al. 2018 ).

Almond et al ( 2018 ) summarise a number of ‘biological’ or direct mechanisms through which foetal-origin effects can be generated, including nutritional insults, infectious disease, maternal stress, and alcohol and tobacco use, all of which would likely be more prevalent during times of war. In addition to the direct biological mechanisms, there may be social and economic factors at play that reinforce the negative outcomes. However, as Almond and Currie ( 2011 ) and Almond et al ( 2018 ) point out in their extensive reviews of this wide-ranging literature, more work is needed to disentangle the biological from the more indirect socio-economic mechanisms. Some of examples of these during war could include lack of access to health and policing services, disruption of markets and other key institutions, disturbance of family life, established norms and social networks, and changes to parenting behaviour. We reflect on some of these issues further below when looking at the mechanisms through which exposure to war might affect domestic violence in adulthood.

Robustness Checks

To test the robustness of our difference-in-differences strategy which assumes parallel trends, we estimate two placebo regressions (using similar methods to for e.g. Akresh et al. 2012a ; Gutierrez and Gallegos 2016 and Weldeegzie 2017 ). In the first test (column 1 of Table  6 ), we exclude the main war-exposed ethnicities (Igbo and other ethnic minorities) and placebo-treat the ethnic groups in the northern part of the country (Kanuri, Hausa, and Fulani), with the remaining ethnicities used as the control group. We choose the northern part of the country given the geographical distance from the area where the war was fought. In the second test (column 2), we placebo-treat the cohort born immediately after the civil war (from 1971 to 1976), with the cohort born from 1977 to 1980 used as the control group. Footnote 7 We would not expect an effect for women born after the civil war. Neither of the coefficients on the placebo-treated interaction term in Table  6 is statistically significant, providing support in favour of our identification strategy. Footnote 8

Although we chose to use the DHS 2008 for this study, as it provides the largest sample of women exposed to the war in childhood (from in utero to age 12), we also check whether our main results hold using the later round of the DHS from 2013. Column 1 of Table  7 shows the estimated effect of war exposure in childhood (without disaggregating across the cohorts) when only the 2013 sample is used, and column 2 of Table  7 shows the estimated effect when the 2008 and 2013 samples are pooled. The results remain robust, with the effect even larger at 5.4 percentage points in column 1 and 4.7 percentage points in column 2 (compared to the 1.2 percentage points estimated in column 4, Table  4 , using the same specification).

In column 3 of Table  7 , we disaggregate the war-exposed women into the four birth cohorts using the pooled sample from 2008 and 2013. Footnote 9 Again, we find the strongest effect from exposure in utero of 5.1 percentage points (compared to 7.4 percentage points in column 4 of Table  5 , using the same specification). However, in the pooled sample, we also find a significant effect of exposure by those exposed between 8 and 12 years. On the whole, though, our robustness checks support our main findings, namely that war exposure in childhood results in a higher incidence of domestic violence among women in adulthood, and that exposure in utero appears to have the strongest effect.

Potential Mechanisms Through Which Civil War Affects Domestic Violence

Normalisation of violence.

This section explores two potential mechanisms through which exposure to civil war during childhood may affect the incidence of domestic violence in adulthood. The first is the normalisation of violence hypothesis, which has also been referred to as the intergenerational transmission of violence hypothesis or the model of social learning. Exposure to violence at home during a child’s formative years is known to result in a greater likelihood of being a victim or perpetrator of domestic violence in adulthood (Schwab-Stone et al. 1995 ; Gage 2005 ; Mihalic and Elliott 2007; Yount and Li 2009 ; Cesur and Sabia 2016 ; Jin et al. 2017 ). Along the same lines, one might expect that children exposed to community-level violence during war might also be more likely to view violence as a justifiable response to certain problems (Barnett et al. 2005 ; Fowler et al. 2009 ). In Table  8 , we estimate the effect of women’s exposure to the civil war on the justification of domestic violence to test whether women who were exposed to the conflict in childhood have different attitudes towards domestic violence in adulthood.

Most of the coefficients are positive, many are statistically significant, and some are quite large. In general, the results suggest that, across the birth cohorts, women exposed to the war in childhood are more likely to justify the use of wife-beating than non-exposed women, particularly if the woman argues with her husband, refuses to have sex with him, or burns the food. For example (from row 1), women exposed to war in utero were 2.4 percentage points more likely to justify wife-beating if the woman argues with her husband and 6 percentage points more likely to justify wife-beating if she burns the food, compared to the non-exposed group. The effects are similarly large (and in some cases larger) among those exposed between the ages of 0–4, 5–8, and 9–12, depending on the question asked.

In Table  9 , we use the matched couple’s recode data from the DHS Footnote 10 to investigate the effect of husbands’ exposure to the civil war on the justification of domestic violence in adulthood. This recognises that domestic violence involves both a perpetrator and a victim. Given the high degree of assortative mating by ethnicity in Nigeria, the majority of women who were exposed to the civil war are married to men who were also exposed to the civil war. Indeed, the DHS data indicate that 93.4% of war-exposed women were married to war-exposed men (with only 6.3% of non-exposed women married to war-exposed men). Footnote 11 Because the DHS interviews men aged 15–59, we can disaggregate exposure into in utero, between the ages of 0–4 (born 1966–1970), between the ages of 5–8 (born 1962–1965), between the ages of 9–12 (born 1958–1961), and between the ages of 13–22 (born 1948–1957). The results suggest that compared to non-exposed men, war-exposed men are more likely to justify the use of wife-beating. Although the pattern is not entirely consistent across the five columns, the effect is largest for cohorts of men exposed in utero and between the ages of 9–12 and 13–22.

In addition to being exposed to more community-level violence growing up during war, and marrying men similarly exposed as children, the women exposed to war in childhood may also have been witness to more domestic violence in their own childhood homes or more violent forms of parenting. This could be the case if the stresses and violence of war and the disruption to social norms and family life in turn led to more violence among the parents. The literature summarised in the introduction certainly suggests that intimate partner violence rises during times of war and conflict among married or partnered couples (La Mattina 2017 ; Kelly et al. 2018 ; Østby et al 2019 ; Svallfors 2023 ). The questionnaire asks women if they were aware of domestic violence among their parents, specifically whether the father ever ‘beat’ the mother. We find that 11 percent of women not exposed to the war in childhood were aware of domestic violence among their parents, compared to 19 percent of war-exposed women. This is a substantial and significant difference.

We include this variable as an explanatory variable in the regression and we also interact this variable with the war exposure variables to test whether the effect is stronger for those growing up in the midst of the war. Indeed, in Table  10 , we find a strong positive effect of witnessing domestic violence among one’s parents on the likelihood of becoming a victim oneself in adulthood, and particularly for those exposed to the war in utero. This is a striking result and could suggest that the levels of violence in those war-exposed families where the mother was pregnant were particularly severe, as the combined stresses of war and having another child on the way took their toll. It is also possible that the final months of the war (when these exposed women would have been in utero) were particularly intense, and so the effect on family life more substantial. Finally, disruptions during war to the resources that would ordinarily help mitigate the negative effects of intimate partner violence, such as health and policing services and established social networks, might have exacerbated the experiences of pregnant mothers in particular.

Bargaining Power Hypothesis

The second mechanism we explore is the intra-household bargaining power hypothesis. Women with limited resources tend to have fewer outside options which can result in an increased likelihood that they will be victims of domestic violence (Gelles 1976 ; Aizer 2010 ). The literature on the effects of conflict provides a number of reasons why women exposed to war may have fewer outside options. Civil conflict results in poorer educational outcomes (Akresh and Walque 2008 ; Leon 2012 ; Shemyakina 2011 ; Chamarbagwala and Moran 2011 ; and Dabalen and Paul 2014 ), and there is evidence that exposure to conflict negatively affects girls more than boys in terms of educational outcomes (Singh and Shemyakina 2016 ). Women with lower education have fewer out-of-marriage options given their weaker labour market outcomes and increased financial dependence on their husbands (Lundberg and Pollak 1996 ; Farmer and Tiefenthaler 1997 ; Aizer 2010 ; Bhattacharyya et al. 2011 ; Eswaran and Malhotra 2011 ; Galdo 2013 ; Heath 2014 ). Furthermore, war exposure can affect marriage, reproductive and health outcomes, which would have consequences for women’s intra-household bargaining power and experiences of domestic violence (Verwimp and van Bavel 2005 ; Akresh 2012a; Grimard and Laszlo 2014 ; Islam et al 2016 ; Cetorelli and Khawaja 2017 ; La Mattina 2017 ).

We test whether war-exposed women have lower bargaining power compared to non-exposed women using the information on decision-making in the household as a proxy. Specifically, we examine whether war-exposed women are less likely to have the final say on certain key decisions in the household compared to non-exposed women. The results in Table  11 show that while most of the coefficients are negative, as predicted, not all are significant. The strongest results are for those exposed in utero; exposure to the civil war decreases the probability of these women having a final say on their own health care by 5.4 percentage points, and on household purchases of daily needs by 8 percentage points. There are also some significant effects, ranging between 3.6 and 5.6 percentage points, for those exposed to the war between the ages of 5–8 and 9–12 for a number of the outcomes.

Conclusions and Policy Implications

In this paper, we examine the impact of exposure to war during childhood on women’s experience of domestic violence in adulthood. Unlike other studies that use current geography-based variables to identify exposure to conflict, we are able to use ethnicity and birth cohort given the nature of the Nigerian civil war, thereby mitigating concerns of selective migration. Our results indicate that exposure to the Nigerian civil war during childhood increases the likelihood of women being victims of domestic violence in adulthood, with larger effects for those exposed at younger ages, and particularly large effects for those exposed in utero. This is consistent with evidence to suggest that the early childhood period, including the time in utero, is particularly important for later life outcomes and that shocks during this period can have long-lasting effects.

Understanding the mechanisms through which civil war affects domestic violence is equally as important as identifying the effect itself, especially if effective post-war policies are to be designed to mitigate the deleterious consequences of conflict in developing countries. However, identifying the mechanisms is a much more difficult task with the data available, and therefore, our results can only be interpreted as suggestive.

First, we find that both the women in our sample and their husbands who were exposed to the war during childhood are more likely to perceive domestic violence to be an acceptable behaviour in adulthood than those not exposed to the war. This is in line with the normalisation of violence hypothesis that predicts that those exposed to violence in childhood are more likely to become either perpetrators or victims of domestic violence in adulthood. In addition, we find war-exposed women were more likely to witness domestic violence in their own childhood homes than non-exposed women, and that witnessing domestic violence among their parents is positively correlated with experiencing domestic violence themselves in adulthood particularly among those exposed in utero. It is possible that the combined stresses of war and having another child on the way led to more violent behaviour in the home, or that the final months of war (when these exposed women would have been in utero) were particularly intense, and so the effect on family life more marked. Footnote 12

Second, our findings suggest that women who were exposed to the war in childhood also have lower intra-household bargaining power compared to non-exposed women, which would make them more vulnerable to incidents of domestic violence. Relative to the non-exposed group, we found women who were exposed to the conflict in childhood have less decision-making power in their households in adulthood, and again the effect appears stronger among those in utero (although there is evidence also for the other cohorts). This might be the case if war exposure affected women’s educational, health, and reproductive outcomes in ways that placed them in a more precarious position relative to men in the marriage market.

However, this is a subject for further study given the complexity of the potential pathways and mechanisms. The large effects measured for children who were exposed to the war in utero in particular warrant further investigation. These results are consistent with the evidence from a large literature showing that conditions and events in utero can have long-lasting consequences for the individual’s physical and mental health as well as their education, employment, and earnings outcomes (Ravelli et al 1976 ; Hoek et al 1998 ; Almond 2006 ; Dunn 2007 ; Camacho 2009 ; Almond and Currie 2011 ; Comfort 2016 ). However, much more work is needed to disentangle the biological from the social mechanisms in order to better understand both the direct and more indirect channels through which foetal-origin effects are generated (Almond and Currie 2011 ; Almond et al. 2018 ).

The relevance of our study and the need for further work in this area is underscored by the pervasiveness of domestic violence. A recent study estimated the global prevalence of intimate partner violence to be around 30%, and for the sub-Saharan African region specifically, closer to 37% (WHO 2017 ). Moreover, the consequences of domestic violence, both human and economic, are substantial. Domestic violence results in direct physical and mental harm to women, with research pointing to poorer health outcomes and a greater likelihood of depressive symptoms and substance abuse among victims (Coker et al. 2002 ; Silverman et al. 2006 ; Ackerson et al. 2008 ; Ellsberg et al. 2008 ; Meekers et al. 2013 ). Domestic violence can also result in substantial economic costs related to policing, health expenditure, and reduced economic productivity (Walby 2004 ). Lastly, children of women who experience domestic violence have worse outcomes, such as lower birth weight, lower IQ scores, a greater likelihood of emotional and behavioural problems, and a higher probability of acquiring HIV (Sternberg et al. 1993 ; Koenen et al. 2003 ; Aizer 2011 ; WHO 2013 ; Rawlings and Siddique 2014 , 2018 ; Currie et al 2022 ). Understanding both the causes and longer-term implications of domestic violence is imperative to designing appropriate policy responses and support mechanisms.

Data availability

The dataset used to obtain the results for this paper can be made available upon request.

These three main regions were subsequently demarcated into six geopolitical regions, namely the northeast, northwest, north-central, south-south, south-east, and south-west, the latter being the region where the civil war was fought (Alapiki 2005 ). These six regions are further divided into 36 states.

The 2008 Nigerian Demographic Health survey also interviewed men aged 15 to 59 to provide information on health and other related issues, but it did not collect information on their experiences of domestic violence.

We were unable to analyse exposure after age 12 (or among cohorts born pre-1958) because the DHS contains information only on women aged 15 to 49 years old. In the 2008 DHS wave, the oldest woman in the sample (aged 49) therefore was born in 1958. If we use later waves of the DHS, we can only analyse a smaller sample of war-exposed women. Specifically, if we used the 2013 DHS, we would only be able to estimate the effect for those exposed in utero to age 7, and if we used the 2018 DHS, we would only be able to estimate the effect for those exposed in utero to age 2.

The DHS captures information on experiences of domestic violence using the World Health Organization’s ethical and safety guidelines (Kishor and Kiersten 2004 ). Interviewers are trained to deal with the sensitive nature of the questions and there are strict protocols to ensure privacy during the interview. To try to minimise under-reporting of domestic violence, the DHS domestic violence questionnaire uses a modified version of the Conflict Tactics Scale (CTS). Women are asked a number of separate questions on different types of violence which reduces confusion as to what constitutes domestic violence, and gives women multiple opportunities to reveal their experiences (Kishor 2005 ).

We limit our control group to the six-year period following the war, as too broad a window of comparison increases potential confounding effects (Akresh et al 2012a ). Moreover, our results are consistent when, following Akresh et al ( 2012a ), we use an even shorter control period, namely 1970 (Nov) to 1974.

If the immediate post-war environment in the south-eastern region did not experience a full recovery, then these impacts of war exposure would be underestimated, and our findings would represent a lower-bound effect.

To validate the placebo result, we conducted further robustness checks using equal intervals of years for the treatment and control groups (1971–1974 and 1975–1978). We find statistically insignificant effects of exposure to civil war on domestic violence in these additional checks.

Akresh et al ( 2012a ) run slightly different placebo tests on ethnic group and cohort but similarly find no significant effects. They also use estimated ethnic mortality during the war instead of ethnicity itself in their regressions to test for the validity of the identification strategy and find remarkably similar results. This leads them to conclude that the strategy to use ethnicity to identify exposure “while simple, is accurate and powerful” (Akresh et al. 2012a : 275).

Because the DHS only interviews women aged 15 to 49, the oldest women included in the 2013 survey would have been born in 1964, and therefore, we can only capture war exposure from in utero through to age 7. To estimate the exposure by birth cohort, we therefore only show the results using the pooled 2008 and 2013 datasets. We did not attempt to include the 2018 DHS in the robustness checks, as the sample of war-exposed women would have shrunk even further to those women who were exposed in utero through to 2 years of age.

The DHS couple’s recode data contain information on the husbands/partners (aged 15–59) for the sample of women who were married/cohabiting and living with their partners during the interview.

The high level of intra-ethnic marriage is consistent with low levels of migration across states, with most migration in Nigeria occurring within states from rural to urban areas (Federal Office of Statistics 1999 ; 2000).

Unfortunately, we are unable to test more formally for a relationship between the intensity of conflict and domestic violence. To do so would require data on the variation in the number of deaths caused by the civil war across districts and time, and to the best of our knowledge, no such data exist (there are only estimates of the total number of deaths caused by the war).

Aall, C. 1970. Relief, Nutrition and Health in the Nigerian/Biafran War. Journal of Tropical Pediatrics 16 (2): 70–90.

Article   Google Scholar  

Ackerson, L.K., I. Kawachi, E.M. Barbeau, and S.V. Subramanian. 2008. Effects of Individual and Proximate Educational Context on Intimate Partner Violence: A Population-Based Study of Women in India. American Journal of Public Health 98 (3): 507–514.

Aizer, A. 2010. The Gender Wage Gap and Domestic Violence. American Economic Review 100 (4): 1847–1859.

Aizer, A. 2011. Poverty, Violence, and Health: The Impact of Domestic Violence During Pregnancy on Newborn Health. Journal of Human Resources 46 (3): 518–538. https://doi.org/10.1353/jhr.2011.0024 .

Ajefu, J.B., and D. Casale. 2021. The Long-Term Effects of Violent Conflict on Women’s Intra-household Decision-Making Power. Journal of Development Studies 57 (10): 1690–1709. https://doi.org/10.1080/00220388.2021.1873285 .

Akbulut-Yuksel, M. 2017. War During childhood: The Long Run Effects of Warfare on Health. Journal of Health Economics 53: 117–130.

Akresh, R., S. Bhalotra, M. Leone, and U.O. Osili. 2012a. War and Stature: Growing Up During the Nigerian Civil War. American Economic Review 102 (3): 273–277.

Akresh, R., L. Lucchetti, and H. Thirumurthy. 2012b. Wars and Child Health: Evidence from the Eritrean-Ethiopian Conflict. Journal of Development Economics 99: 330–340.

Akresh, R. and de Walque, D. 2008. Armed Conflict and Schooling: Evidence from the 1994 Rwandan Genocide, The World Bank, Washington, D.C., Policy Research Working Paper No. 4606.

Akresh, R., S. Bhalotra, M. Leone, and U. Osili. 2023. First-and Second-Generation Impacts of the Biafran War. Journal of Human Resources 58 (2): 488–531.

Alapiki, H.E. 2005. State Creation in Nigeria: Failed Approached to National Integration and Local Autonomy. African Studies Review 48 (3): 49–65.

Almond, D. 2006. Is the 1918 Influenza Pandemic Over? Long-Term Effects of In Utero Influenza Exposure in the Post-1940 U.S. Population. Journal of Political Economy 14 (4): 672–712.

Almond, D., and J. Currie. 2011. Killing me Softly: The Fetal Origins Hypothesis. The Journal of Economic Perspectives 25 (3): 153–172.

Almond, D., J. Currie, and V. Duque. 2018. Childhood Circumstances and Adult Outcomes: Act II. Journal of Economic Literature 56 (4): 1360–1446.

Barker, D.J. 1990. The Fetal and Infant Origins of Adult Disease. BMJ 301 (6761): 1111.

Barker, D.J. 1995. Fetal Origins of Coronary Heart Disease. BMJ 311 (6998): 171–174.

Bertrand, M., E. Duflo, and S. Mullainathan. 2004. How Much Should We Trust Differences-in-Differences Estimates? The Quarterly Journal of Economics 119 (1): 249–275.

Barnett, O., C.L. Miller-Perrin, and R.D. Perrin. 2005. Family Violence Across the Lifespan: An introduction . Thousand Oaks, CA: Sage.

Google Scholar  

Bhattacharyya, M., A.S. Bedi, and A. Chhachhi. 2011. Marital Violence and Women’s Employment and Property Status: Evidence from North Indian Villages. World Development 39 (9): 1676–1689.

Camacho, A. 2009. Stress and Birth Weight: Evidence from Terrorist Attacks. American Economic Review: Papers & Proceedings. 98 (2): 511–515.

Case, A., A. Fertig, and C. Paxson. 2005. The Lasting Impact of Childhood Health and Circumstance. Journal of Health Economics 24: 365–389.

Cesur, R., and J.J. Sabia. 2016. When War Comes Home: The Effect of Combat Service on Domestic Violence. Review of Economics and Statistics 98 (2): 209–225.

Cetorelli, V., and Khawaja. 2017. Intensity of Conflict and Fertility in the Occupied Palestian Territory: A Longitudinal Study. The Lancet 390 (2): 350.

Chamarbagwala, R., and H. Moran. 2011. The Human Capital Consequences of Civil War: Evidence from Guatemala. Journal of Development Economics 94: 41–61.

Coker, A.L., K.E. Davis, I. Arias, S. Desai, M. Sanderson, H.M. Brandt, and P.H. Smith. 2002. Physical and Mental Health Effects of Intimate Partner Violence for Men and Women. American Journal of Preventive Medicine 23 (4): 260–268.

Comfort, A.B. 2016. Long-Term Effect of In Utero Conditions on Maternal Survival Later in Life: Evidence from Sub-Saharan Africa. Journal of Population Economics. 29 (2): 493–527.

Cunha, F., and J. Heckman. 2007. The Technology of Skill Formation. American Economic Review 97 (2): 31–47.

Currie, J. 2020. Child Health as Human Capital. Health Economics. 29: 452–463. https://doi.org/10.1002/hec.3995 .

Currie, J., M. Mueller-Smith, and M. Rossin-Slater. 2022. Violence While in Utero: The Impact of Assaults During Pregnancy on Birth Outcomes. The Review of Economics and Statistics 104 (3): 525–540. https://doi.org/10.1162/rest_a_00965 .

Dabalen, A.L., and S. Paul. 2014. Estimating the Effects of Conflict on Education in Cote d’Ivoire. The Journal of Development Studies 50 (12): 1631–1646.

Dunn, P.M. 2007. Perinatal Lessons from the Past: Sir Norman Gregg, ChM, MC, of Sydney (1892–1966) and Rubella Embryopathy. Archives of Disease in Childhood 92 (6): F513–F514. https://doi.org/10.1136/adc.2005.091405 .

Ellsberg, M., H.A.F.M. Jansen, et al. 2008. Intimate partner Violence and Women’s Physical and Mental Health in the WHO multi-country study on Women’s Health and Domestic Violence: An Observational Study. The Lancet 371 (9619): 1165–1172.

Eswaran, M., and N. Malhotra. 2011. Domestic Violence and Women’s Autonomy in Developing Countries: Theory and Evidence. Canadian Journal of Economics. 44 (4): 1222–1263.

Farmer, A., and J. Tiefenthaler. 1997. An Economic Analysis of Domestic Violence. Review of Social Economy 55 (3): 337–358.

Federal Office of Statistic. 1999. Annual Abstract of Statistics Various Years.

Federal Office of Statistics. 2000 Social Statistics in Nigeria Various Years.

Fowler, P.J., C.J. Tompsett, Jordan M. Braciszewski, Angela J. Jaques-Tiura, and B.B. Baltes. 2009. Community Violence: A Meta-Analysis on the Effect of Exposure and Mental Health Outcomes of Children and Adolescents. Development and Psychopathology 21: 227–259.

Gage, A. 2005. Women’s Experience of Intimate Partner Violence in Haiti. Social Science and Medicine 61: 343–364.

Galdo, J. 2013. The Long-Run Labor-Market Consequences of Civil War: Evidence from the Shining Path in Peru. Economic Development and Cultural Change 61 (4): 789–823.

Gelles, R.J. 1976. Abused Wives: Why Do They Stay? Journal of Marriage and the Family 38 (4): 659–667.

Gleditsch, N.P., P. Wallensteen, M. Eriksson, M. Sollenberg, and H. Strand. 2002. Armed Conflict 1946–2001: A New Dataset. Journal of Peace Research 39 (5): 615–637.

Grimard, F., and S. Laszlo. 2014. Long-Term Effects of Civil Conflict on Women’s Health Outcomes in Peru. World Development 54: 139–155.

Gutierrez, I.A. and Gallegos, J.V. 2016. The Effect of Civil War on Domestic Violence: The Case of Peru, Working Paper, RAND Labour and Population, WR-1168.

Heath, R. 2014. Women’s Access to Labour Market Opportunities, Control of Household Resources, and Domestic Violence: Evidence from Bangladesh. World Development 57: 32–46.

Hoek, H.W., A.S. Brown, and E. Susser. 1998. The Dutch Famine and Schizophrenia Spectrum Disorders. Social Psychiatry and Psychiatric Epidemiology 33 (8): 373–379.

Huber, Laura. 2023. One Step Forward, One Step Back: The Micro-Level Impacts of Conflict on Women’s Security. International Studies Quarterly 67 (2): 019.

Institute for Economics and Peace. 2018. The Economic Value of Peace 2018: Measuring the Global Economic Impact of Violence and Conflict, Sydney.

Islam, A., C. Ouch, R. Smyth, and L.C. Wang. 2016. The Long-term effects of Civil Conflicts on Education, Earnings, and Fertility: Evidence from Cambodia. Journal of Comparative Economics 44: 800–820.

Jin, X., T. Yang, and M.W. Feldman. 2017. Intergenerational Transmission of Marital Violence among Rural Migrants in China: Evidence from a Survey in Shenzhen. Journal of Contemporary China 26 (108): 931–947.

Kelly, J.T.D., E. Colantuoni, C. Robinson, and M.R. Decker. 2018. From the Battlefield to the Bedroom: A Multilevel Analysis of the Links Between Political Conflict and Intimate Partner Violence in Liberia. BMJ Global Health 3: e000668. https://doi.org/10.1136/bmjgh-2017-000668 .

Kijewski, S., and M. Freitag. 2018. Civil War and the Formation of Social Trust in Kosovo: Post-traumatic Growth or War-related Distress? Journal of Conflict Resolution 62 (4): 717–742.

Kirk-Greene, A.H.M. 1971. Crisis and Conflict in Nigeria . Oxford: Oxford University Press.

Kishor, S., and J. Kiersten. 2004. Profiling Domestic Violence: A Multi-country Study . Calverton, Maryland: ORC Macro.

Kishor, S. 2005. Domestic Violence Measurement in the Demographic and Health Surveys: The History and the Challenges, Paper Presented at Expert Group Meeting.

Koenen, K.C., T.E. Moffitt, et al. 2003. Domestic Violence is Associated with Environmental Suppression of IQ in Young Children. Development and Psychopathology 15 (02): 297–311.

La Mattina, G. 2017. Civil Conflict, Domestic Violence and Intra-household Bargaining in Post-Genocide Rwanda. Journal of Development Economics 124: 168–198.

La Mattina, G. and Shemyakina, O.N. 2017. Domestic Violence and Childhood Exposure to Armed Conflict: Attitudes and Experiences, Unpublished Manuscript.

Leon, G. 2012. Civil Conflict and Human Capital Accumulation: The Long-Term Effects of Political Violence in Peru. Journal of Human Resources 47 (4): 992–1022.

Lundberg, S., and R.A. Pollak. 1996. Bargaining and Distribution in Marriage. The Journal of Economic Perspectives 10 (4): 988–158.

Mihalic, S.W., and D. Elliott. 1997. A Social Learning Theory Model of Marital Violence. Journal of Family Violence 12 (1): 21–47.

Meekers, D., S. Pallin, and P. Hutchinson. 2013. Intimate Partner Violence and Mental Health in Bolivia. BMC Women’s Health 13: 1.

Nafziger, E.W. 1972. The Economic Impact of the Nigerian Civil War. The Journal of Modern African Studies 10 (2): 223–245.

Østby, Gudrun, Michele Leiby, and Ragnhild Nordås. 2019. The Legacy of Wartime Violence on Intimate-Partner Abuse: Microlevel Evidence from Peru, 1980–2009. International Studies Quarterly 63 (1): 1–46.

Ravelli, G.P., Z.A. Stein, and M.W. Susser. 1976. Obesity in Young Men After Famine Exposure In Utero and Early Infancy. New England Journal of Medicine 295 (7): 349–353.

Rawlings, S. and Siddique, Z. 2014. Domestic Abuse and Child Health, IZA Discussion Paper No. 8566.

Rawlings, S. and Siddique, Z. 2018. Domestic Violence and Child Mortality, IZA Discussion Paper No. 11899.

Schwab-Stone, M.E., Tim S. Ayers, K. Wesley, V. Charlene, B. Charles, S. Timothy, and R.P. Weissberg. 1995. No Safe Haven: A Study of Violence Exposure in an Urban Community. Journal of the American Academy of Child and Adolescent Psychiatry 34: 1343–1352.

Shemyakina, O. 2011. The Effect of Armed Conflict on Accumulation of Schooling: Results from Tajikistan. Journal of Development Economics 95 (2): 186–200.

Silverman, J., M. Decker, E. Reed, and A. Raj. 2006. Intimate Partner Violence Victimization Prior To and During Pregnancy Among Women Residing in the 26 United States: Associations with Maternal and Neonatal Health. American Journal of Obstetrics and Gynecology 195 (1): 140–148.

Simpson, B. 2014. The Biafran Secession and the Limits of Self-Determination. Journal of Genocide Research 16 (2–3): 337–354.

Singh, P., and O.N. Shemyakina. 2016. Gender-Differential Effects of Terrorism on Education: The Case of the 1981–1993 Punjab Insurgency. Economics of Education 54: 185–210.

Sternberg, K.J., M.E. Lamb, C. Greenbaum, D. Cicchetti, D. Samia, R.M. Cortes, O. Krispin, and F. Lorey. 1993. Effects of Domestic Violence on Children’s Behaviour Problems and Depression. Developmental Psychology 29 (1): 44–52.

Svallfors, S. 2023. Hidden Casualties: The Links Between Armed Conflict and Intimate Partner Violence in Colombia. Politics & Gender 1–33: 1–33. https://doi.org/10.1017/S1743923X2100043X .

Swee, E.L. 2015. On War Intensity and Schooling Attainment: The Case of Bosnia and Herzegovina. European Journal of Political Economy 40: 158–172.

Torrisi, O. 2023. Young-Age Exposure to Armed Conflict and Women’s Experiences of Intimate Partner Violence. Journal of Marriage and Family 85 (1): 7–32.

Udo, R.K. 1970. Reconstruction in the War-Affected Areas of Nigeria. The Royal Geographical Society 2 (3): 9–12.

Verwimp, P., and J. van Bavel. 2005. Child Survival and Fertility of Refugees in Rwanda. European Journal of Population 21 (2): 271–290.

Verwimp, P., P. Justino, and T. Brück. 2019. The Microeconomics of Violent Conflict. Journal of Development Economics 141: 102297.

Volpe, E.M., T.L. Hardie, C. Cerulli, M.S. Sommers, and D. Morrison-Beedy. 2013. What’s Age Got to Do with It? Partner Age Difference, Power, Intimate Partner Violence, and Sexual Risk in Urban Adolescents. Journal of Interpersonal Violence 28 (10): 2068–2087.

Walby, S. 2004. The Cost of Domestic Violence, Women and Equality Unit (DTI).

Weldeegzie, S.G. 2017. Growing-up Unfortunate: War and Human Capital in Ethiopia. World Development 96: 474–489.

World Health Organization. 2013. Global and Regional Estimates of Violence against Women, Prevalence and Health Effects of Intimate Partner Violence and Non-Partner Sexual Violence, World Health Organisation.

World Health Organization. 2014. Health Care for Women Subjected to Intimate Partner Violence or Sexual Violence . A Clinical Handbook Report: World Health Organization.

World Health Organization. 2017. Global and Regional Estimates of Violence against Women: Prevalence and Health Effects of Intimate Partner Violence and Non-Partner Sexual Violence, World Health Organization.

World Development Report. 2011. Conflict, Security, and Development . Washington DC: The World Bank.

World Humanitarian Data and Trends Report. 2017. A Report from the UN Office for the Coordination of Humanitarian Affairs.

Yount, K., and L. Li. 2009. Women’s “Justification” of Domestic Violence in Egypt. Journal of Marriage and Family 71 (5): 1125–1140.

Download references

Author information

Authors and affiliations.

Department of Peace Studies and International Development, Faculty of Management, Law, and Social Sciences, University of Bradford, Bradford, UK

Joseph B. Ajefu

Centre for Social Development in Africa (CSDA), University of Johannesburg, Johannesburg, South Africa

School of Economics and Finance, University of the Witwatersrand, Johannesburg, South Africa

Daniela Casale

You can also search for this author in PubMed   Google Scholar

Corresponding author

Correspondence to Joseph B. Ajefu .

Additional information

Publisher's note.

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ .

Reprints and permissions

About this article

Ajefu, J.B., Casale, D. Long-Term Effects of Childhood Exposure to War on Domestic Violence. Eur J Dev Res (2024). https://doi.org/10.1057/s41287-024-00659-4

Download citation

Accepted : 14 July 2024

Published : 22 August 2024

DOI : https://doi.org/10.1057/s41287-024-00659-4

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Domestic violence
  • Bargaining power
  • Find a journal
  • Publish with us
  • Track your research

IMAGES

  1. Statistical Treatment of Data

    how to find statistical treatment in a research paper

  2. Statistical Report Writing Sample No.4. Introduction

    how to find statistical treatment in a research paper

  3. Statistical treatment in thesis writing

    how to find statistical treatment in a research paper

  4. Statistical treatment of data in thesis sample

    how to find statistical treatment in a research paper

  5. PPT

    how to find statistical treatment in a research paper

  6. How To Make Statistical Treatment In Quantitative Research

    how to find statistical treatment in a research paper

VIDEO

  1. Selecting the Appropriate Hypothesis Test [FIL]

  2. WHAT STATISTICAL TREATMENT WILL YOU BE USING IN THE STUDY?

  3. Dancing Is The Most Effective Treatment For Depression #funfact

  4. Issues and Solutions in Psychiatric Clinical Trial

  5. What is statistical methodology research and why is PPIE input important?

  6. HOW TO READ and ANALYZE A RESEARCH STUDY

COMMENTS

  1. Statistical Treatment of Data

    Statistical Treatment Example - Quantitative Research. For a statistical treatment of data example, consider a medical study that is investigating the effect of a drug on the human population. As the drug can affect different people in different ways based on parameters such as gender, age and race, the researchers would want to group the ...

  2. Research Paper Statistical Treatment of Data: A Primer

    Research Paper Statistical Treatment of Data: A Primer. March 11, 2024. We can all agree that analyzing and presenting data effectively in a research paper is critical, yet often challenging. This primer on statistical treatment of data will equip you with the key concepts and procedures to accurately analyze and clearly convey research findings.

  3. The Beginner's Guide to Statistical Analysis

    This article is a practical introduction to statistical analysis for students and researchers. We'll walk you through the steps using two research examples. The first investigates a potential cause-and-effect relationship, while the second investigates a potential correlation between variables. Example: Causal research question.

  4. Introduction to Research Statistical Analysis: An Overview of the

    Introduction. Statistical analysis is necessary for any research project seeking to make quantitative conclusions. The following is a primer for research-based statistical analysis. It is intended to be a high-level overview of appropriate statistical testing, while not diving too deep into any specific methodology.

  5. Selection of Appropriate Statistical Methods for Data Analysis

    Type and distribution of the data used. For the same objective, selection of the statistical test is varying as per data types. For the nominal, ordinal, discrete data, we use nonparametric methods while for continuous data, parametric methods as well as nonparametric methods are used.[] For example, in the regression analysis, when our outcome variable is categorical, logistic regression ...

  6. Basic statistical tools in research and data analysis

    Abstract. Statistical methods involved in carrying out a study include planning, designing, collecting data, analysing, drawing meaningful interpretation and reporting of the research findings. The statistical analysis gives meaning to the meaningless numbers, thereby breathing life into a lifeless data. The results and inferences are precise ...

  7. Choosing the Right Statistical Test

    ANOVA and MANOVA tests are used when comparing the means of more than two groups (e.g., the average heights of children, teenagers, and adults). Predictor variable. Outcome variable. Research question example. Paired t-test. Categorical. 1 predictor. Quantitative. groups come from the same population.

  8. Statistical Treatment

    Treatments are divided into two groups: descriptive statistics, which summarize your data as a graph or summary statistic and inferential statistics, which make predictions and test hypotheses about your data. Treatments could include: Finding T-Scores or Z-Scores. Calculating Correlation coefficients. 2.

  9. Best Practices for Presenting Statistical Information in a Research

    A key characteristic of scientific research is that the entire experiment (or series of experiments), including the data analyses, is reproducible. This aspect of science is increasingly emphasized. The Materials and Methods section of a scientific paper typically contains the necessary information for the research to be replicated and expanded on by other scientists. Important components are ...

  10. New Guidelines for Statistical Reporting in the Journal

    The new guidelines discuss many aspects of the reporting of studies in the Journal, including a requirement to replace P values with estimates of effects or association and 95% confidence ...

  11. The Treatment of Data

    Some of these methods and tools are used within specific fields of research, such as statistical tests of significance, double-blind trials, and proper phrasing of questions on surveys. Others apply across all research fields, such as describing to others what one has done so that research data and results can be verified and extended.

  12. PDF Chapter 10. Experimental Design: Statistical Analysis of Data Purpose

    Now, if we divide the frequency with which a given mean was obtained by the total number of sample means (36), we obtain the probability of selecting that mean (last column in Table 10.5). Thus, eight different samples of n = 2 would yield a mean equal to 3.0. The probability of selecting that mean is 8/36 = 0.222.

  13. An Introduction to Statistics: Choosing the Correct Statistical Test

    A bstract. The choice of statistical test used for analysis of data from a research study is crucial in interpreting the results of the study. This article gives an overview of the various factors that determine the selection of a statistical test and lists some statistical testsused in common practice. How to cite this article: Ranganathan P.

  14. Role of Statistics in Research

    Furthermore, statistics in research helps interpret the data clustered near the mean of distributed data or spread across the distribution. These trends help analyze the sample and signify the hypothesis. 3. Data Interpretation Through Analysis. When dealing with large data, statistics in research assist in data analysis. This helps researchers ...

  15. Statistical Treatment Of Data

    Statistical treatment of data also involves describing the data. The best way to do this is through the measures of central tendencies like mean, median and mode. These help the researcher explain in short how the data are concentrated. Range, uncertainty and standard deviation help to understand the distribution of the data.

  16. Descriptive Statistics

    Types of descriptive statistics. There are 3 main types of descriptive statistics: The distribution concerns the frequency of each value. The central tendency concerns the averages of the values. The variability or dispersion concerns how spread out the values are. You can apply these to assess only one variable at a time, in univariate ...

  17. Statistical methods in research

    Statistical methods appropriate in research are described with examples. Topics covered include the choice of appropriate averages and measures of dispersion to summarize data sets, and the choice of tests of significance, including t-tests and a one- and a two-way ANOVA plus post-tests for normally distributed (Gaussian) data and their non-parametric equivalents.

  18. Choose the right statistical technique

    Mean - the arithmetic average, calculated by summing all the values and dividing by the number of values in the sum. Median - the mid point of the distribution, where half the values are higher and half lower. Mode - the most frequently occurring value. Range - the difference between the highest and lowest value.

  19. Basics of statistics for primary care research

    Correlation analysis has three general outcomes: (1) the two variables rise and fall together; (2) as values in one variable rise, the other falls; and (3) the two variables do not appear to be systematically related. To make those determinations, we use the correlation coefficient (r) and related p value or CI.

  20. (PDF) Chapter 3 Research Design and Methodology

    Research Design and Methodology. Chapter 3 consists of three parts: (1) Purpose of the. study and research design, (2) Methods, and (3) Statistical. Data analysis procedure. Part one, Purpose of ...

  21. PDF How to Write the Methods Section of a Research Paper

    The methods section should describe what was done to answer the research question, describe how it was done, justify the experimental design, and explain how the results were analyzed. Scientific writing is direct and orderly. Therefore, the methods section structure should: describe the materials used in the study, explain how the materials ...

  22. Standard statistical tools in research and data analysis

    Inferential statistics. In inferential statistics, data from a sample is analysed to conclude the entire population. The goal is to prove or disprove the theories. A hypothesis is a suggested explanation for a phenomenon (plural hypotheses). Hypothesis testing is essential to process for making logical choices regarding observed effects ...

  23. Exploratory Data Analysis: Frequencies, Descriptive Statistics

    Researchers must utilize exploratory data techniques to present findings to a target audience and create appropriate graphs and figures. Researchers can determine if outliers exist, data are missing, and statistical assumptions will be upheld by understanding data. Additionally, it is essential to comprehend these data when describing them in conclusions of a paper, in a meeting with ...

  24. Long-Term Effects of Childhood Exposure to War on Domestic Violence

    This paper highlights the scarring effects of early life exposure to civil war, by examining the impact of exposure to conflict in childhood on the incidence of domestic violence in adulthood among married women. To estimate these effects, we use a difference-in-differences model which exploits variation in exposure to Nigeria's 30-month-long civil war by year of birth and ethnicity. Our ...