• Skip to secondary menu
  • Skip to main content
  • Skip to primary sidebar

Statistics By Jim

Making statistics intuitive

Sample Size Essentials: The Foundation of Reliable Statistics

By Jim Frost 1 Comment

What is Sample Size?

Sample size is the number of observations or data points collected in a study. It is a crucial element in any statistical analysis because it is the foundation for drawing inferences and conclusions about a larger population .

Image the illustrates the concept of sample size.

Imagine you’re tasting a new brand of cookies. Sampling just one cookie might not give you a true sense of the overall flavor—what if you picked the only burnt one? Similarly, in statistics, the sample size determines how well your study represents the larger group. A larger sample size can mean the difference between a snapshot and a panorama, providing a clearer, more accurate picture of the reality you’re studying.

In this blog post, learn why adequate sample sizes are not just a statistical nicety but a fundamental component of trustworthy research. However, large sample sizes can’t fix all problems. By understanding the impact of sample size on your results, you can make informed decisions about your research design and have more confidence in your findings .

Benefits of a Large Sample Size

A large sample size can significantly enhance the reliability and validity of study results. We’re primarily looking at how well representative samples reflect the populations from which the researchers drew them. Here are several key benefits.

Increased Precision

Larger samples tend to yield more precise estimates of the population  parameters . Larger samples reduce the effect of random fluctuations in the data, narrowing the margin of error around the estimated values.

Estimate precision refers to how closely the results obtained from a sample align with the actual population values. A larger sample size tends to yield more precise estimates because it reduces the effect of random variability within the sample. The more data points you have, the smaller the margin of error and the closer you are to capturing the correct value of the population parameter.

For example, estimating the average height of adults using a larger sample tends to give an estimate closer to the actual average than using a smaller sample.

Learn more about Statistics vs. Parameters , Margin of Error , and Confidence Intervals .

Greater Statistical Power

The power of a statistical test is its capability to detect an effect if there is one, such as a difference between groups or a correlation between variables. Larger samples increase the likelihood of detecting actual effects.

Statistical power is the probability that a study will detect an effect when one exists. The sample size directly influences it;  a larger sample size increases statistical power . Studies with more data are more likely to detect existing differences or relationships.

For instance, in testing whether a new drug is more effective than an existing one, a larger sample can more reliably detect small but real improvements in efficacy .

Better Generalizability

With a larger sample, there is a higher chance that the sample adequately represents the diversity of the population, improving the generalizability of the findings to the population.

Consider a national survey gauging public opinion on a policy. A larger sample captures a broader range of demographic groups and opinions.

Learn more about Representative Samples .

Reduced Impact of Outliers

In a large sample, outliers have less impact on the overall results because many observations dilute their influence. The numerous data points stabilize the averages and other statistical estimates, making them more representative of the general population.

If measuring income levels within a region, a few very high incomes will distort the average less in a larger sample than in a smaller one .

Learn more about 5 Ways to Identify Outliers .

The Limits of Larger Sample Sizes: A Cautionary Note

While larger sample sizes offer numerous advantages, such as increased precision and statistical power, it’s important to understand their limitations. They are not a panacea for all research challenges. Crucially, larger sample sizes do not automatically correct for biases in sampling methods , other forms of bias, or fundamental errors in study design. Ignoring these issues can lead to misleading conclusions, regardless of how many data points are collected.

Sampling Bias

Even a large sample is misleading if it’s not representative of the population. For instance, if a study on employee satisfaction only includes responses from headquarters staff but not remote workers, increasing the number of respondents won’t address the inherent bias in missing a significant segment of the workforce.

Learn more about Sampling Bias: Definition & Examples .

Other Forms of Bias

Biases related to data collection methods, survey question phrasing, or data analyst subjectivity can still skew results. If the underlying issues are not addressed, a larger sample size might magnify these biases instead of mitigating them.

Errors in Study Design

Simply adding more data points will not overcome a flawed experimental design . For example, increasing the sample size will not clarify the causal relationships if the design doesn’t control a confounding variable .

Large Sample Sizes are Expensive!

Additionally, it is possible to have too large a sample size. Larger sizes come with their own challenges, such as higher costs and logistical complexities. You get to a point of diminishing returns where you have a very large sample that will detect such small effects that they’re meaningless in a practical sense.

The takeaway here is that researchers must exercise caution and not rely solely on a large  sample size to safeguard the reliability and validity of their results. An adequate amount of data must be paired with an appropriate sampling method, a robust study design, and meticulous execution to truly understand and accurately represent the phenomena being studied .

Sample Size Calculation

Statisticians have devised quantitative ways to find a good sample size. You want a large enough sample to have a reasonable chance of detecting a meaningful effect when it exists but not too large to be overly expensive.

In general, these methods focus on using the population’s variability . More variable populations require larger samples to assess them. Let’s go back to the cookie example to see why.

If all cookies in a population are identical (zero variability), you only need to sample one cookie to know what the average cookie is like for the entire population. However, suppose there’s a little variability because some cookies are cooked perfectly while others are overcooked. You’ll need a larger sample size to understand the ratio of the perfect to overcooked cookies.

Now, instead of just those two types, you have an entire range of how much they are over and undercooked. And some use sweeter chocolate chips than others. You’ll need an even larger sample to understand the increased variability and know what an average cookie is really like.

Cookie Monster likes a large sample of cookies!

Power and sample size analysis quantifies the population’s variability. Hence, you’ll often need a variability estimate to perform this type of analysis. These calculations also frequently factor in the smallest practically meaningful effect size you want to detect, so you’ll use a manageable sample size.

To learn more about determining how to find a sample size, read my following articles :

  • How to Calculate Sample Size
  • What is Power in Statistics?

Sample Size Summary

Understanding the implications of sample size is fundamental to conducting robust statistical analysis. While larger samples provide more reliable and precise estimates, smaller samples can compromise the validity of statistical inferences.

Always remember that the breadth of your sample profoundly influences the strength of your conclusions. So, whether conducting a simple survey or a complex experimental study, consider your sample size carefully. Your research’s integrity depends on it.

Consequently, the effort to achieve an adequate sample size is a worthwhile investment in the precision and credibility of your research .

Share this:

advantages of a small sample size in research

Reader Interactions

' src=

July 17, 2024 at 11:11 am

When I used survey data, we had a clear, conscious sampling method and the distinction made sense. However, with other types of data such as performance or sales data, I’m confused about the distinction. We have all the data of everyone who did the work, so by that understanding, we aren’t doing any sampling. However, is there a ‘hidden’ population of everyone who could potentially do that work? If we take a point in time, such as just first quarter performance, is that a sample or something else? I regularly see people just go ahead and apply the same statistics to both, suggesting that this is a ‘sample’, but I’m not sure what it’s a sample of or how!

Comments and Questions Cancel reply

advantages of a small sample size in research

  • For Authors
  • Editorial Board
  • Journals Home

To View More...

Purchase this article with an account.

advantages of a small sample size in research

  • Andrew John Anderson From the Department of Optometry and Vision Sciences, The University of Melbourne, Carlton, Victoria, Australia.
  • Algis Jonas Vingrys From the Department of Optometry and Vision Sciences, The University of Melbourne, Carlton, Victoria, Australia.
  • Full Article

This feature is available to authenticated users only.

Andrew John Anderson , Algis Jonas Vingrys; Small Samples: Does Size Matter?. Invest. Ophthalmol. Vis. Sci. 2001;42(7):1411-1413.

Download citation file:

  • Ris (Zotero)
  • Reference Manager

© ARVO (1962-2015); The Authors (2016-present)

  • Get Permissions
  • Supplements
  •   Using a particular experimental paradigm, or set of paradigms, the effect is either present or absent; that is, equivocal results are not found.
  •   In the group of subjects tested, all subjects show the effect (which we will term “serial successes”). The number of serial successes is therefore equal to the sample size, N .
  •   The group of subjects is randomly chosen from a selectively normal population.
Successes ( ) % Minimum Population Proportions (θ )
θ ( = 0.10) θ ( = 0.05) θ ( = 0.01)
1 10 5 1
2 32 22 10
3 46 37 22
4 56 47 32
5 63 55 40
6 68 61 46
7 72 65 52
8 75 69 56
9 77 72 60
10 79 74 63
22 90
29 90
44 90
45 95
59 95
90 95

Related Articles

From other journals, related topics.

  • Visual Psychophysics and Physiological Optics

This PDF is available to Subscribers Only

You must be signed into an individual account to use this feature.

MeasuringU Logo

Best Practices for Using Statistics on Small Sample Sizes

advantages of a small sample size in research

Put simply, this is wrong, but it’s a common misconception .

There are appropriate statistical methods to deal with small sample sizes.

Although one researcher’s “small” is another’s large, when I refer to small sample sizes I mean studies that have typically between 5 and 30 users total—a size very common in usability studies .

But user research isn’t the only field that deals with small sample sizes. Studies involving fMRIs, which cost a lot to operate, have limited sample sizes as well [pdf] as do studies using laboratory animals.

While there are equations that allow us to properly handle small “n” studies, it’s important to know that there are limitations to these smaller sample studies: you are limited to seeing big differences or big “effects.”

To put it another way, statistical analysis with small samples is like making astronomical observations with binoculars . You are limited to seeing big things: planets, stars, moons and the occasional comet.  But just because you don’t have access to a high-powered telescope doesn’t mean you cannot conduct astronomy. Galileo, in fact, discovered Jupiter’s moons with a telescope with the same power as many of today’s binoculars .

Just as with statistics, just because you don’t have a large sample size doesn’t mean you cannot use statistics. Again, the key limitation is that you are limited to detecting large differences between designs or measures.

Fortunately, in user-experience research we are often most concerned about these big differences—differences users are likely to notice, such as changes in the navigation structure or the improvement of a search results page.

Here are the procedures which we’ve tested for common, small-sample user research, and we will cover them all at the UX Boot Camp in Denver next month.

If you need to compare completion rates, task times, and rating scale data for two independent groups, there are two procedures you can use for small and large sample sizes.  The right one depends on the type of data you have: continuous or discrete-binary.

Comparing Means : If your data is generally continuous (not binary), such as task time or rating scales, use the two sample t-test . It’s been shown to be accurate for small sample sizes.

Comparing Two Proportions : If your data is binary (pass/fail, yes/no), then use the N-1 Two Proportion Test. This is a variation on the better known Chi-Square test (it is algebraically equivalent to the N-1 Chi-Square test). When expected cell counts fall below one, the Fisher Exact Test tends to perform better. The online calculator handles this for you and we discuss the procedure in Chapter 5 of Quantifying the User Experience .

Confidence Intervals

When you want to know what the plausible range is for the user population from a sample of data, you’ll want to generate a confidence interval . While the confidence interval width will be rather wide (usually 20 to 30 percentage points), the upper or lower boundary of the intervals can be very helpful in establishing how often something will occur in the total user population.

For example, if you wanted to know if users would read a sheet that said “Read this first” when installing a printer, and six out of eight users didn’t read the sheet in an installation study, you’d know that at least 40% of all users would likely do this –a substantial proportion.

There are three approaches to computing confidence intervals based on whether your data is binary, task-time or continuous.

Confidence interval around a mean : If your data is generally continuous (not binary) such as rating scales, order amounts in dollars, or the number of page views, the confidence interval is based on the t-distribution (which takes into account sample size).

Confidence interval around task-time :  Task time data is positively skewed . There is a lower boundary of 0 seconds. It’s not uncommon for some users to take 10 to 20 times longer than other users to complete the same task. To handle this skew, the time data needs to be log-transformed   and the confidence interval is computed on the log-data, then transformed back when reporting. The online calculator handles all this.

Confidence interval around a binary measure: For an accurate confidence interval around binary measures like completion rate or yes/no questions, the Adjusted Wald interval performs well for all sample sizes.

Point Estimates (The Best Averages)

The “best” estimate for reporting an average time or average completion rate for any study may vary depending on the study goals.  Keep in mind that even the “best” single estimate will still differ from the actual average, so using confidence intervals provides a better method for estimating the unknown population average.

For the best overall average for small sample sizes, we have two recommendations for task-time and completion rates, and a more general recommendation for all sample sizes for rating scales.

Completion Rate : For small-sample completion rates, there are only a few possible values for each task. For example, with five users attempting a task, the only possible outcomes are 0%, 20%, 40%, 60%, 80% and 100% success. It’s not uncommon to have 100% completion rates with five users. There’s something about reporting perfect success at this sample size that doesn’t resonate well. It sounds too good to be true.

We experimented [pdf] with several estimators with small sample sizes and found the LaPlace estimator and the simple proportion (referred to as the Maximum Likelihood Estimator) generally work well for the usability test data we examined. When you want the best estimate, the calculator will generate it based on our findings.

Rating Scales : Rating scales are a funny type of metric, in that most of them are bounded on both ends (e.g. 1 to 5, 1 to 7 or 1 to 10) unless you are Spinal Tap of course. For small and large sample sizes, we’ve found reporting the mean to be the best average over the median [pdf] . There are in fact many ways to report the scores from rating scales, including top-two boxes . The one you report depends on both the sensitivity as well as what’s used in an organization.

Average Time : One long task time can skew the arithmetic mean and make it a poor measure of the middle. In such situations, the median is a better indicator of the typical or “average” time. Unfortunately, the median tends to be less accurate and more biased than the mean when sample sizes are less than about 25. In these circumstances, the geometric mean (average of the log values transformed back) tends to be a better measure of the middle. When sample sizes get above 25, the median works fine.

You might also be interested in

feature image

U.S. flag

An official website of the United States government

The .gov means it’s official. Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

The site is secure. The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

  • Publications
  • Account settings
  • My Bibliography
  • Collections
  • Citation manager

Save citation to file

Email citation, add to collections.

  • Create a new collection
  • Add to an existing collection

Add to My Bibliography

Your saved search, create a file for external citation management software, your rss feed.

  • Search in PubMed
  • Search in NLM Catalog
  • Add to Search

Small Sample Research: Considerations Beyond Statistical Power

Affiliations.

  • 1 National Institute on Drug Abuse, National Institutes of Health, 6001 Executive Blvd., Bethesda, MD, 20852, USA. [email protected].
  • 2 National Institute on Alcohol Abuse and Alcoholism, National Institutes of Health, 5635 Fishers Lane, Bethesda, MD, 20852, USA.
  • PMID: 26281902
  • DOI: 10.1007/s11121-015-0585-4

Small sample research presents a challenge to current standards of design and analytic approaches and the underlying notions of what constitutes good prevention science. Yet, small sample research is critically important as the research questions posed in small samples often represent serious health concerns in vulnerable and underrepresented populations. This commentary considers the Special Section on small sample research and also highlights additional challenges that arise in small sample research not considered in the Special Section, including generalizability, determining what constitutes knowledge, and ensuring that research designs match community desires. It also points to opportunities afforded by small sample research, such as a focus on and increased understanding of context and the emphasis it may place on alternatives to the randomized clinical trial. The commentary urges the development and adoption of innovative strategies to conduct research with small samples.

PubMed Disclaimer

Similar articles

  • Why Small is Too Small a Term: Prevention Science for Health Disparities, Culturally Distinct Groups, and Community-Level Intervention. Henry D, Fok CC, Allen J. Henry D, et al. Prev Sci. 2015 Oct;16(7):1026-32. doi: 10.1007/s11121-015-0577-4. Prev Sci. 2015. PMID: 26228478 Free PMC article.
  • Maybe Small Is Too Small a Term: Introduction to Advancing Small Sample Prevention Science. Fok CC, Henry D, Allen J. Fok CC, et al. Prev Sci. 2015 Oct;16(7):943-9. doi: 10.1007/s11121-015-0584-5. Prev Sci. 2015. PMID: 26245527 Free PMC article.
  • Research Designs for Intervention Research with Small Samples II: Stepped Wedge and Interrupted Time-Series Designs. Fok CC, Henry D, Allen J. Fok CC, et al. Prev Sci. 2015 Oct;16(7):967-77. doi: 10.1007/s11121-015-0569-4. Prev Sci. 2015. PMID: 26017633 Free PMC article.
  • An overview of variance inflation factors for sample-size calculation. Hsieh FY, Lavori PW, Cohen HJ, Feussner JR. Hsieh FY, et al. Eval Health Prof. 2003 Sep;26(3):239-57. doi: 10.1177/0163278703255230. Eval Health Prof. 2003. PMID: 12971199 Review.
  • Literature review: considerations in undertaking focus group research with culturally and linguistically diverse groups. Halcomb EJ, Gholizadeh L, DiGiacomo M, Phillips J, Davidson PM. Halcomb EJ, et al. J Clin Nurs. 2007 Jun;16(6):1000-11. doi: 10.1111/j.1365-2702.2006.01760.x. J Clin Nurs. 2007. PMID: 17518876 Review.
  • The State of AfroLatinxs in Latinx Psychological Research: Findings From a Content Analysis From 2009 to 2020. Mazzula SL, Sanchez D. Mazzula SL, et al. J Lat Psychol. 2021 Feb;9(1):8-25. doi: 10.1037/lat0000187. J Lat Psychol. 2021. PMID: 38818513 Free PMC article.
  • A Mixed-Method Study on the Assessment of Factors Influencing Nurses' Provision of Spiritual Care. Fradelos EC, Alikari V, Artemi S, Missouridou E, Mangoulia P, Kyranou M, Saridi M, Toska A, Tsaras K, Tzavella F. Fradelos EC, et al. Healthcare (Basel). 2024 Apr 18;12(8):854. doi: 10.3390/healthcare12080854. Healthcare (Basel). 2024. PMID: 38667616 Free PMC article.
  • Innervation pattern of the unclosed detrusor muscle in classic bladder exstrophy: a study of patients with urothelial overexpression of nerve growth factor. Promm M, Otto W, Götz S, Burger M, Müller K, Rubenwolf P, Neuhuber WL, Rösch WH. Promm M, et al. Pediatr Surg Int. 2024 Mar 5;40(1):69. doi: 10.1007/s00383-024-05649-5. Pediatr Surg Int. 2024. PMID: 38441774 Free PMC article.
  • Predicting the Protective Behavioral Intentions for Parents with Young Children Living in Taipei City and New Taipei City Using the Theory of Planned Behavior for Air Polluted with PM2.5. Woo SK, Pai CJ, Chiang YT, Fang WT. Woo SK, et al. Int J Environ Res Public Health. 2023 Jan 31;20(3):2518. doi: 10.3390/ijerph20032518. Int J Environ Res Public Health. 2023. PMID: 36767882 Free PMC article.
  • Knowledge and awareness of colorectal cancer among a predominantly Indigenous Caribbean community. Warner ZC, Gilbert-Gard K, Reid B, Joseph W, Kepka D, Auguste P, Warner EL. Warner ZC, et al. BMC Public Health. 2023 Feb 4;23(1):243. doi: 10.1186/s12889-022-14810-5. BMC Public Health. 2023. PMID: 36737701 Free PMC article.
  • Am J Public Health. 2015 Jul;105 Suppl 3:S371-3 - PubMed
  • Prev Sci. 2015 Oct;16(7):1007-16 - PubMed
  • Prev Sci. 2015 Oct;16(7):943-9 - PubMed
  • Am J Public Health. 2008 Feb;98(2):216-21 - PubMed
  • Prev Sci. 2015 Oct;16(7):1026-32 - PubMed
  • Search in MeSH

LinkOut - more resources

Full text sources.

  • Citation Manager

NCBI Literature Resources

MeSH PMC Bookshelf Disclaimer

The PubMed wordmark and PubMed logo are registered trademarks of the U.S. Department of Health and Human Services (HHS). Unauthorized use of these marks is strictly prohibited.

U.S. flag

An official website of the United States government

The .gov means it’s official. Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

The site is secure. The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

  • Publications
  • Account settings

Preview improvements coming to the PMC website in October 2024. Learn More or Try it out now .

  • Advanced Search
  • Journal List
  • Clin Orthop Relat Res
  • v.466(9); 2008 Sep

Logo of corr

Statistics in Brief: The Importance of Sample Size in the Planning and Interpretation of Medical Research

David jean biau.

Département de Biostatistique et Informatique Médicale, INSERM – UMR-S 717, AP-HP, Université Paris 7, Hôpital Saint Louis, 1, avenue Claude-Vellefaux, Paris Cedex 10, 75475 France

Solen Kernéis

Raphaël porcher.

The increasing volume of research by the medical community often leads to increasing numbers of contradictory findings and conclusions. Although the differences observed may represent true differences, the results also may differ because of sampling variability as all studies are performed on a limited number of specimens or patients. When planning a study reporting differences among groups of patients or describing some variable in a single group, sample size should be considered because it allows the researcher to control for the risk of reporting a false-negative finding (Type II error) or to estimate the precision his or her experiment will yield. Equally important, readers of medical journals should understand sample size because such understanding is essential to interpret the relevance of a finding with regard to their own patients. At the time of planning, the investigator must establish (1) a justifiable level of statistical significance, (2) the chances of detecting a difference of given magnitude between the groups compared, ie, the power, (3) this targeted difference (ie, effect size), and (4) the variability of the data (for quantitative data). We believe correct planning of experiments is an ethical issue of concern to the entire community.

Introduction

“Statistical analysis allows us to put limits on our uncertainty, but not to prove anything.”— Douglas G. Altman [ 1 ]

The growing need for medical practice based on evidence has generated an increasing medical literature supported by statistics: readers expect and presume medical journals publish only studies with unquestionable results they can use in their everyday practice and editors expect and often request authors provide rigorously supportable answers. Researchers submit articles based on presumably valid outcome measures, analyses, and conclusions claiming or implying the superiority of one treatment over another, the usefulness of a new diagnostic test, or the prognostic value of some sign. Paradoxically, the increasing frequency of seemingly contradictory results may be generating increasing skepticism in the medical community.

One fundamental reason for this conundrum takes root in the theory of hypothesis testing developed by Pearson and Neyman in the late 1920s [ 24 , 25 ]. The majority of medical research is presented in the form of a comparison, the most obvious being treatment comparisons in randomized controlled trials. To assess whether the difference observed is likely attributable to chance alone or to a true difference, researchers set a null hypothesis that there is no difference between the alternative treatments. They then determine the probability (the p value), they could have obtained the difference observed or a larger difference if the null hypothesis were true; if this probability is below some predetermined explicit significance level, the null hypothesis (ie, there is no difference) is rejected. However, regardless of study results, there is always a chance to conclude there is a difference when in fact there is not (Type I error or false positive) or to report there is no difference when a true difference does exist (Type II error or false negative) and the study has simply failed to detect it (Table  1 ). The size of the sample studied is a major determinant of the risk of reporting false-negative findings. Therefore, sample size is important for planning and interpreting medical research.

Table 1

Type I and Type II errors during hypothesis testing

TruthStudy findings
Null hypothesis is not rejectedNull hypothesis is rejected
Null hypothesis is trueTrue negativeType I error (alpha) (False positive)
Null hypothesis is falseType II error (beta) (False negative)True positive

For that reason, we believe readers should be adequately informed of the frequent issues related to sample size, such as (1) the desired level of statistical significance, (2) the chances of detecting a difference of given magnitude between the groups compared, ie, the power, (3) this targeted difference, and (4) the variability of the data (for quantitative data). We will illustrate these matters with a comparison between two treatments in a surgical randomized controlled trial. The use of sample size also will be presented in other common areas of statistics, such as estimation and regression analyzes.

Desired Level of Significance

The level of statistical significance α corresponds to the probability of Type I error, namely, the probability of rejecting the null hypothesis of “no difference between the treatments compared” when in fact it is true. The decision to reject the null hypothesis is based on a comparison of the prespecified level of the test arbitrarily chosen with the test procedure’s p value. Controlling for Type I error is paramount to medical research to avoid the spread of new or perpetuation of old treatments that are ineffective. For the majority of hypothesis tests, the level of significance is arbitrarily chosen at 5%. When an investigator chooses α = 5%, if the test’s procedure p value computed is less than 5%, the null hypothesis will be rejected and the treatments compared will be assumed to be different.

To reduce the probability of Type I error, we may choose to reduce the level of statistical significance to 1% or less [ 29 ]. However, the level of statistical significance also influences the sample size calculation: the lower the chosen level of statistical significance, the larger the sample size will be, considering all other parameters remain the same (see example below and Appendix 1). Consequently, there are domains where higher levels of statistical significance are used so that the sample size remains restricted, such as for randomized Phase II screening designs in cancer [ 26 ]. We believe the choice of a significance level greater than 5% should be restricted to particular cases.

The power of a test is defined as 1 − the probability of Type II error. The Type II error is concluding at no difference (the null is not rejected) when in fact there is a difference, and its probability is named β. Therefore, the power of a study reflects the probability of detecting a difference when this difference exists. It is also very important to medical research that studies are planned with an adequate power so that meaningful conclusions can be issued if no statistical difference has been shown between the treatments compared. More power means less risk for Type II errors and more chances to detect a difference when it exists.

Power should be determined a priori to be at least 80% and preferably 90%. The latter means, if the true difference between treatments is equal to the one we planned, there is only 10% chance the study will not detect it. Sample size increases with increasing power (Fig.  1 ).

An external file that holds a picture, illustration, etc.
Object name is 11999_2008_346_Fig1_HTML.jpg

The graphs show the distribution of the test statistic (z-test) for the null hypothesis (plain line) and the alternative hypothesis (dotted line) for a sample size of ( A ) 32 patients per group, ( B ) 64 patients per group, and ( C ) 85 patients per group. For a difference in mean of 10, a standard deviation of 20, and a significance level α of 5%, the power (shaded area) increases from ( A ) 50%, to ( B ) 80%, and ( C ) 90%. It can be seen, as power increases, the test statistics yielded under the alternative hypothesis (there is a difference in the two comparison groups) are more likely to be greater than the critical value 1.96.

Very commonly, power calculations have not been performed before conducting the trial [ 3 , 8 ], and when facing nonsignificant results, investigators sometimes compute post hoc power analyses, also called observed power. For this purpose, investigators use the observed difference and variability and the sample size of the trial to determine the power they would have had to detect this particular difference. However, post hoc power analyses have little statistical meaning for three reasons [ 9 , 13 ]. First, because there is a one-to-one relationship between p values and post hoc power, the latter does not convey any additional information on the sample than the former. Second, nonsignificant p values always correspond to low power and post hoc power, at best, will be slightly larger than 50% for p values equal to or greater than 0.05. Third, when computing post hoc power, investigators implicitly make the assumption that the difference observed is clinically meaningful and more representative of the truth than the null hypothesis they precisely were not able to reject. However, in the theory of hypothesis testing, the difference observed should be used only to choose between the hypotheses stated a priori; a posteriori, the use of confidence intervals is preferable to judge the relevance of a finding. The confidence interval represents the range of values we can be confident to some extent includes the true difference. It is related directly to sample size and conveys more information than p values. Nonetheless, post hoc power analyses educate readers about the importance of considering sample size by explicitly raising the issue.

The Targeted Difference Between the Alternative Treatments

The targeted difference between the alternative treatments is determined a priori by the investigator, typically based on preliminary data. The larger the expected difference is, the smaller the required sample size will be. However, because the sample size based on the difference expected may be too large to achieve, investigators sometimes choose to power their trial to detect a difference larger than one would normally expect to reduce the sample size and minimize the time and resources dedicated to the trial. However, if the targeted difference between the alternative treatments is larger than the true difference, the trial may fail to conclude a difference between the two treatments when a smaller, and still meaningful, difference exists. This smallest meaningful difference sometimes is expressed as the “minimal clinically important difference,” namely, “the smallest difference in score in the domain of interest which patients perceive as beneficial and which would mandate, in the absence of troublesome side effects and excessive costs, a change in the patient’s management” [ 15 ]. Because theoretically the minimal clinically important difference is a multidimensional phenomenon that encompasses a wide range of complex issues of a particular treatment in a unique setting, it usually is determined by consensus among clinicians with expertise in the domain. When the measure of treatment effect is based on a score, researchers may use empiric definitions of clinically meaningful difference. For instance, Michener et al. [ 21 ], in a prospective study of 63 patients with various shoulder abnormalities, determined the minimal change perceived as clinically meaningful by the patients for the patient self-report section of the American Shoulder and Elbow Surgeons Standardized Shoulder Assessment Form was 6.7 points of 100 points. Similarly, Bijur et al. [ 5 ], in a prospective cohort study of 108 adults presenting to the emergency department with acute pain, determined the minimal change perceived as clinically meaningful by patients for acute pain measured on the visual analog scale was 1.4 points. There is no reason to try to detect a difference below the minimal clinically important difference because, even if it proves statistically significant, it will not be meaningful.

The meaningful clinically important difference should not be confused with the effect size. The effect size is a dimensionless measure of the magnitude of a relation between two or more variables, such as Cohen’s d standardized difference [ 6 ], but also odds ratio, Pearson’s r correlation coefficient, etc. Sometimes studies are planned to detect a particular effect size instead of being planned to detect a particular difference between the two treatments. According to Cohen [ 6 ], 0.2 is indicative of a small effect, 0.5 a medium effect, and 0.8 a large effect size. One of the advantages of doing so is that researchers do not have to make any assumptions regarding the minimal clinically important difference or the expected variability of the data.

The Variability of the Data

For quantitative data, researchers also need to determine the expected variability of the alternative treatments: the more variability expected in the specified outcome, the more difficult it will be to differentiate between treatments and the larger the required sample size (see example below). If this variability is underestimated at the time of planning, the sample size computed will be too small and the study will be underpowered to the one desired. For comparing proportions, the calculation of sample size makes use of the expected proportion with the specified outcome in each group. For survival data, the calculation of sample size is based on the survival proportions in each treatment group at a specified time and on the total number of events in the group in which the fewer events occur. Therefore, for the latter two types of data, variability does not appear in the computation of sample size.

Presume an investigator wants to compare the postoperative Harris hip score [ 12 ] at 3 months in a group of patients undergoing minimally invasive THA with a control group of patients undergoing standard THA in a randomized controlled trial. The investigator must (1) establish a statistical significance level, eg, α = 5%, (2) select a power, eg, 1 − β = 90%, and (3) establish a targeted difference in the mean scores, eg, 10, and assume a standard deviation of the scores, eg, 20 in both groups (which they can obtain from the literature or their previous patients). In this case, the sample size should be 85 patients per group (Appendix 1). If fewer patients are included in the trial, the probability of detecting the targeted difference when it exists will decrease; for sample sizes of 64 and 32 per group, for instance, the power decreases to 80% and 50%, respectively (Fig.  1 ). If the investigator assumed the standard deviation of the scores in each group to be 30 instead of 20, a sample size of 190 per group would be necessary to obtain a power of 90% with a significance level α = 5% and targeted difference in the mean scores of 10. If the significance level was chosen at α = 1% instead of α = 5%, to yield the same power of 90% with a targeted difference in scores of 10 and standard deviation of 20, the sample size would increase from 85 patients per group to 120 patients per group. In relatively simple cases, statistical tables [ 19 ] and dedicated software available from the internet may be used to determine sample size. In most orthopaedic clinical trials cases, sample size calculation is rather simple as above, but it will become more complex in other cases. The type of end points, the number of groups, the statistical tests used, whether the observations are paired, and other factors influence the complexity of the calculation, and in these cases, expert statistical advice is recommended.

Sample Size, Estimation, and Regression

Sample size was presented above in the context of hypothesis testing. However, it is also of interest in other areas of biostatistics, such as estimation or regression. When planning an experiment, researchers should ensure the precision of the anticipated estimation will be adequate. The precision of an estimation corresponds to the width of the confidence interval: the larger the tested sample size is, the better the precision. For instance, Handl et al. [ 11 ], in a biomechanical study of 21 fresh-frozen cadavers, reported a mean ultimate load failure of four-strand hamstring tendon constructs of 4546 N under loading with a standard deviation of 1500 N. Based on these values, if we were to design an experiment to assess the ultimate load failure of a particular construct, the precision around the mean at the 95% confidence level would be expected to be 3725 N for five specimens, 2146 N for 10 specimens, 1238 N for 25 specimens, 853 N for 50 specimens, and 595 N for 100 specimens tested (Appendix 2); if we consider the estimated mean will be equal to 4546 N, the one obtained in the previous experiment, we could obtain the corresponding 95% confidence intervals (Fig.  2 ). Because we always deal with limited samples, we never exactly know the true mean or standard deviation of the parameter distribution; otherwise, we would not perform the experiment. We only approximate these values, and the results obtained can vary from the planned experiment. Nonetheless, what we identify at the time of planning is that testing more than 50 specimens, for instance 100, will multiply the costs and time necessary to the experiment while providing only slight improvement in the precision.

An external file that holds a picture, illustration, etc.
Object name is 11999_2008_346_Fig2_HTML.jpg

The graph shows the predicted confidence interval for experiments with an increasing number of specimens tested based on the study by Handl et al. [ 11 ] of 21 fresh-frozen cadavers with a mean ultimate load failure of four-strand hamstring tendon constructs of 4546 N and standard deviation of 1500 N.

Similarly, sample size issues should be considered when performing regression analyses, namely, when trying to assess the effect of a particular covariate, or set of covariates, on an outcome. The effective power to detect the significance of a covariate in predicting this outcome depends on the outcome modeled [ 14 , 30 ]. For instance, when using a Cox regression model, the power of the test to detect the significance of a particular covariate does not depend on the size of the sample per se but on the number of specific critical events. In a cohort study of patients treated for soft tissue sarcoma with various treatments, such as surgery, radiotherapy, chemotherapy, etc, the power to detect the effect of chemotherapy on survival will depend on the number of patients who die, not on the total number of patients in the cohort. Therefore, when planning such studies, researchers should be familiar with these issues and decide, for example, to model a composite outcome, such as event-free survival that includes any of the following events: death from disease, death from other causes, recurrence, metastases, etc, to increase the power of the test.

The reasons to plan a trial with an adequate sample size likely to give enough power to detect a meaningful difference are essentially ethical. Small trials are considered unethical by most, but not all, researchers because they expose participants to the burdens and risks of human research with a limited chance to provide any useful answers [ 2 , 10 , 28 ]. Underpowered trials also ineffectively consume resources (human, material) and add to the cost of healthcare to society. Although there are particular cases when trials conducted on a small sample are justified, such as early-phase trials with the aim of guiding the conduct of subsequent research (or formulating hypotheses) or, more rarely, for rare diseases with the aim of prospectively conducting meta-analyses, they generally should be avoided [ 10 ]. It is also unethical to conduct trials with too large a sample size because, in addition to the waste of time and resources, they expose participants in one group to receive inadequate treatment after appropriate conclusions should have been reached. Interim analyses and adaptive trials have been developed in this context to shorten the time to decision and overcome these concerns [ 4 , 16 ].

We raise two important points. First, we explained, for practical and ethical reasons, experiments are conducted on a sample of limited size with the aim to generalize the results to the population of interest and increasing the size of the sample is a way to combat uncertainty. When doing this, we implicitly consider the patients or specimens in the sample are randomly selected from the population of interest, although this is almost never the case; even if it were the case, the population of interest would be limited in space and time. For instance, Marx et al. [ 20 ], in a survey conducted in late 1998 and early 1999, assessed the practices for anterior cruciate ligament reconstruction on a randomly selected sample of 725 members of the American Academy of Orthopaedic Surgeons; however, because only ½ the surgeons responded to the survey, their sample probably is not representative of all members of the society, who in turn are not representative of all orthopaedic surgeons in the United States, who again are not representative of all surgeons in the world because of the numerous differences among patients, doctors, and healthcare systems across countries. Similar surveys conducted in other countries have provided different results [ 17 , 22 ]. Moreover, if the same survey was conducted today, the results would possibly differ. Therefore, another source for variation among studies, apart from sampling variability, is that samples may not be representative of the same population. Therefore, when planning experiments, researchers must take care to make their sample representative of the population they want to infer to and readers, when interpreting the results of a study, should always assess first how representative the sample presented is regarding their own patients. The process implemented to select the sample, the settings of the experiment, and the general characteristics and influencing factors of the patients must be described precisely to assess representativeness and possible selection biases [ 7 ].

Second, we have discussed only sample size for interpreting nonsignificant p values, but it also may be of interest when interpreting p values that are significant. Significant results issued from larger studies usually are given more credit than those from smaller studies because of the risk of reporting exaggerating treatment effects with studies with smaller samples or of lower quality [ 23 , 27 ], and small trials are believed to be more biased than others. However, there is no statistical reason a significant result in a trial including 2000 patients should be given more belief than a trial including 20 patients, given the significance level chosen is the same in both trials. Small but well-conducted trials may yield a reliable estimation of treatment effect. Kjaergard et al. [ 18 ], in a study of 14 meta-analyses involving 190 randomized trials, reported small trials (fewer than 1000 patients) reported exaggerated treatment effects when compared with large trials. However, when considering only small trials with adequate randomization, allocation concealment (allocation concealment is the process that keeps clinicians and participants unaware of upcoming assignments. Without it, even properly developed random allocation sequences can be subverted), and blinding, this difference became negligible. Nonetheless, the advantages of a large sample size to interpret significant results are it allows a more precise estimate of the treatment effect and it usually is easier to assess the representativeness of the sample and to generalize the results.

Sample size is important for planning and interpreting medical research and surgeons should become familiar with the basic elements required to assess sample size and the influence of sample size on the conclusions. Controlling for the size of the sample allows the researcher to walk a thin line that separates the uncertainty surrounding studies with too small a sample size from studies that have failed practical or ethical considerations because of too large a sample size.

Acknowledgments

We thank the editor whose thorough readings of, and accurate comments on drafts of the manuscript have helped clarify the manuscript.

The sample size (n) per group for comparing two means with a two-sided two-sample t test is

equation M1

where z 1−α/2 and z 1−β are standard normal deviates for the probability of 1 − α/2 and 1 − β, respectively, and d t  = (μ 0  − μ 1 )/σ is the targeted standardized difference between the two means.

The following values correspond to the example:

  • α = 0.05 (statistical significance level)
  • β = 0.10 (power of 90%)
  • |μ 0  − μ 1 | = 10 (difference in the mean score between the two groups)
  • σ = 20 (standard deviation of the score in each group)
  • z 1−α/2  = 1.96
  • z 1−β  = 1.28

equation M2

Two-sided tests which do not assume the direction of the difference (ie, that the mean value in one group would always be greater than that in the other) are generally preferred. The null hypothesis makes the assumption that there is no difference between the treatments compared, and a difference on one side or the other therefore is expected.

Computation of Confidence Interval

To determine the estimation of a parameter, or alternatively the confidence interval, we use the distribution of the parameter estimate in repeated samples of the same size. For instance, consider a parameter with observed mean, m, and standard deviation, sd, in a given sample. If we assume that the distribution of the parameter in the sample is close to a normal distribution, the means, x n , of several repeated samples of the same size have true mean, μ, the population mean, and estimated standard deviation,

equation M3

also known as standard error of the mean, and

equation M4

follows a t distribution. For a large sample, the t distribution becomes close to the normal distribution; however, for a smaller sample size the difference is not negligible and the t distribution is preferred. The precision of the estimation is

equation M5

For example, Handl et al. [ 11 ] in a biomechanical study of 21 fresh-frozen cadavers reported a mean ultimate load failure of 4-strand hamstring tendon constructs of 4546 N under dynamic loading with standard deviation of 1500 N. If we were to plan an experiment, the anticipated precision of the estimation at the 95% level would be

equation M7

for five specimens,

equation M8

The values 2.78, 2.26, 2.06, 2.01, and 1.98 correspond to the t distribution deviates for the probability of 1 − α/2, with 4, 9, 24, 49, and 99 (n − 1) degrees of freedom; the well known corresponding standard normal deviate is 1.96. Given an estimated mean of 4546 N, the corresponding 95% confidence intervals are 2683 N to 6408 N for five specimens, 3473 N to 5619 N for 10 specimens, 3927 N to 5165 N for 25 specimens, 4120 N to 4972 N for 50 specimens, and 4248 N to 4844 N for 100 specimens (Fig.  2 ).

Similarly, for a proportion p in a given sample with sufficient sample size to assume a nearly normal distribution, the confidence interval extends either side of the proportion p by

equation M12

For a small sample size, exact confidence interval for proportions should be used.

Each author certifies that he or she has no commercial associations (eg, consultancies, stock ownership, equity interest, patent/licensing arrangements, etc) that might pose a conflict of interest in connection with the submitted article.

Does Contract Farming Improve Income of Smallholder Avocado Farmers? Evidence from Sidama Region of Ethiopia

  • Published: 28 August 2024

Cite this article

advantages of a small sample size in research

  • Tibebu Legesse   ORCID: orcid.org/0000-0003-4821-3198 1 ,
  • Mesfin Gensa 2 ,
  • Abera Alemu 3 ,
  • Aneteneh Ashebir 1 &
  • Zerhun Ganewo 1  

Contract farming is considered the most effective income-generating strategy for smallholder farmers and a significant source of foreign currency in Ethiopia. Avocado farmers in the study area made a contract agreement with the Savando avocado oil processing company, which is part of the Yirgalem agro-processing industry. The main aim of this research was to look at the factors influencing avocado producers’ decision to participate in contract farming and how it would affect their income, using data collected from 413 avocado producers in Dale district, Sidama region, Ethiopia. The cross-sectional research design and multi-stage sampling procedure were used to choose the study’s representative sample. The data were analyzed using descriptive statistics, inferential statistics, and propensity score matching model. The findings of this study indicated that the age of the household head, sex of the household head, education level of the household head, family size, and proportion of the farmland allocated for avocado production influenced the avocado producers’ participation in contract farming under the agro-processing industry. Average treatment effect on treated (ATT) estimation showed that participation in contract framing had a substantial impact on avocado producer households’ income. The study suggests that local government should offer adult education to improve smallholders’ knowledge and attitudes towards the benefits of participation in contract farming schemes in the study area. Moreover, the district office of agriculture needs to work with farmers to allocate more land for avocado production.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save.

  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime

Price includes VAT (Russian Federation)

Instant access to the full article PDF.

Rent this article via DeepDyve

Institutional subscriptions

advantages of a small sample size in research

Data Availability

The datasets used and analyzed during this study are accessible upon request from the corresponding author.

Abdulai, Y., & Al-hassan, S. (2016). Effects of contract farming on smallholder soybean farmers’ income in the eastern corridor of the northern region, Ghana. Journal of Economics and Sustainable Development, 7 (2), 103–113.  www.iiste.org

Alemu, E. E., Mathijs, M., Maertens, J., Deckers, K., Gegziabher, H., Bauer, & Ghiwot, K. (2013). Vertical coordination in the local food chains: Evidence from farmers in Ethiopia. International Journal of Economics and Finance Studies, 4 (1), 11–20.

Ashraf, N., Giné, X., & Karlan, D. (2009). Finding missing markets (and a disturbing epilogue): Evidence from an export crop adoption and marketing intervention in Kenya. American Journal of Agricultural Economics, 91 (4), 973–990.

Article   Google Scholar  

Asrat, D., Anteneh, A., Adem, M. & Berhane, Z. (2022). Impact of Awash irrigation on the welfare of smallholder farmers in Eastern Ethiopia.  Cogent Economics & Finance, 10 . https://doi.org/10.1080/23322039.2021.2024722

Azumah, S. B., Donkoh, S. A., & Ehiakpor, D. S. (2016). Examining the determinants and effects of contract farming on farm income in the northern region of Ghana. Ghana Journal of Science, Technology and Development , 4 , 1–10. http://www.gjstd.o

Barrett, C., Bachke, M., Bellemare, M., Michelson, H., Narayanan, S., & Walker, T. (2012). Smallholder participation in contract farming: Comparative evidence from five countries. World Development, 40 (3), 715–730.

Becker, S. O., & Caliendo,. (2007). Sensitivity analysis for average treatment effects. The Stata Journal, 7 (1), 71–83.

Bellemare, M. F. (2012). As you sow, so shall you reap: The welfare impacts of contract farming. World Development . https://doi.org/10.1016/j.worlddev.2011.12.008

Bijman, J. (2008). Contract farming in developing countries: An overview. Working Paper . Wageningen University.

Bryson, A., Dorsett, R., & Purdon, S. (2002). The use of propensity score matching in the evaluation of active labour market policies. Department for work and pensions working paper (4).

Caliendo, M. & Kopeinig, S. (2005). Some practical guidance for the implementation of propensity score matching. University of Cologne discussion paper No. 1588, Bonn Germany.

Caliendo, M, & Kopeinig, S. (2008). Some practical guidance for the implementation of propensity score matching.  Journal of Economic Surveys . https://doi.org/10.1111/j.1467-6419.2007.00527.x

Coase, R. H. (1937). The nature of the firm. Economica, 4 (16), 386–405.

CSA. (2021). Agriculture sample survey 2012/2013; volume I, Report on Area and Production of major crops statistical Bulletin 532; Addis Ababa, Ethiopia.

Dehejia, R. H., & Wahba, S. (2002). Propensity score-matching methods for non-experimental causal studies. Review of Economics and Statistics, 84 (1), 151–161. https://doi.org/10.1162/003465302317331982

Dubbert, C. (2019). Participation in contract farming and farm performance: Insights from cashew farmers in Ghana. Agricultural Economics , 1–15. https://doi.org/10.1111/agec.12522

Eaton, C, & Shepherd, A. W. (2001). Contract farming: Partnership for growth. FAO Agricultural Services Bulletin 145.

Echánove, F., & Steffen, C. (2005). Agribusiness and farmers in Mexico: The importance of contractual relations. The Geographical Journal, 171 (2), 166–176.

Endalew, B., Anteneh, A., & Tasie, K. (2022). Technical efficiency of teff production among smallholder farmers: Beta regression approach. European Journal of Development Research, 34 , 1076–1096.

Gebremariam, T. K., & Ying, S. (2022). The foreign direct investment-export performance nexus: An ARDL based empirical evidence from Ethiopia.  Cogent Economics & Finance, 10 . https://doi.org/10.1080/23322039.2021.2009089

Gemechu, M., Jema, H., Belaineh, L., & Mengistu, K. (2017). Impact of participation in vegetables’ contract farming on household’s income in the Central Rift Valley of Ethiopia. American Journal of Rural Development, 5 (4), 90–96. https://doi.org/10.12691/ajrd-5-4-1

Geoffrey, S. K., Hillary, B. K., Lawrence, K. K., & Mary, M. C. (2013). Determinants of market participation among small-scale pineapple farmers in Kericho County, Kenya. Journal of Economics and Sustainable Development, 4 , 59–66.

Google Scholar  

Gopi, V., Ananthan, M., Rani, M. S. A., Kumar, M., Jeyakumar, P., & Krishnamoorthy, V. (2021). Diversity analysis in avocado (Persea Americana Mill.) accessions of lower Pulney Hills of Tamil Nadu, India. Int J Plant Soil Sci. https://doi.org/10.9734/ijpss/2021/v33i2430794

Goshu, D., Getahun, T. D., & Oluwole, F. (2019). Innovation opportunities for wheat and faba bean value chains in Ethiopia. FARA Research Report, 4 (5), P73.

Gujarati, N. D., & Porter, D. C. (2009). Basic econometrics . International Edition McGraw-Hill/Irwin, a Business Unit of the McGraw-Hill Companies, Inc.

Iro, K. I. (2016). Empirical evidence on contract farming in northern Nigeria: Case study of tomato production. Asian Journal of Agriculture and Rural Development, 6 (12), 240–325.

Jalan, J., & Ravallion, M. (2003). Estimating the benefit incidence of an antipoverty program by propensity-score matching. Journal of Business & Economic Statistics, 21 (1), 19–30. https://doi.org/10.1198/073500102288618720

Key, N., & Runsten, D. (1999). Contract farming, smallholders and rural development in Latin America: The organization of agro-processing firms and scale of out grower production. World Development, 27 (2), 381–401.

Khan, M. F., Nakano, Y., & Kurosaki, T. (2019). Impact of contract farming on land productivity and income of maize and potato growers in Pakistan. Food Policy . https://doi.org/10.1016/j.foodpol.2019.04.004

Kifle, T., Ayal, D. Y., & Mulugeta, M. (2022). Factors influencing farmers adoption of climate smart agriculture to respond climate variability in Siyadebrina Wayu District, central highland of Ethiopia.  Climate Services, 26 , 100290. https://doi.org/10.1016/j.cliser.2022

Maertens, M., & Velde, K. V. (2017). Contract-farming in staple food chains: The case of rice in Benin. World Development, 95 , 73–87. https://doi.org/10.1016/j.worlddev.2017.02.011

Megersa, B., &  Alemu, D. (2013). The role of avocado production in coffee based farming systems of South Western Ethiopia: The case of Jimma Zone.  Journal of Agricultural Science and Applications 2 (02), 86–95. https://doi.org/10.14511/jasa.2013.020206

Minot, N. (2007).  Contract farming in developing countries; patterns, impact, and policy implications . IFPRI.

Minot, N., & Sawyer, B. (2014). Contract farming in developing countries: Theory, practice, and policy implications. International Food Policy Research Institute. https://doi.org/10.2499/9780896292130_04

Miyata, S., Minot, N., & Hu, D. (2009). Impact of contract farming on income: Linking small farmers, packers, and supermarkets in China.  World Development .  https://doi.org/10.1016/j.worlddev.2008.08.025

Molla, A. (2016). Partial substitution of malted barley by raw barley in brewing technology. MSc Thesis. Addis Ababa University: Addis Ababa.

Nakawuka, P., Langan, S., Schmitter, & Barron, J. (2018). A review of trends, constraints and opportunities of smallholder irrigation in East Africa. Global Food Security, 17 , 196–212.

Ncube, D. (2020). The importance of contract farming to small-scale farmers in Africa and the implications for policy: A review scenario. Open Agriculture Journal . https://doi.org/10.2174/1874331502014010059

Okezie, C. A., Sulaiman, J., & Nwosu, A. C. (2012). Farm–level determinants of agricultural commercialization. International Journal of Agriculture and Forestry, 2 (2), 1–5.

Olounlade, O. A., Li, G. C., Kokoye, S. E., Dossouhoui, F. V., Aristide, K. A., Anshiso, D., & Biaou, G. (2020). Impact of participation in contract farming on smallholder farmers’ income and food security in rural Benin: PSM and LATE parameter combined. Sustainability, 12 (3), 901.

Patricio and Gefrrey. (2009). Microcredit impact assessment: The Brazilian and Chilean cases. Panorama Socioeomicio, 27 , 100–112.

Prowse, M. (2012). Contract farming in developing countries: A review . Research Department, AFD.

Regasa, M. S., Nones, M. & Adeba, D. (2021). A review on land use and land cover change in Ethiopian Basins. Land, 10 . https://doi.org/10.3390/land10060585

Reiffel, J. A. (2018). Propensity-score matching: Optimal, adequate, or incomplete? Journal of Atrial Fibrillation . https://doi.org/10.4022/jafb.2130

Rosenbaum, P. R. (2002). Observational studies (2nd edition, 375 p). Springer.

Rosenbaum, P., & R., & Rubin, D. B. (1985). The bias due to incomplete matching. Biometrics, 41 (1), 103–16.

Roy, E. P. (1963). Contract farming . The Interstate Printers and Publishers Inc.

Khandker, S. R., Koolwal, G. B., & Samad, H. A. (2010). Hand book on impact evaluation. Quantitative Methods and Practices . World Bank.

Seba, S. (2016). Impact of contract farming on smallholders in Ethiopia: The case of chickpea growers [A thesis submitted in partial fulfillment of the requirements for the Degree of Master of Commerce (Agricultural) at Lincoln University).

Sharma, N. (2016). Does contract farming improve farmers’ income and efficiency? A case study from Punjab. Economic and Political Weekly, 51 (40), 63–69.

Singh, I., Squire, L., & Strauss, J. (Eds.). (1986). Agricultural household models: Extensions, applications, and policy . John Hopkins University Press.

Smalley, R. (2013). Plantations, contract farming and commercial farming areas in Africa: A Comparative review. Retrieved from https://www.plaas.org.za/sites/default/files/publicationspdf/FAC_Working_Paper_055.pdf

Tadesse, B. (2020). A blink of transformation: Building trust and collaboration in Ethiopia’s malt barley value chain. Inner Work for Social Change . Synergos.

Ton, G., Vellema, W., Desiere, S., Weituschat, S., & D’Haese., M. (2016). Effectiveness of contract farming for improving income of smallholder farmers-preliminary results of a systematic review. http://ageconsearch.umn.edu/bitstream/244792/1/

Wooldridge, J. M. (2002). Econometric analysis of cross section and panel data. MIT Press, Cambridge, MA.

Wordofa, M. G., Hassen, J. Y., Endris, G. S., Aweke, C. S., Moges, D. K., & Rorisa, D. T. (2021). Adoption of improved agricultural technology and its impact on household income: A propensity score matching estimation in Eastern Ethiopia.  Agriculture & Food Security, 10 . https://doi.org/10.1186/s40066-020-00278-2

Yakubu, A., Oladimeji, Y. U., & Hassan, A. A. (2019). Technical efficiency of maize farmers in Kano State of Nigeria using data envelopment analysis approach. Ethiopian Journal of Environmental Studies & Management, 12 (2), 136–147.

Zegeye, M. B., Fikrie, A. H., & Assefa, A. B. (2022). Impact of agricultural technology adoption on wheat productivity: Evidence from North Shewa Zone, Amhara Region, Ethiopia.  Cogent Economics & Finance, 10 . https://doi.org/10.1080/23322039.2022.2101223

Zhao, Q.-Y., Luo, J.-C., Su, Y., Zhang, Y.-J., Tu, G.-W., & Luo, Z. (2021). Propensity score matching with R: Conventional methods and new features.  Annals of Translational Medicine . https://doi.org/10.21037/atm-20-3998

Download references

Acknowledgements

The authors would like to thank the experts’ of the agriculture office of the district and the staff of Yirgalem integrated agro-processing industries for their patience and support in getting the required supplementary data. Besides, the authors would like to thank the respondents for their dedicated willingness to participate in this study

Author information

Authors and affiliations.

Department of Agribusiness and Value Chain Management, College of Agriculture, Hawassa University, Hawassa, Ethiopia

Tibebu Legesse, Aneteneh Ashebir & Zerhun Ganewo

World Vision, Hawassa, Ethiopia

Mesfin Gensa

Department of Rural Development and Agricultural Extension, College of Agriculture, Hawassa University, Hawassa, Ethiopia

Abera Alemu

You can also search for this author in PubMed   Google Scholar

Contributions

The first author contributed for research proposal writing, data collection, and supervision. The second author assisted with data cleaning and feeding. The third, fourth and fifth authors contributed for data analysis and article writing. All authors read and approved the final manuscript.

Corresponding author

Correspondence to Tibebu Legesse .

Ethics declarations

Ethics approval and consent to participate.

Not applicable.

Consent for Publication

Competing interests.

The authors declare no competing interests.

Additional information

Publisher's note.

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Tables 6 , 7 , 8 , and 9

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Legesse, T., Gensa, M., Alemu, A. et al. Does Contract Farming Improve Income of Smallholder Avocado Farmers? Evidence from Sidama Region of Ethiopia. J Knowl Econ (2024). https://doi.org/10.1007/s13132-024-02275-3

Download citation

Received : 22 April 2023

Accepted : 01 August 2024

Published : 28 August 2024

DOI : https://doi.org/10.1007/s13132-024-02275-3

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Contract farming
  • Household income
  • Propensity score matching model
  • Find a journal
  • Publish with us
  • Track your research

IMAGES

  1. PPT

    advantages of a small sample size in research

  2. What Is A Sample Size? A Guide to Market Research Sample Sizes

    advantages of a small sample size in research

  3. Three reasons why having small sample size is enough (for a Qualitative study)

    advantages of a small sample size in research

  4. Advantages And Disadvantages Of Sampling

    advantages of a small sample size in research

  5. Sample Size Determination: Definition, Formula, And Example, 55% OFF

    advantages of a small sample size in research

  6. how to determine sample size in research methodology

    advantages of a small sample size in research

COMMENTS

  1. Sample Size and its Importance in Research

    The sample size for a study needs to be estimated at the time the study is proposed; too large a sample is unnecessary and unethical, and too small a sample is unscientific and also unethical. The necessary sample size can be calculated, using statistical software, based on certain assumptions. If no assumptions can be made, then an arbitrary ...

  2. The importance of small samples in medical research

    Statistically, a sample of n <30 for the quantitative outcome or [np or n (1 - p)] <8 (where P is the proportion) for the qualitative outcome is considered small because the central limit theorem for normal distribution does not hold in most cases with such a sample size and an exact method of analysis is required.

  3. How sample size influences research outcomes

    An appropriate sample renders the research more efficient: Data generated are reliable, resource investment is as limited as possible, while conforming to ethical principles. The use of sample size calculation directly influences research findings. Very small samples undermine the internal and external validity of a study.

  4. Big enough? Sampling in qualitative inquiry

    These studies often have small samples. However, the problem doesn't rest in the size of the sample, it is with the inadequacy of the evidence. ... O'Reilly M, Parker N (2013) Unsatisfactory saturation: A critical exploration of the notion of saturated sample sizes in qualitative research. Qualitative Research 13(2): 190-197. Crossref ...

  5. The logic of small samples in interview-based qualitative research

    Since such a research project scrutinizes the dynamic qualities of a situation (rather than elucidating the proportionate relationships among its constituents), the issue of sample size - as well as representativeness - has little bearing on the project's basic logic.

  6. Sample Size Essentials: The Foundation of Reliable Statistics

    Sample size is the number of observations or data points collected in a study. It is a crucial element in any statistical analysis because it is the foundation for drawing inferences and conclusions about a larger population. When delving into the world of statistics, the phrase "sample size" often pops up, carrying with it the weight of ...

  7. The importance of small samples in medical research

    This communication underscores the importance of small samples in reaching a valid conclusion in certain situations and describes the situations where a large sample is not only unnecessary but may even compromise the validity by not being able to exercise full care in the assessments. What sample size is small depends on the context.

  8. Sample size determination: A practical guide for health researchers

    2.1 Expectations regarding sample size. A sample size can be small, especially when investigating rare diseases or when the sampling technique is complicated and costly. 4, 7 Most academic journals do not place limitations on sample sizes. 8 However, an insufficiently small sample size makes it challenging to reproduce the results and may ...

  9. Small Sample Research: Considerations Beyond Statistical Power

    Small sample research presents a challenge to current standards of design and analytic approaches and the underlying notions of what constitutes good prevention science. Yet, small sample research is critically important as the research questions posed in small samples often represent serious health concerns in vulnerable and underrepresented populations. This commentary considers the Special ...

  10. On the scientific study of small samples: Challenges confronting

    Scientific research can proceed in small samples as long as comparison cases are fair (Cooper & Richardson, 1986). Medicine provides some good examples of how matching can be used to get around the difficulty of quantitative analysis of small sample sizes.

  11. Small Samples: Does Size Matter?

    Despite criticisms, 1 a sample size of five may well be useful in scientific research. In summary, the model outlined allows predictions to be made from experimental data obtained from limited numbers of samples. Our approach is appropriate for studies documenting the presence of an effect in each of a small number of subjects and allows ...

  12. Sample size, power and effect size revisited: simplified and practical

    In clinical research, sample size is calculated in line with the hypothesis and study design. The cross-over study design and parallel study design apply different approaches for sample size estimation. ... (0.71-0.87) is the study is conducted with 100 samples. Thus, at small sample sizes, only rather uncertain estimates of specificity ...

  13. Small Studies: Strengths and Limitations

    The sample size of 500 secondary school students was also small in comparison to the 160,699 students enrolled in just public senior secondary schools in Oyo State as stated in the Senior ...

  14. Best Practices for Using Statistics on Small Sample Sizes

    The right one depends on the type of data you have: continuous or discrete-binary. Comparing Means: If your data is generally continuous (not binary), such as task time or rating scales, use the two sample t-test. It's been shown to be accurate for small sample sizes. Comparing Two Proportions: If your data is binary (pass/fail, yes/no), then ...

  15. PDF Small studies: strengths and limitations

    often review very interesting studies but based on small sample sizes. While the board encourages the best use of such data, editors must take into account that small studies have their limitations. REFERENCES 1 Pocock SJ, ed. Clinical Trials: A Practical Approach. New York, John Wiley Sons, 1983. 2 Machin D, Campbell MJ, Fayers PM, Pinol APY ...

  16. Clarifying the advantage of small samples: as it relates to statistical

    Such a small-sample advantage (SSA) is predicted for choices, not estimations. It is contingent on high constant decision thresholds. The model was harshly criticized by Cahan (2010), who argued that the SSA disappears when the threshold decreases with increasing sample size and when the costs of incorrect decisions are higher than the benefits ...

  17. Implications of Small Samples for Generalization: Adjustments

    In this article, we investigate properties of six of these methods and statistics in the small sample sizes common in education research (i.e., 10-70 sites), evaluating the utility of rules of thumb developed from observational studies in the generalization case. ... are much too conservative given the small sample sizes found in generalization ...

  18. Series: Practical guidance to qualitative research. Part 3: Sampling

    In quantitative research, by contrast, the sample size is determined by a power calculation. The usually small sample size in qualitative research depends on the information richness of the data, the variety of participants (or other units), the broadness of the research question and the phenomenon, the data collection method (e.g., individual ...

  19. Sample Size and its Importance in Research

    Sample Size and its Importance in Research. Chittaranjan Andrade. ABSTRACT. The sample size for a study needs to be estimated at the time the study is proposed; too large a sample is unnecessary and unethical, and too small a sample is unscientific and also unethical. The necessary sample size can be calculated, using statistical software ...

  20. PDF Why Small Samples Can Increase Accuracy

    The economic and practical advantages of small sample size High efficiency in an experimental design has the obvious attraction that a result can be obtained after a much lower expenditure of time, money and other research resources. The same comments can be made with regard to a small individual sample for each treatment

  21. Small Sample Research: Considerations Beyond Statistical Power

    Small sample research presents a challenge to current standards of design and analytic approaches and the underlying notions of what constitutes good prevention science. Yet, small sample research is critically important as the research questions posed in small samples often represent serious health concerns in vulnerable and underrepresented ...

  22. Small sample sizes: A big data problem in high-dimensional data

    Small sample sizes occur in various research experiments and especially in preclinical (animal) studies due to ethical, financial, and general feasibility reasons. Such studies are essential and an important part of translational medicine and other areas (e.g. rare diseases). Often, less than 20 animals per group are involved, and thus making ...

  23. Dealing with small samples in disability research: Do not fret

    Purpose/Objective: Small sample sizes are a common problem in disability research. Here, we show how Bayesian methods can be applied in small sample settings and the advantages that they provide. Method/Design: To illustrate, we provide a Bayesian analysis of employment status (employed vs. unemployed) for those with disability. Specifically, we apply empirically informed priors, based on ...

  24. Statistics in Brief: The Importance of Sample Size in the Planning and

    The graphs show the distribution of the test statistic (z-test) for the null hypothesis (plain line) and the alternative hypothesis (dotted line) for a sample size of (A) 32 patients per group, (B) 64 patients per group, and (C) 85 patients per group.For a difference in mean of 10, a standard deviation of 20, and a significance level α of 5%, the power (shaded area) increases from (A) 50%, to ...

  25. Does Contract Farming Improve Income of Smallholder Avocado ...

    Contract farming is considered the most effective income-generating strategy for smallholder farmers and a significant source of foreign currency in Ethiopia. Avocado farmers in the study area made a contract agreement with the Savando avocado oil processing company, which is part of the Yirgalem agro-processing industry. The main aim of this research was to look at the factors influencing ...