A Step-by-Step Guide to the Data Analysis Process

Like any scientific discipline, data analysis follows a rigorous step-by-step process. Each stage requires different skills and know-how. To get meaningful insights, though, it’s important to understand the process as a whole. An underlying framework is invaluable for producing results that stand up to scrutiny.

In this post, we’ll explore the main steps in the data analysis process. This will cover how to define your goal, collect data, and carry out an analysis. Where applicable, we’ll also use examples and highlight a few tools to make the journey easier. When you’re done, you’ll have a much better understanding of the basics. This will help you tweak the process to fit your own needs.

Here are the steps we’ll take you through:

  • Defining the question
  • Collecting the data
  • Cleaning the data
  • Analyzing the data
  • Sharing your results
  • Embracing failure

On popular request, we've also developed a video based on this article.

Ready? Let’s get started with step one.

1. Step one: Defining the question

The first step in any data analysis process is to define your objective. In data analytics jargon, this is sometimes called the ‘problem statement’.

Defining your objective means coming up with a hypothesis and figuring how to test it. Start by asking: What business problem am I trying to solve? While this might sound straightforward, it can be trickier than it seems. For instance, your organization’s senior management might pose an issue, such as: “Why are we losing customers?” It’s possible, though, that this doesn’t get to the core of the problem. A data analyst’s job is to understand the business and its goals in enough depth that they can frame the problem the right way.

Let’s say you work for a fictional company called TopNotch Learning. TopNotch creates custom training software for its clients. While it is excellent at securing new clients, it has much lower repeat business. As such, your question might not be, “Why are we losing customers?” but, “Which factors are negatively impacting the customer experience?” or better yet: “How can we boost customer retention while minimizing costs?”

Now you’ve defined a problem, you need to determine which sources of data will best help you solve it. This is where your business acumen comes in again. For instance, perhaps you’ve noticed that the sales process for new clients is very slick, but that the production team is inefficient. Knowing this, you could hypothesize that the sales process wins lots of new clients, but the subsequent customer experience is lacking. Could this be why customers don’t come back? Which sources of data will help you answer this question?

Tools to help define your objective

Defining your objective is mostly about soft skills, business knowledge, and lateral thinking. But you’ll also need to keep track of business metrics and key performance indicators (KPIs). Monthly reports can allow you to track problem points in the business. Some KPI dashboards come with a fee, like Databox and DashThis . However, you’ll also find open-source software like Grafana , Freeboard , and Dashbuilder . These are great for producing simple dashboards, both at the beginning and the end of the data analysis process.

2. Step two: Collecting the data

Once you’ve established your objective, you’ll need to create a strategy for collecting and aggregating the appropriate data. A key part of this is determining which data you need. This might be quantitative (numeric) data, e.g. sales figures, or qualitative (descriptive) data, such as customer reviews. All data fit into one of three categories: first-party, second-party, and third-party data. Let’s explore each one.

What is first-party data?

First-party data are data that you, or your company, have directly collected from customers. It might come in the form of transactional tracking data or information from your company’s customer relationship management (CRM) system. Whatever its source, first-party data is usually structured and organized in a clear, defined way. Other sources of first-party data might include customer satisfaction surveys, focus groups, interviews, or direct observation.

What is second-party data?

To enrich your analysis, you might want to secure a secondary data source. Second-party data is the first-party data of other organizations. This might be available directly from the company or through a private marketplace. The main benefit of second-party data is that they are usually structured, and although they will be less relevant than first-party data, they also tend to be quite reliable. Examples of second-party data include website, app or social media activity, like online purchase histories, or shipping data.

What is third-party data?

Third-party data is data that has been collected and aggregated from numerous sources by a third-party organization. Often (though not always) third-party data contains a vast amount of unstructured data points (big data). Many organizations collect big data to create industry reports or to conduct market research. The research and advisory firm Gartner is a good real-world example of an organization that collects big data and sells it on to other companies. Open data repositories and government portals are also sources of third-party data .

Tools to help you collect data

Once you’ve devised a data strategy (i.e. you’ve identified which data you need, and how best to go about collecting them) there are many tools you can use to help you. One thing you’ll need, regardless of industry or area of expertise, is a data management platform (DMP). A DMP is a piece of software that allows you to identify and aggregate data from numerous sources, before manipulating them, segmenting them, and so on. There are many DMPs available. Some well-known enterprise DMPs include Salesforce DMP , SAS , and the data integration platform, Xplenty . If you want to play around, you can also try some open-source platforms like Pimcore or D:Swarm .

Want to learn more about what data analytics is and the process a data analyst follows? We cover this topic (and more) in our free introductory short course for beginners.

3. Step three: Cleaning the data

Once you’ve collected your data, the next step is to get it ready for analysis. This means cleaning, or ‘scrubbing’ it, and is crucial in making sure that you’re working with high-quality data . Key data cleaning tasks include:

  • Removing major errors, duplicates, and outliers —all of which are inevitable problems when aggregating data from numerous sources.
  • Removing unwanted data points —extracting irrelevant observations that have no bearing on your intended analysis.
  • Bringing structure to your data —general ‘housekeeping’, i.e. fixing typos or layout issues, which will help you map and manipulate your data more easily.
  • Filling in major gaps —as you’re tidying up, you might notice that important data are missing. Once you’ve identified gaps, you can go about filling them.

A good data analyst will spend around 70-90% of their time cleaning their data. This might sound excessive. But focusing on the wrong data points (or analyzing erroneous data) will severely impact your results. It might even send you back to square one…so don’t rush it! You’ll find a step-by-step guide to data cleaning here . You may be interested in this introductory tutorial to data cleaning, hosted by Dr. Humera Noor Minhas.

Carrying out an exploratory analysis

Another thing many data analysts do (alongside cleaning data) is to carry out an exploratory analysis. This helps identify initial trends and characteristics, and can even refine your hypothesis. Let’s use our fictional learning company as an example again. Carrying out an exploratory analysis, perhaps you notice a correlation between how much TopNotch Learning’s clients pay and how quickly they move on to new suppliers. This might suggest that a low-quality customer experience (the assumption in your initial hypothesis) is actually less of an issue than cost. You might, therefore, take this into account.

Tools to help you clean your data

Cleaning datasets manually—especially large ones—can be daunting. Luckily, there are many tools available to streamline the process. Open-source tools, such as OpenRefine , are excellent for basic data cleaning, as well as high-level exploration. However, free tools offer limited functionality for very large datasets. Python libraries (e.g. Pandas) and some R packages are better suited for heavy data scrubbing. You will, of course, need to be familiar with the languages. Alternatively, enterprise tools are also available. For example, Data Ladder , which is one of the highest-rated data-matching tools in the industry. There are many more. Why not see which free data cleaning tools you can find to play around with?

4. Step four: Analyzing the data

Finally, you’ve cleaned your data. Now comes the fun bit—analyzing it! The type of data analysis you carry out largely depends on what your goal is. But there are many techniques available. Univariate or bivariate analysis, time-series analysis, and regression analysis are just a few you might have heard of. More important than the different types, though, is how you apply them. This depends on what insights you’re hoping to gain. Broadly speaking, all types of data analysis fit into one of the following four categories.

Descriptive analysis

Descriptive analysis identifies what has already happened . It is a common first step that companies carry out before proceeding with deeper explorations. As an example, let’s refer back to our fictional learning provider once more. TopNotch Learning might use descriptive analytics to analyze course completion rates for their customers. Or they might identify how many users access their products during a particular period. Perhaps they’ll use it to measure sales figures over the last five years. While the company might not draw firm conclusions from any of these insights, summarizing and describing the data will help them to determine how to proceed.

Learn more: What is descriptive analytics?

Diagnostic analysis

Diagnostic analytics focuses on understanding why something has happened . It is literally the diagnosis of a problem, just as a doctor uses a patient’s symptoms to diagnose a disease. Remember TopNotch Learning’s business problem? ‘Which factors are negatively impacting the customer experience?’ A diagnostic analysis would help answer this. For instance, it could help the company draw correlations between the issue (struggling to gain repeat business) and factors that might be causing it (e.g. project costs, speed of delivery, customer sector, etc.) Let’s imagine that, using diagnostic analytics, TopNotch realizes its clients in the retail sector are departing at a faster rate than other clients. This might suggest that they’re losing customers because they lack expertise in this sector. And that’s a useful insight!

Predictive analysis

Predictive analysis allows you to identify future trends based on historical data . In business, predictive analysis is commonly used to forecast future growth, for example. But it doesn’t stop there. Predictive analysis has grown increasingly sophisticated in recent years. The speedy evolution of machine learning allows organizations to make surprisingly accurate forecasts. Take the insurance industry. Insurance providers commonly use past data to predict which customer groups are more likely to get into accidents. As a result, they’ll hike up customer insurance premiums for those groups. Likewise, the retail industry often uses transaction data to predict where future trends lie, or to determine seasonal buying habits to inform their strategies. These are just a few simple examples, but the untapped potential of predictive analysis is pretty compelling.

Prescriptive analysis

Prescriptive analysis allows you to make recommendations for the future. This is the final step in the analytics part of the process. It’s also the most complex. This is because it incorporates aspects of all the other analyses we’ve described. A great example of prescriptive analytics is the algorithms that guide Google’s self-driving cars. Every second, these algorithms make countless decisions based on past and present data, ensuring a smooth, safe ride. Prescriptive analytics also helps companies decide on new products or areas of business to invest in.

Learn more:  What are the different types of data analysis?

5. Step five: Sharing your results

You’ve finished carrying out your analyses. You have your insights. The final step of the data analytics process is to share these insights with the wider world (or at least with your organization’s stakeholders!) This is more complex than simply sharing the raw results of your work—it involves interpreting the outcomes, and presenting them in a manner that’s digestible for all types of audiences. Since you’ll often present information to decision-makers, it’s very important that the insights you present are 100% clear and unambiguous. For this reason, data analysts commonly use reports, dashboards, and interactive visualizations to support their findings.

How you interpret and present results will often influence the direction of a business. Depending on what you share, your organization might decide to restructure, to launch a high-risk product, or even to close an entire division. That’s why it’s very important to provide all the evidence that you’ve gathered, and not to cherry-pick data. Ensuring that you cover everything in a clear, concise way will prove that your conclusions are scientifically sound and based on the facts. On the flip side, it’s important to highlight any gaps in the data or to flag any insights that might be open to interpretation. Honest communication is the most important part of the process. It will help the business, while also helping you to excel at your job!

Tools for interpreting and sharing your findings

There are tons of data visualization tools available, suited to different experience levels. Popular tools requiring little or no coding skills include Google Charts , Tableau , Datawrapper , and Infogram . If you’re familiar with Python and R, there are also many data visualization libraries and packages available. For instance, check out the Python libraries Plotly , Seaborn , and Matplotlib . Whichever data visualization tools you use, make sure you polish up your presentation skills, too. Remember: Visualization is great, but communication is key!

You can learn more about storytelling with data in this free, hands-on tutorial.

6. Step six: Embrace your failures

The last ‘step’ in the data analytics process is to embrace your failures. The path we’ve described above is more of an iterative process than a one-way street. Data analytics is inherently messy, and the process you follow will be different for every project. For instance, while cleaning data, you might spot patterns that spark a whole new set of questions. This could send you back to step one (to redefine your objective). Equally, an exploratory analysis might highlight a set of data points you’d never considered using before. Or maybe you find that the results of your core analyses are misleading or erroneous. This might be caused by mistakes in the data, or human error earlier in the process.

While these pitfalls can feel like failures, don’t be disheartened if they happen. Data analysis is inherently chaotic, and mistakes occur. What’s important is to hone your ability to spot and rectify errors. If data analytics was straightforward, it might be easier, but it certainly wouldn’t be as interesting. Use the steps we’ve outlined as a framework, stay open-minded, and be creative. If you lose your way, you can refer back to the process to keep yourself on track.

In this post, we’ve covered the main steps of the data analytics process. These core steps can be amended, re-ordered and re-used as you deem fit, but they underpin every data analyst’s work:

  • Define the question —What business problem are you trying to solve? Frame it as a question to help you focus on finding a clear answer.
  • Collect data —Create a strategy for collecting data. Which data sources are most likely to help you solve your business problem?
  • Clean the data —Explore, scrub, tidy, de-dupe, and structure your data as needed. Do whatever you have to! But don’t rush…take your time!
  • Analyze the data —Carry out various analyses to obtain insights. Focus on the four types of data analysis: descriptive, diagnostic, predictive, and prescriptive.
  • Share your results —How best can you share your insights and recommendations? A combination of visualization tools and communication is key.
  • Embrace your mistakes —Mistakes happen. Learn from them. This is what transforms a good data analyst into a great one.

What next? From here, we strongly encourage you to explore the topic on your own. Get creative with the steps in the data analysis process, and see what tools you can find. As long as you stick to the core principles we’ve described, you can create a tailored technique that works for you.

To learn more, check out our free, 5-day data analytics short course.

  These are the top 9 data analytics tools
  10 great places to find free datasets for your next project
  How to build a data analytics portfolio

Data Analysis 6 Steps: A Complete Guide Into Data Analysis Methodology

Data Analysis 6 Steps: A Complete Guide Into Data Analysis Methodology

We explore the 6 key steps in carrying out a data analysis process through examples and a comprehensive guide.

Despite being a science very much linked to technology, data analysis is still a science. Like any science, a data analysis process involves a rigorous and sequential procedure based on a series of steps that cannot be ignored. Discover the essential steps of a data analysis process through examples and a comprehensive guide.

pasos a seguir para llevar a cabo un análisis de datos

Often, when we talk about data analysis, we focus on the tools and technological knowledge associated with this scientific field which, although fundamental, are subordinate to the methodology of the data analysis process.

In this article we focus on the 6 essential steps of a data analysis process with examples and addressing the core points of the process' methodology : how to establish the objectives of the analysis , how to collect the data and how to perform the analysis . Each of the steps listed in this publication requires different expertise and knowledge. However, understanding the entire process is crucial to drawing meaningful conclusions.

Don't miss: The Role of Data Analytics in Business

On the other hand, it is important to note that an enterprise data analytics process depends on the maturity of the company's data strategy . Companies with a more developed data-driven culture will be able to conduct deeper, more complex and more efficient data analysis.

If you are interested in improving your corporate data strategy or in discovering how to design an efficient data strategy , we encourage you to download the e-book: "How to create a data strategy to leverage the business value of data" .

The 6 steps of a data analysis process in business

Step 1 of the data analysis process: define a specific objective.

definir un objetivo

The initial phase of any data analysis process is to define the specific objective of the analysis . That is, to establish what we want to achieve with the analysis. In the case of a business data analysis, our specific objective will be linked to a business goal and, as a consequence, to a performance indicator or KPI .

To define your objective effectively, you can formulate a hypothesis and define an evaluation strategy to test it. However, this step should always start from a crucial question:

What business objective do I want to achieve?

What business challenge am I trying to address?

While this process may seem simple, it is often more complicated than it first appears. For a data analytics process to be efficient, it is essential that the data analyst has a thorough understanding of the company's operations and business objectives .

Once the objective or problem we want to solve has been defined, the next step is to identify the data and data sources we need to achieve it. Again, this is where the business vision of the data analyst comes into play. Identifying the data sources that will provide the information to answer the question posed involves extensive knowledge of the business and its activity.

Bismart Tip: How to set the right objective?

Setting the objective of an analysis depends, in part, on our creative problem-solving skills and our level of knowledge about the field under study. However, in the case of a business data analysis, it is most effective to pay attention to established performance indicators and business metrics about the field of study we want to solve . Exploring the company's activity reports and dashboards will provide valuable information about the organisation's areas of interest.

Step 2 of the data analysis process: Data collection

fuente de datos

Once the objective has been defined, it is time to design a plan to obtain and consolidate the necessary data . At this point it is essential to identify the specific types of data you need, which can be quantitative (numerical data such as sales figures) or qualitative (descriptive data such as customer feedback).

On the other hand, you should also consider the typology of data in terms of the data source , which can be classified as: first-party data, second-party data and third-party data.

First-party data:

First-party data is the information that you or your organisation collects directly . It typically includes transactional tracking data or information obtained from your company's customer relationship management system, whether it is a CRM or a Customer Data Platform (CDP) .

Regardless of its source, first-party data is usually presented in a structured and well-organised way. Other sources of first-party data may include customer satisfaction surveys, feedback from focus groups, interviews or observational data.

Second-party data:

Second-party data is information that other organisations have directly collected . It can be understood as first-party data that has been collected for a different purpose than your analysis.

The main advantage of second-party data is that it is usually organised in a structured way. That is, it often is structured data that will make your work easier. It also tends to have a high degree of reliability. Examples of second-hand data include website, apps or social media activity, as well as online purchase or shipping data.

Third-party data:

Third-party data is information collected and consolidated from various sources by an external entity . Third-party data often comprises a wide range of unstructured data points. Many organisations collect data from third parties to generate industry reports or conduct a market research.

A specific example of third-party data collection is provided by the consultancy Gartner, which collects and distributes data of high business value to other companies.

Step 3 of the data analysis process: Data cleaning

limpieza de datos

Once we have collected the data we need, we need to prepare it for analysis. This involves a process known as data cleaning or consolidation, which is essential to ensure that the data we are working with is of quality .

The most common tasks in this part of the process are:

Eliminating significant errors, duplicated data and inconsistencies, which are inherent issues when aggregating data from different sources.

Getting rid of irrelevant data , i.e. extracting observations that are not relevant to the intended analysis.

Organising and structuring the data : performing general "cleaning" tasks, such as rectifying typographical errors or layout discrepancies, to facilitate data mapping and manipulation.

Fixing important gaps in the data : during the cleaning process, important missing data may be identified and should be remedied as soon as possible.

It is important to understand that this is the most time-consuming part of the process. In fact, it is estimated that a data analyst typically spends around 70-90% of their time cleaning data . If you are interested in learning more about the specific steps involved in this part of the process, you can read our post on data processing .

Bismart Tip: Resources to speed up data cleansing

Manually cleaning datasets can be a very time consuming task. Fortunately, there are several tools available to simplify this process. Open source tools such as OpenRefine are excellent options for basic data cleansing and even offer advanced scanning functions. However, free tools can have limitations when dealing with very large datasets. For more robust data cleaning, Python libraries such as Pandas and certain R packages are more suitable. Fluency in these programming languages is essential for their effective use.

Step 4 of the data analysis process: Data analysis

analizar los datos

Once the data has been cleaned and prepared, it is time to dive into the most exciting phase of the process, data analysis .

At this point, we should bear in mind that there are different types of data analysis and that the type of data analysis we choose will depend , to a large extent, on the objective of our analysis . On the other hand, there are also multiple techniques to carry out data analysis. Some of the best known are univariate or bivariate analysis, time series analysis and regression analysis.

In a broader context, all forms of data analysis fall into one of the following four categories.

Types of data analysis

Descriptive analysis.

Descriptive analysis is a type of analysis that explores past events . It is the first step that companies usually take before going into more in-depth investigations. 

Diagnostic analysis

Diagnostic analysis revolves around unravelling the "why" of something. In other words, the objective of this type of analysis is to discover the causes or reasons for an event of interest to the company.

Predictive analytics

The focus of predictive analytics is to forecast future trends based on historical data . In business, predictive analytics is becoming increasingly relevant.

Unlike the other types of analysis, predictive analytics is linked to artificial intelligence and, typically, to machine learning and deep learning . Recent advances in machine learning have significantly improved the accuracy of predictive analytics and it is now one of the most valued types of analysis by companies.

Predictive analytics enables a company's senior management to take high-value actions such as solving problems before they happen, anticipating future market trends or taking strategic actions ahead of the competition.

Prescriptive analysis

Prescriptive analysis is an evolution of the three types of analysis mentioned so far. It is a methodology that combines descriptive, diagnostic and predictive analytics to formulate recommendations for the future . In other words, it goes one step further than predictive analytics. Rather than simply explaining what will happen in the future, it offers the most appropriate courses of action based on what will happen. In business, prescriptive analytics can be very useful in determining new product projects or investment areas by aggregating information from other types of analytics.

An example of prescriptive analytics is the algorithms that guide Google's self-driving cars. These algorithms make a multitude of real-time decisions based on historical and current data, ensuring a safe and smooth journey. 

Step 5 of the data analysis process: Transforming results into reports or dashboards

report o cuadro de mando empresarial

Once the analysis is complete and conclusions have been drawn, the final stage of the data analysis process is to share these findings with a wider audience . In the case of a business data analysis, to the organisation's stakeholders.

This step requires interpreting the results and presenting them in an easily understandable way so that senior management can make data-driven decisions . It is therefore essential to convey clear, concise and unambiguous ideas. Data visualisation plays a key role in achieving this and data analysts frequently use reporting tools such as Power BI to transform data into interactive reports and dashboards to support their conclusions.

The interpretation and presentation of results significantly influences the trajectory of a company. In this regard, it is essential to provide a complete, clear and concise overview that demonstrates a scientific and fact-based methodology for the conclusions drawn. On the other hand, it is also critical to be honest and transparent and to share with stakeholders any doubts or unclear conclusions you may have about the analysis and its results.

The best data visualisation and reporting tools

If you want to delve deeper into this part of the data analysis process, don't miss our post on the best business intelligence tools .

However, we anticipate that Power BI has been proclaimed the leading BI and analytics platform in the market in 2023 by Gartner .

At Bismart, as a Microsoft Power BI partner , we have a large team of Power BI experts and, in addition, we also have our set of specific solutions to improve the productivity and performance of Power BI .

Recently, we have created an e-book in which we explore the keys for a company to develop an efficient self-service BI strategy with Power BI . Don't miss it!

Step 6 of the data analysis process: Transforming insights into actions and business opportunities


The final stage of a data analysis process involves turning the intelligence obtained into actions and business opportunities .

On the other hand, it is essential to be aware that a data analysis process is not a linear process, but rather a complex process full of ramifications . For example, during the data cleansing phase, you may identify patterns that raise new questions, leading you back to the first step of redefining your objectives. Similarly, an exploratory analysis may uncover a set of data that you had not previously considered. You may also discover that the results of your central analysis seem misleading or incorrect, perhaps due to inaccuracies in the data or human error earlier in the process.

Although these obstacles may seem like setbacks, it is essential not to become discouraged. Data analysis is intricate and setbacks are a natural part of the process.

In this article, we have delved i nto the key stages of a data analysis process , which, in brief, are as follows:

Defining the objective : Define the business challenge we intend to address. Formulating it as a question provides a structured approach to finding a clear solution.

Collect the data : Developing a strategy for gathering the data needed to answer our question and identifying the data sources most likely to have the information we need.

Clean the data : Drill down into the data, cleaning, organising and structuring it as necessary.

Analyse the data using one of four main types of data analysis : descriptive, diagnostic, predictive and prescriptive.

Disseminate findings : Choose the most effective means to disseminate our insights in a way that is clear, concise and encourages intelligent decision-making.

Learning from setbacks : Recognising and learning from mistakes is part of the journey. Challenges that arise during the process are learning opportunities that can also transform our analysis process into a more effective strategy.

Before you go...

Companies with a well-defined and efficient data strategy are much more likely to obtain truly useful business intelligence.

We encourage you to explore in more depth the steps to take to consolidate an enterprise data strategy through our e-book "How to create a data strategy":

Keep up-to-date with the world of data!

Recent posts, the future of ai and data: trends that will dominate in 2025, maximizing revenue: top hotel upselling strategies for 2024-2025, top 6 strategies to attract and keep top it talent, integrating the balanced scorecard (bsc) with agile methodologies, hotel revenue management: strategies and benefits, explore more posts.

steps in research data analysis

What Is a Dashboard in Data Analytics and Business Intelligence?

Nowadays, almost all companies use dashboards to visually represent and track the performance of their business activity. Dashboards are a major tool...

steps in research data analysis

Microsoft Updates on Data Analysis Beyond Power BI

In recent months Microsoft has released several updates to its data analysis tools in response to the business transformation brought about by...

steps in research data analysis

9 Best Data Analysis Tools for Perfect Data Management

The importance of data analytics has continued to rise in recent years leading to an important worldwide market opening. So, data analysis tools have...

survey software icon

  • Resources Blog eBooks Survey Templates Case Studies Training Help center

steps in research data analysis

Home Market Research

Data Analysis in Research: Types & Methods


Content Index

Why analyze data in research?

Types of data in research, finding patterns in the qualitative data, methods used for data analysis in qualitative research, preparing data for analysis, methods used for data analysis in quantitative research, considerations in research data analysis, what is data analysis in research.

Definition of research in data analysis: According to LeCompte and Schensul, research data analysis is a process used by researchers to reduce data to a story and interpret it to derive insights. The data analysis process helps reduce a large chunk of data into smaller fragments, which makes sense. 

Three essential things occur during the data analysis process — the first is data organization . Summarization and categorization together contribute to becoming the second known method used for data reduction. It helps find patterns and themes in the data for easy identification and linking. The third and last way is data analysis – researchers do it in both top-down and bottom-up fashion.

LEARN ABOUT: Research Process Steps

On the other hand, Marshall and Rossman describe data analysis as a messy, ambiguous, and time-consuming but creative and fascinating process through which a mass of collected data is brought to order, structure and meaning.

We can say that “the data analysis and data interpretation is a process representing the application of deductive and inductive logic to the research and data analysis.”

Researchers rely heavily on data as they have a story to tell or research problems to solve. It starts with a question, and data is nothing but an answer to that question. But, what if there is no question to ask? Well! It is possible to explore data even without a problem – we call it ‘Data Mining’, which often reveals some interesting patterns within the data that are worth exploring.

Irrelevant to the type of data researchers explore, their mission and audiences’ vision guide them to find the patterns to shape the story they want to tell. One of the essential things expected from researchers while analyzing data is to stay open and remain unbiased toward unexpected patterns, expressions, and results. Remember, sometimes, data analysis tells the most unforeseen yet exciting stories that were not expected when initiating data analysis. Therefore, rely on the data you have at hand and enjoy the journey of exploratory research. 

Create a Free Account

Every kind of data has a rare quality of describing things after assigning a specific value to it. For analysis, you need to organize these values, processed and presented in a given context, to make it useful. Data can be in different forms; here are the primary data types.

  • Qualitative data: When the data presented has words and descriptions, then we call it qualitative data . Although you can observe this data, it is subjective and harder to analyze data in research, especially for comparison. Example: Quality data represents everything describing taste, experience, texture, or an opinion that is considered quality data. This type of data is usually collected through focus groups, personal qualitative interviews , qualitative observation or using open-ended questions in surveys.
  • Quantitative data: Any data expressed in numbers of numerical figures are called quantitative data . This type of data can be distinguished into categories, grouped, measured, calculated, or ranked. Example: questions such as age, rank, cost, length, weight, scores, etc. everything comes under this type of data. You can present such data in graphical format, charts, or apply statistical analysis methods to this data. The (Outcomes Measurement Systems) OMS questionnaires in surveys are a significant source of collecting numeric data.
  • Categorical data: It is data presented in groups. However, an item included in the categorical data cannot belong to more than one group. Example: A person responding to a survey by telling his living style, marital status, smoking habit, or drinking habit comes under the categorical data. A chi-square test is a standard method used to analyze this data.

Learn More : Examples of Qualitative Data in Education

Data analysis in qualitative research

Data analysis and qualitative data research work a little differently from the numerical data as the quality data is made up of words, descriptions, images, objects, and sometimes symbols. Getting insight from such complicated information is a complicated process. Hence it is typically used for exploratory research and data analysis .

Although there are several ways to find patterns in the textual information, a word-based method is the most relied and widely used global technique for research and data analysis. Notably, the data analysis process in qualitative research is manual. Here the researchers usually read the available data and find repetitive or commonly used words. 

For example, while studying data collected from African countries to understand the most pressing issues people face, researchers might find  “food”  and  “hunger” are the most commonly used words and will highlight them for further analysis.

LEARN ABOUT: Level of Analysis

The keyword context is another widely used word-based technique. In this method, the researcher tries to understand the concept by analyzing the context in which the participants use a particular keyword.  

For example , researchers conducting research and data analysis for studying the concept of ‘diabetes’ amongst respondents might analyze the context of when and how the respondent has used or referred to the word ‘diabetes.’

The scrutiny-based technique is also one of the highly recommended  text analysis  methods used to identify a quality data pattern. Compare and contrast is the widely used method under this technique to differentiate how a specific text is similar or different from each other. 

For example: To find out the “importance of resident doctor in a company,” the collected data is divided into people who think it is necessary to hire a resident doctor and those who think it is unnecessary. Compare and contrast is the best method that can be used to analyze the polls having single-answer questions types .

Metaphors can be used to reduce the data pile and find patterns in it so that it becomes easier to connect data with theory.

Variable Partitioning is another technique used to split variables so that researchers can find more coherent descriptions and explanations from the enormous data.

LEARN ABOUT: Qualitative Research Questions and Questionnaires

There are several techniques to analyze the data in qualitative research, but here are some commonly used methods,

  • Content Analysis:  It is widely accepted and the most frequently employed technique for data analysis in research methodology. It can be used to analyze the documented information from text, images, and sometimes from the physical items. It depends on the research questions to predict when and where to use this method.
  • Narrative Analysis: This method is used to analyze content gathered from various sources such as personal interviews, field observation, and  surveys . The majority of times, stories, or opinions shared by people are focused on finding answers to the research questions.
  • Discourse Analysis:  Similar to narrative analysis, discourse analysis is used to analyze the interactions with people. Nevertheless, this particular method considers the social context under which or within which the communication between the researcher and respondent takes place. In addition to that, discourse analysis also focuses on the lifestyle and day-to-day environment while deriving any conclusion.
  • Grounded Theory:  When you want to explain why a particular phenomenon happened, then using grounded theory for analyzing quality data is the best resort. Grounded theory is applied to study data about the host of similar cases occurring in different settings. When researchers are using this method, they might alter explanations or produce new ones until they arrive at some conclusion.

LEARN ABOUT: 12 Best Tools for Researchers

Data analysis in quantitative research

The first stage in research and data analysis is to make it for the analysis so that the nominal data can be converted into something meaningful. Data preparation consists of the below phases.

Phase I: Data Validation

Data validation is done to understand if the collected data sample is per the pre-set standards, or it is a biased data sample again divided into four different stages

  • Fraud: To ensure an actual human being records each response to the survey or the questionnaire
  • Screening: To make sure each participant or respondent is selected or chosen in compliance with the research criteria
  • Procedure: To ensure ethical standards were maintained while collecting the data sample
  • Completeness: To ensure that the respondent has answered all the questions in an online survey. Else, the interviewer had asked all the questions devised in the questionnaire.

Phase II: Data Editing

More often, an extensive research data sample comes loaded with errors. Respondents sometimes fill in some fields incorrectly or sometimes skip them accidentally. Data editing is a process wherein the researchers have to confirm that the provided data is free of such errors. They need to conduct necessary checks and outlier checks to edit the raw edit and make it ready for analysis.

Phase III: Data Coding

Out of all three, this is the most critical phase of data preparation associated with grouping and assigning values to the survey responses . If a survey is completed with a 1000 sample size, the researcher will create an age bracket to distinguish the respondents based on their age. Thus, it becomes easier to analyze small data buckets rather than deal with the massive data pile.

LEARN ABOUT: Steps in Qualitative Research

After the data is prepared for analysis, researchers are open to using different research and data analysis methods to derive meaningful insights. For sure, statistical analysis plans are the most favored to analyze numerical data. In statistical analysis, distinguishing between categorical data and numerical data is essential, as categorical data involves distinct categories or labels, while numerical data consists of measurable quantities. The method is again classified into two groups. First, ‘Descriptive Statistics’ used to describe data. Second, ‘Inferential statistics’ that helps in comparing the data .

Descriptive statistics

This method is used to describe the basic features of versatile types of data in research. It presents the data in such a meaningful way that pattern in the data starts making sense. Nevertheless, the descriptive analysis does not go beyond making conclusions. The conclusions are again based on the hypothesis researchers have formulated so far. Here are a few major types of descriptive analysis methods.

Measures of Frequency

  • Count, Percent, Frequency
  • It is used to denote home often a particular event occurs.
  • Researchers use it when they want to showcase how often a response is given.

Measures of Central Tendency

  • Mean, Median, Mode
  • The method is widely used to demonstrate distribution by various points.
  • Researchers use this method when they want to showcase the most commonly or averagely indicated response.

Measures of Dispersion or Variation

  • Range, Variance, Standard deviation
  • Here the field equals high/low points.
  • Variance standard deviation = difference between the observed score and mean
  • It is used to identify the spread of scores by stating intervals.
  • Researchers use this method to showcase data spread out. It helps them identify the depth until which the data is spread out that it directly affects the mean.

Measures of Position

  • Percentile ranks, Quartile ranks
  • It relies on standardized scores helping researchers to identify the relationship between different scores.
  • It is often used when researchers want to compare scores with the average count.

For quantitative research use of descriptive analysis often give absolute numbers, but the in-depth analysis is never sufficient to demonstrate the rationale behind those numbers. Nevertheless, it is necessary to think of the best method for research and data analysis suiting your survey questionnaire and what story researchers want to tell. For example, the mean is the best way to demonstrate the students’ average scores in schools. It is better to rely on the descriptive statistics when the researchers intend to keep the research or outcome limited to the provided  sample  without generalizing it. For example, when you want to compare average voting done in two different cities, differential statistics are enough.

Descriptive analysis is also called a ‘univariate analysis’ since it is commonly used to analyze a single variable.

Inferential statistics

Inferential statistics are used to make predictions about a larger population after research and data analysis of the representing population’s collected sample. For example, you can ask some odd 100 audiences at a movie theater if they like the movie they are watching. Researchers then use inferential statistics on the collected  sample  to reason that about 80-90% of people like the movie. 

Here are two significant areas of inferential statistics.

  • Estimating parameters: It takes statistics from the sample research data and demonstrates something about the population parameter.
  • Hypothesis test: I t’s about sampling research data to answer the survey research questions. For example, researchers might be interested to understand if the new shade of lipstick recently launched is good or not, or if the multivitamin capsules help children to perform better at games.

These are sophisticated analysis methods used to showcase the relationship between different variables instead of describing a single variable. It is often used when researchers want something beyond absolute numbers to understand the relationship between variables.

Here are some of the commonly used methods for data analysis in research.

  • Correlation: When researchers are not conducting experimental research or quasi-experimental research wherein the researchers are interested to understand the relationship between two or more variables, they opt for correlational research methods.
  • Cross-tabulation: Also called contingency tables,  cross-tabulation  is used to analyze the relationship between multiple variables.  Suppose provided data has age and gender categories presented in rows and columns. A two-dimensional cross-tabulation helps for seamless data analysis and research by showing the number of males and females in each age category.
  • Regression analysis: For understanding the strong relationship between two variables, researchers do not look beyond the primary and commonly used regression analysis method, which is also a type of predictive analysis used. In this method, you have an essential factor called the dependent variable. You also have multiple independent variables in regression analysis. You undertake efforts to find out the impact of independent variables on the dependent variable. The values of both independent and dependent variables are assumed as being ascertained in an error-free random manner.
  • Frequency tables: The statistical procedure is used for testing the degree to which two or more vary or differ in an experiment. A considerable degree of variation means research findings were significant. In many contexts, ANOVA testing and variance analysis are similar.
  • Analysis of variance: The statistical procedure is used for testing the degree to which two or more vary or differ in an experiment. A considerable degree of variation means research findings were significant. In many contexts, ANOVA testing and variance analysis are similar.
  • Researchers must have the necessary research skills to analyze and manipulation the data , Getting trained to demonstrate a high standard of research practice. Ideally, researchers must possess more than a basic understanding of the rationale of selecting one statistical method over the other to obtain better data insights.
  • Usually, research and data analytics projects differ by scientific discipline; therefore, getting statistical advice at the beginning of analysis helps design a survey questionnaire, select data collection methods , and choose samples.

LEARN ABOUT: Best Data Collection Tools

  • The primary aim of data research and analysis is to derive ultimate insights that are unbiased. Any mistake in or keeping a biased mind to collect data, selecting an analysis method, or choosing  audience  sample il to draw a biased inference.
  • Irrelevant to the sophistication used in research data and analysis is enough to rectify the poorly defined objective outcome measurements. It does not matter if the design is at fault or intentions are not clear, but lack of clarity might mislead readers, so avoid the practice.
  • The motive behind data analysis in research is to present accurate and reliable data. As far as possible, avoid statistical errors, and find a way to deal with everyday challenges like outliers, missing data, data altering, data mining , or developing graphical representation.

LEARN MORE: Descriptive Research vs Correlational Research The sheer amount of data generated daily is frightening. Especially when data analysis has taken center stage. in 2018. In last year, the total data supply amounted to 2.8 trillion gigabytes. Hence, it is clear that the enterprises willing to survive in the hypercompetitive world must possess an excellent capability to analyze complex research data, derive actionable insights, and adapt to the new market needs.

LEARN ABOUT: Average Order Value

QuestionPro is an online survey platform that empowers organizations in data analysis and research and provides them a medium to collect data by creating appealing surveys.


Experimental vs Observational Studies: Differences & Examples

Experimental vs Observational Studies: Differences & Examples

Sep 5, 2024

Interactive forms

Interactive Forms: Key Features, Benefits, Uses + Design Tips

Sep 4, 2024

closed-loop management

Closed-Loop Management: The Key to Customer Centricity

Sep 3, 2024

Net Trust Score

Net Trust Score: Tool for Measuring Trust in Organization

Sep 2, 2024

Home » Data Analysis – Process, Methods and Types

Data Analysis – Process, Methods and Types

Table of Contents

Data Analysis

Data Analysis


Data analysis refers to the process of inspecting, cleaning, transforming, and modeling data with the goal of discovering useful information, drawing conclusions, and supporting decision-making. It involves applying various statistical and computational techniques to interpret and derive insights from large datasets. The ultimate aim of data analysis is to convert raw data into actionable insights that can inform business decisions, scientific research, and other endeavors.

Data Analysis Process

The following are step-by-step guides to the data analysis process:

Define the Problem

The first step in data analysis is to clearly define the problem or question that needs to be answered. This involves identifying the purpose of the analysis, the data required, and the intended outcome.

Collect the Data

The next step is to collect the relevant data from various sources. This may involve collecting data from surveys, databases, or other sources. It is important to ensure that the data collected is accurate, complete, and relevant to the problem being analyzed.

Clean and Organize the Data

Once the data has been collected, it needs to be cleaned and organized. This involves removing any errors or inconsistencies in the data, filling in missing values, and ensuring that the data is in a format that can be easily analyzed.

Analyze the Data

The next step is to analyze the data using various statistical and analytical techniques. This may involve identifying patterns in the data, conducting statistical tests, or using machine learning algorithms to identify trends and insights.

Interpret the Results

After analyzing the data, the next step is to interpret the results. This involves drawing conclusions based on the analysis and identifying any significant findings or trends.

Communicate the Findings

Once the results have been interpreted, they need to be communicated to stakeholders. This may involve creating reports, visualizations, or presentations to effectively communicate the findings and recommendations.

Take Action

The final step in the data analysis process is to take action based on the findings. This may involve implementing new policies or procedures, making strategic decisions, or taking other actions based on the insights gained from the analysis.

Types of Data Analysis

Types of Data Analysis are as follows:

Descriptive Analysis

This type of analysis involves summarizing and describing the main characteristics of a dataset, such as the mean, median, mode, standard deviation, and range.

Inferential Analysis

This type of analysis involves making inferences about a population based on a sample. Inferential analysis can help determine whether a certain relationship or pattern observed in a sample is likely to be present in the entire population.

Diagnostic Analysis

This type of analysis involves identifying and diagnosing problems or issues within a dataset. Diagnostic analysis can help identify outliers, errors, missing data, or other anomalies in the dataset.

Predictive Analysis

This type of analysis involves using statistical models and algorithms to predict future outcomes or trends based on historical data. Predictive analysis can help businesses and organizations make informed decisions about the future.

Prescriptive Analysis

This type of analysis involves recommending a course of action based on the results of previous analyses. Prescriptive analysis can help organizations make data-driven decisions about how to optimize their operations, products, or services.

Exploratory Analysis

This type of analysis involves exploring the relationships and patterns within a dataset to identify new insights and trends. Exploratory analysis is often used in the early stages of research or data analysis to generate hypotheses and identify areas for further investigation.

Data Analysis Methods

Data Analysis Methods are as follows:

Statistical Analysis

This method involves the use of mathematical models and statistical tools to analyze and interpret data. It includes measures of central tendency, correlation analysis, regression analysis, hypothesis testing, and more.

Machine Learning

This method involves the use of algorithms to identify patterns and relationships in data. It includes supervised and unsupervised learning, classification, clustering, and predictive modeling.

Data Mining

This method involves using statistical and machine learning techniques to extract information and insights from large and complex datasets.

Text Analysis

This method involves using natural language processing (NLP) techniques to analyze and interpret text data. It includes sentiment analysis, topic modeling, and entity recognition.

Network Analysis

This method involves analyzing the relationships and connections between entities in a network, such as social networks or computer networks. It includes social network analysis and graph theory.

Time Series Analysis

This method involves analyzing data collected over time to identify patterns and trends. It includes forecasting, decomposition, and smoothing techniques.

Spatial Analysis

This method involves analyzing geographic data to identify spatial patterns and relationships. It includes spatial statistics, spatial regression, and geospatial data visualization.

Data Visualization

This method involves using graphs, charts, and other visual representations to help communicate the findings of the analysis. It includes scatter plots, bar charts, heat maps, and interactive dashboards.

Qualitative Analysis

This method involves analyzing non-numeric data such as interviews, observations, and open-ended survey responses. It includes thematic analysis, content analysis, and grounded theory.

Multi-criteria Decision Analysis

This method involves analyzing multiple criteria and objectives to support decision-making. It includes techniques such as the analytical hierarchy process, TOPSIS, and ELECTRE.

Data Analysis Tools

There are various data analysis tools available that can help with different aspects of data analysis. Below is a list of some commonly used data analysis tools:

  • Microsoft Excel: A widely used spreadsheet program that allows for data organization, analysis, and visualization.
  • SQL : A programming language used to manage and manipulate relational databases.
  • R : An open-source programming language and software environment for statistical computing and graphics.
  • Python : A general-purpose programming language that is widely used in data analysis and machine learning.
  • Tableau : A data visualization software that allows for interactive and dynamic visualizations of data.
  • SAS : A statistical analysis software used for data management, analysis, and reporting.
  • SPSS : A statistical analysis software used for data analysis, reporting, and modeling.
  • Matlab : A numerical computing software that is widely used in scientific research and engineering.
  • RapidMiner : A data science platform that offers a wide range of data analysis and machine learning tools.

Applications of Data Analysis

Data analysis has numerous applications across various fields. Below are some examples of how data analysis is used in different fields:

  • Business : Data analysis is used to gain insights into customer behavior, market trends, and financial performance. This includes customer segmentation, sales forecasting, and market research.
  • Healthcare : Data analysis is used to identify patterns and trends in patient data, improve patient outcomes, and optimize healthcare operations. This includes clinical decision support, disease surveillance, and healthcare cost analysis.
  • Education : Data analysis is used to measure student performance, evaluate teaching effectiveness, and improve educational programs. This includes assessment analytics, learning analytics, and program evaluation.
  • Finance : Data analysis is used to monitor and evaluate financial performance, identify risks, and make investment decisions. This includes risk management, portfolio optimization, and fraud detection.
  • Government : Data analysis is used to inform policy-making, improve public services, and enhance public safety. This includes crime analysis, disaster response planning, and social welfare program evaluation.
  • Sports : Data analysis is used to gain insights into athlete performance, improve team strategy, and enhance fan engagement. This includes player evaluation, scouting analysis, and game strategy optimization.
  • Marketing : Data analysis is used to measure the effectiveness of marketing campaigns, understand customer behavior, and develop targeted marketing strategies. This includes customer segmentation, marketing attribution analysis, and social media analytics.
  • Environmental science : Data analysis is used to monitor and evaluate environmental conditions, assess the impact of human activities on the environment, and develop environmental policies. This includes climate modeling, ecological forecasting, and pollution monitoring.

When to Use Data Analysis

Data analysis is useful when you need to extract meaningful insights and information from large and complex datasets. It is a crucial step in the decision-making process, as it helps you understand the underlying patterns and relationships within the data, and identify potential areas for improvement or opportunities for growth.

Here are some specific scenarios where data analysis can be particularly helpful:

  • Problem-solving : When you encounter a problem or challenge, data analysis can help you identify the root cause and develop effective solutions.
  • Optimization : Data analysis can help you optimize processes, products, or services to increase efficiency, reduce costs, and improve overall performance.
  • Prediction: Data analysis can help you make predictions about future trends or outcomes, which can inform strategic planning and decision-making.
  • Performance evaluation : Data analysis can help you evaluate the performance of a process, product, or service to identify areas for improvement and potential opportunities for growth.
  • Risk assessment : Data analysis can help you assess and mitigate risks, whether it is financial, operational, or related to safety.
  • Market research : Data analysis can help you understand customer behavior and preferences, identify market trends, and develop effective marketing strategies.
  • Quality control: Data analysis can help you ensure product quality and customer satisfaction by identifying and addressing quality issues.

Purpose of Data Analysis

The primary purposes of data analysis can be summarized as follows:

  • To gain insights: Data analysis allows you to identify patterns and trends in data, which can provide valuable insights into the underlying factors that influence a particular phenomenon or process.
  • To inform decision-making: Data analysis can help you make informed decisions based on the information that is available. By analyzing data, you can identify potential risks, opportunities, and solutions to problems.
  • To improve performance: Data analysis can help you optimize processes, products, or services by identifying areas for improvement and potential opportunities for growth.
  • To measure progress: Data analysis can help you measure progress towards a specific goal or objective, allowing you to track performance over time and adjust your strategies accordingly.
  • To identify new opportunities: Data analysis can help you identify new opportunities for growth and innovation by identifying patterns and trends that may not have been visible before.

Examples of Data Analysis

Some Examples of Data Analysis are as follows:

  • Social Media Monitoring: Companies use data analysis to monitor social media activity in real-time to understand their brand reputation, identify potential customer issues, and track competitors. By analyzing social media data, businesses can make informed decisions on product development, marketing strategies, and customer service.
  • Financial Trading: Financial traders use data analysis to make real-time decisions about buying and selling stocks, bonds, and other financial instruments. By analyzing real-time market data, traders can identify trends and patterns that help them make informed investment decisions.
  • Traffic Monitoring : Cities use data analysis to monitor traffic patterns and make real-time decisions about traffic management. By analyzing data from traffic cameras, sensors, and other sources, cities can identify congestion hotspots and make changes to improve traffic flow.
  • Healthcare Monitoring: Healthcare providers use data analysis to monitor patient health in real-time. By analyzing data from wearable devices, electronic health records, and other sources, healthcare providers can identify potential health issues and provide timely interventions.
  • Online Advertising: Online advertisers use data analysis to make real-time decisions about advertising campaigns. By analyzing data on user behavior and ad performance, advertisers can make adjustments to their campaigns to improve their effectiveness.
  • Sports Analysis : Sports teams use data analysis to make real-time decisions about strategy and player performance. By analyzing data on player movement, ball position, and other variables, coaches can make informed decisions about substitutions, game strategy, and training regimens.
  • Energy Management : Energy companies use data analysis to monitor energy consumption in real-time. By analyzing data on energy usage patterns, companies can identify opportunities to reduce energy consumption and improve efficiency.

Characteristics of Data Analysis

Characteristics of Data Analysis are as follows:

  • Objective : Data analysis should be objective and based on empirical evidence, rather than subjective assumptions or opinions.
  • Systematic : Data analysis should follow a systematic approach, using established methods and procedures for collecting, cleaning, and analyzing data.
  • Accurate : Data analysis should produce accurate results, free from errors and bias. Data should be validated and verified to ensure its quality.
  • Relevant : Data analysis should be relevant to the research question or problem being addressed. It should focus on the data that is most useful for answering the research question or solving the problem.
  • Comprehensive : Data analysis should be comprehensive and consider all relevant factors that may affect the research question or problem.
  • Timely : Data analysis should be conducted in a timely manner, so that the results are available when they are needed.
  • Reproducible : Data analysis should be reproducible, meaning that other researchers should be able to replicate the analysis using the same data and methods.
  • Communicable : Data analysis should be communicated clearly and effectively to stakeholders and other interested parties. The results should be presented in a way that is understandable and useful for decision-making.

Advantages of Data Analysis

Advantages of Data Analysis are as follows:

  • Better decision-making: Data analysis helps in making informed decisions based on facts and evidence, rather than intuition or guesswork.
  • Improved efficiency: Data analysis can identify inefficiencies and bottlenecks in business processes, allowing organizations to optimize their operations and reduce costs.
  • Increased accuracy: Data analysis helps to reduce errors and bias, providing more accurate and reliable information.
  • Better customer service: Data analysis can help organizations understand their customers better, allowing them to provide better customer service and improve customer satisfaction.
  • Competitive advantage: Data analysis can provide organizations with insights into their competitors, allowing them to identify areas where they can gain a competitive advantage.
  • Identification of trends and patterns : Data analysis can identify trends and patterns in data that may not be immediately apparent, helping organizations to make predictions and plan for the future.
  • Improved risk management : Data analysis can help organizations identify potential risks and take proactive steps to mitigate them.
  • Innovation: Data analysis can inspire innovation and new ideas by revealing new opportunities or previously unknown correlations in data.

Limitations of Data Analysis

  • Data quality: The quality of data can impact the accuracy and reliability of analysis results. If data is incomplete, inconsistent, or outdated, the analysis may not provide meaningful insights.
  • Limited scope: Data analysis is limited by the scope of the data available. If data is incomplete or does not capture all relevant factors, the analysis may not provide a complete picture.
  • Human error : Data analysis is often conducted by humans, and errors can occur in data collection, cleaning, and analysis.
  • Cost : Data analysis can be expensive, requiring specialized tools, software, and expertise.
  • Time-consuming : Data analysis can be time-consuming, especially when working with large datasets or conducting complex analyses.
  • Overreliance on data: Data analysis should be complemented with human intuition and expertise. Overreliance on data can lead to a lack of creativity and innovation.
  • Privacy concerns: Data analysis can raise privacy concerns if personal or sensitive information is used without proper consent or security measures.

  • Knowledge Base

The Beginner's Guide to Statistical Analysis | 5 Steps & Examples

Statistical analysis means investigating trends, patterns, and relationships using quantitative data . It is an important research tool used by scientists, governments, businesses, and other organizations.

To draw valid conclusions, statistical analysis requires careful planning from the very start of the research process . You need to specify your hypotheses and make decisions about your research design, sample size, and sampling procedure.

After collecting data from your sample, you can organize and summarize the data using descriptive statistics . Then, you can use inferential statistics to formally test hypotheses and make estimates about the population. Finally, you can interpret and generalize your findings.

This article is a practical introduction to statistical analysis for students and researchers. We’ll walk you through the steps using two research examples. The first investigates a potential cause-and-effect relationship, while the second investigates a potential correlation between variables.

Table of contents

Step 1: write your hypotheses and plan your research design, step 2: collect data from a sample, step 3: summarize your data with descriptive statistics, step 4: test hypotheses or make estimates with inferential statistics, step 5: interpret your results, other interesting articles.

To collect valid data for statistical analysis, you first need to specify your hypotheses and plan out your research design.

Writing statistical hypotheses

The goal of research is often to investigate a relationship between variables within a population . You start with a prediction, and use statistical analysis to test that prediction.

A statistical hypothesis is a formal way of writing a prediction about a population. Every research prediction is rephrased into null and alternative hypotheses that can be tested using sample data.

While the null hypothesis always predicts no effect or no relationship between variables, the alternative hypothesis states your research prediction of an effect or relationship.

  • Null hypothesis: A 5-minute meditation exercise will have no effect on math test scores in teenagers.
  • Alternative hypothesis: A 5-minute meditation exercise will improve math test scores in teenagers.
  • Null hypothesis: Parental income and GPA have no relationship with each other in college students.
  • Alternative hypothesis: Parental income and GPA are positively correlated in college students.

Planning your research design

A research design is your overall strategy for data collection and analysis. It determines the statistical tests you can use to test your hypothesis later on.

First, decide whether your research will use a descriptive, correlational, or experimental design. Experiments directly influence variables, whereas descriptive and correlational studies only measure variables.

  • In an experimental design , you can assess a cause-and-effect relationship (e.g., the effect of meditation on test scores) using statistical tests of comparison or regression.
  • In a correlational design , you can explore relationships between variables (e.g., parental income and GPA) without any assumption of causality using correlation coefficients and significance tests.
  • In a descriptive design , you can study the characteristics of a population or phenomenon (e.g., the prevalence of anxiety in U.S. college students) using statistical tests to draw inferences from sample data.

Your research design also concerns whether you’ll compare participants at the group level or individual level, or both.

  • In a between-subjects design , you compare the group-level outcomes of participants who have been exposed to different treatments (e.g., those who performed a meditation exercise vs those who didn’t).
  • In a within-subjects design , you compare repeated measures from participants who have participated in all treatments of a study (e.g., scores from before and after performing a meditation exercise).
  • In a mixed (factorial) design , one variable is altered between subjects and another is altered within subjects (e.g., pretest and posttest scores from participants who either did or didn’t do a meditation exercise).
  • Experimental
  • Correlational

First, you’ll take baseline test scores from participants. Then, your participants will undergo a 5-minute meditation exercise. Finally, you’ll record participants’ scores from a second math test.

In this experiment, the independent variable is the 5-minute meditation exercise, and the dependent variable is the math test score from before and after the intervention. Example: Correlational research design In a correlational study, you test whether there is a relationship between parental income and GPA in graduating college students. To collect your data, you will ask participants to fill in a survey and self-report their parents’ incomes and their own GPA.

Measuring variables

When planning a research design, you should operationalize your variables and decide exactly how you will measure them.

For statistical analysis, it’s important to consider the level of measurement of your variables, which tells you what kind of data they contain:

  • Categorical data represents groupings. These may be nominal (e.g., gender) or ordinal (e.g. level of language ability).
  • Quantitative data represents amounts. These may be on an interval scale (e.g. test score) or a ratio scale (e.g. age).

Many variables can be measured at different levels of precision. For example, age data can be quantitative (8 years old) or categorical (young). If a variable is coded numerically (e.g., level of agreement from 1–5), it doesn’t automatically mean that it’s quantitative instead of categorical.

Identifying the measurement level is important for choosing appropriate statistics and hypothesis tests. For example, you can calculate a mean score with quantitative data, but not with categorical data.

In a research study, along with measures of your variables of interest, you’ll often collect data on relevant participant characteristics.

Variable Type of data
Age Quantitative (ratio)
Gender Categorical (nominal)
Race or ethnicity Categorical (nominal)
Baseline test scores Quantitative (interval)
Final test scores Quantitative (interval)
Parental income Quantitative (ratio)
GPA Quantitative (interval)

Here's why students love Scribbr's proofreading services

Discover proofreading & editing

Population vs sample

In most cases, it’s too difficult or expensive to collect data from every member of the population you’re interested in studying. Instead, you’ll collect data from a sample.

Statistical analysis allows you to apply your findings beyond your own sample as long as you use appropriate sampling procedures . You should aim for a sample that is representative of the population.

Sampling for statistical analysis

There are two main approaches to selecting a sample.

  • Probability sampling: every member of the population has a chance of being selected for the study through random selection.
  • Non-probability sampling: some members of the population are more likely than others to be selected for the study because of criteria such as convenience or voluntary self-selection.

In theory, for highly generalizable findings, you should use a probability sampling method. Random selection reduces several types of research bias , like sampling bias , and ensures that data from your sample is actually typical of the population. Parametric tests can be used to make strong statistical inferences when data are collected using probability sampling.

But in practice, it’s rarely possible to gather the ideal sample. While non-probability samples are more likely to at risk for biases like self-selection bias , they are much easier to recruit and collect data from. Non-parametric tests are more appropriate for non-probability samples, but they result in weaker inferences about the population.

If you want to use parametric tests for non-probability samples, you have to make the case that:

  • your sample is representative of the population you’re generalizing your findings to.
  • your sample lacks systematic bias.

Keep in mind that external validity means that you can only generalize your conclusions to others who share the characteristics of your sample. For instance, results from Western, Educated, Industrialized, Rich and Democratic samples (e.g., college students in the US) aren’t automatically applicable to all non-WEIRD populations.

If you apply parametric tests to data from non-probability samples, be sure to elaborate on the limitations of how far your results can be generalized in your discussion section .

Create an appropriate sampling procedure

Based on the resources available for your research, decide on how you’ll recruit participants.

  • Will you have resources to advertise your study widely, including outside of your university setting?
  • Will you have the means to recruit a diverse sample that represents a broad population?
  • Do you have time to contact and follow up with members of hard-to-reach groups?

Your participants are self-selected by their schools. Although you’re using a non-probability sample, you aim for a diverse and representative sample. Example: Sampling (correlational study) Your main population of interest is male college students in the US. Using social media advertising, you recruit senior-year male college students from a smaller subpopulation: seven universities in the Boston area.

Calculate sufficient sample size

Before recruiting participants, decide on your sample size either by looking at other studies in your field or using statistics. A sample that’s too small may be unrepresentative of the sample, while a sample that’s too large will be more costly than necessary.

There are many sample size calculators online. Different formulas are used depending on whether you have subgroups or how rigorous your study should be (e.g., in clinical research). As a rule of thumb, a minimum of 30 units or more per subgroup is necessary.

To use these calculators, you have to understand and input these key components:

  • Significance level (alpha): the risk of rejecting a true null hypothesis that you are willing to take, usually set at 5%.
  • Statistical power : the probability of your study detecting an effect of a certain size if there is one, usually 80% or higher.
  • Expected effect size : a standardized indication of how large the expected result of your study will be, usually based on other similar studies.
  • Population standard deviation: an estimate of the population parameter based on a previous study or a pilot study of your own.

Once you’ve collected all of your data, you can inspect them and calculate descriptive statistics that summarize them.

Inspect your data

There are various ways to inspect your data, including the following:

  • Organizing data from each variable in frequency distribution tables .
  • Displaying data from a key variable in a bar chart to view the distribution of responses.
  • Visualizing the relationship between two variables using a scatter plot .

By visualizing your data in tables and graphs, you can assess whether your data follow a skewed or normal distribution and whether there are any outliers or missing data.

A normal distribution means that your data are symmetrically distributed around a center where most values lie, with the values tapering off at the tail ends.

Mean, median, mode, and standard deviation in a normal distribution

In contrast, a skewed distribution is asymmetric and has more values on one end than the other. The shape of the distribution is important to keep in mind because only some descriptive statistics should be used with skewed distributions.

Extreme outliers can also produce misleading statistics, so you may need a systematic approach to dealing with these values.

Calculate measures of central tendency

Measures of central tendency describe where most of the values in a data set lie. Three main measures of central tendency are often reported:

  • Mode : the most popular response or value in the data set.
  • Median : the value in the exact middle of the data set when ordered from low to high.
  • Mean : the sum of all values divided by the number of values.

However, depending on the shape of the distribution and level of measurement, only one or two of these measures may be appropriate. For example, many demographic characteristics can only be described using the mode or proportions, while a variable like reaction time may not have a mode at all.

Calculate measures of variability

Measures of variability tell you how spread out the values in a data set are. Four main measures of variability are often reported:

  • Range : the highest value minus the lowest value of the data set.
  • Interquartile range : the range of the middle half of the data set.
  • Standard deviation : the average distance between each value in your data set and the mean.
  • Variance : the square of the standard deviation.

Once again, the shape of the distribution and level of measurement should guide your choice of variability statistics. The interquartile range is the best measure for skewed distributions, while standard deviation and variance provide the best information for normal distributions.

Using your table, you should check whether the units of the descriptive statistics are comparable for pretest and posttest scores. For example, are the variance levels similar across the groups? Are there any extreme values? If there are, you may need to identify and remove extreme outliers in your data set or transform your data before performing a statistical test.

Pretest scores Posttest scores
Mean 68.44 75.25
Standard deviation 9.43 9.88
Variance 88.96 97.96
Range 36.25 45.12

From this table, we can see that the mean score increased after the meditation exercise, and the variances of the two scores are comparable. Next, we can perform a statistical test to find out if this improvement in test scores is statistically significant in the population. Example: Descriptive statistics (correlational study) After collecting data from 653 students, you tabulate descriptive statistics for annual parental income and GPA.

It’s important to check whether you have a broad range of data points. If you don’t, your data may be skewed towards some groups more than others (e.g., high academic achievers), and only limited inferences can be made about a relationship.

Parental income (USD) GPA
Mean 62,100 3.12
Standard deviation 15,000 0.45
Variance 225,000,000 0.16
Range 8,000–378,000 2.64–4.00

A number that describes a sample is called a statistic , while a number describing a population is called a parameter . Using inferential statistics , you can make conclusions about population parameters based on sample statistics.

Researchers often use two main methods (simultaneously) to make inferences in statistics.

  • Estimation: calculating population parameters based on sample statistics.
  • Hypothesis testing: a formal process for testing research predictions about the population using samples.

You can make two types of estimates of population parameters from sample statistics:

  • A point estimate : a value that represents your best guess of the exact parameter.
  • An interval estimate : a range of values that represent your best guess of where the parameter lies.

If your aim is to infer and report population characteristics from sample data, it’s best to use both point and interval estimates in your paper.

You can consider a sample statistic a point estimate for the population parameter when you have a representative sample (e.g., in a wide public opinion poll, the proportion of a sample that supports the current government is taken as the population proportion of government supporters).

There’s always error involved in estimation, so you should also provide a confidence interval as an interval estimate to show the variability around a point estimate.

A confidence interval uses the standard error and the z score from the standard normal distribution to convey where you’d generally expect to find the population parameter most of the time.

Hypothesis testing

Using data from a sample, you can test hypotheses about relationships between variables in the population. Hypothesis testing starts with the assumption that the null hypothesis is true in the population, and you use statistical tests to assess whether the null hypothesis can be rejected or not.

Statistical tests determine where your sample data would lie on an expected distribution of sample data if the null hypothesis were true. These tests give two main outputs:

  • A test statistic tells you how much your data differs from the null hypothesis of the test.
  • A p value tells you the likelihood of obtaining your results if the null hypothesis is actually true in the population.

Statistical tests come in three main varieties:

  • Comparison tests assess group differences in outcomes.
  • Regression tests assess cause-and-effect relationships between variables.
  • Correlation tests assess relationships between variables without assuming causation.

Your choice of statistical test depends on your research questions, research design, sampling method, and data characteristics.

Parametric tests

Parametric tests make powerful inferences about the population based on sample data. But to use them, some assumptions must be met, and only some types of variables can be used. If your data violate these assumptions, you can perform appropriate data transformations or use alternative non-parametric tests instead.

A regression models the extent to which changes in a predictor variable results in changes in outcome variable(s).

  • A simple linear regression includes one predictor variable and one outcome variable.
  • A multiple linear regression includes two or more predictor variables and one outcome variable.

Comparison tests usually compare the means of groups. These may be the means of different groups within a sample (e.g., a treatment and control group), the means of one sample group taken at different times (e.g., pretest and posttest scores), or a sample mean and a population mean.

  • A t test is for exactly 1 or 2 groups when the sample is small (30 or less).
  • A z test is for exactly 1 or 2 groups when the sample is large.
  • An ANOVA is for 3 or more groups.

The z and t tests have subtypes based on the number and types of samples and the hypotheses:

  • If you have only one sample that you want to compare to a population mean, use a one-sample test .
  • If you have paired measurements (within-subjects design), use a dependent (paired) samples test .
  • If you have completely separate measurements from two unmatched groups (between-subjects design), use an independent (unpaired) samples test .
  • If you expect a difference between groups in a specific direction, use a one-tailed test .
  • If you don’t have any expectations for the direction of a difference between groups, use a two-tailed test .

The only parametric correlation test is Pearson’s r . The correlation coefficient ( r ) tells you the strength of a linear relationship between two quantitative variables.

However, to test whether the correlation in the sample is strong enough to be important in the population, you also need to perform a significance test of the correlation coefficient, usually a t test, to obtain a p value. This test uses your sample size to calculate how much the correlation coefficient differs from zero in the population.

You use a dependent-samples, one-tailed t test to assess whether the meditation exercise significantly improved math test scores. The test gives you:

  • a t value (test statistic) of 3.00
  • a p value of 0.0028

Although Pearson’s r is a test statistic, it doesn’t tell you anything about how significant the correlation is in the population. You also need to test whether this sample correlation coefficient is large enough to demonstrate a correlation in the population.

A t test can also determine how significantly a correlation coefficient differs from zero based on sample size. Since you expect a positive correlation between parental income and GPA, you use a one-sample, one-tailed t test. The t test gives you:

  • a t value of 3.08
  • a p value of 0.001

The final step of statistical analysis is interpreting your results.

Statistical significance

In hypothesis testing, statistical significance is the main criterion for forming conclusions. You compare your p value to a set significance level (usually 0.05) to decide whether your results are statistically significant or non-significant.

Statistically significant results are considered unlikely to have arisen solely due to chance. There is only a very low chance of such a result occurring if the null hypothesis is true in the population.

This means that you believe the meditation intervention, rather than random factors, directly caused the increase in test scores. Example: Interpret your results (correlational study) You compare your p value of 0.001 to your significance threshold of 0.05. With a p value under this threshold, you can reject the null hypothesis. This indicates a statistically significant correlation between parental income and GPA in male college students.

Note that correlation doesn’t always mean causation, because there are often many underlying factors contributing to a complex variable like GPA. Even if one variable is related to another, this may be because of a third variable influencing both of them, or indirect links between the two variables.

Effect size

A statistically significant result doesn’t necessarily mean that there are important real life applications or clinical outcomes for a finding.

In contrast, the effect size indicates the practical significance of your results. It’s important to report effect sizes along with your inferential statistics for a complete picture of your results. You should also report interval estimates of effect sizes if you’re writing an APA style paper .

With a Cohen’s d of 0.72, there’s medium to high practical significance to your finding that the meditation exercise improved test scores. Example: Effect size (correlational study) To determine the effect size of the correlation coefficient, you compare your Pearson’s r value to Cohen’s effect size criteria.

Decision errors

Type I and Type II errors are mistakes made in research conclusions. A Type I error means rejecting the null hypothesis when it’s actually true, while a Type II error means failing to reject the null hypothesis when it’s false.

You can aim to minimize the risk of these errors by selecting an optimal significance level and ensuring high power . However, there’s a trade-off between the two errors, so a fine balance is necessary.

Frequentist versus Bayesian statistics

Traditionally, frequentist statistics emphasizes null hypothesis significance testing and always starts with the assumption of a true null hypothesis.

However, Bayesian statistics has grown in popularity as an alternative approach in the last few decades. In this approach, you use previous research to continually update your hypotheses based on your expectations and observations.

Bayes factor compares the relative strength of evidence for the null versus the alternative hypothesis rather than making a conclusion about rejecting the null hypothesis or not.

If you want to know more about statistics, methodology, or research bias, make sure to check out some of our other articles with explanations and examples.

  Student's  t -distribution
  Normal distribution
  Null and Alternative Hypotheses
  Chi square tests
  Confidence interval


  Cluster sampling
  Stratified sampling
  Data cleansing
  Reproducibility vs Replicability
  Peer review
  Likert scale

Research bias

  Implicit bias
  Framing effect
  Cognitive bias
  Placebo effect
  Hawthorne effect
  Hostile attribution bias
  Affect heuristic

Is this article helpful?

Other students also liked.

  Descriptive Statistics | Definitions, Types, Examples
  Inferential Statistics | An Easy Introduction & Examples
  Choosing the Right Statistical Test | Types & Examples

More interesting articles

  Akaike Information Criterion | When & How to Use It (Example)
  An Easy Introduction to Statistical Significance (With Examples)
  An Introduction to t Tests | Definitions, Formula and Examples
  ANOVA in R | A Complete Step-by-Step Guide with Examples
  Central Limit Theorem | Formula, Definition & Examples
  Central Tendency | Understanding the Mean, Median & Mode
  Chi-Square (Χ²) Distributions | Definition & Examples
  Chi-Square (Χ²) Table | Examples & Downloadable Table
  Chi-Square (Χ²) Tests | Types, Formula & Examples
  Chi-Square Goodness of Fit Test | Formula, Guide & Examples
  Chi-Square Test of Independence | Formula, Guide & Examples
  Coefficient of Determination (R²) | Calculation & Interpretation
  Correlation Coefficient | Types, Formulas & Examples
  Frequency Distribution | Tables, Types & Examples
  How to Calculate Standard Deviation (Guide) | Calculator & Examples
  How to Calculate Variance | Calculator, Analysis & Examples
  How to Find Degrees of Freedom | Definition & Formula
  How to Find Interquartile Range (IQR) | Calculator & Examples
  How to Find Outliers | 4 Ways with Examples & Explanation
  How to Find the Geometric Mean | Calculator & Formula
  How to Find the Mean | Definition, Examples & Calculator
  How to Find the Median | Definition, Examples & Calculator
  How to Find the Mode | Definition, Examples & Calculator
  How to Find the Range of a Data Set | Calculator & Formula
  Hypothesis Testing | A Step-by-Step Guide with Easy Examples
  Interval Data and How to Analyze It | Definitions & Examples
  Levels of Measurement | Nominal, Ordinal, Interval and Ratio
  Linear Regression in R | A Step-by-Step Guide & Examples
  Missing Data | Types, Explanation, & Imputation
  Multiple Linear Regression | A Quick Guide (Examples)
  Nominal Data | Definition, Examples, Data Collection & Analysis
  Normal Distribution | Examples, Formulas, & Uses
  Null and Alternative Hypotheses | Definitions & Examples
  One-way ANOVA | When and How to Use It (With Examples)
  Ordinal Data | Definition, Examples, Data Collection & Analysis
  Parameter vs Statistic | Definitions, Differences & Examples
  Pearson Correlation Coefficient (r) | Guide & Examples
  Poisson Distributions | Definition, Formula & Examples
  Probability Distribution | Formula, Types, & Examples
  Quartiles & Quantiles | Calculation, Definition & Interpretation
  Ratio Scales | Definition, Examples, & Data Analysis
  Simple Linear Regression | An Easy Introduction & Examples
  Skewness | Definition, Examples & Formula
  Statistical Power and Why It Matters | A Simple Introduction
  Student's t Table (Free Download) | Guide & Examples
  T-distribution: What it is and how to use it
  Test statistics | Definition, Interpretation, and Examples
  The Standard Normal Distribution | Calculator, Examples & Uses
  Two-Way ANOVA | Examples & When To Use It
  Type I & Type II Errors | Differences, Examples, Visualizations
  Understanding Confidence Intervals | Easy Examples & Formulas
  Understanding P values | Definition and Examples
  Variability | Calculating Range, IQR, Variance, Standard Deviation
  What is Effect Size and Why Does It Matter? (Examples)
  What Is Kurtosis? | Definition, Examples & Formula
  What Is Standard Error? | How to Calculate (Guide with Examples)

What is Data Analysis?

According to the federal government, data analysis is "the process of systematically applying statistical and/or logical techniques to describe and illustrate, condense and recap, and evaluate data" ( Responsible Conduct in Data Management ). Important components of data analysis include searching for patterns, remaining unbiased in drawing inference from data, practicing responsible  data management , and maintaining "honest and accurate analysis" ( Responsible Conduct in Data Management ). 

In order to understand data analysis further, it can be helpful to take a step back and understand the question "What is data?". Many of us associate data with spreadsheets of numbers and values, however, data can encompass much more than that. According to the federal government, data is "The recorded factual material commonly accepted in the scientific community as necessary to validate research findings" ( OMB Circular 110 ). This broad definition can include information in many formats. 

Some examples of types of data are as follows:

  • Photographs 
  • Hand-written notes from field observation
  • Machine learning training data sets
  • Ethnographic interview transcripts
  • Sheet music
  • Scripts for plays and musicals 
  • Observations from laboratory experiments ( CMU Data 101 )

Thus, data analysis includes the processing and manipulation of these data sources in order to gain additional insight from data, answer a research question, or confirm a research hypothesis. 

Data analysis falls within the larger research data lifecycle, as seen below. 

( University of Virginia )

Why Analyze Data?

Through data analysis, a researcher can gain additional insight from data and draw conclusions to address the research question or hypothesis. Use of data analysis tools helps researchers understand and interpret data. 

What are the Types of Data Analysis?

Data analysis can be quantitative, qualitative, or mixed methods. 

Quantitative research typically involves numbers and "close-ended questions and responses" ( Creswell & Creswell, 2018 , p. 3). Quantitative research tests variables against objective theories, usually measured and collected on instruments and analyzed using statistical procedures ( Creswell & Creswell, 2018 , p. 4). Quantitative analysis usually uses deductive reasoning. 

Qualitative  research typically involves words and "open-ended questions and responses" ( Creswell & Creswell, 2018 , p. 3). According to Creswell & Creswell, "qualitative research is an approach for exploring and understanding the meaning individuals or groups ascribe to a social or human problem" ( 2018 , p. 4). Thus, qualitative analysis usually invokes inductive reasoning. 

Mixed methods  research uses methods from both quantitative and qualitative research approaches. Mixed methods research works under the "core assumption... that the integration of qualitative and quantitative data yields additional insight beyond the information provided by either the quantitative or qualitative data alone" ( Creswell & Creswell, 2018 , p. 4). 

Quantitative Data Analysis: A Comprehensive Guide

By: Ofem Eteng | Published: May 18, 2022

Related Articles

steps in research data analysis

A healthcare giant successfully introduces the most effective drug dosage through rigorous statistical modeling, saving countless lives. A marketing team predicts consumer trends with uncanny accuracy, tailoring campaigns for maximum impact.

Table of Contents

These trends and dosages are not just any numbers but are a result of meticulous quantitative data analysis. Quantitative data analysis offers a robust framework for understanding complex phenomena, evaluating hypotheses, and predicting future outcomes.

In this blog, we’ll walk through the concept of quantitative data analysis, the steps required, its advantages, and the methods and techniques that are used in this analysis. Read on!

What is Quantitative Data Analysis?

Quantitative data analysis is a systematic process of examining, interpreting, and drawing meaningful conclusions from numerical data. It involves the application of statistical methods, mathematical models, and computational techniques to understand patterns, relationships, and trends within datasets.

Quantitative data analysis methods typically work with algorithms, mathematical analysis tools, and software to gain insights from the data, answering questions such as how many, how often, and how much. Data for quantitative data analysis is usually collected from close-ended surveys, questionnaires, polls, etc. The data can also be obtained from sales figures, email click-through rates, number of website visitors, and percentage revenue increase. 

Quantitative Data Analysis vs Qualitative Data Analysis

When we talk about data, we directly think about the pattern, the relationship, and the connection between the datasets – analyzing the data in short. Therefore when it comes to data analysis, there are broadly two types – Quantitative Data Analysis and Qualitative Data Analysis.

Quantitative data analysis revolves around numerical data and statistics, which are suitable for functions that can be counted or measured. In contrast, qualitative data analysis includes description and subjective information – for things that can be observed but not measured.

Let us differentiate between Quantitative Data Analysis and Quantitative Data Analysis for a better understanding.

Numerical data – statistics, counts, metrics measurementsText data – customer feedback, opinions, documents, notes, audio/video recordings
Close-ended surveys, polls and experiments.Open-ended questions, descriptive interviews
What? How much? Why (to a certain extent)?How? Why? What are individual experiences and motivations?
Statistical programming software like R, Python, SAS and Data visualization like Tableau, Power BINVivo, Atlas.ti for qualitative coding.
Word processors and highlighters – Mindmaps and visual canvases
Best used for large sample sizes for quick answers.Best used for small to middle sample sizes for descriptive insights

Data Preparation Steps for Quantitative Data Analysis

Quantitative data has to be gathered and cleaned before proceeding to the stage of analyzing it. Below are the steps to prepare a data before quantitative research analysis:

  • Step 1: Data Collection

Before beginning the analysis process, you need data. Data can be collected through rigorous quantitative research, which includes methods such as interviews, focus groups, surveys, and questionnaires.

  • Step 2: Data Cleaning

Once the data is collected, begin the data cleaning process by scanning through the entire data for duplicates, errors, and omissions. Keep a close eye for outliers (data points that are significantly different from the majority of the dataset) because they can skew your analysis results if they are not removed.

This data-cleaning process ensures data accuracy, consistency and relevancy before analysis.

  • Step 3: Data Analysis and Interpretation

Now that you have collected and cleaned your data, it is now time to carry out the quantitative analysis. There are two methods of quantitative data analysis, which we will discuss in the next section.

However, if you have data from multiple sources, collecting and cleaning it can be a cumbersome task. This is where Hevo Data steps in. With Hevo, extracting, transforming, and loading data from source to destination becomes a seamless task, eliminating the need for manual coding. This not only saves valuable time but also enhances the overall efficiency of data analysis and visualization, empowering users to derive insights quickly and with precision

Hevo is the only real-time ELT No-code Data Pipeline platform that cost-effectively automates data pipelines that are flexible to your needs. With integration with 150+ Data Sources (40+ free sources), we help you not only export data from sources & load data to the destinations but also transform & enrich your data, & make it analysis-ready.

Start for free now!

Now that you are familiar with what quantitative data analysis is and how to prepare your data for analysis, the focus will shift to the purpose of this article, which is to describe the methods and techniques of quantitative data analysis.

Methods and Techniques of Quantitative Data Analysis

Quantitative data analysis employs two techniques to extract meaningful insights from datasets, broadly. The first method is descriptive statistics, which summarizes and portrays essential features of a dataset, such as mean, median, and standard deviation.

Inferential statistics, the second method, extrapolates insights and predictions from a sample dataset to make broader inferences about an entire population, such as hypothesis testing and regression analysis.

An in-depth explanation of both the methods is provided below:

  • Descriptive Statistics
  • Inferential Statistics

1) Descriptive Statistics

Descriptive statistics as the name implies is used to describe a dataset. It helps understand the details of your data by summarizing it and finding patterns from the specific data sample. They provide absolute numbers obtained from a sample but do not necessarily explain the rationale behind the numbers and are mostly used for analyzing single variables. The methods used in descriptive statistics include: 

  • Mean:   This calculates the numerical average of a set of values.
  • Median: This is used to get the midpoint of a set of values when the numbers are arranged in numerical order.
  • Mode: This is used to find the most commonly occurring value in a dataset.
  • Percentage: This is used to express how a value or group of respondents within the data relates to a larger group of respondents.
  • Frequency: This indicates the number of times a value is found.
  • Range: This shows the highest and lowest values in a dataset.
  • Standard Deviation: This is used to indicate how dispersed a range of numbers is, meaning, it shows how close all the numbers are to the mean.
  • Skewness: It indicates how symmetrical a range of numbers is, showing if they cluster into a smooth bell curve shape in the middle of the graph or if they skew towards the left or right.

2) Inferential Statistics

In quantitative analysis, the expectation is to turn raw numbers into meaningful insight using numerical values, and descriptive statistics is all about explaining details of a specific dataset using numbers, but it does not explain the motives behind the numbers; hence, a need for further analysis using inferential statistics.

Inferential statistics aim to make predictions or highlight possible outcomes from the analyzed data obtained from descriptive statistics. They are used to generalize results and make predictions between groups, show relationships that exist between multiple variables, and are used for hypothesis testing that predicts changes or differences.

There are various statistical analysis methods used within inferential statistics; a few are discussed below.

  • Cross Tabulations: Cross tabulation or crosstab is used to show the relationship that exists between two variables and is often used to compare results by demographic groups. It uses a basic tabular form to draw inferences between different data sets and contains data that is mutually exclusive or has some connection with each other. Crosstabs help understand the nuances of a dataset and factors that may influence a data point.
  • Regression Analysis: Regression analysis estimates the relationship between a set of variables. It shows the correlation between a dependent variable (the variable or outcome you want to measure or predict) and any number of independent variables (factors that may impact the dependent variable). Therefore, the purpose of the regression analysis is to estimate how one or more variables might affect a dependent variable to identify trends and patterns to make predictions and forecast possible future trends. There are many types of regression analysis, and the model you choose will be determined by the type of data you have for the dependent variable. The types of regression analysis include linear regression, non-linear regression, binary logistic regression, etc.
  • Monte Carlo Simulation: Monte Carlo simulation, also known as the Monte Carlo method, is a computerized technique of generating models of possible outcomes and showing their probability distributions. It considers a range of possible outcomes and then tries to calculate how likely each outcome will occur. Data analysts use it to perform advanced risk analyses to help forecast future events and make decisions accordingly.
  • Analysis of Variance (ANOVA): This is used to test the extent to which two or more groups differ from each other. It compares the mean of various groups and allows the analysis of multiple groups.
  • Factor Analysis:   A large number of variables can be reduced into a smaller number of factors using the factor analysis technique. It works on the principle that multiple separate observable variables correlate with each other because they are all associated with an underlying construct. It helps in reducing large datasets into smaller, more manageable samples.
  • Cohort Analysis: Cohort analysis can be defined as a subset of behavioral analytics that operates from data taken from a given dataset. Rather than looking at all users as one unit, cohort analysis breaks down data into related groups for analysis, where these groups or cohorts usually have common characteristics or similarities within a defined period.
  • MaxDiff Analysis: This is a quantitative data analysis method that is used to gauge customers’ preferences for purchase and what parameters rank higher than the others in the process. 
  • Cluster Analysis: Cluster analysis is a technique used to identify structures within a dataset. Cluster analysis aims to be able to sort different data points into groups that are internally similar and externally different; that is, data points within a cluster will look like each other and different from data points in other clusters.
  • Time Series Analysis: This is a statistical analytic technique used to identify trends and cycles over time. It is simply the measurement of the same variables at different times, like weekly and monthly email sign-ups, to uncover trends, seasonality, and cyclic patterns. By doing this, the data analyst can forecast how variables of interest may fluctuate in the future. 
  • SWOT analysis: This is a quantitative data analysis method that assigns numerical values to indicate strengths, weaknesses, opportunities, and threats of an organization, product, or service to show a clearer picture of competition to foster better business strategies

How to Choose the Right Method for your Analysis?

Choosing between Descriptive Statistics or Inferential Statistics can be often confusing. You should consider the following factors before choosing the right method for your quantitative data analysis:

1. Type of Data

The first consideration in data analysis is understanding the type of data you have. Different statistical methods have specific requirements based on these data types, and using the wrong method can render results meaningless. The choice of statistical method should align with the nature and distribution of your data to ensure meaningful and accurate analysis.

2. Your Research Questions

When deciding on statistical methods, it’s crucial to align them with your specific research questions and hypotheses. The nature of your questions will influence whether descriptive statistics alone, which reveal sample attributes, are sufficient or if you need both descriptive and inferential statistics to understand group differences or relationships between variables and make population inferences.

Pros and Cons of Quantitative Data Analysis

1. Objectivity and Generalizability:

  • Quantitative data analysis offers objective, numerical measurements, minimizing bias and personal interpretation.
  • Results can often be generalized to larger populations, making them applicable to broader contexts.

Example: A study using quantitative data analysis to measure student test scores can objectively compare performance across different schools and demographics, leading to generalizable insights about educational strategies.

2. Precision and Efficiency:

  • Statistical methods provide precise numerical results, allowing for accurate comparisons and prediction.
  • Large datasets can be analyzed efficiently with the help of computer software, saving time and resources.

Example: A marketing team can use quantitative data analysis to precisely track click-through rates and conversion rates on different ad campaigns, quickly identifying the most effective strategies for maximizing customer engagement.

3. Identification of Patterns and Relationships:

  • Statistical techniques reveal hidden patterns and relationships between variables that might not be apparent through observation alone.
  • This can lead to new insights and understanding of complex phenomena.

Example: A medical researcher can use quantitative analysis to pinpoint correlations between lifestyle factors and disease risk, aiding in the development of prevention strategies.

1. Limited Scope:

  • Quantitative analysis focuses on quantifiable aspects of a phenomenon ,  potentially overlooking important qualitative nuances, such as emotions, motivations, or cultural contexts.

Example: A survey measuring customer satisfaction with numerical ratings might miss key insights about the underlying reasons for their satisfaction or dissatisfaction, which could be better captured through open-ended feedback.

2. Oversimplification:

  • Reducing complex phenomena to numerical data can lead to oversimplification and a loss of richness in understanding.

Example: Analyzing employee productivity solely through quantitative metrics like hours worked or tasks completed might not account for factors like creativity, collaboration, or problem-solving skills, which are crucial for overall performance.

3. Potential for Misinterpretation:

  • Statistical results can be misinterpreted if not analyzed carefully and with appropriate expertise.
  • The choice of statistical methods and assumptions can significantly influence results.

This blog discusses the steps, methods, and techniques of quantitative data analysis. It also gives insights into the methods of data collection, the type of data one should work with, and the pros and cons of such analysis.

Gain a better understanding of data analysis with these essential reads:

  Data Analysis and Modeling: 4 Critical Differences
  Exploratory Data Analysis Simplified 101
  25 Best Data Analysis Tools in 2024

Carrying out successful data analysis requires prepping the data and making it analysis-ready. That is where Hevo steps in.

Want to give Hevo a try? Sign Up for a 14-day free trial and experience the feature-rich Hevo suite first hand. You may also have a look at the amazing Hevo price , which will assist you in selecting the best plan for your requirements.

Share your experience of understanding Quantitative Data Analysis in the comment section below! We would love to hear your thoughts.

Ofem Eteng is a seasoned technical content writer with over 12 years of experience. He has held pivotal roles such as System Analyst (DevOps) at Dagbs Nigeria Limited and Full-Stack Developer at Pedoquasphere International Limited. He specializes in data science, data analytics and cutting-edge technologies, making him an expert in the data industry.

No-code Data Pipeline for your Data Warehouse

  Data Analysis
  Data Warehouse
  Quantitative Data Analysis

Continue Reading

steps in research data analysis

Rashmi Joshi

Matillion vs dbt: 5 Key Differences

steps in research data analysis

Skand Agrawal

AWS Glue Architecture: Components, Working, and Alternatives

steps in research data analysis

Asimiyu Musa

AWS Glue Data Quality: Implementation, Best Practices & Alternatives

Leave a reply cancel reply.

Your email address will not be published. Required fields are marked *

Save my name, email, and website in this browser for the next time I comment.

I want to read this e-book

steps in research data analysis

Quantitative Data Analysis 101

The lingo, methods and techniques, explained simply.

By: Derek Jansen (MBA)  and Kerryn Warren (PhD) | December 2020

Quantitative data analysis is one of those things that often strikes fear in students. It’s totally understandable – quantitative analysis is a complex topic, full of daunting lingo , like medians, modes, correlation and regression. Suddenly we’re all wishing we’d paid a little more attention in math class…

The good news is that while quantitative data analysis is a mammoth topic, gaining a working understanding of the basics isn’t that hard , even for those of us who avoid numbers and math . In this post, we’ll break quantitative analysis down into simple , bite-sized chunks so you can approach your research with confidence.

Quantitative data analysis methods and techniques 101

Overview: Quantitative Data Analysis 101

  • What (exactly) is quantitative data analysis?
  • When to use quantitative analysis
  • How quantitative analysis works

The two “branches” of quantitative analysis

  • Descriptive statistics 101
  • Inferential statistics 101
  • How to choose the right quantitative methods
  • Recap & summary

What is quantitative data analysis?

Despite being a mouthful, quantitative data analysis simply means analysing data that is numbers-based – or data that can be easily “converted” into numbers without losing any meaning.

For example, category-based variables like gender, ethnicity, or native language could all be “converted” into numbers without losing meaning – for example, English could equal 1, French 2, etc.

This contrasts against qualitative data analysis, where the focus is on words, phrases and expressions that can’t be reduced to numbers. If you’re interested in learning about qualitative analysis, check out our post and video here .

What is quantitative analysis used for?

Quantitative analysis is generally used for three purposes.

  • Firstly, it’s used to measure differences between groups . For example, the popularity of different clothing colours or brands.
  • Secondly, it’s used to assess relationships between variables . For example, the relationship between weather temperature and voter turnout.
  • And third, it’s used to test hypotheses in a scientifically rigorous way. For example, a hypothesis about the impact of a certain vaccine.

Again, this contrasts with qualitative analysis , which can be used to analyse people’s perceptions and feelings about an event or situation. In other words, things that can’t be reduced to numbers.

How does quantitative analysis work?

Well, since quantitative data analysis is all about analysing numbers , it’s no surprise that it involves statistics . Statistical analysis methods form the engine that powers quantitative analysis, and these methods can vary from pretty basic calculations (for example, averages and medians) to more sophisticated analyses (for example, correlations and regressions).

Sounds like gibberish? Don’t worry. We’ll explain all of that in this post. Importantly, you don’t need to be a statistician or math wiz to pull off a good quantitative analysis. We’ll break down all the technical mumbo jumbo in this post.

Need a helping hand?

Need a helping hand?

As I mentioned, quantitative analysis is powered by statistical analysis methods . There are two main “branches” of statistical methods that are used – descriptive statistics and inferential statistics . In your research, you might only use descriptive statistics, or you might use a mix of both , depending on what you’re trying to figure out. In other words, depending on your research questions, aims and objectives . I’ll explain how to choose your methods later.

So, what are descriptive and inferential statistics?

Well, before I can explain that, we need to take a quick detour to explain some lingo. To understand the difference between these two branches of statistics, you need to understand two important words. These words are population and sample .

First up, population . In statistics, the population is the entire group of people (or animals or organisations or whatever) that you’re interested in researching. For example, if you were interested in researching Tesla owners in the US, then the population would be all Tesla owners in the US.

However, it’s extremely unlikely that you’re going to be able to interview or survey every single Tesla owner in the US. Realistically, you’ll likely only get access to a few hundred, or maybe a few thousand owners using an online survey. This smaller group of accessible people whose data you actually collect is called your sample .

So, to recap – the population is the entire group of people you’re interested in, and the sample is the subset of the population that you can actually get access to. In other words, the population is the full chocolate cake , whereas the sample is a slice of that cake.

So, why is this sample-population thing important?

Well, descriptive statistics focus on describing the sample , while inferential statistics aim to make predictions about the population, based on the findings within the sample. In other words, we use one group of statistical methods – descriptive statistics – to investigate the slice of cake, and another group of methods – inferential statistics – to draw conclusions about the entire cake. There I go with the cake analogy again…

With that out the way, let’s take a closer look at each of these branches in more detail.

Descriptive statistics vs inferential statistics

Branch 1: Descriptive Statistics

Descriptive statistics serve a simple but critically important role in your research – to describe your data set – hence the name. In other words, they help you understand the details of your sample . Unlike inferential statistics (which we’ll get to soon), descriptive statistics don’t aim to make inferences or predictions about the entire population – they’re purely interested in the details of your specific sample .

When you’re writing up your analysis, descriptive statistics are the first set of stats you’ll cover, before moving on to inferential statistics. But, that said, depending on your research objectives and research questions , they may be the only type of statistics you use. We’ll explore that a little later.

So, what kind of statistics are usually covered in this section?

Some common statistical tests used in this branch include the following:

  • Mean – this is simply the mathematical average of a range of numbers.
  • Median – this is the midpoint in a range of numbers when the numbers are arranged in numerical order. If the data set makes up an odd number, then the median is the number right in the middle of the set. If the data set makes up an even number, then the median is the midpoint between the two middle numbers.
  • Mode – this is simply the most commonly occurring number in the data set.
  • In cases where most of the numbers are quite close to the average, the standard deviation will be relatively low.
  • Conversely, in cases where the numbers are scattered all over the place, the standard deviation will be relatively high.
  • Skewness . As the name suggests, skewness indicates how symmetrical a range of numbers is. In other words, do they tend to cluster into a smooth bell curve shape in the middle of the graph, or do they skew to the left or right?

Feeling a bit confused? Let’s look at a practical example using a small data set.

Descriptive statistics example data

On the left-hand side is the data set. This details the bodyweight of a sample of 10 people. On the right-hand side, we have the descriptive statistics. Let’s take a look at each of them.

First, we can see that the mean weight is 72.4 kilograms. In other words, the average weight across the sample is 72.4 kilograms. Straightforward.

Next, we can see that the median is very similar to the mean (the average). This suggests that this data set has a reasonably symmetrical distribution (in other words, a relatively smooth, centred distribution of weights, clustered towards the centre).

In terms of the mode , there is no mode in this data set. This is because each number is present only once and so there cannot be a “most common number”. If there were two people who were both 65 kilograms, for example, then the mode would be 65.

Next up is the standard deviation . 10.6 indicates that there’s quite a wide spread of numbers. We can see this quite easily by looking at the numbers themselves, which range from 55 to 90, which is quite a stretch from the mean of 72.4.

And lastly, the skewness of -0.2 tells us that the data is very slightly negatively skewed. This makes sense since the mean and the median are slightly different.

As you can see, these descriptive statistics give us some useful insight into the data set. Of course, this is a very small data set (only 10 records), so we can’t read into these statistics too much. Also, keep in mind that this is not a list of all possible descriptive statistics – just the most common ones.

But why do all of these numbers matter?

While these descriptive statistics are all fairly basic, they’re important for a few reasons:

  • Firstly, they help you get both a macro and micro-level view of your data. In other words, they help you understand both the big picture and the finer details.
  • Secondly, they help you spot potential errors in the data – for example, if an average is way higher than you’d expect, or responses to a question are highly varied, this can act as a warning sign that you need to double-check the data.
  • And lastly, these descriptive statistics help inform which inferential statistical techniques you can use, as those techniques depend on the skewness (in other words, the symmetry and normality) of the data.

Simply put, descriptive statistics are really important , even though the statistical techniques used are fairly basic. All too often at Grad Coach, we see students skimming over the descriptives in their eagerness to get to the more exciting inferential methods, and then landing up with some very flawed results.

Don’t be a sucker – give your descriptive statistics the love and attention they deserve!

Private Coaching

Branch 2: Inferential Statistics

As I mentioned, while descriptive statistics are all about the details of your specific data set – your sample – inferential statistics aim to make inferences about the population . In other words, you’ll use inferential statistics to make predictions about what you’d expect to find in the full population.

What kind of predictions, you ask? Well, there are two common types of predictions that researchers try to make using inferential stats:

  • Firstly, predictions about differences between groups – for example, height differences between children grouped by their favourite meal or gender.
  • And secondly, relationships between variables – for example, the relationship between body weight and the number of hours a week a person does yoga.

In other words, inferential statistics (when done correctly), allow you to connect the dots and make predictions about what you expect to see in the real world population, based on what you observe in your sample data. For this reason, inferential statistics are used for hypothesis testing – in other words, to test hypotheses that predict changes or differences.

Inferential statistics are used to make predictions about what you’d expect to find in the full population, based on the sample.

Of course, when you’re working with inferential statistics, the composition of your sample is really important. In other words, if your sample doesn’t accurately represent the population you’re researching, then your findings won’t necessarily be very useful.

For example, if your population of interest is a mix of 50% male and 50% female , but your sample is 80% male , you can’t make inferences about the population based on your sample, since it’s not representative. This area of statistics is called sampling, but we won’t go down that rabbit hole here (it’s a deep one!) – we’ll save that for another post .

What statistics are usually used in this branch?

There are many, many different statistical analysis methods within the inferential branch and it’d be impossible for us to discuss them all here. So we’ll just take a look at some of the most common inferential statistical methods so that you have a solid starting point.

First up are T-Tests . T-tests compare the means (the averages) of two groups of data to assess whether they’re statistically significantly different. In other words, do they have significantly different means, standard deviations and skewness.

This type of testing is very useful for understanding just how similar or different two groups of data are. For example, you might want to compare the mean blood pressure between two groups of people – one that has taken a new medication and one that hasn’t – to assess whether they are significantly different.

Kicking things up a level, we have ANOVA, which stands for “analysis of variance”. This test is similar to a T-test in that it compares the means of various groups, but ANOVA allows you to analyse multiple groups , not just two groups So it’s basically a t-test on steroids…

Next, we have correlation analysis . This type of analysis assesses the relationship between two variables. In other words, if one variable increases, does the other variable also increase, decrease or stay the same. For example, if the average temperature goes up, do average ice creams sales increase too? We’d expect some sort of relationship between these two variables intuitively , but correlation analysis allows us to measure that relationship scientifically .

Lastly, we have regression analysis – this is quite similar to correlation in that it assesses the relationship between variables, but it goes a step further to understand cause and effect between variables, not just whether they move together. In other words, does the one variable actually cause the other one to move, or do they just happen to move together naturally thanks to another force? Just because two variables correlate doesn’t necessarily mean that one causes the other.

Stats overload…

I hear you. To make this all a little more tangible, let’s take a look at an example of a correlation in action.

Here’s a scatter plot demonstrating the correlation (relationship) between weight and height. Intuitively, we’d expect there to be some relationship between these two variables, which is what we see in this scatter plot. In other words, the results tend to cluster together in a diagonal line from bottom left to top right.

Sample correlation

As I mentioned, these are are just a handful of inferential techniques – there are many, many more. Importantly, each statistical method has its own assumptions and limitations .

For example, some methods only work with normally distributed (parametric) data, while other methods are designed specifically for non-parametric data. And that’s exactly why descriptive statistics are so important – they’re the first step to knowing which inferential techniques you can and can’t use.

Remember that every statistical method has its own assumptions and limitations,  so you need to be aware of these.

How to choose the right analysis method

To choose the right statistical methods, you need to think about two important factors :

  • The type of quantitative data you have (specifically, level of measurement and the shape of the data). And,
  • Your research questions and hypotheses

Let’s take a closer look at each of these.

Factor 1 – Data type

The first thing you need to consider is the type of data you’ve collected (or the type of data you will collect). By data types, I’m referring to the four levels of measurement – namely, nominal, ordinal, interval and ratio. If you’re not familiar with this lingo, check out the video below.

Why does this matter?

Well, because different statistical methods and techniques require different types of data. This is one of the “assumptions” I mentioned earlier – every method has its assumptions regarding the type of data.

For example, some techniques work with categorical data (for example, yes/no type questions, or gender or ethnicity), while others work with continuous numerical data (for example, age, weight or income) – and, of course, some work with multiple data types.

If you try to use a statistical method that doesn’t support the data type you have, your results will be largely meaningless . So, make sure that you have a clear understanding of what types of data you’ve collected (or will collect). Once you have this, you can then check which statistical methods would support your data types here .

If you haven’t collected your data yet, you can work in reverse and look at which statistical method would give you the most useful insights, and then design your data collection strategy to collect the correct data types.

Another important factor to consider is the shape of your data . Specifically, does it have a normal distribution (in other words, is it a bell-shaped curve, centred in the middle) or is it very skewed to the left or the right? Again, different statistical techniques work for different shapes of data – some are designed for symmetrical data while others are designed for skewed data.

This is another reminder of why descriptive statistics are so important – they tell you all about the shape of your data.

Factor 2: Your research questions

The next thing you need to consider is your specific research questions, as well as your hypotheses (if you have some). The nature of your research questions and research hypotheses will heavily influence which statistical methods and techniques you should use.

If you’re just interested in understanding the attributes of your sample (as opposed to the entire population), then descriptive statistics are probably all you need. For example, if you just want to assess the means (averages) and medians (centre points) of variables in a group of people.

On the other hand, if you aim to understand differences between groups or relationships between variables and to infer or predict outcomes in the population, then you’ll likely need both descriptive statistics and inferential statistics.

So, it’s really important to get very clear about your research aims and research questions, as well your hypotheses – before you start looking at which statistical techniques to use.

Never shoehorn a specific statistical technique into your research just because you like it or have some experience with it. Your choice of methods must align with all the factors we’ve covered here.

Time to recap…

You’re still with me? That’s impressive. We’ve covered a lot of ground here, so let’s recap on the key points:

  • Quantitative data analysis is all about  analysing number-based data  (which includes categorical and numerical data) using various statistical techniques.
  • The two main  branches  of statistics are  descriptive statistics  and  inferential statistics . Descriptives describe your sample, whereas inferentials make predictions about what you’ll find in the population.
  • Common  descriptive statistical methods include  mean  (average),  median , standard  deviation  and  skewness .
  • Common  inferential statistical methods include  t-tests ,  ANOVA ,  correlation  and  regression  analysis.
  • To choose the right statistical methods and techniques, you need to consider the  type of data you’re working with , as well as your  research questions  and hypotheses.

Research Methodology Bootcamp


  • All Categories
  • Analytics Platforms

Data Analysis Process: Key Steps and Techniques to Use

steps in research data analysis

In this post

What is data analysis?

Data analysis techniques.

  • What are the 5 steps to the data analysis process?

Businesses generate and store tons of data every day, but what happens to this data after it’s stored?

The short answer is that most of it sits in repositories and is almost never looked at again, which is quite counterintuitive.

The problem isn’t the lack of data available but the ambiguity in determining how exactly the data should be analyzed and used. To clear up any uncertainties, businesses should understand the data analysis process to make informed business decisions.

The data analysis process entails inspecting, cleansing, transforming, and modeling data. It discovers useful information, draws conclusions, and supports decision-making. This process also empowers organizations to predict trends and enhance operational efficiency.

Data can hold valuable insights into users, customer bases, and markets. When paired with analytics software , data can help businesses discover new product opportunities, marketing segments, industry verticals, and much more.

Now that you have a general overview of the data analysis process, it’s time to examine each step in more detail.

Data analysts can use many data analysis techniques to extract meaningful information from raw data for real-world applications and computational purposes. Some of the notable data analysis techniques that aid a data analysis process are:

Exploratory data analysis

Exploratory data analysis is used to understand the messages within a dataset. This technique involves many iterative processes to ensure that the cleaned data is further sorted to understand its useful meaning better. Data visualization techniques, such as analyzing data in an Excel sheet or other graphical format, and descriptive analysis techniques, such as calculating the mean or median, are examples of exploratory data analysis.

Using algorithms and models

Algorithms have become an integral part of today's data environment. They include mathematical calculations for data analysis. Mathematical formulas or models, such as correlation or causation, help identify the relationships between data variables.

Modeling techniques such as regression analysis analyze data by modeling the change in one variable caused by another. For example, determining whether a change in marketing (independent variable) explains a change in engagement (dependent variable). Such techniques are part of inferential statistics, the process of analyzing statistical data to draw conclusions about the relationship between different sets of data.

Want to learn more about Analytics Platforms? Explore Analytics Platforms products.

What are the 5 steps of the data analysis process.

The data analysis process is a collection of steps required to make sense of the available data. Identifying the critical stages is a no-brainer. However, each step is equally important to ensure that the data is analyzed correctly and provides valuable and actionable information.

Let's take a look at the five essential steps that make up a data analysis process flow.

what is the data analysis process

Data analysis step 1: Define why you need data analysis

Before getting into the nitty-gritty of data analysis, a business must first define why it requires a well-founded process in the first place. The first step in a data analysis process is determining why you need data analysis. This need typically stems from a business problem or question, such as:

  • How can we reduce production costs without sacrificing quality?
  • What are some ways to increase sales opportunities with our current resources?
  • Do customers see our brand positively?

In addition to finding a purpose, consider which metrics to track along the way. Also, be sure to identify sources of data when it’s time to collect.

This process can be long and arduous, so building a roadmap will greatly prepare your data team for all the following steps.

Data analysis step 2: Collect data

After a purpose has been defined, it’s time to begin collecting the data needed for analysis. This step is important because the nature of the collected data sources determines how in-depth the analysis is.

Data collection starts with primary sources, also known as internal sources . This is typically structured data gathered from CRM software, ERP systems, marketing automation tools, and others. These sources contain information about customers, finances, gaps in sales, and more.

Then comes secondary sources, also known as external sources . This is both structured and unstructured data that can be gathered from many places.

For example, if you’re looking to perform a sentiment analysis toward your brand, you could gather data from review sites or social media APIs. 

how is data collected

Data analysis step 3: Clean unnecessary data

Once data is collected from all the necessary sources, your data team will be tasked with cleaning and sorting through it. Data cleaning is extremely important during the data analysis process, simply because not all data is good data.

Data scientists must identify and purge duplicate data, anomalous data, and other inconsistencies that could skew the analysis to generate accurate results.

of a data scientist’s time is spent on data preparation and cleansing rather than generating insights.

Source: Anaconda

With advances in data science and machine learning platforms , more intelligent automation can save a data analyst’s valuable time while cleaning data.

Data analysis step 4: Analyze data

One of the last steps in the data analysis process is analyzing and manipulating the data, which can be done in various ways.

One way is through data mining , which is defined as “knowledge discovery within databases”. Data mining techniques like clustering analysis, anomaly detection, association rule mining, and others could unveil hidden patterns in data that weren’t previously visible.

There’s also business intelligence and data visualization software , both of which are optimized for decision-makers and business users. These options generate easy-to-understand reports, dashboards, scorecards, and charts.

Data scientists may also apply predictive analytics, one of the four data analytics used today (descriptive, diagnostic, predictive, and prescriptive). Predictive analysis looks ahead to the future, attempting to forecast what will likely happen next with a business problem or question.

What are the types of data analysis methods?

Data analysis methods can be broadly classified into the following categories:

  • Quantitative data analysis
  • Qualitative data analysis
  • Statistical analysis
  • Textual analysis
  • Descriptive analysis
  • Predictive analysis
  • Prescriptive analysis
  • Diagnostic analysis

Data analysis step 5: Interpret the results

The final step is interpreting the results from the data analysis. This part is essential because it’s how a business will gain actual value from the previous four steps.

Interpreting data analysis results should validate why you conducted it, even if it’s not 100 percent conclusive. For example, “options A and B can be explored and tested to reduce production costs without sacrificing quality.”

Analysts and business users should look to collaborate during this process. Also, when interpreting results, consider any challenges or limitations that may not have been present in the data. This will only bolster your confidence in the next steps.

Why is data analysis so important?

From small businesses to global enterprises, the amount of data businesses generate today is simply staggering, and this is why the term “big data” has become so buzzwordy.

However, without proper data analysis, this mountain of data does little other than clog up cloud storage and databases. 

Learn more about data analytics and implement it to uncover valuable insights within your systems.

Devin Pickell

Devin is a former senior content specialist at G2. Prior to G2, he helped scale early-stage startups out of Chicago's booming tech scene. Outside of work, he enjoys watching his beloved Cubs, playing baseball, and gaming. (he/him/his)

big data analytics software

steps in research data analysis

What is Data Analysis? (Types, Methods, and Tools)

' src=

  • Couchbase Product Marketing December 17, 2023

Data analysis is the process of cleaning, transforming, and interpreting data to uncover insights, patterns, and trends. It plays a crucial role in decision making, problem solving, and driving innovation across various domains. 

In addition to further exploring the role data analysis plays this blog post will discuss common data analysis techniques, delve into the distinction between quantitative and qualitative data, explore popular data analysis tools, and discuss the steps involved in the data analysis process. 

By the end, you should have a deeper understanding of data analysis and its applications, empowering you to harness the power of data to make informed decisions and gain actionable insights.

Why is Data Analysis Important?

Data analysis is important across various domains and industries. It helps with:

  • Decision Making : Data analysis provides valuable insights that support informed decision making, enabling organizations to make data-driven choices for better outcomes.
  • Problem Solving : Data analysis helps identify and solve problems by uncovering root causes, detecting anomalies, and optimizing processes for increased efficiency.
  • Performance Evaluation : Data analysis allows organizations to evaluate performance, track progress, and measure success by analyzing key performance indicators (KPIs) and other relevant metrics.
  • Gathering Insights : Data analysis uncovers valuable insights that drive innovation, enabling businesses to develop new products, services, and strategies aligned with customer needs and market demand.
  • Risk Management : Data analysis helps mitigate risks by identifying risk factors and enabling proactive measures to minimize potential negative impacts.

By leveraging data analysis, organizations can gain a competitive advantage, improve operational efficiency, and make smarter decisions that positively impact the bottom line.

Quantitative vs. Qualitative Data

In data analysis, you’ll commonly encounter two types of data: quantitative and qualitative. Understanding the differences between these two types of data is essential for selecting appropriate analysis methods and drawing meaningful insights. Here’s an overview of quantitative and qualitative data:

Quantitative Data

Quantitative data is numerical and represents quantities or measurements. It’s typically collected through surveys, experiments, and direct measurements. This type of data is characterized by its ability to be counted, measured, and subjected to mathematical calculations. Examples of quantitative data include age, height, sales figures, test scores, and the number of website users.

Quantitative data has the following characteristics:

  • Numerical : Quantitative data is expressed in numerical values that can be analyzed and manipulated mathematically.
  • Objective : Quantitative data is objective and can be measured and verified independently of individual interpretations.
  • Statistical Analysis : Quantitative data lends itself well to statistical analysis. It allows for applying various statistical techniques, such as descriptive statistics, correlation analysis, regression analysis, and hypothesis testing.
  • Generalizability : Quantitative data often aims to generalize findings to a larger population. It allows for making predictions, estimating probabilities, and drawing statistical inferences.

Qualitative Data

Qualitative data, on the other hand, is non-numerical and is collected through interviews, observations, and open-ended survey questions. It focuses on capturing rich, descriptive, and subjective information to gain insights into people’s opinions, attitudes, experiences, and behaviors. Examples of qualitative data include interview transcripts, field notes, survey responses, and customer feedback.

Qualitative data has the following characteristics:

  • Descriptive : Qualitative data provides detailed descriptions, narratives, or interpretations of phenomena, often capturing context, emotions, and nuances.
  • Subjective : Qualitative data is subjective and influenced by the individuals’ perspectives, experiences, and interpretations.
  • Interpretive Analysis : Qualitative data requires interpretive techniques, such as thematic analysis, content analysis, and discourse analysis, to uncover themes, patterns, and underlying meanings.
  • Contextual Understanding : Qualitative data emphasizes understanding the social, cultural, and contextual factors that shape individuals’ experiences and behaviors.
  • Rich Insights : Qualitative data enables researchers to gain in-depth insights into complex phenomena and explore research questions in greater depth.

In summary, quantitative data represents numerical quantities and lends itself well to statistical analysis, while qualitative data provides rich, descriptive insights into subjective experiences and requires interpretive analysis techniques. Understanding the differences between quantitative and qualitative data is crucial for selecting appropriate analysis methods and drawing meaningful conclusions in research and data analysis.

Types of Data Analysis

Different types of data analysis techniques serve different purposes. In this section, we’ll explore four types of data analysis: descriptive, diagnostic, predictive, and prescriptive, and go over how you can use them.

Descriptive Analysis

Descriptive analysis involves summarizing and describing the main characteristics of a dataset. It focuses on gaining a comprehensive understanding of the data through measures such as central tendency (mean, median, mode), dispersion (variance, standard deviation), and graphical representations (histograms, bar charts). For example, in a retail business, descriptive analysis may involve analyzing sales data to identify average monthly sales, popular products, or sales distribution across different regions.

Diagnostic Analysis

Diagnostic analysis aims to understand the causes or factors influencing specific outcomes or events. It involves investigating relationships between variables and identifying patterns or anomalies in the data. Diagnostic analysis often uses regression analysis, correlation analysis, and hypothesis testing to uncover the underlying reasons behind observed phenomena. For example, in healthcare, diagnostic analysis could help determine factors contributing to patient readmissions and identify potential improvements in the care process.

Predictive Analysis

Predictive analysis focuses on making predictions or forecasts about future outcomes based on historical data. It utilizes statistical models, machine learning algorithms, and time series analysis to identify patterns and trends in the data. By applying predictive analysis, businesses can anticipate customer behavior, market trends, or demand for products and services. For example, an e-commerce company might use predictive analysis to forecast customer churn and take proactive measures to retain customers.

Prescriptive Analysis

Prescriptive analysis takes predictive analysis a step further by providing recommendations or optimal solutions based on the predicted outcomes. It combines historical and real-time data with optimization techniques, simulation models, and decision-making algorithms to suggest the best course of action. Prescriptive analysis helps organizations make data-driven decisions and optimize their strategies. For example, a logistics company can use prescriptive analysis to determine the most efficient delivery routes, considering factors like traffic conditions, fuel costs, and customer preferences.

In summary, data analysis plays a vital role in extracting insights and enabling informed decision making. Descriptive analysis helps understand the data, diagnostic analysis uncovers the underlying causes, predictive analysis forecasts future outcomes, and prescriptive analysis provides recommendations for optimal actions. These different data analysis techniques are valuable tools for businesses and organizations across various industries.

Data Analysis Methods

In addition to the data analysis types discussed earlier, you can use various methods to analyze data effectively. These methods provide a structured approach to extract insights, detect patterns, and derive meaningful conclusions from the available data. Here are some commonly used data analysis methods:

Statistical Analysis 

Statistical analysis involves applying statistical techniques to data to uncover patterns, relationships, and trends. It includes methods such as hypothesis testing, regression analysis, analysis of variance (ANOVA), and chi-square tests. Statistical analysis helps organizations understand the significance of relationships between variables and make inferences about the population based on sample data. For example, a market research company could conduct a survey to analyze the relationship between customer satisfaction and product price. They can use regression analysis to determine whether there is a significant correlation between these variables.

Data Mining

Data mining refers to the process of discovering patterns and relationships in large datasets using techniques such as clustering, classification, association analysis, and anomaly detection. It involves exploring data to identify hidden patterns and gain valuable insights. For example, a telecommunications company could analyze customer call records to identify calling patterns and segment customers into groups based on their calling behavior. 

Text Mining

Text mining involves analyzing unstructured data , such as customer reviews, social media posts, or emails, to extract valuable information and insights. It utilizes techniques like natural language processing (NLP), sentiment analysis, and topic modeling to analyze and understand textual data. For example, consider how a hotel chain might analyze customer reviews from various online platforms to identify common themes and sentiment patterns to improve customer satisfaction.

Time Series Analysis

Time series analysis focuses on analyzing data collected over time to identify trends, seasonality, and patterns. It involves techniques such as forecasting, decomposition, and autocorrelation analysis to make predictions and understand the underlying patterns in the data.

For example, an energy company could analyze historical electricity consumption data to forecast future demand and optimize energy generation and distribution.

Data Visualization

Data visualization is the graphical representation of data to communicate patterns, trends, and insights visually. It uses charts, graphs, maps, and other visual elements to present data in a visually appealing and easily understandable format. For example, a sales team might use a line chart to visualize monthly sales trends and identify seasonal patterns in their sales data.

These are just a few examples of the data analysis methods you can use. Your choice should depend on the nature of the data, the research question or problem, and the desired outcome.

How to Analyze Data

Analyzing data involves following a systematic approach to extract insights and derive meaningful conclusions. Here are some steps to guide you through the process of analyzing data effectively:

Define the Objective : Clearly define the purpose and objective of your data analysis. Identify the specific question or problem you want to address through analysis.

Prepare and Explore the Data : Gather the relevant data and ensure its quality. Clean and preprocess the data by handling missing values, duplicates, and formatting issues. Explore the data using descriptive statistics and visualizations to identify patterns, outliers, and relationships.

Apply Analysis Techniques : Choose the appropriate analysis techniques based on your data and research question. Apply statistical methods, machine learning algorithms, and other analytical tools to derive insights and answer your research question.

Interpret the Results : Analyze the output of your analysis and interpret the findings in the context of your objective. Identify significant patterns, trends, and relationships in the data. Consider the implications and practical relevance of the results.

Communicate and Take Action : Communicate your findings effectively to stakeholders or intended audiences. Present the results clearly and concisely, using visualizations and reports. Use the insights from the analysis to inform decision making.

Remember, data analysis is an iterative process, and you may need to revisit and refine your analysis as you progress. These steps provide a general framework to guide you through the data analysis process and help you derive meaningful insights from your data.

Data Analysis Tools

Data analysis tools are software applications and platforms designed to facilitate the process of analyzing and interpreting data . These tools provide a range of functionalities to handle data manipulation, visualization, statistical analysis, and machine learning. Here are some commonly used data analysis tools:

Spreadsheet Software

Tools like Microsoft Excel, Google Sheets, and Apple Numbers are used for basic data analysis tasks. They offer features for data entry, manipulation, basic statistical functions, and simple visualizations.

Business Intelligence (BI) Platforms

BI platforms like Microsoft Power BI, Tableau, and Looker integrate data from multiple sources, providing comprehensive views of business performance through interactive dashboards, reports, and ad hoc queries.

Programming Languages and Libraries

Programming languages like R and Python, along with their associated libraries (e.g., NumPy, SciPy, scikit-learn), offer extensive capabilities for data analysis. They provide flexibility, customizability, and access to a wide range of statistical and machine-learning algorithms.

Cloud-Based Analytics Platforms

Cloud-based platforms like Google Cloud Platform (BigQuery, Data Studio), Microsoft Azure (Azure Analytics, Power BI), and Amazon Web Services (AWS Analytics, QuickSight) provide scalable and collaborative environments for data storage, processing, and analysis. They have a wide range of analytical capabilities for handling large datasets.

Data Mining and Machine Learning Tools

Tools like RapidMiner, KNIME, and Weka automate the process of data preprocessing, feature selection, model training, and evaluation. They’re designed to extract insights and build predictive models from complex datasets.

Text Analytics Tools

Text analytics tools, such as Natural Language Processing (NLP) libraries in Python (NLTK, spaCy) or platforms like RapidMiner Text Mining Extension, enable the analysis of unstructured text data . They help extract information, sentiment, and themes from sources like customer reviews or social media.

Choosing the right data analysis tool depends on analysis complexity, dataset size, required functionalities, and user expertise. You might need to use a combination of tools to leverage their combined strengths and address specific analysis needs. 

By understanding the power of data analysis, you can leverage it to make informed decisions, identify opportunities for improvement, and drive innovation within your organization. Whether you’re working with quantitative data for statistical analysis or qualitative data for in-depth insights, it’s important to select the right analysis techniques and tools for your objectives.

To continue learning about data analysis, review the following resources:

  • What is Big Data Analytics?
  • Operational Analytics
  • JSON Analytics + Real-Time Insights
  • Database vs. Data Warehouse: Differences, Use Cases, Examples
  • Couchbase Capella Columnar Product Blog
  • Posted in: Analytics , Application Design , Best Practices and Tutorials
  • Tagged in: data analytics , data visualization , time series

Posted by Couchbase Product Marketing

Leave a reply cancel reply.

You must be logged in to post a comment.

Check your inbox or spam folder to confirm your subscription.

Study Site Homepage

  • Request new password
  • Create a new account

The Essential Guide to Doing Your Research Project

Student resources, steps in quantitative analysis, stepping your way through effective quantitative data analysis.

Data management –  This involves familiarizing yourself with appropriate software; systematically logging in and screening your data: entering the data into a program; and finally, ‘cleaning’ your data.

Understanding variable types –  Different data types demand discrete treatment, so it has important to be able to distinguish variables by both cause and effect (dependent or independent), and their measurement scales (nominal, ordinal, interval, and ratio).

Run descriptive statistics –  These are used to summarize the basic features of a data set through measures of central tendency (mean, mode, and median), dispersion (range, quartiles, variance, and standard deviation), and distribution (skewness and kurtosis).

Run appropriate inferential statistics –  This allows researchers to assess their ability to draw conclusions that extend beyond the immediate data. For example, if a sample represents the population; if there are differences between two or more groups; if there are changes over time; or if there is a relationship between two or more variables.

Make sure you selecting the right statistical test –  This relies on knowing the nature of your variables; their scale of measurement; their distribution shape; and the types of question you want to ask.

Look for statistical significance –  This is generally captured through a ‘p-value’, which assesses the probability that your findings are more than coincidence. The lower the p-value, the more confident researchers can be that findings are genuine.

  • AI & NLP
  • Churn & Loyalty
  • Customer Experience
  • Customer Journeys
  • Customer Metrics
  • Feedback Analysis
  • Product Experience
  • Product Updates
  • Sentiment Analysis
  • Surveys & Feedback Collection
  • Text Analytics
  • Try Thematic

Welcome to the community

steps in research data analysis

Qualitative Data Analysis: Step-by-Step Guide (Manual vs. Automatic)

When we conduct qualitative methods of research, need to explain changes in metrics or understand people's opinions, we always turn to qualitative data. Qualitative data is typically generated through:

  • Interview transcripts
  • Surveys with open-ended questions
  • Contact center transcripts
  • Texts and documents
  • Audio and video recordings
  • Observational notes

Compared to quantitative data, which captures structured information, qualitative data is unstructured and has more depth. It can answer our questions, can help formulate hypotheses and build understanding.

It's important to understand the differences between quantitative data & qualitative data . But unfortunately, analyzing qualitative data is difficult. While tools like Excel, Tableau and PowerBI crunch and visualize quantitative data with ease, there are a limited number of mainstream tools for analyzing qualitative data . The majority of qualitative data analysis still happens manually.

That said, there are two new trends that are changing this. First, there are advances in natural language processing (NLP) which is focused on understanding human language. Second, there is an explosion of user-friendly software designed for both researchers and businesses. Both help automate the qualitative data analysis process.

In this post we want to teach you how to conduct a successful qualitative data analysis. There are two primary qualitative data analysis methods; manual & automatic. We will teach you how to conduct the analysis manually, and also, automatically using software solutions powered by NLP. We’ll guide you through the steps to conduct a manual analysis, and look at what is involved and the role technology can play in automating this process.

More businesses are switching to fully-automated analysis of qualitative customer data because it is cheaper, faster, and just as accurate. Primarily, businesses purchase subscriptions to feedback analytics platforms so that they can understand customer pain points and sentiment.

Overwhelming quantity of feedback

We’ll take you through 5 steps to conduct a successful qualitative data analysis. Within each step we will highlight the key difference between the manual, and automated approach of qualitative researchers. Here's an overview of the steps:

The 5 steps to doing qualitative data analysis

  • Gathering and collecting your qualitative data
  • Organizing and connecting into your qualitative data
  • Coding your qualitative data
  • Analyzing the qualitative data for insights
  • Reporting on the insights derived from your analysis

What is Qualitative Data Analysis?

Qualitative data analysis is a process of gathering, structuring and interpreting qualitative data to understand what it represents.

Qualitative data is non-numerical and unstructured. Qualitative data generally refers to text, such as open-ended responses to survey questions or user interviews, but also includes audio, photos and video.

Businesses often perform qualitative data analysis on customer feedback. And within this context, qualitative data generally refers to verbatim text data collected from sources such as reviews, complaints, chat messages, support centre interactions, customer interviews, case notes or social media comments.

How is qualitative data analysis different from quantitative data analysis?

Understanding the differences between quantitative & qualitative data is important. When it comes to analyzing data, Qualitative Data Analysis serves a very different role to Quantitative Data Analysis. But what sets them apart?

Qualitative Data Analysis dives into the stories hidden in non-numerical data such as interviews, open-ended survey answers, or notes from observations. It uncovers the ‘whys’ and ‘hows’ giving a deep understanding of people’s experiences and emotions.

Quantitative Data Analysis on the other hand deals with numerical data, using statistics to measure differences, identify preferred options, and pinpoint root causes of issues.  It steps back to address questions like "how many" or "what percentage" to offer broad insights we can apply to larger groups.

In short, Qualitative Data Analysis is like a microscope,  helping us understand specific detail. Quantitative Data Analysis is like the telescope, giving us a broader perspective. Both are important, working together to decode data for different objectives.

Qualitative Data Analysis methods

Once all the data has been captured, there are a variety of analysis techniques available and the choice is determined by your specific research objectives and the kind of data you’ve gathered.  Common qualitative data analysis methods include:

Content Analysis

This is a popular approach to qualitative data analysis. Other qualitative analysis techniques may fit within the broad scope of content analysis. Thematic analysis is a part of the content analysis.  Content analysis is used to identify the patterns that emerge from text, by grouping content into words, concepts, and themes. Content analysis is useful to quantify the relationship between all of the grouped content. The Columbia School of Public Health has a detailed breakdown of content analysis .

Narrative Analysis

Narrative analysis focuses on the stories people tell and the language they use to make sense of them.  It is particularly useful in qualitative research methods where customer stories are used to get a deep understanding of customers’ perspectives on a specific issue. A narrative analysis might enable us to summarize the outcomes of a focused case study.

Discourse Analysis

Discourse analysis is used to get a thorough understanding of the political, cultural and power dynamics that exist in specific situations.  The focus of discourse analysis here is on the way people express themselves in different social contexts. Discourse analysis is commonly used by brand strategists who hope to understand why a group of people feel the way they do about a brand or product.

Thematic Analysis

Thematic analysis is used to deduce the meaning behind the words people use. This is accomplished by discovering repeating themes in text. These meaningful themes reveal key insights into data and can be quantified, particularly when paired with sentiment analysis . Often, the outcome of thematic analysis is a code frame that captures themes in terms of codes, also called categories. So the process of thematic analysis is also referred to as “coding”. A common use-case for thematic analysis in companies is analysis of customer feedback.

Grounded Theory

Grounded theory is a useful approach when little is known about a subject. Grounded theory starts by formulating a theory around a single data case. This means that the theory is “grounded”. Grounded theory analysis is based on actual data, and not entirely speculative. Then additional cases can be examined to see if they are relevant and can add to the original grounded theory.

Methods of qualitative data analysis; approaches and techniques to qualitative data analysis

Challenges of Qualitative Data Analysis

While Qualitative Data Analysis offers rich insights, it comes with its challenges. Each unique QDA method has its unique hurdles. Let’s take a look at the challenges researchers and analysts might face, depending on the chosen method.

  • Time and Effort (Narrative Analysis): Narrative analysis, which focuses on personal stories, demands patience. Sifting through lengthy narratives to find meaningful insights can be time-consuming, requires dedicated effort.
  • Being Objective (Grounded Theory): Grounded theory, building theories from data, faces the challenges of personal biases. Staying objective while interpreting data is crucial, ensuring conclusions are rooted in the data itself.
  • Complexity (Thematic Analysis): Thematic analysis involves identifying themes within data, a process that can be intricate. Categorizing and understanding themes can be complex, especially when each piece of data varies in context and structure. Thematic Analysis software can simplify this process.
  • Generalizing Findings (Narrative Analysis): Narrative analysis, dealing with individual stories, makes drawing broad challenging. Extending findings from a single narrative to a broader context requires careful consideration.
  • Managing Data (Thematic Analysis): Thematic analysis involves organizing and managing vast amounts of unstructured data, like interview transcripts. Managing this can be a hefty task, requiring effective data management strategies.
  • Skill Level (Grounded Theory): Grounded theory demands specific skills to build theories from the ground up. Finding or training analysts with these skills poses a challenge, requiring investment in building expertise.

Benefits of qualitative data analysis

Qualitative Data Analysis (QDA) is like a versatile toolkit, offering a tailored approach to understanding your data. The benefits it offers are as diverse as the methods. Let’s explore why choosing the right method matters.

  • Tailored Methods for Specific Needs: QDA isn't one-size-fits-all. Depending on your research objectives and the type of data at hand, different methods offer unique benefits. If you want emotive customer stories, narrative analysis paints a strong picture. When you want to explain a score, thematic analysis reveals insightful patterns
  • Flexibility with Thematic Analysis: thematic analysis is like a chameleon in the toolkit of QDA. It adapts well to different types of data and research objectives, making it a top choice for any qualitative analysis.
  • Deeper Understanding, Better Products: QDA helps you dive into people's thoughts and feelings. This deep understanding helps you build products and services that truly matches what people want, ensuring satisfied customers
  • Finding the Unexpected: Qualitative data often reveals surprises that we miss in quantitative data. QDA offers us new ideas and perspectives, for insights we might otherwise miss.
  • Building Effective Strategies: Insights from QDA are like strategic guides. They help businesses in crafting plans that match people’s desires.
  • Creating Genuine Connections: Understanding people’s experiences lets businesses connect on a real level. This genuine connection helps build trust and loyalty, priceless for any business.

How to do Qualitative Data Analysis: 5 steps

Now we are going to show how you can do your own qualitative data analysis. We will guide you through this process step by step. As mentioned earlier, you will learn how to do qualitative data analysis manually , and also automatically using modern qualitative data and thematic analysis software.

To get best value from the analysis process and research process, it’s important to be super clear about the nature and scope of the question that’s being researched. This will help you select the research collection channels that are most likely to help you answer your question.

Depending on if you are a business looking to understand customer sentiment, or an academic surveying a school, your approach to qualitative data analysis will be unique.

Once you’re clear, there’s a sequence to follow. And, though there are differences in the manual and automatic approaches, the process steps are mostly the same.

The use case for our step-by-step guide is a company looking to collect data (customer feedback data), and analyze the customer feedback - in order to improve customer experience. By analyzing the customer feedback the company derives insights about their business and their customers. You can follow these same steps regardless of the nature of your research. Let’s get started.

Step 1: Gather your qualitative data and conduct research (Conduct qualitative research)

The first step of qualitative research is to do data collection. Put simply, data collection is gathering all of your data for analysis. A common situation is when qualitative data is spread across various sources.

Classic methods of gathering qualitative data

Most companies use traditional methods for gathering qualitative data: conducting interviews with research participants, running surveys, and running focus groups. This data is typically stored in documents, CRMs, databases and knowledge bases. It’s important to examine which data is available and needs to be included in your research project, based on its scope.

Using your existing qualitative feedback

As it becomes easier for customers to engage across a range of different channels, companies are gathering increasingly large amounts of both solicited and unsolicited qualitative feedback.

Most organizations have now invested in Voice of Customer programs , support ticketing systems, chatbot and support conversations, emails and even customer Slack chats.

These new channels provide companies with new ways of getting feedback, and also allow the collection of unstructured feedback data at scale.

The great thing about this data is that it contains a wealth of valubale insights and that it’s already there! When you have a new question about user behavior or your customers, you don’t need to create a new research study or set up a focus group. You can find most answers in the data you already have.

Typically, this data is stored in third-party solutions or a central database, but there are ways to export it or connect to a feedback analysis solution through integrations or an API.

Utilize untapped qualitative data channels

There are many online qualitative data sources you may not have considered. For example, you can find useful qualitative data in social media channels like Twitter or Facebook. Online forums, review sites, and online communities such as Discourse or Reddit also contain valuable data about your customers, or research questions.

If you are considering performing a qualitative benchmark analysis against competitors - the internet is your best friend, and review analysis is a great place to start. Gathering feedback in competitor reviews on sites like Trustpilot, G2, Capterra, Better Business Bureau or on app stores is a great way to perform a competitor benchmark analysis.

Customer feedback analysis software often has integrations into social media and review sites, or you could use a solution like DataMiner to scrape the reviews.

G2.com reviews of the product Airtable. You could pull reviews from G2 for your analysis.

Step 2: Connect & organize all your qualitative data

Now you all have this qualitative data but there’s a problem, the data is unstructured. Before feedback can be analyzed and assigned any value, it needs to be organized in a single place. Why is this important? Consistency!

If all data is easily accessible in one place and analyzed in a consistent manner, you will have an easier time summarizing and making decisions based on this data.

The manual approach to organizing your data

The classic method of structuring qualitative data is to plot all the raw data you’ve gathered into a spreadsheet.

Typically, research and support teams would share large Excel sheets and different business units would make sense of the qualitative feedback data on their own. Each team collects and organizes the data in a way that best suits them, which means the feedback tends to be kept in separate silos.

An alternative and a more robust solution is to store feedback in a central database, like Snowflake or Amazon Redshift .

Keep in mind that when you organize your data in this way, you are often preparing it to be imported into another software. If you go the route of a database, you would need to use an API to push the feedback into a third-party software.

Computer-assisted qualitative data analysis software (CAQDAS)

Traditionally within the manual analysis approach (but not always), qualitative data is imported into CAQDAS software for coding.

In the early 2000s, CAQDAS software was popularised by developers such as ATLAS.ti, NVivo and MAXQDA and eagerly adopted by researchers to assist with the organizing and coding of data.  

The benefits of using computer-assisted qualitative data analysis software:

  • Assists in the organizing of your data
  • Opens you up to exploring different interpretations of your data analysis
  • Allows you to share your dataset easier and allows group collaboration (allows for secondary analysis)

However you still need to code the data, uncover the themes and do the analysis yourself. Therefore it is still a manual approach.

The user interface of CAQDAS software 'NVivo'

Organizing your qualitative data in a feedback repository

Another solution to organizing your qualitative data is to upload it into a feedback repository where it can be unified with your other data , and easily searchable and taggable. There are a number of software solutions that act as a central repository for your qualitative research data. Here are a couple solutions that you could investigate:  

  • Dovetail: Dovetail is a research repository with a focus on video and audio transcriptions. You can tag your transcriptions within the platform for theme analysis. You can also upload your other qualitative data such as research reports, survey responses, support conversations ( conversational analytics ), and customer interviews. Dovetail acts as a single, searchable repository. And makes it easier to collaborate with other people around your qualitative research.
  • EnjoyHQ: EnjoyHQ is another research repository with similar functionality to Dovetail. It boasts a more sophisticated search engine, but it has a higher starting subscription cost.

Organizing your qualitative data in a feedback analytics platform

If you have a lot of qualitative customer or employee feedback, from the likes of customer surveys or employee surveys, you will benefit from a feedback analytics platform. A feedback analytics platform is a software that automates the process of both sentiment analysis and thematic analysis . Companies use the integrations offered by these platforms to directly tap into their qualitative data sources (review sites, social media, survey responses, etc.). The data collected is then organized and analyzed consistently within the platform.

If you have data prepared in a spreadsheet, it can also be imported into feedback analytics platforms.

Once all this rich data has been organized within the feedback analytics platform, it is ready to be coded and themed, within the same platform. Thematic is a feedback analytics platform that offers one of the largest libraries of integrations with qualitative data sources.

Some of qualitative data integrations offered by Thematic

Step 3: Coding your qualitative data

Your feedback data is now organized in one place. Either within your spreadsheet, CAQDAS, feedback repository or within your feedback analytics platform. The next step is to code your feedback data so we can extract meaningful insights in the next step.

Coding is the process of labelling and organizing your data in such a way that you can then identify themes in the data, and the relationships between these themes.

To simplify the coding process, you will take small samples of your customer feedback data, come up with a set of codes, or categories capturing themes, and label each piece of feedback, systematically, for patterns and meaning. Then you will take a larger sample of data, revising and refining the codes for greater accuracy and consistency as you go.

If you choose to use a feedback analytics platform, much of this process will be automated and accomplished for you.

The terms to describe different categories of meaning (‘theme’, ‘code’, ‘tag’, ‘category’ etc) can be confusing as they are often used interchangeably.  For clarity, this article will use the term ‘code’.

To code means to identify key words or phrases and assign them to a category of meaning. “I really hate the customer service of this computer software company” would be coded as “poor customer service”.

How to manually code your qualitative data

  • Decide whether you will use deductive or inductive coding. Deductive coding is when you create a list of predefined codes, and then assign them to the qualitative data. Inductive coding is the opposite of this, you create codes based on the data itself. Codes arise directly from the data and you label them as you go. You need to weigh up the pros and cons of each coding method and select the most appropriate.
  • Read through the feedback data to get a broad sense of what it reveals. Now it’s time to start assigning your first set of codes to statements and sections of text.
  • Keep repeating step 2, adding new codes and revising the code description as often as necessary.  Once it has all been coded, go through everything again, to be sure there are no inconsistencies and that nothing has been overlooked.
  • Create a code frame to group your codes. The coding frame is the organizational structure of all your codes. And there are two commonly used types of coding frames, flat, or hierarchical. A hierarchical code frame will make it easier for you to derive insights from your analysis.
  • Based on the number of times a particular code occurs, you can now see the common themes in your feedback data. This is insightful! If ‘bad customer service’ is a common code, it’s time to take action.

We have a detailed guide dedicated to manually coding your qualitative data .

Example of a hierarchical coding frame in qualitative data analysis

Using software to speed up manual coding of qualitative data

An Excel spreadsheet is still a popular method for coding. But various software solutions can help speed up this process. Here are some examples.

  • CAQDAS / NVivo - CAQDAS software has built-in functionality that allows you to code text within their software. You may find the interface the software offers easier for managing codes than a spreadsheet.
  • Dovetail/EnjoyHQ - You can tag transcripts and other textual data within these solutions. As they are also repositories you may find it simpler to keep the coding in one platform.
  • IBM SPSS - SPSS is a statistical analysis software that may make coding easier than in a spreadsheet.
  • Ascribe - Ascribe’s ‘Coder’ is a coding management system. Its user interface will make it easier for you to manage your codes.

Automating the qualitative coding process using thematic analysis software

In solutions which speed up the manual coding process, you still have to come up with valid codes and often apply codes manually to pieces of feedback. But there are also solutions that automate both the discovery and the application of codes.

Advances in machine learning have now made it possible to read, code and structure qualitative data automatically. This type of automated coding is offered by thematic analysis software .

Automation makes it far simpler and faster to code the feedback and group it into themes. By incorporating natural language processing (NLP) into the software, the AI looks across sentences and phrases to identify common themes meaningful statements. Some automated solutions detect repeating patterns and assign codes to them, others make you train the AI by providing examples. You could say that the AI learns the meaning of the feedback on its own.

Thematic automates the coding of qualitative feedback regardless of source. There’s no need to set up themes or categories in advance. Simply upload your data and wait a few minutes. You can also manually edit the codes to further refine their accuracy.  Experiments conducted indicate that Thematic’s automated coding is just as accurate as manual coding .

Paired with sentiment analysis and advanced text analytics - these automated solutions become powerful for deriving quality business or research insights.

You could also build your own , if you have the resources!

The key benefits of using an automated coding solution

Automated analysis can often be set up fast and there’s the potential to uncover things that would never have been revealed if you had given the software a prescribed list of themes to look for.

Because the model applies a consistent rule to the data, it captures phrases or statements that a human eye might have missed.

Complete and consistent analysis of customer feedback enables more meaningful findings. Leading us into step 4.

Step 4: Analyze your data: Find meaningful insights

Now we are going to analyze our data to find insights. This is where we start to answer our research questions. Keep in mind that step 4 and step 5 (tell the story) have some overlap . This is because creating visualizations is both part of analysis process and reporting.

The task of uncovering insights is to scour through the codes that emerge from the data and draw meaningful correlations from them. It is also about making sure each insight is distinct and has enough data to support it.

Part of the analysis is to establish how much each code relates to different demographics and customer profiles, and identify whether there’s any relationship between these data points.

Manually create sub-codes to improve the quality of insights

If your code frame only has one level, you may find that your codes are too broad to be able to extract meaningful insights. This is where it is valuable to create sub-codes to your primary codes. This process is sometimes referred to as meta coding.

Note: If you take an inductive coding approach, you can create sub-codes as you are reading through your feedback data and coding it.

While time-consuming, this exercise will improve the quality of your analysis. Here is an example of what sub-codes could look like.

Example of sub-codes

You need to carefully read your qualitative data to create quality sub-codes. But as you can see, the depth of analysis is greatly improved. By calculating the frequency of these sub-codes you can get insight into which  customer service problems you can immediately address.

Correlate the frequency of codes to customer segments

Many businesses use customer segmentation . And you may have your own respondent segments that you can apply to your qualitative analysis. Segmentation is the practise of dividing customers or research respondents into subgroups.

Segments can be based on:

  • Demographic
  • And any other data type that you care to segment by

It is particularly useful to see the occurrence of codes within your segments. If one of your customer segments is considered unimportant to your business, but they are the cause of nearly all customer service complaints, it may be in your best interest to focus attention elsewhere. This is a useful insight!

Manually visualizing coded qualitative data

There are formulas you can use to visualize key insights in your data. The formulas we will suggest are imperative if you are measuring a score alongside your feedback.

If you are collecting a metric alongside your qualitative data this is a key visualization. Impact answers the question: “What’s the impact of a code on my overall score?”. Using Net Promoter Score (NPS) as an example, first you need to:

  • Calculate overall NPS
  • Calculate NPS in the subset of responses that do not contain that theme
  • Subtract B from A

Then you can use this simple formula to calculate code impact on NPS .

Visualizing qualitative data: Calculating the impact of a code on your score

You can then visualize this data using a bar chart.

You can download our CX toolkit - it includes a template to recreate this.

Trends over time

This analysis can help you answer questions like: “Which codes are linked to decreases or increases in my score over time?”

We need to compare two sequences of numbers: NPS over time and code frequency over time . Using Excel, calculate the correlation between the two sequences, which can be either positive (the more codes the higher the NPS, see picture below), or negative (the more codes the lower the NPS).

Now you need to plot code frequency against the absolute value of code correlation with NPS. Here is the formula:

Analyzing qualitative data: Calculate which codes are linked to increases or decreases in my score

The visualization could look like this:

Visualizing qualitative data trends over time

These are two examples, but there are more. For a third manual formula, and to learn why word clouds are not an insightful form of analysis, read our visualizations article .

Using a text analytics solution to automate analysis

Automated text analytics solutions enable codes and sub-codes to be pulled out of the data automatically. This makes it far faster and easier to identify what’s driving negative or positive results. And to pick up emerging trends and find all manner of rich insights in the data.

Another benefit of AI-driven text analytics software is its built-in capability for sentiment analysis, which provides the emotive context behind your feedback and other qualitative textual data therein.

Thematic provides text analytics that goes further by allowing users to apply their expertise on business context to edit or augment the AI-generated outputs.

Since the move away from manual research is generally about reducing the human element, adding human input to the technology might sound counter-intuitive. However, this is mostly to make sure important business nuances in the feedback aren’t missed during coding. The result is a higher accuracy of analysis. This is sometimes referred to as augmented intelligence .

Codes displayed by volume within Thematic. You can 'manage themes' to introduce human input.

Step 5: Report on your data: Tell the story

The last step of analyzing your qualitative data is to report on it, to tell the story. At this point, the codes are fully developed and the focus is on communicating the narrative to the audience.

A coherent outline of the qualitative research, the findings and the insights is vital for stakeholders to discuss and debate before they can devise a meaningful course of action.

Creating graphs and reporting in Powerpoint

Typically, qualitative researchers take the tried and tested approach of distilling their report into a series of charts, tables and other visuals which are woven into a narrative for presentation in Powerpoint.

Using visualization software for reporting

With data transformation and APIs, the analyzed data can be shared with data visualisation software, such as Power BI or Tableau , Google Studio or Looker. Power BI and Tableau are among the most preferred options.

Visualizing your insights inside a feedback analytics platform

Feedback analytics platforms, like Thematic, incorporate visualisation tools that intuitively turn key data and insights into graphs.  This removes the time consuming work of constructing charts to visually identify patterns and creates more time to focus on building a compelling narrative that highlights the insights, in bite-size chunks, for executive teams to review.

Using a feedback analytics platform with visualization tools means you don’t have to use a separate product for visualizations. You can export graphs into Powerpoints straight from the platforms.

Two examples of qualitative data visualizations within Thematic

Conclusion - Manual or Automated?

There are those who remain deeply invested in the manual approach - because it’s familiar, because they’re reluctant to spend money and time learning new software, or because they’ve been burned by the overpromises of AI.  

For projects that involve small datasets, manual analysis makes sense. For example, if the objective is simply to quantify a simple question like “Do customers prefer X concepts to Y?”. If the findings are being extracted from a small set of focus groups and interviews, sometimes it’s easier to just read them

However, as new generations come into the workplace, it’s technology-driven solutions that feel more comfortable and practical. And the merits are undeniable.  Especially if the objective is to go deeper and understand the ‘why’ behind customers’ preference for X or Y. And even more especially if time and money are considerations.

The ability to collect a free flow of qualitative feedback data at the same time as the metric means AI can cost-effectively scan, crunch, score and analyze a ton of feedback from one system in one go. And time-intensive processes like focus groups, or coding, that used to take weeks, can now be completed in a matter of hours or days.

But aside from the ever-present business case to speed things up and keep costs down, there are also powerful research imperatives for automated analysis of qualitative data: namely, accuracy and consistency.

Finding insights hidden in feedback requires consistency, especially in coding.  Not to mention catching all the ‘unknown unknowns’ that can skew research findings and steering clear of cognitive bias.

Some say without manual data analysis researchers won’t get an accurate “feel” for the insights. However, the larger data sets are, the harder it is to sort through the feedback and organize feedback that has been pulled from different places.  And, the more difficult it is to stay on course, the greater the risk of drawing incorrect, or incomplete, conclusions grows.

Though the process steps for qualitative data analysis have remained pretty much unchanged since psychologist Paul Felix Lazarsfeld paved the path a hundred years ago, the impact digital technology has had on types of qualitative feedback data and the approach to the analysis are profound.  

If you want to try an automated feedback analysis solution on your own qualitative data, you can get started with Thematic .

We make it easy to discover the customer and product issues that matter.

Unlock the value of feedback at scale, in one platform. Try it for free now!

  • Questions to ask your Feedback Analytics vendor
  • How to end customer churn for good
  • Scalable analysis of NPS verbatims
  • 5 Text analytics approaches
  • How to calculate the ROI of CX

Our experts will show you how Thematic works, how to discover pain points and track the ROI of decisions. To access your free trial, book a personal demo today.

Become a qualitative theming pro! Creating a perfect code frame is hard, but thematic analysis software makes the process much easier.

Discover the power of thematic analysis to unlock insights from qualitative data. Learn about manual vs. AI-powered approaches, best practices, and how Thematic software can revolutionize your analysis workflow.

When two major storms wreaked havoc on Auckland and Watercare’s infrastructurem the utility went through a CX crisis. With a massive influx of calls to their support center, Thematic helped them get inisghts from this data to forge a new approach to restore services and satisfaction levels.

steps in research data analysis

The Ultimate Guide to Qualitative Research - Part 2: Handling Qualitative Data

steps in research data analysis

  • Handling qualitative data
  • Transcripts
  • Field notes
  • Survey data and responses
  • Visual and audio data
  • Data organization
  • Data coding
  • Coding frame
  • Auto and smart coding
  • Organizing codes
  • Introduction

What is qualitative data analysis?

Qualitative data analysis methods, how do you analyze qualitative data, content analysis, thematic analysis.

  • Thematic analysis vs. content analysis
  • Narrative research

Phenomenological research

Discourse analysis, grounded theory.

  • Deductive reasoning
  • Inductive reasoning
  • Inductive vs. deductive reasoning
  • Qualitative data interpretation
  • Qualitative data analysis software

Qualitative data analysis

Analyzing qualitative data is the next step after you have completed the use of qualitative data collection methods . The qualitative analysis process aims to identify themes and patterns that emerge across the data.

steps in research data analysis

In simplified terms, qualitative research methods involve non-numerical data collection followed by an explanation based on the attributes of the data . For example, if you are asked to explain in qualitative terms a thermal image displayed in multiple colors, then you would explain the color differences rather than the heat's numerical value. If you have a large amount of data (e.g., of group discussions or observations of real-life situations), the next step is to transcribe and prepare the raw data for subsequent analysis.

Researchers can conduct studies fully based on qualitative methodology, or researchers can preface a quantitative research study with a qualitative study to identify issues that were not originally envisioned but are important to the study. Quantitative researchers may also collect and analyze qualitative data following their quantitative analyses to better understand the meanings behind their statistical results.

Conducting qualitative research can especially help build an understanding of how and why certain outcomes were achieved (in addition to what was achieved). For example, qualitative data analysis is often used for policy and program evaluation research since it can answer certain important questions more efficiently and effectively than quantitative approaches.

steps in research data analysis

Qualitative data analysis can also answer important questions about the relevance, unintended effects, and impact of programs, such as:

  • Were expectations reasonable?
  • Did processes operate as expected?
  • Were key players able to carry out their duties?
  • Were there any unintended effects of the program?

The importance of qualitative data analysis

Qualitative approaches have the advantage of allowing for more diversity in responses and the capacity to adapt to new developments or issues during the research process itself. While qualitative data analysis can be demanding and time-consuming to conduct, many fields of research utilize qualitative software tools that have been specifically developed to provide more succinct, cost-efficient, and timely results.

steps in research data analysis

Qualitative data analysis is an important part of research and building greater understanding across fields for a number of reasons. First, cases for qualitative data analysis can be selected purposefully according to whether they typify certain characteristics or contextual locations. In other words, qualitative data permits deep immersion into a topic, phenomenon, or area of interest. Rather than seeking generalizability to the population the sample of participants represent, qualitative research aims to construct an in-depth and nuanced understanding of the research topic.

Secondly, the role or position of the researcher in qualitative data analysis is given greater critical attention. This is because, in qualitative data analysis, the possibility of the researcher taking a ‘neutral' or transcendent position is seen as more problematic in practical and/or philosophical terms. Hence, qualitative researchers are often exhorted to reflect on their role in the research process and make this clear in the analysis.

steps in research data analysis

Thirdly, while qualitative data analysis can take a wide variety of forms, it largely differs from quantitative research in the focus on language, signs, experiences, and meaning. In addition, qualitative approaches to analysis are often holistic and contextual rather than analyzing the data in a piecemeal fashion or removing the data from its context. Qualitative approaches thus allow researchers to explore inquiries from directions that could not be accessed with only numerical quantitative data.

Establishing research rigor

Systematic and transparent approaches to the analysis of qualitative data are essential for rigor . For example, many qualitative research methods require researchers to carefully code data and discern and document themes in a consistent and credible way.

steps in research data analysis

Perhaps the most traditional division in the way qualitative and quantitative research have been used in the social sciences is for qualitative methods to be used for exploratory purposes (e.g., to generate new theory or propositions) or to explain puzzling quantitative results, while quantitative methods are used to test hypotheses .

steps in research data analysis

After you’ve collected relevant data , what is the best way to look at your data ? As always, it will depend on your research question . For instance, if you employed an observational research method to learn about a group’s shared practices, an ethnographic approach could be appropriate to explain the various dimensions of culture. If you collected textual data to understand how people talk about something, then a discourse analysis approach might help you generate key insights about language and communication.

steps in research data analysis

The qualitative data coding process involves iterative categorization and recategorization, ensuring the evolution of the analysis to best represent the data. The procedure typically concludes with the interpretation of patterns and trends identified through the coding process.

To start off, let’s look at two broad approaches to data analysis.

Deductive analysis

Deductive analysis is guided by pre-existing theories or ideas. It starts with a theoretical framework , which is then used to code the data. The researcher can thus use this theoretical framework to interpret their data and answer their research question .

The key steps include coding the data based on the predetermined concepts or categories and using the theory to guide the interpretation of patterns among the codings. Deductive analysis is particularly useful when researchers aim to verify or extend an existing theory within a new context.

Inductive analysis

Inductive analysis involves the generation of new theories or ideas based on the data. The process starts without any preconceived theories or codes, and patterns, themes, and categories emerge out of the data.

steps in research data analysis

The researcher codes the data to capture any concepts or patterns that seem interesting or important to the research question . These codes are then compared and linked, leading to the formation of broader categories or themes. The main goal of inductive analysis is to allow the data to 'speak for itself' rather than imposing pre-existing expectations or ideas onto the data.

Deductive and inductive approaches can be seen as sitting on opposite poles, and all research falls somewhere within that spectrum. Most often, qualitative data analysis approaches blend both deductive and inductive elements to contribute to the existing conversation around a topic while remaining open to potential unexpected findings. To help you make informed decisions about which qualitative data analysis approach fits with your research objectives, let's look at some of the common approaches for qualitative data analysis.

Content analysis is a research method used to identify patterns and themes within qualitative data. This approach involves systematically coding and categorizing specific aspects of the content in the data to uncover trends and patterns. An often important part of content analysis is quantifying frequencies and patterns of words or characteristics present in the data .

It is a highly flexible technique that can be adapted to various data types , including text, images, and audiovisual content . While content analysis can be exploratory in nature, it is also common to use pre-established theories and follow a more deductive approach to categorizing and quantifying the qualitative data.

steps in research data analysis

Thematic analysis is a method used to identify, analyze, and report patterns or themes within the data. This approach moves beyond counting explicit words or phrases and focuses on also identifying implicit concepts and themes within the data.

steps in research data analysis

Researchers conduct detailed coding of the data to ascertain repeated themes or patterns of meaning. Codes can be categorized into themes, and the researcher can analyze how the themes relate to one another. Thematic analysis is flexible in terms of the research framework, allowing for both inductive (data-driven) and deductive (theory-driven) approaches. The outcome is a rich, detailed, and complex account of the data.

Grounded theory is a systematic qualitative research methodology that is used to inductively generate theory that is 'grounded' in the data itself. Analysis takes place simultaneously with data collection , and researchers iterate between data collection and analysis until a comprehensive theory is developed.

Grounded theory is characterized by simultaneous data collection and analysis, the development of theoretical codes from the data, purposeful sampling of participants, and the constant comparison of data with emerging categories and concepts. The ultimate goal is to create a theoretical explanation that fits the data and answers the research question .

Discourse analysis is a qualitative research approach that emphasizes the role of language in social contexts. It involves examining communication and language use beyond the level of the sentence, considering larger units of language such as texts or conversations.

steps in research data analysis

Discourse analysts typically investigate how social meanings and understandings are constructed in different contexts, emphasizing the connection between language and power. It can be applied to texts of all kinds, including interviews , documents, case studies , and social media posts.

Phenomenological research focuses on exploring how human beings make sense of an experience and delves into the essence of this experience. It strives to understand people's perceptions, perspectives, and understandings of a particular situation or phenomenon.

steps in research data analysis

It involves in-depth engagement with participants, often through interviews or conversations, to explore their lived experiences. The goal is to derive detailed descriptions of the essence of the experience and to interpret what insights or implications this may bear on our understanding of this phenomenon.

steps in research data analysis

Whatever your data analysis approach, start with ATLAS.ti

Qualitative data analysis done quickly and intuitively with ATLAS.ti. Download a free trial today.

Now that we've summarized the major approaches to data analysis, let's look at the broader process of research and data analysis. Suppose you need to do some research to find answers to any kind of research question, be it an academic inquiry, business problem, or policy decision. In that case, you need to collect some data. There are many methods of collecting data: you can collect primary data yourself by conducting interviews, focus groups , or a survey , for instance. Another option is to use secondary data sources. These are data previously collected for other projects, historical records, reports, statistics – basically everything that exists already and can be relevant to your research.

steps in research data analysis

The data you collect should always be a good fit for your research question . For example, if you are interested in how many people in your target population like your brand compared to others, it is no use to conduct interviews or a few focus groups . The sample will be too small to get a representative picture of the population. If your questions are about "how many….", "what is the spread…" etc., you need to conduct quantitative research . If you are interested in why people like different brands, their motives, and their experiences, then conducting qualitative research can provide you with the answers you are looking for.

Let's describe the important steps involved in conducting research.

Step 1: Planning the research

As the saying goes: "Garbage in, garbage out." Suppose you find out after you have collected data that

  • you talked to the wrong people
  • asked the wrong questions
  • a couple of focus groups sessions would have yielded better results because of the group interaction, or
  • a survey including a few open-ended questions sent to a larger group of people would have been sufficient and required less effort.

Think thoroughly about sampling, the questions you will be asking, and in which form. If you conduct a focus group or an interview, you are the research instrument, and your data collection will only be as good as you are. If you have never done it before, seek some training and practice. If you have other people do it, make sure they have the skills.

steps in research data analysis

Step 2: Preparing the data

When you conduct focus groups or interviews, think about how to transcribe them. Do you want to run them online or offline? If online, check out which tools can serve your needs, both in terms of functionality and cost. For any audio or video recordings , you can consider using automatic transcription software or services. Automatically generated transcripts can save you time and money, but they still need to be checked. If you don't do this yourself, make sure that you instruct the person doing it on how to prepare the data.

  • How should the final transcript be formatted for later analysis?
  • Which names and locations should be anonymized?
  • What kind of speaker IDs to use?

What about survey data ? Some survey data programs will immediately provide basic descriptive-level analysis of the responses. ATLAS.ti will support you with the analysis of the open-ended questions. For this, you need to export your data as an Excel file. ATLAS.ti's survey import wizard will guide you through the process.

Other kinds of data such as images, videos, audio recordings, text, and more can be imported to ATLAS.ti. You can organize all your data into groups and write comments on each source of data to maintain a systematic organization and documentation of your data.

steps in research data analysis

Step 3: Exploratory data analysis

You can run a few simple exploratory analyses to get to know your data. For instance, you can create a word list or word cloud of all your text data or compare and contrast the words in different documents. You can also let ATLAS.ti find relevant concepts for you. There are many tools available that can automatically code your text data, so you can also use these codings to explore your data and refine your coding.

steps in research data analysis

For instance, you can get a feeling for the sentiments expressed in the data. Who is more optimistic, pessimistic, or neutral in their responses? ATLAS.ti can auto-code the positive, negative, and neutral sentiments in your data. Naturally, you can also simply browse through your data and highlight relevant segments that catch your attention or attach codes to begin condensing the data.

steps in research data analysis

Step 4: Build a code system

Whether you start with auto-coding or manual coding, after having generated some first codes, you need to get some order in your code system to develop a cohesive understanding. You can build your code system by sorting codes into groups and creating categories and subcodes. As this process requires reading and re-reading your data, you will become very familiar with your data. Counting on a tool like ATLAS.ti qualitative data analysis software will support you in the process and make it easier to review your data, modify codings if necessary, change code labels, and write operational definitions to explain what each code means.

steps in research data analysis

Step 5: Query your coded data and write up the analysis

Once you have coded your data, it is time to take the analysis a step further. When using software for qualitative data analysis , it is easy to compare and contrast subsets in your data, such as groups of participants or sets of themes.

steps in research data analysis

For instance, you can query the various opinions of female vs. male respondents. Is there a difference between consumers from rural or urban areas or among different age groups or educational levels? Which codes occur together throughout the data set? Are there relationships between various concepts, and if so, why?

Step 6: Data visualization

Data visualization brings your data to life. It is a powerful way of seeing patterns and relationships in your data. For instance, diagrams allow you to see how your codes are distributed across documents or specific subpopulations in your data.

steps in research data analysis

Exploring coded data on a canvas, moving around code labels in a virtual space, linking codes and other elements of your data set, and thinking about how they are related and why – all of these will advance your analysis and spur further insights. Visuals are also great for communicating results to others.

Step 7: Data presentation

The final step is to summarize the analysis in a written report . You can now put together the memos you have written about the various topics, select some salient quotes that illustrate your writing, and add visuals such as tables and diagrams. If you follow the steps above, you will already have all the building blocks, and you just have to put them together in a report or presentation.

When preparing a report or a presentation, keep your audience in mind. Does your audience better understand numbers than long sections of detailed interpretations? If so, add more tables, charts, and short supportive data quotes to your report or presentation. If your audience loves a good interpretation, add your full-length memos and walk your audience through your conceptual networks and illustrative data quotes.

steps in research data analysis

Qualitative data analysis begins with ATLAS.ti

For tools that can make the most out of your data, check out ATLAS.ti with a free trial.

Making gender data insights accessible: Five steps to develop an effective gender factbook

Anna tabitha bonfert, heather moylan, miriam muller.

Making gender data insights accessible: Five steps to develop an effective gender factbook

The World Development Report 2021: Data for Better Lives called for using existing data more effectively to improve development outcomes. In line with this commitment, the recently launched World Bank Group Gender Strategy 2024 – 2030 calls for greater action to use data and evidence to promote solutions for gender equality. Data communication is an essential part of this process, although it is often overlooked. Effective communication improves accessibility to data insights among people who are not data experts, including policymakers, civil society advocates, media representatives, and the general public. To disseminate gender data, many countries use gender factbooks, which often cover a wide range of topics, from health and education, to work and employment, and women’s representation in political leadership.  To date, more than 120 countries have produced at least one (see map below), though they vary widely when it comes to visual appeal, readability, and timeliness.

map visualization

To support widespread access to and use of existing data, the World Bank developed the Strengthening Gender Statistics (SGS) project, a partnership between the Gender Group, the Poverty and Equity Global Practice, and the Living Standards Measurement Study (LSMS) —the World Bank’s flagship household survey program. The objective of the SGS is to work with national statistical offices (NSOs) in 12 partner countries to improve the availability, quality, and use of gender data in the economic domain; and to share learnings, promote global engagement and plan dissemination events. To date, it has supported 22 countries. To help NSOs in the production of gender factbooks, the SGS team developed a guidance note that provides a roadmap for countries working on their first factbook and insights for those that already have it. The recommendations have been organized into five sections:

1. Planning, planning, planning:  Developing a gender factbook is often a months-long process involving coordination across several government institutions, so planning is critical.

  • Identify relevant stakeholders, especially the main producers and users of gender data. 
  • Make sure the core production team includes experts in statistics, data visualization and writing analyses. 
  • Consider the length of time needed to obtain data and calculate indicators.
  • Develop a dissemination and communication strategy for the factbook early in the production process.   

2. Getting the right people involved: Stakeholder engagement matters for uptake, so get them involved early and often. This offers an excellent opportunity to gather input and build ownership. 

  • Engage producers and users of gender data to agree on the themes and validate the list of indicators, from national and international frameworks, to be considered for the factbook. 
  • Agree on the type of data sources that should be included and map data availability for the selected indicators. 
  • Share the final list of proposed indicators with stakeholders for their approval. 

Read more about Cameroon’s recent experience in bringing the right people together to develop a gender factbook here .  

3. Turning data into insights: The most powerful gender factbooks offer data insights, not only data.

  • Assemble selected indicators from existing publications or internal files. They can also be calculated from original data.
  • Ensure that the methodology used to produce indicators follows international standards (for example, as outlined in the metadata for the UN Minimum Set of Gender Indicators ), and also that documentation is clear and comprehensive. 
  • Gather information on relevant legal and policy frameworks to provide context for the indicator’s data. 
  • Analyze the data to identify interesting patterns, for example, trends over time or significant differences across subgroups of the population. 
  • Discuss these patterns with stakeholders to determine key messages to highlight in the factbook.  

4. Making data insights accessible through visuals and text: Effective data visualization requires the presentation of insights in a way that it is easy to understand to improve uptake. Bear in mind that readers of a gender factbook will bring different levels of data literacy, and key audiences, such as policymakers, may not have much time to figure out how to interpret the data. 

  • Present data in a variety of ways and use formats that are easier to understand, like charts, maps, pictograms, or streamlined tables. 
  • Choose visualizations that suit the type of data you have and the insights you want to highlight. It is important to follow best practices for visualizing gender data . 
  • Use text alongside the visualizations, as well as tables to contextualize, clarify, and explain the insights presented.

For examples of best practices, explore the collection of gender factbooks linked in the interactive map above.  

5. Disseminate, disseminate, disseminate: Effective dissemination puts the data insights you’ve worked so hard to develop in the hands of people who can use them to maximize reach and impact.

  • Organize a launch event to share insights timely and strengthen cross-institutional collaboration.
  • Develop tailored materials that package messages for different audiences, including fact sheets that summarize sector-specific insights for line ministries, or infographics that communicate high-level takeaways for the general public.
  • Spread the word about the launch event strategically and broadly, through newsletters or multimedia materials, and encourage participation of relevant decision-makers and wider audiences.

To see dissemination in action, watch a short video by Malibooknews about a recent workshop in Mali that highlighted gender data insights from the country’s annual household survey, the Enquête Modulaire et Permanente Auprès des Ménages 2022. By supporting data accessibility, gender factbooks provide data on gender gaps to inform policymaking and track progress over time. For example, a gender factbook can show that large gender gaps remain in unemployment despite a gap reduction in educational attainment. Factbooks dissemination activities are planned in different countries for the coming months, so it will be important to document learnings on the most effective ways of using this data to influence policy.

Anna Bonfert's Photo

Join the Conversation

Sentiment Analysis with Ticker News API Insights

Sep 6, 2024

In this tutorial, we explore the Ticker News API , enhanced through our internal research and the insights from our recently published paper on sentiment analysis using Large Language Models (LLMs). This API now captures structured data from unstructured financial news, enabling precise sentiment analysis tagging directly tied to specific tickers.

steps in research data analysis

We'll demonstrate the practical application of these capabilities by examining sentiment trends for tickers such as CRWD (CrowdStrike) and NVDA (NVIDIA), showing you firsthand how the upgraded API functions as a sophisticated tool for market analysis.

Research Background

The research from Polygon.io introduces a groundbreaking approach using Large Language Models (LLMs) to extract structured data from unstructured financial news. By combining the generative capabilities of LLMs with advanced prompting techniques and a robust validation framework, the system can accurately identify relevant company tickers and perform sentiment analysis at the company level from raw news content.

  • Ticker Identification Accuracy : The new approach demonstrated a significant improvement in accurately identifying company tickers directly from news content. It achieved a 90% accuracy rate in capturing all tickers that matched those identified by publishers, and it identified additional relevant tickers in 22% of articles.
  • Granular Sentiment Analysis : The methodology is unique in providing granular, per-company sentiment analysis extracted from news articles. This allows for a more nuanced understanding of the impact of news on specific companies.
  • Real-time API Updates : The insights extracted are made available through a live API, which is updated in real-time with the latest news, enabling timely and relevant financial analysis.

The evaluation showed high effectiveness, with the new methodology surpassing traditional data providers in both coverage and relevance, providing a granular, real-time analysis via an accessible API.

Exploring the Ticker News API

The Ticker News API has long been a cornerstone for accessing financial news associated with specific tickers. However, we have now enhanced this API by integrating insights that provide structured data from this unstructured news content. This feature adds valuable layers of sentiment analysis tagging, ticker identification, and summarization directly into the news data, enriching the information available to investors and analysts.

Before you start, make sure to obtain an API key by signing up at Polygon.io. In the following example, you can retrieve news articles for a specific ticker using the Polygon.io Python client library :

The response from the Ticker News API now includes detailed insights object that offers a deeper understanding of the sentiment and context surrounding the news articles. Here's an example of a TickerNews object that showcases these enhancements:

This updated API functionality not only provides traditional news retrieval but now offers enriched data that can significantly enhance financial analysis and decision-making processes.

Visualizing Ticker News Insights

Now, let's dive into a hands-on example of actually using the Ticker News API. The script we'll explore is designed to systematically analyze market sentiments by focusing on specific tickers, such as CRWD (CrowdStrike) and NVDA (NVIDIA). Here’s a breakdown of the workflow:

The script begins by fetching news with insights for a specific ticker using the Polygon.io Python client , spanning a chosen date range. It then organizes this data using the pandas library to make it ready for analysis. Finally, it uses matplotlib to graphically display the sentiment trends over time, helping to visually identify how different news events affect stock behavior. This simple process allows for easy examination of market sentiments related to news articles.

I've executed the script twice, selecting both CRWD (CrowdStrike) and NVDA (NVIDIA) as our focus tickers. The results showcased distinct sentiment trajectories for each company, reflecting the different news events they each experienced.

steps in research data analysis

For CrowdStrike, the graph displays a significant spike in negative sentiment following the global outage caused by a software update. This visual highlights the immediate and substantial increase in negative press, reflecting the severity of the impact across multiple industries.

steps in research data analysis

In contrast, the graph for NVIDIA illustrates a sustained increase in positive sentiment over an extended period, coinciding with the company's advancements in AI technology. This trend underscores the favorable media coverage that NVIDIA has received, bolstered by its pivotal role in driving AI innovations.

These visualizations underscore the distinct sentiment dynamics that can emerge from different industry events, providing a factual, data-driven basis for strategic decision-making in financial markets.

In this tutorial, we explored the enhanced Ticker News API , now with insights, that offers advanced sentiment tagging capabilities. We also explored how sentiment analysis works using the CRWD and NVDA tickers, illustrating the API's potential to reveal trends and impacts from news coverage.

Happy exploring!

integration quantconnect Feature Image

Integration: QuantConnect

We are excited to announce our integration with QuantConnect! This offering empowers users with state-of-the-art research, backtesting, parameter optimization, and live trading capabilities, all fueled by the robust market data APIs and WebSocket Streams of Polygon.io.

Jack Bell Profile Photo

Integration: everviz

We are thrilled to announce our latest partnership with everviz, bringing embeddable visualizations of Polygon.io data to any platform.

Historical File Downloads Included In All Plans – No Really

Polygon now includes daily historical Flat Files in all paid plans at no extra charge, featuring a new web-based File Browser and S3 access for simplified data exploration and integration.

Team Polygon Profile Photo

UCD–CE Integration: A Hybrid Approach to Reinforcing User Involvement in Systems Requirements Elicitation and Analysis Tasks

  • Published: 05 September 2024

Cite this article

steps in research data analysis

  • Christine Kalumera Akello   ORCID: orcid.org/0000-0001-6183-0698 1 &
  • Josephine Nabukenya   ORCID: orcid.org/0000-0002-4731-2496 2  

Requirements elicitation and analysis tasks in user-centered design (UCD) are pivotal for assessing digital systems’ quality and costs. However, these tasks often face challenges due to limited user involvement. This stems from unclear guidelines on how to conduct activities and engage users effectively to achieve their goals during the development process. This study explored how the integration of collaboration engineering (CE) principles with UCD approach could address these challenges. Using an Applied Science / Engineering approach, a UCD-CE process was designed drawing on the Six-layer model of Collaboration. This model aligns the CE steps with UCD principles (why), practices (what), and methods (how). Data collection tools included structured interviews, questionnaires, and observations, supported by techniques like user stories and dialogues, as well as thinkLets, and patterns of collaboration. Formative and summative evaluations were used to validate the UCD-CE process; and the results underscore its strengths, particularly its efficiency in helping users to complete tasks on time, reducing effort in reaching common goals, fostering high user satisfaction, promoting creativity and productivity, ensuring ease-of-use and learnability, and delivering comprehensive outcomes in requirements elicitation and analysis tasks during the development process. Future research aims to assess the practicality of UCD-CE integration in reinforcing user involvement during the UCD design phase.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save.

  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime

Price includes VAT (Russian Federation)

Instant access to the full article PDF.

Rent this article via DeepDyve

Institutional subscriptions

steps in research data analysis

Explore related subjects

  • Artificial Intelligence

210, ISO (2019) Ergonomics of human-system interaction — Part 210: human-centred design for interactive systems. Int Stand Vol 2: pp. 1–33

Amiyo M, Nabukenya J, Sol HG (2012) A repeatable collaboration process for exploring business process improvement alternatives. In: Proceedings of the annual Hawaii international conference on system sciences. pp 326–335. https://doi.org/10.1109/HICSS.2012.77

Ardito C, Buono P, Caivano D, Costabile MF, Lanzilotti R (2014) Investigating and promoting UX practice in industry: an experimental study. Int J Hum Comput Stud 72(6):542–551. https://doi.org/10.1016/J.IJHCS.2013.10.004

Article   Google Scholar  

Azadegan A, Harteveld C (2014) Work for or against players: on the use of collaboration engineering for collaborative games

Azadegan A, Papamichail KN, Sampaio P (2013) Applying collaborative process design to user requirements elicitation: a case study. Comput Ind 64(7):798–812. https://doi.org/10.1016/J.COMPIND.2013.05.001

Baek E, Boling E, Frick T (2008) User-centered design and development, (1)

Bani-Salameh H, Al Jawabreh N (2015) Towards a comprehensive survey of the requirements elicitation process improvements. In: ACM international conference proceeding series, 23–25-November-2015. https://doi.org/10.1145/2816839.2816872

Bano M, & Zowghi D (2013) User involvement in software development and system success: A systematic literature review. In: ACM International Conference Proceeding Series, 125–130. https://doi.org/10.1145/2460999.2461017

Barki H, Hartwick J (1989) Rethinking the concept of user involvement. MIS q: Manag Inform Syst 13(1):53–63. https://doi.org/10.2307/248700

Bazzano AN, Martin J, Hicks E, Faughnan M, Murphy L (2017) Human-centred design in global health: a scoping review of applications and contexts. PloS one 12(11):e0186744. https://doi.org/10.1371/journal.pone.0186744

Becker SJ, Scott K, Murphy CM, Pielech M, Moul SA, Yap KR, Garner BR (2019) User-centered design of contingency management for implementation in opioid treatment programs: a qualitative study. BMC Health Serv Res 19(1):1–9. https://doi.org/10.1186/S12913-019-4308-6

Briggs RO, Reinig BA (2010) Bounded ideation theory. J Manag Inform Syst 27(1):123–144

Briggs RO, De Vreede GJ, Nunamaker JF (2003) Collaboration engineering with thinkLet s to pursue sustained success with group support systems. J Manag Inf Syst 19(4):31–64. https://doi.org/10.1080/07421222.2003.11045743

Briggs R O, & Schwabe G (2011) On expanding the scope of design science in IS research. In: Lecture notes in computer science (including subseries lecture notes in artificial intelligence and lecture notes in bioinformatics), 6629 LNCS, 92–106. https://doi.org/10.1007/978-3-642-20633-7_7

Briggs R O, Kolfschoten G L, de Vreede G (2006) Instrumentality theory of consensus. https://www.researchgate.net/publication/260283965_Instrumentality_theory_of_consensus

Briggs RO, Kolfschoten GL, de Vreede GJ, Albrecht C, Lukosch S, Dean DL (2015) A six-layer model of collaboration. Collab Syst 221–227. https://doi.org/10.4324/9781315705569-22

Campese C, Amaral DC, Mascarenhas J (2020) Restating the meaning of UCD and HCD for a new world of design theories. Interact Comput 32(1):33–51. https://doi.org/10.1093/iwc/iwaa003

Canny A, Martinie C, Navarre D, Palanque P, Barboni E, Gris C (2021) Engineering model-based software testing of WIMP interactive applications: a process based on formal models and the SQUAMATA tool. In: Proceedings of the ACM on human-computer interaction, 5(EICS), 1–30

Chammas A, Quaresma M, MontAlvão C (2015) A closer look on the user centred design. Procedia Manufacturing 3:5397–5404. https://doi.org/10.1016/J.PROMFG.2015.07.656

Chokshi SK, Mann DM (2018) Innovating from within: a process model for user-centered digital development in academic medical centers. JMIR Human Fact. https://doi.org/10.2196/11048

De Vreede Jan, Briggs R O (2003) Instrumentality theory of consensus. https://www.semanticscholar.org/paper/Collaboration-Engineering-with-ThinkLet s-to-Pursue-Briggs-Vreede/a76ab6dd6f12f487539e24925ddcb160f7c48957

de Vreede G J, Briggs R O (2005) Collaboration engineering: Designing repeatable processes for high-value collaborative tasks. In: Proceedings of the annual Hawaii international conference on system sciences, 17. https://doi.org/10.1109/HICSS.2005.144

de Vreede GJ, Briggs RO, Massey AP (2009) Collaboration engineering: foundations and opportunities: editorial to the special issue on the journal of the association of information systems. J Assoc Info Syst 10(3). https://doi.org/10.17705/1jais.00191 . Available at: https://aisel.aisnet.org/jais/vol10/iss3/7

De Vreede GJ (2014) Two case studies of achieving repeatable team performance through collaboration engineering. MIS Quart Exec 13(2):115–129

Google Scholar  

De Vreede GJ, Briggs R (2018) Collaboration engineering: reflections on 15 years of research & practice. In: Proceedings of the 51st Hawaii international conference on system sciences

de Vreede GJ, Briggs RO (2019) A program of collaboration engineering research and practice: contributions, insights, and future directions. J Manag Inf Syst 36(1):74–119. https://doi.org/10.1080/07421222.2018.1550552

Derrick DC, Read A, Nguyen C, Callens A, De Vreede GJ (2013) Automated group facilitation for gathering wide audience end-user requirements. In: 2013 46th Hawaii international conference on system sciences. IEEE, pp. 195–204

Duque E, Fonseca G, Vieira H, Gontijo G, Ishitani L (2019) A systematic literature review on user centered design and participatory design with older people. In: IHC 2019 Proceedings of the 18th Brazilian symposium on human factors in computing systems. https://doi.org/10.1145/3357155.3358471

Eggen B, Van den Hoven E, Terken J (2014) Human-Centered Design and Smart Homes: How to Study and Design for the Home Experience? Health Care and Well-Being, Handbook of Smart Homes. https://doi.org/10.1007/978-3-319-01904-8_6-1

Book   Google Scholar  

Farinango CD, Benavides JS, Cerón JD, López DM, Álvarez RE (2018) Human-centered design of a personal health record system for metabolic syndrome management based on the ISO 9241–210: 2010 standard. J Multidiscip Healthc 11:21

Ferreira FK, Song EH, Gomes H, Garcia EB, Ferreira LM (2015) New mindset in scientific method in the health field: design thinking. Clinics 70(12):770–772. https://doi.org/10.6061/CLINICS/2015(12)01

Filip FG, Zamfirescu CB, Ciurea C (2017) Computer-supported collaborative decision-making. Autom Collaborat E-Services. https://doi.org/10.1007/978-3-319-47221-8

Good A, Omisade O (2019) Linking activity theory with user centred design: a human computer interaction framework for the design and validation of. Appl Interdisciplinary Theory in Health Inform: A Knowledge Base for Pract 263:49

Gulliksen J, Göransson B, Boivie I, Blomkvist S, Persson J, Cajander Å (2003) Key principles for user-centred systems design. Behaviour and Inform Technol 22(6):397–409. https://doi.org/10.1080/01449290310001624329

Harte R, Glynn L, Rodríguez-Molinero A, Baker PMA, Scharf T, Quinlan LR, ÓLaighin G, (2017) A human-centered design methodology to enhance the usability, human factors, and user experience of connected health systems: a three-phase methodology. JMIR Human Fact. https://doi.org/10.2196/HUMANFACTORS.5443

Helquist J H, Diller C B R, Kruse J (2019) Crowdsourcing convergence: aggregating partial clusters to facilitate collaborative convergence.In: proceedings of the annual Hawaii international conference on system sciences, 2019-January, 522–528. https://doi.org/10.24251/HICSS.2019.064

ISO 9241–210 (2010) ISO 9241–210. Ergonomics of human-system interaction – Part 210: Human-centred design for interactive systems. International Organisation Standardization, Génève, 2010

Kanstrup A M, Rotger-Griful S, Laplante-Lévesque A, Nielsen A C (2017) Designing connections for hearing rehabilitation: exploring future client journeys with elderly hearing aid users, relatives and healthcare providers. In: DIS 2017 - Proceedings of the 2017 ACM conference on designing interactive systems, 1153–1163. https://doi.org/10.1145/3064663.3064737

Kashfi PH (2018) Integrating user experience principles and practices into software development organizations: an empirical investigation. Chalmers Tekniska Hogskola

Kolfschoten G L, De Vreede G J (2007) The collaboration engineering approach for designing collaboration processes. In: Lecture Notes in Computer Science (Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 4715 LNCS, 95–110. https://doi.org/10.1007/978-3-540-74812-0_8

Kolfschoten G, De Vreede GJ (2009) A design approach for collaboration processes: a multimethod design science study in collaboration engineering. J Manag Inf Syst 26(1):225–256. https://doi.org/10.2753/MIS0742-1222260109

Konaté J, Sahraoui AEK, Kolfschoten GL (2014) Collaborative requirements elicitation: a process-centred approach. Group Decis Negotiat 23(4):847–877. https://doi.org/10.1007/S10726-013-9350-X

Lane S, Sammon D (2016) Journal of decision systems requirements gathering: the journey requirements gathering: the journey. J Decis Syst 25(s1):302–312. https://doi.org/10.1080/12460125.2016.1187390

Laporti V, Borges MR, Braganholo V (2009) Athena: a collaborative approach to requirements elicitation. Comput Ind 60(6):367–380

Lee C (2014) User-centered system design in an aging society: an integrated study on technology adoption (Doctoral dissertation, Massachusetts Institute of Technology).

Lopes A, Valentim N, Moraes B, Zilse R, Conte T (2018) Applying user-centered techniques to analyse and design a mobile application. J Softw Eng Res Develop 6(1):1–23. https://doi.org/10.1186/S40411-018-0049-1

Losada B (2018) Flexible requirement development through user objectives in an agile-UCD hybrid approach. Proceedings of the XIX international conference on human computer interaction - interacción 2018. https://doi.org/10.1145/3233824.3233865

Maalem S, Zarour N (2016) Challenge of validation in requirements engineering. J Innov Digit Ecosyst 3(1):15–21. https://doi.org/10.1016/j.jides.2016.05.001

Marchak JG, Cherven B, Williamson Lewis R, Edwards P, Meacham LR, Palgon M, Mertens AC (2020) User-centered design and enhancement of an electronic personal health record to support survivors of pediatric cancers. Support Care Cancer 28:3905–3914

Mithun AM, Yafooz WM (2018) Extended user centered design (UCD) process in the aspect of human computer interaction. In: 2018 International Conference on Smart Computing and Electronic Enterprise (ICSCEE) (pp. 1–6). IEEE.

Mohammed A, Hussain Dr MI, Dr ZP, Bin Z (2017) Application of user centered design for customer requirement in design process for automotive manufacturing organizations. IOSR J Mech Civil Eng 14(02):27–31. https://doi.org/10.9790/1684-1402062731

Nabukenya J (2012) Combining case study, design science and action research methods for effective collaboration engineering research efforts. In: Proceedings of the annual Hawaii international conference on system sciences, 343–352. https://doi.org/10.1109/HICSS.2012.162

Ning W, Goodman-Deane J, Clarkson P J (2019) In: Addressing Cognitive Limitations in Design-A Review on Existing Approaches. 5–8. https://doi.org/10.1017/dsi.2019.284

Nunamaker Jr JF, Briggs RO, Romano Jr NC (2015) A six-layer model of collaboration. In: Collaboration systems, pp 225–242. https://doi.org/10.4324/9781315705569-22

Ozcelik D, Quevedo-Fernandez, Thalen J, Terken J (2011) On the development of electronic design tools and associated guidelines for supporting the early stages of the design process. In: Proceedings of the DESIRE'11 conference on creativity and innovation in design, 115–126. https://doi.org/10.1145/2079216.2079232

Parveen N, Beg R, Khan MH (2014) Integrating security and usability at requirement specification process. Int J Comput Trends Technol 10(5):236–240

Prat N, Comyn-Wattiau I, Akoka J (2014) Artefact evaluation in information systems design-science research–a holistic view. PACIS 2014 Proceedings. 23. https://aisel.aisnet.org/pacis2014/23

Rahimi B, Safdari R, Jebraeily M (2014) Development of hospital information systems: User participation and factors affecting it. Acta Informatica Medica 22(6):398–401. https://doi.org/10.5455/AIM.2014.22.398-401

Randrup N L, Briggs R O (2015) Evaluating the performance of collaboration engineers. In: Proceedings of the annual Hawaii international conference on system sciences, 2015-march, 600–609. https://doi.org/10.1109/HICSS.2015.78

Ratwani RM, Fairbanks RJ, Zachary Hettinger A, Benda NC (2015) Electronic health record usability: analysis of the user-centered design processes of eleven electronic health record vendors. J Am Med Inform Assoc 22(6):1179–1182. https://doi.org/10.1093/JAMIA/OCV050

Read A, Hullsiek B, Briggs RO (2012) The seven layer model of collaboration: an exploratory study of process identification and improvement. In: 2012 45th Hawaii international conference on system sciences, 412–420. https://doi.org/10.1109/HICSS.2012.584

Reinig BA, de Vreede GJ, Briggs RO (2017) An investigation of the yield shift theory of satisfaction using field data from the United States and the Netherlands. Group Decis Negot 26(5):973–996

Roy JSS, Patrick Neumann W, Fels DI (2016) User centered design methods and their application in older adult community. Lecture Notes in Comput Sci (Including Subseries Lecture Notes in Artific Intell and Lecture Notes in Bioinform) 9734:462–472. https://doi.org/10.1007/978-3-319-40349-6_44

Sánchez E, Macías JA (2019) A set of prescribed activities for enhancing requirements engineering in the development of usable e-Government applications. Requirements Eng 24(2):181–203. https://doi.org/10.1007/S00766-017-0282-X

Scariot CA, Heemann A, Padovani S (2012) Understanding the collaborative-participatory design. Work 41(1):2701–2705. https://doi.org/10.3233/WOR-2012-0656-2701

Seffah A, Metzker E (2004) The obstacles and myths of usability and software engineering. Commun ACM 47(12):71–76. https://doi.org/10.1145/1035134.1035136

Sharma S, Pandey K, S (2013) Revisiting requirements elicitation techniques. Int J Comput Appl 75(12):35–39. https://doi.org/10.5120/13166-0889

Tastle WJ, Wierman MJ (2007) Using consensus to measure weighted targeted agreement. In: NAFIPS 2007-2007 annual meeting of the North American fuzzy information processing society. IEEE, pp 31–35

Teles S, de Sousa RT, Abrantes D, Bertel D, Ferreira A, Paúl C (2019) Bridging the gap between technology and older adults: insights from a collaborative workshop on R&D methodologies for ambient assisted living solutions. J Reliable Intell Environ 5(4):195–207. https://doi.org/10.1007/S40860-019-00090-1

Van Biljon J, Renaud K (2016) Validating mobile phone design guidelines: focusing on the elderly in a developing country. In: ACM international conference proceeding series, 26–28-Sept. https://doi.org/10.1145/2987491.2987492

Van Solingen R, Berghout EW (1999) The Goal/Question/Metric Method: a practical guide for quality improvement of software development. McGraw-Hill

Venable J, Pries-Heje J, Baskerville R (2012) A comprehensive framework for evaluation in design science research. In: Design science research in information systems. Advances in theory and practice: 7th international conference, DESRIST 2012, Las Vegas, NV, USA, May 14–15, 2012. Proceedings 7. Springer, Berlin, Heidelberg, pp 423–438)

Wallach D, Scholz S C (2012) User-centered design: why and how to put users first in software development. In: Software for people (pp. 11–38) Springer, Berlin, Heidelberg

Wilkinson CR, De Angeli A (2014) Applying user centred and participatory design approaches to commercial product development. Des Stud 35(6):614–631

Wu FG, Ma MY, Chang RH (2009) A new user-centered design approach: a hair washing assistive device design for users with shoulder mobility restriction. Appl Ergon 40(5):878–886. https://doi.org/10.1016/j.apergo.2009.01.002

Download references


We would like to acknowledge study participants, and colleagues who provided both technical feedback and proofread this manuscript.

Department of Computer Science, Gulu University, Gulu, Uganda

Christine Kalumera Akello

School of Computing and Informatics Technology, Makerere University, Kampala, Uganda

Josephine Nabukenya

You can also search for this author in PubMed   Google Scholar


All authors have made substantial contributions to the content and writing of the paper.

Corresponding author

Correspondence to Josephine Nabukenya .

Ethics declarations

Conflict of interest.

The authors report no conflict of interest.

Additional information

Publisher's note.

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Gnss time series analysis with machine learning algorithms: a case study for anatolia.

steps in research data analysis

1. Introduction

2.1. data acquiring.

Click here to enlarge figure

2.2. Evaluation of GNSS Data

3. methodology, 3.1. data segmentation and feature extraction, 3.2. accuracy and evaluation metrics, 4.1. preprocessing of the time series, 4.2. residual analysis, 5. discussion, 6. conclusions, supplementary materials, author contributions, data availability statement, acknowledgments, conflicts of interest, abbreviations.

GNSSGlobal Navigational Satellite System
TUSAGATurkish National Continuous GNSS Network
RCMTRegional Centroid Moment Tensor
MwMoment Magnitude
CMECommon Mode Error
MLMachine Learning
LSTMLong Short-Term Memory
NAFNorth Anatolian Fault
EAFEast Anatolian Fault
IGSInternational GNSS Service
AMLSTM NNAttention Mechanism with Long Short-Time Memory Neural Network
TPTrue Positive
TNTrue Negative
FPFalse Positive
FNFalse Negative
TP′True Positive Prime
RNNRecurrent Neural Network
MSEMean Squared Error
MAEMean Absolute Error
RMSERoot Mean Squared Error
R R-squared (Coefficient of Determination)
ROCReceiver Operating Characteristic
AUCArea Under the Curve
GMTGeneric Mapping Tools
Number of treesNumber of boosting rounds
Learning RateStep size shrinkage used to prevent overfitting
Maximum depthMaximum depth of a tree
Minimum Child WeightMinimum sum of instance weight (hessian) needed in a child
SubsampleFraction of observations to be randomly sampled for each tree
Column sampleFraction of features to be randomly sampled for each tree
Number of LayersThe depth of the network
Number of Units per LayerThe number of memory cells in each layer
Dropout RateThe fraction of input units to drop during training to prevent overfitting
Learning RateThe step size for the optimizer
Batch SizeThe number of samples per gradient update
Number of EpochsThe number of times the entire dataset is passed through the network during training
  • Blewitt, G.; Hammond, W.C.; Kreemer, C.; Plag, H.P.; Stein, S.; Okal, E. GPS for real-time earthquake source determination and tsunami warning systems. J. Geod. 2009 , 83 , 335–343. [ Google Scholar ] [ CrossRef ]
  • Mao, A.; Harrison, C.G.A.; Dixon, T.H. Noise in GPS coordinate time series. J. Geophys. Res. Solid Earth 1999 , 104 , 2797–2816. [ Google Scholar ] [ CrossRef ]
  • Dong, D.; Fang, P.; Bock, Y.; Cheng, M.K.; Miyazaki, S. Anatomy of apparent seasonal variations from GPS-derived site position time series. J. Geophys. Res. Solid Earth 2002 , 107 , ETG 9-1–ETG 9-16. [ Google Scholar ] [ CrossRef ]
  • Williams, S. The effect of coloured noise on the uncertainties of rates estimated from geodetic time series. J. Geodesy 2003 , 76 , 483–494. [ Google Scholar ] [ CrossRef ]
  • King, M.A.; Williams, S.D.P. Apparent stability of GPS monumentation from short-baseline time series. J. Geophys. Res. Solid Earth 2009 , 114 , B10403. [ Google Scholar ] [ CrossRef ]
  • Jiang, Y.; Wdowinski, S.; Dixon, T.H.; Hackl, M.; Protti, M.; Gonzalez, V. Slow slip events in Costa Rica detected by continuous GPS observations, 2002–2011. Geochem. Geophys. Geosyst. 2012 , 13 , Q04006. [ Google Scholar ] [ CrossRef ]
  • Gazeaux, J.; Williams, S.; King, M.; Bos, M.; Dach, R.; Deo, M.; Moore, A.W.; Ostini, L.; Petrie, E.; Roggero, M.; et al. Detecting offsets in GPS time series: First results from the detection of offsets in GPS experiment. J. Geophys. Res. Solid Earth 2013 , 118 , 2397–2407. [ Google Scholar ] [ CrossRef ]
  • Frank, W.B.; Rousset, B.; Lasserre, C.; Campillo, M. Revealing the cluster of slow transients behind a large slow slip event. Sci. Adv. 2018 , 4 , eaat0661. [ Google Scholar ] [ CrossRef ]
  • Bos, M.; Fernandes, R.; Williams, S.; Bastos, L. Fast error analysis of continuous GPS observations. J. Geodesy 2008 , 82 , 157–166. [ Google Scholar ] [ CrossRef ]
  • Dong, D.; Fang, P.; Bock, Y.; Webb, F.; Prawirodirdjo, L.; Kedar, S.; Jamason, P. Spatiotemporal filtering using principal component analysis and Karhunen-Loeve expansion approaches for regional GPS network analysis. J. Geophys. Res. Solid Earth 2006 , 111 , B03405. [ Google Scholar ] [ CrossRef ]
  • Williams, S.D.P.; Bock, Y.; Fang, P.; Jamason, P.; Nikolaidis, R.M.; Prawirodirdjo, L.; Miller, M.; Johnson, D.J. Error analysis of continuous GPS position time series. J. Geophys. Res. Solid Earth 2004 , 109 , B03412-1. [ Google Scholar ] [ CrossRef ]
  • Williams, S.D. CATS: GPS coordinate time series analysis software. GPS Solut. 2008 , 12 , 147–153. [ Google Scholar ] [ CrossRef ]
  • Segall, P.; Desmarais, E.K.; Shelly, D.; Miklius, A.; Cervelli, P. Earthquakes triggered by silent slip events on Kīlauea volcano, Hawaii. Nature 2006 , 442 , 71–74. [ Google Scholar ] [ CrossRef ] [ PubMed ]
  • Brown, J.R.; Beroza, G.C.; Ide, S.; Ohta, K.; Shelly, D.R.; Schwartz, S.Y.; Rabbel, W.; Thorwart, M.; Kao, H. Deep low-frequency earthquakes in tremor localize to the plate interface in multiple subduction zones. Geophys. Res. Lett. 2009 , 36 . [ Google Scholar ] [ CrossRef ]
  • Bishop, C.M.; Nasrabadi, N.M. Pattern Recognition and Machine Learning ; Springer: New York, NY, USA, 2006; Volume 4. [ Google Scholar ]
  • Goodfellow, I.; Bengio, Y.; Courville, A. Deep Learning ; MIT Press: Cambridge, MA, USA, 2016. [ Google Scholar ]
  • Wang, J.; Nie, G.; Gao, S.; Wu, S.; Li, H.; Ren, X. Landslide Deformation Prediction Based on a GNSS Time Series Analysis and Recurrent Neural Network Model. Remote Sens. 2021 , 13 , 1055. [ Google Scholar ] [ CrossRef ]
  • Crocetti, L.; Schartner, M.; Soja, B. Discontinuity Detection in GNSS Station Coordinate Time Series Using Machine Learning. Remote Sens. 2021 , 13 , 3906. [ Google Scholar ] [ CrossRef ]
  • Wang, J.; Jiang, W.; Li, Z.; Lu, Y. A New Multi-Scale Sliding Window LSTM Framework (MSSW-LSTM): A Case Study for GNSS Time-Series Prediction. Remote Sens. 2021 , 13 , 3328. [ Google Scholar ] [ CrossRef ]
  • Gao, W.; Li, Z.; Chen, Q.; Jiang, W.; Feng, Y. Modelling and prediction of GNSS time series using GBDT, LSTM and SVM machine learning approaches. J. Geodesy 2022 , 96 , 71. [ Google Scholar ] [ CrossRef ]
  • Ruttner, P.; Hohensinn, R.; D’Aronco, S.; Wegner, J.D.; Soja, B. Modeling of Residual GNSS Station Motions through Meteorological Data in a Machine Learning Approach. Remote Sens. 2022 , 14 , 17. [ Google Scholar ] [ CrossRef ]
  • Chen, H.; Lu, T.; Huang, J.; He, X.; Yu, K.; Sun, X.; Ma, X.; Huang, Z. An Improved VMD-LSTM Model for Time-Varying GNSS Time Series Prediction with Temporally Correlated Noise. Remote Sens. 2023 , 15 , 3694. [ Google Scholar ] [ CrossRef ]
  • Li, Z.; Lu, T.; Yu, K.; Wang, J. Interpolation of GNSS Position Time Series Using GBDT, XGBoost, and RF Machine Learning Algorithms and Models Error Analysis. Remote Sens. 2023 , 15 , 4374. [ Google Scholar ] [ CrossRef ]
  • Xie, Y.; Wang, J.; Li, H.; Dong, A.; Kang, Y.; Zhu, J.; Wang, Y.; Yang, Y. Deep Learning CNN-GRU Method for GNSS Deformation Monitoring Prediction. Appl. Sci. 2024 , 14 , 4004. [ Google Scholar ] [ CrossRef ]
  • McKenzie, D. Active tectonics of the Mediterranean region. Geophys. J. Int. 1972 , 30 , 109–185. [ Google Scholar ] [ CrossRef ]
  • Şengör, A.; Tüysüz, O.; İmren, C.; Sakınç, M.; Eyidoğan, H.; Görür, N.; Le Pichon, X.; Rangin, C. The North Anatolian Fault: A New Look. Annu. Rev. Earth Planet. Sci. 2005 , 33 , 37–112. [ Google Scholar ] [ CrossRef ]
  • Reilinger, R.E.; Ergintav, S.; Bürgmann, R.; McClusky, S.; Lenk, O.; Barka, A.; Gurkan, O.; Hearn, L.; Feigl, K.L.; Cakmak, R.; et al. Coseismic and Postseismic Fault Slip for the 17 August 1999, M = 7.5, Izmit, Turkey Earthquake. Science 2000 , 289 , 1519–1524. [ Google Scholar ] [ CrossRef ]
  • McKenzie, D. The East Anatolian Fault: A major structure in eastern Turkey. Earth Planet. Sci. Lett. 1976 , 29 , 189–193. [ Google Scholar ] [ CrossRef ]
  • Fielding, E.J.; Lundgren, P.R.; Taymaz, T.; Yolsal-Çevikbilen, S.; Owen, S.E. Fault-Slip Source Models for the 2011 M 7.1 Van Earthquake in Turkey from SAR Interferometry, Pixel Offset Tracking, GPS, and Seismic Waveform Analysis. Seismol. Res. Lett. 2013 , 84 , 579–593. [ Google Scholar ] [ CrossRef ]
  • Chousianitis, K.; Konca, A.O. Rupture Process of the 2020 M7.0 Samos Earthquake and its Effect on Surrounding Active Faults. Geophys. Res. Lett. 2021 , 48 , e2021GL094162. [ Google Scholar ] [ CrossRef ]
  • Yıldırım, C.; Aksoy, M.E.; Özcan, O.; İşiler, M.; Özbey, V.; Çiner, A.; Salvatore, P.; Sarıkaya, M.A.; Doğan, T.; İlkmen, E.; et al. Coseismic (20 July 2017 Bodrum-Kos) and paleoseismic markers of coastal deformations in the Gulf of Gökova, Aegean Sea, SW Turkey. Tectonophysics 2022 , 822 , 229141. [ Google Scholar ] [ CrossRef ]
  • Cakir, Z.; Doğan, U.; Akoğlu, A.M.; Ergintav, S.; Özarpacı, S.; Özdemir, A.; Nozadkhalil, T.; Çakir, N.; Zabcı, C.; Erkoç, M.H.; et al. Arrest of the Mw 6.8 January 24, 2020 Elaziğ (Turkey) earthquake by shallow fault creep. Earth Planet. Sci. Lett. 2023 , 608 , 118085. [ Google Scholar ] [ CrossRef ]
  • McClusky, S.; Balassanian, S.; Barka, A.; Demir, C.; Ergintav, S.; Georgiev, I.; Gurkan, O.; Hamburger, M.; Hurst, K.; Kahle, H.; et al. Global Positioning System constraints on plate kinematics and dynamics in the eastern Mediterranean and Caucasus. J. Geophys. Res. Solid Earth 2000 , 105 , 5695–5719. [ Google Scholar ] [ CrossRef ]
  • Reilinger, R.; McClusky, S.; Vernant, P.; Lawrence, S.; Ergintav, S.; Cakmak, R.; Ozener, H.; Kadirov, F.; Guliev, I.; Stepanyan, R.; et al. GPS constraints on continental deformation in the Africa-Arabia-Eurasia continental collision zone and implications for the dynamics of plate interactions. J. Geophys. Res. Solid Earth 2006 , 111 , B05411. [ Google Scholar ] [ CrossRef ]
  • Özbey, V.; Şengör, A.; Henry, P.; Özeren, M.S.; Haines, A.J.; Klein, E.; Tarı, E.; Zabcı, C.; Chousianitis, K.; Guvercin, S.E.; et al. Kinematics of the Kahramanmaraş triple junction and of Cyprus: Evidence of shear partitioning. BSGF-Earth Sci. Bull. 2024 , 195 , 15. [ Google Scholar ] [ CrossRef ]
  • Hochreiter, S.; Schmidhuber, J. Long Short-Term Memory. Neural Comput. 1997 , 9 , 1735–1780. [ Google Scholar ] [ CrossRef ]
  • Chen, T.; Guestrin, C. Xgboost: A scalable tree boosting system. In Proceedings of the 22nd Acm Sigkdd International Conference on Knowledge Discovery and Data Mining, San Francisco, CA, USA, 13–17 August 2016; pp. 785–794. [ Google Scholar ] [ CrossRef ]
  • Pondrelli, S.; Morelli, A.; Ekström, G.; Mazza, S.; Boschi, E.; Dziewonski, A. European–Mediterranean regional centroid-moment tensors: 1997–2000. Phys. Earth Planet. Inter. 2002 , 130 , 71–101. [ Google Scholar ] [ CrossRef ]
  • Şengör, A.M.C.; Zabci, C. The North Anatolian Fault and the North Anatolian Shear Zone BT. In Landscapes and Landforms of Turkey ; Springer International Publishing: Cham, Switzerland, 2019; pp. 481–494. [ Google Scholar ] [ CrossRef ]
  • Herring, T.; King, R.; Floyd, M.; McClusky, S. Introduction to GAMIT/GLOBK, Release 10.7, GAMIT/GLOBK Documentation ; Massachusetts Institute of Technology: Cambridge, MA, USA, 2018. [ Google Scholar ]
  • Blewitt, G.; Lavallée, D. Effect of annual signals on geodetic velocity. J. Geophys. Res. Solid Earth 2002 , 107 , ETG 9-1–ETG 9-11. [ Google Scholar ] [ CrossRef ]
  • McKinney, W. Data structures for statistical computing in Python. In Proceedings of the SciPy, Austin, TX, USA, 28–30 June 2010; Volume 445, pp. 51–56. [ Google Scholar ]
  • Pedregosa, F.; Varoquaux, G.; Gramfort, A.; Michel, V.; Thirion, B.; Grisel, O.; Blondel, M.; Prettenhofer, P.; Weiss, R.; Dubourg, V.; et al. Scikit-learn: Machine learning in Python. J. Mach. Learn. Res. 2011 , 12 , 2825–2830. [ Google Scholar ]
  • Abadi, M.; Agarwal, A.; Barham, P.; Brevdo, E.; Chen, Z.; Citro, C.; Corrado, G.S.; Davis, A.; Dean, J.; Devin, M.; et al. TensorFlow: Large-Scale Machine Learning on Heterogeneous Distributed Systems. arXiv 2016 , arXiv:1603.04467. [ Google Scholar ] [ CrossRef ]
  • Wessel, P.; Luis, J.F.; Uieda, L.; Scharroo, R.; Wobbe, F.; Smith, W.H.F.; Tian, D. The Generic Mapping Tools Version 6. Geochem. Geophys. Geosyst. 2019 , 20 , 5556–5564. [ Google Scholar ] [ CrossRef ]
  • Harris, C.R.; Millman, K.J.; Van Der Walt, S.J.; Gommers, R.; Virtanen, P.; Cournapeau, D.; Wieser, E.; Taylor, J.; Berg, S.; Smith, N.J.; et al. Array programming with NumPy. Nature 2020 , 585 , 357–362. [ Google Scholar ] [ CrossRef ]
Statistical FeaturesMean, Standard Deviation, Skewness, Kurtosis
Frequency FeaturesFourier Transform Coefficients
Trend FeaturesLinear Regression Coefficients, Polynomial Regression Coefficients
Metric FeaturesMaximum Displacement, Minimum Displacement, Range
Threshold FeaturesThreshold features for discontinuity detection
TP and (TP′)TN
ModelHyperparameterInitial Value
XGBoostNumber of trees200
XGBoost, LSTMLearning Rate0.1
XGBoostMaximum depth7
XGBoostMinimum Child Weight3
XGBoostColumn sample0.8
LSTMNumber of Layers2
LSTMNumber of Units per Layer100
LSTMDropout Rate0.3
LSTMBatch Size64
HyperparameterTested ValuesSelected Value
Number of trees[100, 200, 300, 500, 1000]200
Learning Rate[0.1, 0.2, 0.3]0.2
Maximum depth[3, 5, 7, 9]7
Minimum Child Weight[1, 3, 5, 7, 10]5
Subsample[0.5, 0.7, 0.9, 1.0]0.9
Column sample[0.3, 0.5, 0.7, 0.8, 1.0]0.8
Number of Layers[1, 2, 3]2
Number of Units per Layer[50, 100, 150]100
Dropout Rate[0.2, 0.3, 0.4]0.3
Batch Size[16, 32, 64, 128]64
The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

Özbey, V.; Ergintav, S.; Tarı, E. GNSS Time Series Analysis with Machine Learning Algorithms: A Case Study for Anatolia. Remote Sens. 2024 , 16 , 3309. https://doi.org/10.3390/rs16173309

Özbey V, Ergintav S, Tarı E. GNSS Time Series Analysis with Machine Learning Algorithms: A Case Study for Anatolia. Remote Sensing . 2024; 16(17):3309. https://doi.org/10.3390/rs16173309

Özbey, Volkan, Semih Ergintav, and Ergin Tarı. 2024. "GNSS Time Series Analysis with Machine Learning Algorithms: A Case Study for Anatolia" Remote Sensing 16, no. 17: 3309. https://doi.org/10.3390/rs16173309

  1. Data Analysis: Definition, Types and Examples

    steps in research data analysis

  2. What Is Data Analysis In Research Process

    steps in research data analysis

  3. 7 Steps of Data Analysis Process

    steps in research data analysis

  4. What is Data Analysis in Research

    steps in research data analysis

  5. PPT

    steps in research data analysis

  6. Unleashing Insights: Mastering the Art of Research and Data Analysis

    steps in research data analysis


  1. What is Data Analysis?

  2. Applied Survey Research Data Analysis Correlations

  3. Qualitative Research (Data Analysis and Interpretation) Video Lesson

  4. International Research Data Analysis

  5. Mastering Data Calculation: Unveiling Product Insights, Product Research Lecture 05 #productresearch

  6. Introduction to STATA for Data Analysis & Basic Statistical Concepts


  1. A Step-by-Step Guide to the Data Analysis Process

    1. Step one: Defining the question. The first step in any data analysis process is to define your objective. In data analytics jargon, this is sometimes called the 'problem statement'. Defining your objective means coming up with a hypothesis and figuring how to test it.

  2. The 6 Steps of a Data Analysis Process: Types of Data Analysis

    Step 5 of the data analysis process: Transforming results into reports or dashboards. Once the analysis is complete and conclusions have been drawn, the final stage of the data analysis process is to share these findings with a wider audience. In the case of a business data analysis, to the organisation's stakeholders.

  3. Data Analysis in Research: Types & Methods

    Definition of research in data analysis: According to LeCompte and Schensul, research data analysis is a process used by researchers to reduce data to a story and interpret it to derive insights. The data analysis process helps reduce a large chunk of data into smaller fragments, which makes sense. Three essential things occur during the data ...

  4. Data Analysis

    Data Analysis Process. The following are step-by-step guides to the data analysis process: Define the Problem. The first step in data analysis is to clearly define the problem or question that needs to be answered. This involves identifying the purpose of the analysis, the data required, and the intended outcome. Collect the Data

  5. The Beginner's Guide to Statistical Analysis

    This article is a practical introduction to statistical analysis for students and researchers. We'll walk you through the steps using two research examples. The first investigates a potential cause-and-effect relationship, while the second investigates a potential correlation between variables. Example: Causal research question.

  6. What is Data Analysis? An Expert Guide With Examples

    The data analysis process in a nutshell. Step 1: Defining objectives and questions. The first step in the data analysis process is to define the objectives and formulate clear, specific questions that your analysis aims to answer. This step is crucial as it sets the direction for the entire process.

  7. What Is the Data Analysis Process? (A Complete Guide)

    Data analysis starts with identifying a problem that can be solved with data. Once you've identified this problem, you can collect, clean, process, and analyze data. The purpose of analyzing this data is to identify trends, patterns, and meaningful insights, with the ultimate goal of solving the original problem.

  8. Introduction to Data Analysis

    Data analysis can be quantitative, qualitative, or mixed methods. Quantitative research typically involves numbers and "close-ended questions and responses" (Creswell & Creswell, 2018, p. 3).Quantitative research tests variables against objective theories, usually measured and collected on instruments and analyzed using statistical procedures (Creswell & Creswell, 2018, p. 4).

  9. PDF The SAGE Handbook of Qualitative Data Analysis

    The SAGE Handbook of. tive Data AnalysisUwe FlickMapping the FieldData analys. s is the central step in qualitative research. Whatever the data are, it is their analysis that, in a de. isive way, forms the outcomes of the research. Sometimes, data collection is limited to recording and docu-menting naturally occurring ph.

  10. Quantitative Data Analysis: A Comprehensive Guide

    Below are the steps to prepare a data before quantitative research analysis: Step 1: Data Collection. Before beginning the analysis process, you need data. Data can be collected through rigorous quantitative research, which includes methods such as interviews, focus groups, surveys, and questionnaires. Step 2: Data Cleaning.

  11. 7-Step Guide on How To Learn Data Analysis (as a Beginner)

    Learn data analysis as a beginner with our 7-step guide. Master the essential skills, tools, and techniques to kickstart your career in this high-demand field. Start your data journey today! ... Research Analyst at Virginia Commonwealth University. Read Story. Learning Data Analysis and Launching Your Career: Real-Life Examples ...

  12. Quantitative Data Analysis Methods & Techniques 101

    Quantitative data analysis is one of those things that often strikes fear in students. It's totally understandable - quantitative analysis is a complex topic, full of daunting lingo, like medians, modes, correlation and regression.Suddenly we're all wishing we'd paid a little more attention in math class…. The good news is that while quantitative data analysis is a mammoth topic ...

  13. What Is Data Analysis? (With Examples)

    Data analysis process. As the data available to companies continues to grow both in amount and complexity, so too does the need for an effective and efficient process by which to harness the value of that data. The data analysis process typically moves through several iterative phases. Let's take a closer look at each.

  14. Data Analysis Process: Key Steps and Techniques to Use

    Data analysis step 4: Analyze data. One of the last steps in the data analysis process is analyzing and manipulating the data, which can be done in various ways. One way is through data mining, which is defined as "knowledge discovery within databases". Data mining techniques like clustering analysis, anomaly detection, association rule ...

  15. What is Data Analysis? (Types, Methods, and Tools)

    Couchbase Product Marketing. December 17, 2023. Data analysis is the process of cleaning, transforming, and interpreting data to uncover insights, patterns, and trends. It plays a crucial role in decision making, problem solving, and driving innovation across various domains. In addition to further exploring the role data analysis plays this ...

  16. Steps in Quantitative Analysis

    Steps in Quantitative Analysis. Data management - This involves familiarizing yourself with appropriate software; systematically logging in and screening your data: entering the data into a program; and finally, 'cleaning' your data. Understanding variable types - Different data types demand discrete treatment, so it has important to be ...

  17. PDF A Step-by-Step Guide to Qualitative Data Analysis

    Step 1: Organizing the Data. "Valid analysis is immensely aided by data displays that are focused enough to permit viewing of a full data set in one location and are systematically arranged to answer the research question at hand." (Huberman and Miles, 1994, p. 432) The best way to organize your data is to go back to your interview guide.

  18. Qualitative Data Analysis: Step-by-Step Guide (Manual vs ...

    Step 1: Gather your qualitative data and conduct research (Conduct qualitative research) The first step of qualitative research is to do data collection. Put simply, data collection is gathering all of your data for analysis. A common situation is when qualitative data is spread across various sources.

  19. A Really Simple Guide to Quantitative Data Analysis

    It is important to know w hat kind of data you are planning to collect or analyse as this w ill. affect your analysis method. A 12 step approach to quantitative data analysis. Step 1: Start with ...

  20. A Step-by-Step Process of Thematic Analysis to Develop a Conceptual

    Thematic analysis is a research method used to identify and interpret patterns or themes in a data set; it often leads to new insights and understanding (Boyatzis, 1998; Elliott, 2018; Thomas, 2006).However, it is critical that researchers avoid letting their own preconceptions interfere with the identification of key themes (Morse & Mitcham, 2002; Patton, 2015).

  21. Qualitative Data Analysis

    Qualitative data analysis is an important part of research and building greater understanding across fields for a number of reasons. First, cases for qualitative data analysis can be selected purposefully according to whether they typify certain characteristics or contextual locations. In other words, qualitative data permits deep immersion into a topic, phenomenon, or area of interest.

  22. Learning to Do Qualitative Data Analysis: A Starting Point

    Yonjoo Cho is an associate professor of Instructional Systems Technology focusing on human resource development (HRD) at Indiana University. Her research interests include action learning in organizations, international HRD, and women in leadership. She serves as an associate editor of Human Resource Development Review and served as a board member of the Academy of Human Resource Development ...

  23. Data Analysis for the Behavioral Sciences

    By the end of the course, students will have a solid foundation in statistical methods, enabling them to analyze data critically and apply statistical techniques confidently in their research endeavors. Learning objectives. Explain various ways to categorize variables. Explain various ways to describe data.

  24. Making gender data insights accessible: Five steps to develop an

    The World Development Report 2021: Data for Better Lives called for using existing data more effectively to improve development outcomes. In line with this commitment, the recently launched World Bank Group Gender Strategy 2024 - 2030 calls for greater action to use data and evidence to promote solutions for gender equality. Data communication is an essential part of this process, although ...

  25. Sentiment Analysis with Ticker News API Insights

    Introducing. Sentiment Analysis with Ticker News API Insights. Sep 6, 2024. In this tutorial, we explore the Ticker News API, enhanced through our internal research and the insights from our recently published paper on sentiment analysis using Large Language Models (LLMs). This API now captures structured data from unstructured financial news, enabling precise sentiment analysis tagging ...

  26. UCD-CE Integration: A Hybrid Approach to Reinforcing User ...

    Requirements elicitation and analysis tasks in user-centered design (UCD) are pivotal for assessing digital systems' quality and costs. However, these tasks often face challenges due to limited user involvement. This stems from unclear guidelines on how to conduct activities and engage users effectively to achieve their goals during the development process. This study explored how the ...

  27. Remote Sensing

    This study addresses the potential of machine learning (ML) algorithms in geophysical and geodetic research, particularly for enhancing GNSS time series analysis. We employed XGBoost and Long Short-Term Memory (LSTM) networks to analyze GNSS time series data from the tectonically active Anatolian region. The primary objective was to detect discontinuities associated with seismic events. Using ...