Supplement: How to Analyze, Visualize, and Interpret Departmental Data
Supplement: How to Analyze, Visualize, and Interpret Departmental Data
This supplement provides an overview of analysis and visualization of
and .
If possible, partner with experts outside your department, such as the leaders of relevant campus offices (e.g., your office of institutional research, campus climate office, office of equity and inclusion, or human resources office) or an external consulting firm to develop, administer, and analyze surveys, interviews, and focus groups. Recognize that trained outside experts are likely to be more effective than department members for gaining
’ trust, ensuring safety and anonymity, and collecting useful feedback while avoiding the potential for retaliation or other negative consequences. However, recognize that non-physicists may need guidance on understanding and exploring physics-department-specific issues or challenges.
Qualitative data
are non-numerical data such as properties, qualities, or categories. Qualitative data often include responses to , but can also include graphics, audio, images, or documents. Qualitative data can come from a wide variety of sources, including surveys, interviews, focus groups, observations, and more. Qualitative data may be presented in terms of the frequency of categories or themes, or as a descriptive narrative.
One of the most common sources of qualitative data is open-ended question responses. An open-ended or open-response question is one in which the
can form their own answer. Open-ended questions always yield qualitative data, but they may be coded into themes which can be treated quantitatively. Open-ended questions take more time to answer and to analyze, but provide rich insight into the respondents’ thinking.
Analysis of qualitative data
There are many approaches to analyzing qualitative data. One of the common ones is thematic analysis, in which common themes or categories in the data are identified. For example, in a set of student interviews or surveys about career choice, if many
indicated that their parents’ careers influenced their career path, this might be identified as the theme “parent career.”
Here are some possible ways to identify themes in the data.
For survey questions:
Put all
responses to a question in a column in your spreadsheet software, with several blank columns to the right. These will hold your tallies of themes.
Read through five to ten survey responses to get a sense of common themes. Label your blank new columns with some of the most common and relevant themes. Label the last unused column “other”. If you are looking for certain themes (e.g., “did they mention the staff advisors?”) add that as a theme column, even if it is not common.
Read through each survey response and put a “1” in the cell corresponding to the theme you see in that response. Add, modify, and delete themes as you read more responses.
A single response can have more than one theme.
Limit yourself to five to seven main themes to keep the data from becoming confusing. Idiosyncratic responses can go into “other.”
Use the “sum” command to add up the total number of responses in each theme.
For interview or focus group data:
Create a document for common themes.
Read through one set of interview notes and listen to the recording. Identify themes in that interview, highlighting the transcript or notes where the theme(s) appear.
Add the theme(s) to the list of common themes, with some notes about each theme.
Continue with additional interview notes and recordings, adding, modifying, and deleting themes as you go.
Not all qualitative data analysis must include thematic analysis. Reading through interview results or open-ended responses to identify the main messages is also a valuable approach. To avoid bias, it is still recommended to generate a numerical count of the number of times an idea is mentioned. It is very easy to overestimate the prevalence of an idea if it is something that is particularly salient to the person doing the analysis.
Visualization and reporting of qualitative data
In the final reporting of your data, report the number (N) of interview
or survey who mentioned each theme in your final tally. Do not report percentages; percentages are misleading because more respondents might have mentioned that theme if directly asked.
While qualitative data doesn’t necessarily lend itself to visualization, there are a few techniques that can be used (see Evergreen, 2019 for more):
Quotations. It can be useful to highlight key quotations (in bold text, a table, or a quotation bubble) to bring the information to life.
Photos. Photos of classrooms or other relevant images can help illustrate a point.
Heat map. In a heat map, data values are represented by colors (e.g., darker colors represent the most prevalent themes.) Create a table with the themes as rows. Columns could be the major of the
or or any other attribute you would like to compare. Fill the cells in a darker color for the themes which were most commonly observed, and a lighter color for those less commonly observed. See Gaudino-Goering, 2021 for an example of heat maps used for academic assessment.
Word clouds. Word clouds are visual displays of the most commonly mentioned words in a qualitative data set. They provide a holistic overview of the information. There are many free online tools that can create this visual and allow you to remove non-interesting words.
Quantitative data
are data that are numerical, or that can be translated into numbers. Quantitative data are often collected through surveys (e.g., responses) or numerical data collection (e.g., students enrolled in a course or completing the major). Quantitative data can be displayed as tables and charts. Quantitative data are valuable for displaying trends and aggregate information, but without context, they can be misinterpreted or lack meaning. It is best to combine quantitative data with qualitative data to provide richer understanding. Closed-ended questions are one source of quantitative data. A closed-ended or closed-response question is one in which the must select from a set of answer choices. Closed-ended questions can include yes/no or true/false questions, , choose-all-that-apply checklists, or other sets of options. Closed-ended question responses can be treated categorically or numerically, so they can be considered quantitative data. Closed-ended questions are relatively quick to answer and to analyze, but offer limited information compared to . It is often best to use a mix of open-ended and closed-ended questions to provide a balance between quality and efficiency.
Analysis of quantitative data
Quantitative data analysis is an extremely broad field. This supplement summarizes a few salient points about this type of data, its analysis, and its interpretation.
Tip #1: If working with small numbers, use caution. If there are few students per year in the course or program you are analyzing, then aggregate across multiple years if possible. A three-year rolling average is a good way to smooth out short-term fluctuations in data. It’s also good practice to report the number (N), not percentages, when working with small numbers, to avoid suggesting over-generalization of your findings.
Tip #2: Choose appropriate descriptive statistics. Most departmental data can be characterized using a variety of descriptive statistics, which provide the basis for quantitative data analysis. Descriptive statistics give summary information about central tendency or variability and describe the basic features of the data or data sample. Common descriptive statistics are count, mean, median, mode, and standard deviation. An average is often useful to understand the typical response pattern, though information about the variability in the data is lost. If responses are strongly skewed (e.g., most responded “strongly agree”) then the median may be the most useful measure. It’s best to avoid using a mean for
responses unless the question has at least seven answer choices (so that ordinal data starts to approximate continuous data), there are no ceiling or floor effects, and the distribution of responses is close to normal. The distribution of responses can be visualized with a histogram, which is always a good addition to any descriptive statistics.
Tip #3: Calculate appropriate ratios and percentages. In addition to descriptive statistics, it is common to calculate ratios and percentages from raw numbers. Use a ratio to compare two different categories of numbers (e.g., number of students to the number of faculty). Use a percentage or proportion for the same category of numbers, e.g., the percentage of survey
who answered “yes”.
Tip #4: Compare groups meaningfully. Once you have descriptive statistics, ratios, and percentages, it is possible to compare these numbers across groups, e.g., by gender. Visualizing the data is often the most effective way to compare groups. However, make sure not to
marginalized identities, e.g., by using the response from a single person to indicate the collective opinion of students.
Tip #5: If working with Likert scale responses, bin or plot the data. If you have rating data with more than three
points, it is often helpful to combine categories (e.g., all “strongly agree,” “moderately agree,” and “somewhat agree” could be combined into “agree”). If you do this, ensure that the claims being made are consistent with the data and take this compression into account. This is particularly useful for purposes of comparing groups (such as students with different majors) to avoid trying to compare too many categories at once. Plotting the data (including histogram distributions) is also very useful for understanding responses to different questions or of different groups. Likert scale data can also be changed into a weighted average by assigning values to the scale points (e.g., “strongly agree” = 7, “strongly disagree” = 1), and most online survey tools will do this automatically, but this will lose much of the nuance of the data and can be misleading (see Tip #2 above). Thus, weighted averages should be used only while also examining the histogram of responses, and for looking for sizable differences (e.g., these three questions were top rated; these three questions were lowest rated).
Tip #6: Statistical tests probably aren’t needed. For most departmental data, statistical tests such as t-tests won’t be needed; a simple descriptive analysis will be adequate and most appropriate. Most statistical tests assume that your sample is drawn randomly from the population of interest, and that the data are normally distributed (“parametric statistics”). These assumptions are violated in most of the data described in the EP3 Guide. Statistical tests might be appropriate for comparing results before and after an intervention, provided you have an adequate sample size. In this case, seek guidance from someone knowledgeable about this type of data analysis.
Visualization and reporting of quantitative data
Visualization, interpretation, and reporting the data are important steps that connect data analysis to the hard work of actual departmental improvement. These steps turn data into knowledge. Return to the key questions the assessment was designed to answer and use them to drive visualization and reporting. Disaggregate data by demographics, major, or other critical subgroupings to understand the experiences of different groups. For guidance on how to do so respectfully and protect anonymity, see the Guidelines for Demographic Questions in the supplement on How to Design Surveys, Interviews, and Focus Groups.
A very useful resource on how to think about the meaning behind quantitative data and convey that meaning through visuals is Stephanie Evergreen’s book, Effective Data Visualization: The Right Chart for the Right Data. Her chart chooser is a very useful tool. She provides the following guiding questions for selecting appropriate visualizations:
Are you trying to compare two numbers to one another?
Are you trying to compare your numbers to a benchmark number?
Are you trying to interpret survey Likert (closed-choice) responses?
Are you trying to show the parts of a whole?
Are you trying to show change over time?
Are you trying to show how two variables are correlated?
The answers to these questions can help guide you to the most appropriate visualization, most of which are available in software such as Excel. Below are common visualizations or data presentation formats:
Tables are useful for separating data into meaningful categories, and comparing and contrasting values in a small space. They can be a useful companion to charts and can provide complete, exact data values. When sorted meaningfully, and when critical cells are highlighted, tables can also provide useful insights.
Bar charts are useful for showing distributions of responses or experiences, and are often used for survey results and/or to compare two groups. To compare responses to a benchmark, a line can be added showing the benchmark.
Histograms are a type of bar chart wherein the bar height represents the number of items or people in each category. Histograms are useful for showing parts of a whole.
Pie or doughnut charts are also useful for showing parts of a whole. Pie charts are particularly useful for showing a single important percentage (e.g., “90% of students have never taken a statistics course.”) Use only with two to three categories; pie charts with more categories are difficult to interpret and are not useful. Bar charts are a better choice for larger numbers of categories.
Scatter plots are useful for showing relationships between two variables.
Line chartsare useful for showing change over time.
See also Stephanie Evergreen’s Data Visualization Checklist for useful guidance on creating high-impact data visualizations.
Resources
Wikipedia contributors, “Thematic analysis,” Wikipedia, The Free Encyclopedia (2023).