How We Reported This Story: Compiling a 4,100+ Entry Database by Hand and Figuring Out What It Meant
By Benjamin Pontz, Editor-in-Chief
The genesis of this story was a relatively simple question: is it true, as I had often heard faculty members complain, that some departments teach more students in larger classes than others? If so, why is that, and what are the implications for the college at a time when it is confronting acute financial challenges and the potential for declines in enrollment?
This should be pretty simple, I thought to myself back in November. We will get course-level enrollment data from the Registrar’s Office, crunch the numbers, and see what it says.
I was wrong.
When we asked for the data, the answer came back that its disclosure is restricted to the Provost’s Office, academic departments themselves, faculty committees, and the Office of Institutional Analysis. We were not the first student organization to ask for this type of data, and we were not the first to have our request denied.
Nevertheless, as an organization whose mission is to pursue the facts, this denial left us undeterred. In retrospect, had I wanted to shield myself and my staff from hours upon hours of tedious monotony, I should have thought twice.
Instead, we set out to build a database of course enrollments on our own. Gathering the data one class at a time through PeopleSoft, the college’s online platform that allows for class searching, nine members of my team and I collectively spent more than 70 hours compiling, coding, and cleaning a dataset to answer the questions I outlined above.
In any dataset compiled and coded by hand with more than 4,000 entries, there will be errors. I am sure that a few faculty members who are on a tenure track are not listed as such or vice versa. I imagine a slip of the finger might have led a class with 32 students to be listed as having 31. Such are the perils of manually copying system-provided data. But I am confident that our multi-layer quality control process — which included cross-referencing to faculty listings on the college website, spot rechecking of entries, and independent data analysis — has led us to accurate conclusions.*
That said, if you spot an error, please let us know about it. We’ll gather any errors that are submitted and release a 2.0 version of the dataset if necessary. Also, feel free to use the data for your own research. It is downloadable as a comma-delimited spreadsheet at the bottom of this article, so you can play with it in Excel, Stata, SPSS, R, or whatever other data programs you might use. If you find something interesting, do let us know. Now that we have compiled the dataset, the more use it gets — and the more stories it uncovers — the merrier.
And now, for a highly-riveting methodological statement on coding decisions:
- This data was hand-coded by a team of eight Gettysburgian staff members following instructions and protocol delineated by Editor-in-Chief Benjamin Pontz.
- Using the “Class Search” interface in PeopleSoft, compilers searched at the department-semester unit of analysis (e.g., Africana Studies, Fall 2016) and, from there, examined each course entry one-at-a-time, gathering the course information, professor, enrollment cap, and actual enrollment.
- Separately, a team of Gettysburgian staffers coded for whether a course professor is tenured/on a tenure track or not (i.e. they hold the rank of adjunct, lecturer, or visiting faculty). This data was compiled — to the extent possible — from faculty listings on department webpages and, for faculty no longer at the college, from internet research to ascertain their status when they were at the college. In a small number of cases, faculty members’ status changed from non-tenure track to tenure-track during the four-year period of data analyzed. We did our best to account for the year in which those changes took effect.
- We chose to analyze only full-credit “traditional” sections of courses. In other words, labs, music ensembles and lessons, and independent studies were omitted from the analysis. (This follows a standard practice from the Common Data Set.)
- When courses had enrollment caps of 16+ students and had two or fewer enrollees, we presumed those sections were canceled and omitted them from the analysis.
- When we knew that multiple faculty members at the college have the same last name, we attempted to add an accurate first initial to alleviate confusion. This was of particular concern in departments with multiple faculty members of the same last name (e.g., Environmental Studies: Randy Wilson and Andy Wilson).
- When the same course changed departmental classifications, we combined it under a single department (e.g., ENG 203 was, for several years, listed as JOUR 203).
- We treated cross-listed classes as two (or more) individual courses so they could be reflected in each department’s analysis. In some cases, a single class could count as up to four different courses (e.g., a spring 2018 section of CLA 214, 314, 435 and ANTH 214 was all one class taught by Professor Benjamin Luley with a combined enrollment of 22 even as far fewer students were enrolled in each individual course number). We chose this approach because the locus of the analysis is the classroom environment — the difference in educational experience when a classroom has 20 students as compared to 6 students — as well as the grading and advising burden of faculty members, who get credit for teaching only one course no matter how many times it is cross-listed. Nevertheless, we flagged cross-listed courses in the notes section.
- Future research directions might include coding for specific faculty rank (as opposed to the broader tenure/tenure-track or not), demographics of faculty members, and whether courses meet Gettysburg Curriculum requirements.
* Known errors identified between print publication and web publication:
- Approximately six 100-level Economics courses from the spring of 2017 were omitted in error from the dataset that we analyzed, suggesting that the average class size and percent of courses enrolled above the cap in Economics is slightly under its actual value.
- Approximately seven 100-level English courses from the spring were omitted in error from the dataset that we analyzed, suggesting that the percent of courses enrolled above the cap in English is slightly under its actual value. (These classes were capped at 16, so, given the department average of 14.9, the effect of missing classes with an average enrollment of 16 or 17 is minimal.)
February 27, 2020
Impressive work; well done. It’s puzzling to me why this information isn’t available to students in the first place…since you’re the ones, perhaps along with faculty, who are arguably most directly affected by it.