UW Statistics Course using HTC
Hannah Cheren July 06, 2022
For the first time, UW Statistics undergraduates could participate in a course teaching high throughput computing (HTC). John Gillett, lecturer of Statistics at the University of Wisconsin-Madison, designed and taught the course with the support of the Center for High Throughput Computing (CHTC).
This past spring HTC was introduced to a new realm – the inside of an undergraduate statistics course. John Gillett, a lecturer in the Statistics department at the University of Wisconsin-Madison, unveiled a new special topics course, Statistics 479, to undergraduate students in the spring of 2022. The course introduced students with little programming experience to a robust and easy-to-learn approach that they could use to tackle significant computational problems. “The basics of distributed computing are easy to learn and very powerful,” Gillett explained.“[That’s why] it fit with the CHTC – I knew they could give the students and me the computing capabilities and support.”
This class was created as an undergraduate counterpart to the graduate-level course, Statistics 605, which Gillett has taught since the Spring of 2017. The course includes learning basic distributed computing to analyze data sets too large for a laptop.
Gillett reached out to research computing facilitator Lauren Michael in 2016. He hoped to learn how he could teach his students easy parallel computing. He settled on HTC, as it was easiest for helping students do large computations. “This was an easy path for me,” the teacher remarked, “and everyone at the CHTC made it easy.”
Research Facilitator Christina Koch guest lectured in 2017 when the graduate class was first offered, and every semester since. She talks to the students about the CHTC and high throughput computing and has them run a few jobs. Koch notes that this partnership between the CHTC and Gillett’s class has been “a win-win; we get to share about our system and how people run things, and he gets to have this interesting, hands-on assignment for his class.”
Gillett created an assignment that involves using HTC on a real data set with the help of Christy Tremonti, a UW-Madison Astronomy professor. Tremonti had a research problem that required searching through many astronomical spectra (of photos of galaxies) for a particular type corresponding to a gravitationally lensed Lyman-break galaxy. “In the beginning, she gave a lot of good, critical feedback for the research element of this,” Gillett explained. She guided the students through large-scale computations during the first few semesters. As he reflects on this partnership, Gillett beams, “this was exciting too – we were doing unknown statistics on a real research problem. We didn’t know what the right answer was!”
Gillett remarked that his students enjoy working with the CHTC; “[the students] now understand how to work a parallel computing environment,” he noted. “They get excited about the power they now have to extract solutions from big piles of data.” This course offers students simple, powerful tools to do just that.
Gillett appreciated the help and support he received from the CHTC in this course development “I needed a little more knowledge and their willingness to help support the students and me.” The technologies and services that the CHTC develops for HTC gave Gillett an easy and accessible way to teach his students programming and computational thinking skills that they’ll be able to carry with them.
“Students go from being weak programmers to not being intimidated by big data sets and computations that they wouldn’t have been able to consider otherwise. I’m proud about that.” These individuals come out of these classes with a different kind of confidence about data problems – and that is priceless.
John Gillett is currently looking for new researchers with whom his students could collaborate. If you are a researcher who can provide a reasonably large and accessible dataset, a question, and guidance, please reach out to [email protected].