CHTC Leads High Throughput Computing Demonstrations
Shirley Obih January 20, 2023
Data Science for understanding science communication involves learning to use statistical methods (e.g., chi-square, analysis of variance, correlation and regression analysis, nonparametric tests) and computational methods (e.g., automated text analysis, computer vision) – all of which sometimes requires complex, time-consuming computing that surpasses the capacity of the everyday computer.
To meet this computing challenge, Chen enlisted the help of CHTC Lead Research Computing Facilitator Christina Koch in November 2022 for a demonstration for her class. Chen wanted students to:
- Acquire knowledge about the basic approaches for large scale computing
- Understand the different scenarios regarding why they may need to use high throughput computing in research
- Be able to distinguish between independent and sequential tasks
- Be able to submit script jobs onto the campus computer cluster of CHTC
- Obtain a basic understanding of the parallel computing implementation in R
Koch achieved these goals by presenting the uses of HTC for large scale computing and leading a hands-on demonstration with Chen to teach students how to submit and run R programming scripts for topic modeling of social media data using HTC.
This learning, Chen noted, served as a tool to aid students to convert theoretical, class-based knowledge into more practical abilities, including learning how to approach computational tasks that could be useful in future work. Two examples of such complex computational tasks include structure topic models (STMs) and regression models. STM uses unsupervised machine learning to identify keywords and major themes across large corpus that could be interpreted into human-readable formats for data analysis. It is also useful in comparing social media influencer versus non-influencer perspectives on science issues through STM.
The majority of the students in the class, while new to CHTC resources, found the class to be a good introduction to HTC. Ph.D student Ashley Cate from LSC was a prime example. “I am still an extreme novice when it comes to understanding all the options CHTC has to offer. However, one thing that Christina Koch made very clear is that you’re not alone in your endeavor of utilizing HTC to meet your research needs, and I feel very confident that the professionals would be able to work me through how CHTC could help me.” Master’s student of Life Sciences Communication Jocelyn Cao reported that “I do think I will be utilizing CHTC in my future work because I am interested in doing work with social media.”
Other campus groups have also reached out to Koch to learn about CHTC services for their research. Lindley’s research group, a group of undergraduate students, M.S., Ph.D and postdocs candidates involved in nuclear reactor physics, advanced reactor design and integrated energy systems, wanted to understand how to harness the power of HPC/HTC in their research.
Ben Lindley, UW Madison Engineering Physics assistant professor has utilized CHTC in his previous work to build software. With the assistance of post-doc Una Baker, Lindley sought the help of CHTC… “One of the beauties of the high throughput computing resources is that we can analyze dozens or hundreds of cases in parallel,” Lindley said. These cases represent scenarios where certain design features of nuclear reactors are modified and observed for change. “Without HTC, the scope of research could be very limited. Computers could crash and tasks could take too long to complete.”
In-person demonstrations with classrooms and research groups are always available at CHTC to UW-Madison researchers looking to expand computing beyond local resources. Koch noted that “we are always happy to meet with course instructors who are interested in including large scale computing in their courses, to share different ways we can support our goals.”
Contact CHTC here.