How the GLUE Lab is bringing the potential of HTC to track the movement of cattle and land use change
Sarah Matysiak December 15, 2023
Researching land use change in the cattle sector is just one of several large projects where the GLUE Lab is working to apply HTC.
It was during a Data Science Research Bazaar presentation led by OSG Research Facilitation Lead Christina Koch in early 2023 when Matthew Christie, the Technical Lead of the Global Land Use and Environment Lab (GLUE) based in Madison, Wisconsin, says the GLUE Lab became more familiar with the Center for High Throughput Computing (CHTC). “That planted the seed for what the center [CHTC] could offer us,” Christie says.
The GLUE Lab studies how land across the world is being used for agriculture and the systems responsible for land use change. Christie — who researches land use in Brazil with a focus on how the Amazon and Cerrado biomes are changing as natural vegetation recedes — takes data describing the cattle supply chain in Brazil and integrates it into a single database the GLUE Lab can use for research. With this data, the lab also aims to inform policy decisions by the Brazilian government and international companies.
In the Amazon, Christie says, one of the main systems causing land use change is in the cattle sector, or the production of cattle. “One of the motivating facts of our research is that 80% of forest cleared in the Amazon is cleared in order to raise cattle. And so we’re interested in understanding the cattle supply chain, how it operates, and what it looks like.” The lab gets its data from the Rural Environmental Registry (CAR), which is a public property boundary registry data from Brazil, and the Guide to Animal Transport (GTA), which records animal movement and sales in Brazil.
The possibilities of utilizing high throughput computing (HTC) for the lab’s research intrigued Christie, who had some awareness of HTC from the research bazaar and had even started refactoring some of the lab’s data pipeline before attending, but he wanted to learn more besides what he gained from watching introductory tutorials. Christie was accepted and attended the OSG School in the summer of 2023. He and other lab members believed their work could benefit from the school training with HTCondor, the workload management application developed by the CHTC for HTC, and the associated big data sets with a large number of jobs.
Upon realizing the lab’s work could greatly benefit from the OSG School, Christie used a “test case” project that resembled a standard research project to model a task with many independent trials, finding how — for the first time — HTC could prove itself resourceful for GLUE Lab research. The specific project Christie worked on during the School using HTC was to compute simulated journeys of cows through properties in Brazil’s cattle supply chain. By the end of the week-long School, Christie says using HTC scaled up the modeling project by a factor of 10. In this sense, HTC is the “grease that makes our research run more smoothly.”
Since attending the School, witnessing the test case’s success with HTC, and discovering ways its other research projects could benefit, the GLUE Lab has begun shifting to applying HTC. However, this process requires pipeline changes lab members are currently working through. “We have been in the process of working through some of our big projects that we think really could benefit from these resources, but that in itself has a cost. Currently, we’re still in the process of writing or refactoring our pipelines to use HTC,” Christie elaborates.
For a current project, Christie mentions he and other GLUE Lab members are looking at how to adapt their code to HTC without having to rewrite all of it. With the parallelism that HTC offers compared to the single computing environment the lab used before to run its data pipeline, each job now has its own environment. But it’s complex “leveraging the parallelism in our database build pipeline. Working on that is an exercise, but with handling data, there are many dependencies, and you have to figure out how to model them.” Christie says lab members are working on adjusting the workflow to ensure each job has the data it needs before it can run. While this can sometimes be straightforward, “sometimes a step in the pipeline has special inputs that are unique to it. With many steps in the pipeline, properly tracking and preparing all this data has been the main source of work to get the pipeline to run fully using HTC.”
For now, Christie says cutting down the two-day run time of their database build pipeline to just a matter of hours with HTC “would be a wonderful improvement that would accelerate deployment and testing of this database. It would let us introduce new features and catch bugs faster.”
Christie recognizes the strength of the CHTC comes from not only its limitless computation power but also the humans who are running it behind the screen and that it’s free for researchers at UW–Madison, distinguishing it from other platforms and drastically lowering the entry barrier for researchers who want to scale up their research projects — “Instead of waiting months or years to receive funding for cloud resources, they can request an account and get started in a matter of weeks,” Christie says.
Christie values the unique opportunity to attend office hours and meet with facilitators, which makes his experience special. “I would definitely recommend that people look at this invaluable resource that we have on campus. Whether your work is with high throughput or high performance computing, there are offerings for both that researchers should consider,” Christie says.