Data is in our DNA. When we practice and promote “quantified conservation,” data is where it begins. We use it to identify the most impactful and cost-effective places on the landscape to solve the most pressing freshwater issues. Our data-driven approach drives our ability to assess watersheds, prioritize restoration actions, and monitor benefits over time. We chatted with leaders of our Science & Analytics Team to learn more. Dylan Harp, lead technical scientist, and Mark Porter, science director, describe how The Freshwater Trust skillfully manages and combines data to gain insight into freshwater ecosystems.
How do you define the job description of a data analyst?
Mark: Every person on our team is an analyst. Each day we collect large amounts of data. The data is from multiple sources and arrives in multiple formats, mainly tabular and geospatial. Our first task is to organize the data so that we can use it.
Dylan: Our team has people who specialize in different areas of analysis. We have the analysts who crank through numbers and do the science-based analyses. We have analysts who build our databases, getting data in the right spots and organized correctly. There are Geographic Information Systems (GIS) analysts, who assemble layers of data onto maps. On the economic side, we have analysts combing through data to make sure we're quantifying conservation costs as accurately as possible. Typically, team members can work across several disciplines.
Mark: All this data means we have to find ways to do our work efficiently. We spend a good chunk of time data wrangling where we transform and map data from raw forms into other formats. Our analysts write scripts and procedures to help pull data automatically from multiple sources. When we need to retrieve new data, the scripts document a repeatable workflow that helps us get the data into the right format and into our models.
What types of modeling and analysis do you do?
Dylan: All our work boils down to improving freshwater ecosystems. I spend a lot of time on agricultural modeling. We collect loads of information about farm fields in a certain area. We quantify the volume of water used to irrigate the fields, the amount of runoff from the fields, and how this impacts adjacent ecosystems. But we don’t stop there. We also determine the cost of doing upgrade actions that are going to mitigate those impacts, the cost-effectiveness of each of those projects, and we rank them.
Mark: My background is in hydrology. Some of the questions we want to answer concerning groundwater start with using trusted and credible sources, such as regional groundwater models from the U.S. Geological Survey. For an analysis in central Oregon, I started with a USGS model and ran a series of particle-tracking simulations that gave us travel times of water moving underground from agricultural fields into rivers. The goal here is to help us determine the depth and direction of contaminants in the groundwater, which feeds into our larger goal of taking actions aboveground to reduce the amount of field-level pollutants flowing into rivers and streams.

What are some of the ways TFT incorporates complex sets of data into our analysis?
Dylan: Our general workflow is sequential. We collect large quantities of data; process and format information; upload it to the database; query the data; pull that dataset to run our analysis; collect the results; push those results back up to the database; and then perform prioritizations. It’s a lot of sequential steps that then lead us to key insights that we visualize for our partners.
Mark: We have different expectations when using publicly available datasets. Data that are based on 30-meter by 30-meter grids (or larger) are helpful for a high-level screening of a large geography, such as an entire river basin that is many square miles. But often this data is based off averages with some of the nuances smoothed out. At this point, the data allows us to identify hot spots within a basin and pick places for more detailed analysis. When we go down to a finer scale, we often update the data. For example, we confirm crop rotations with farmers in an area and run our models again for higher accuracy.
Is TFT using artificial intelligence? Why or why not?
Dylan: Yes, we’re looking at responsible ways to apply artificial intelligence (AI) to our work. We want to make sure we're choosing the right solution for the problem.
One simple way is by using AI agents to help us write code. It's not much different than what we used to do with basic Google searches, but the agent synthesizes more information more quickly.
We also use machine learning (ML), a form of AI, to help us scan features within watersheds and landscapes more quickly. We use it to outline thousands of agricultural fields and identify the types of crops on those fields. Here we've leveraged ML and added our own post-processing procedures to the algorithms.
We have a scholarship from Google to investigate irrigation classification. Google thought our work with irrigation classification looked promising, so they’ve given us the opportunity to refine how AI identifies irrigation types from satellite data. Even as humans, we can’t always tell the difference between flood, sprinkler, and drip irrigated fields in the images. We’re investigating whether we can train a machine to spot the differences.


What are some of TFT's current uses of machine learning to improve water resources?
Mark: One specific example relates to water rights data in Colorado that we need but that isn’t in a usable format. The data is contained in thousands of pages of documentation from the 1960s and 1970s that have been digitized as PDFs. Our team is using machine learning to process those PDFs and pull out the data into tables. Using AI in this case is extremely efficient; otherwise, it would have taken hundreds of hours to pull out the data manually. The goal is to use this data as part of our analysis in a river basin that is struggling to balance agricultural and municipal water usage.
Dylan: Each time we incorporate ML into our analysis, we pay close attention to the accuracy of the results. A lot of this is iterative. We’re learning. We’re applying. We’re learning again, and reapplying.
Does TFT incorporate expert knowledge from people into our tools?
Mark: Our output is only as good as our input. For example, in the Snake River basin in Oregon and Idaho, we work directly with agricultural producers and irrigation equipment suppliers to check our datasets for crop rotations for the previous seven years. More accurate data allows us to re-run our modeling to determine the best places to upgrade irrigation methods. More efficient irrigation leads to less water and fertilizer running off fields, which contributes to the shared watershed-wide goal of reducing the amount of pollutants in the river that contribute to forming toxic methylmercury.
How do people use the insights provided by TFT's analytics?
Mark: Nobody wants to look at a table of thousands of lines of data. You can't interpret it, right? So, we visualize data and scenarios in different maps, web applications, and decision-support tools. This blends data analysis with design in a way that helps the data tell a story.
Dylan: We want our partners and clients to use these apps to help them make informed conclusions and take specific actions that have measurable, positive impacts on freshwater ecosystems.
What part of your job do you like the best?
Mark: Each day is something new. The challenges keep me on my toes. TFT is an amazing group of people that is really fun to work with.
Dylan: I like figuring out new ways to solve problems. I can start with a kernel of an idea, maybe about how to model agricultural drains, and push that idea forward by prototyping a new set of capabilities, which helps us analyze something that we couldn't before. In the end, we want to provide useful solutions.
See also: BasinScout: Surveying Entire Watersheds to Prioritize Conservation Actions