Tutorial: Batch Processing on Windows¶
Introduction¶
These are the steps that will need to be taken in order to use the batch
scripting framework for InVEST models available in the natcap.invest
python
package.
For those wishing to do batch-processing with InVEST without setting up a Python scripting environment, see The InVEST CLI for examples of how to run InVEST models from the command-line.
Setting up your Python environment¶
Install Python 3.6 or later.
Python can be downloaded from here. When installing, be sure to allow
python.exe
to be added to the path in the installation options.Put pip on the PATH.
The
pip
utility for installing python packages is already included with Python 2.7.9 and later. Be sure to addC:\Python37\Scripts
(or your custom install location) to the Windows PATH environment variable so thatpip
can be called from the command line without needing to use its full path.After this is done (and you’ve opened a new command-line window), you will be able to use
pip
at the command-line to install packages like so:> pip install <packagename>
Install packages needed to run InVEST.
Most (maybe even all) of these packages can be downloaded as precompiled wheels from Christoph Gohlke’s build page. Others should be able to be installed via
pip install <packagename>
.# # Any lines with "# pip-only" at the end will be processed by # scripts/convert-requirements-to-conda-yml.py as though it can only be found # on pip. GDAL>=3.1.2,!=3.3.0 # 3.3.0 had a bug that broke our windows builds: https://github.com/OSGeo/gdal/issues/3898 Pyro4==4.77 # pip-only pandas>=1.2.1 numpy>=1.11.0,!=1.16.0 Rtree>=0.8.2,!=0.9.1 Shapely>=1.7.1,<2.0.0 scipy>=1.6.0 # pip-only pygeoprocessing>=2.3.2 # pip-only taskgraph[niced_processes]>=0.10.3 # pip-only psutil>=5.6.6 chardet>=3.0.4 openpyxl
Install the InVEST python package
4a. Download a release of the
natcap.invest
python package.4b. Install the downloaded python package..
win32.whl
files are prebuilt binary distributions and can be installed via pip. See the pip docs for installing a package from a wheel.zip
and.tar.gz
files are source archives. See Installing from Source for details, including how to install specific development versions ofnatcap.invest
.
Creating Sample Python Scripts¶
Launch InVEST Model
Once an InVEST model is selected for scripting, launch that model from the Windows Start menu. This example in this guide follows the NDR model.
Fill in InVEST Model Input Parameters
Once the user interface loads, populate the inputs in the model likely to be used in the Python script. For testing purposes the default InVEST’s data is appropriate. However, if a user wishes to write a batch for several InVEST runs, it would be reasonable to populate the user interface with data for the first run.
Generate a sample Python Script from the User Interface
Open the Development menu at the top of the user interface and select “Save to python script…” and save the file to a known location.
Execute the script in the InVEST Python Environment
Launch a Windows PowerShell from the Start menu (type “powershell” in the search box), then invoke the Python interpreter on the InVEST Python script from that shell. In this example the Python interpreter is installed in
C:\Python37\python.exe
and the script was saved inC:\Users\rpsharp\Desktop\ndr.py
, thus the command to invoke the interpreter is:> C:\Python37\python.exe C:\Users\rpsharp\Desktop\ndr.py
Output Results
As the model executes, status information will be printed to the console. Once complete, model results can be found in the workspace folder selected during the initial configuration.
Modifying a Python Script¶
InVEST Python scripts consist of two sections:
The argument dictionary that represents the model’s user interface input boxes and parameters.
The call to the InVEST model itself.
For reference, consider the following script generated by the Nutrient model with default data inputs:
"""
This is a saved model run from natcap.invest.ndr.ndr.
Generated: Mon 16 May 2016 03:52:59 PM
InVEST version: 3.3.0
"""
import natcap.invest.ndr.ndr
args = {
u'k_param': u'2',
u'runoff_proxy_uri': u'C:\InVEST_3.3.0_x86\Base_Data\Freshwater\precip',
u'subsurface_critical_length_n': u'150',
u'subsurface_critical_length_p': u'150',
u'subsurface_eff_n': u'0.8',
u'subsurface_eff_p': u'0.8',
u'threshold_flow_accumulation': u'1000',
u'biophysical_table_uri': u'C:\InVEST_3.3.0_x86\WP_Nutrient_Retention\Input\water_biophysical_table.csv',
u'calc_n': True,
u'calc_p': True,
u'suffix': '',
u'dem_uri': u'C:\InVEST_3.3.0_x86\Base_Data\Freshwater\dem',
u'lulc_uri': u'C:\InVEST_3.3.0_x86\Base_Data\Freshwater\landuse_90',
u'watersheds_uri': u'C:\InVEST_3.3.0_x86\Base_Data\Freshwater\watersheds.shp',
u'workspace_dir': u'C:\InVEST_3.3.0_x86\ndr_workspace',
}
if __name__ == '__main__':
natcap.invest.ndr.ndr.execute(args)
Elements to note:
Parameter Python Dictionary: Key elements include the
‘args’
dictionary. Note the similarities between the key values such as‘workspace_dir’
and the equivalent “Workspace” input parameter in the user interface. Every key in the‘args’
dictionary has a corresponding reference in the user interface.
In the example below we’ll modify the script to execute the nutrient model for a parameter study of ‘threshold_flow_accumulation’.
Execution of the InVEST model: The InVEST API invokes models with a consistent syntax where the module name that contains the InVEST model is listed first and is followed by a function called ‘execute’ that takes a single parameter called
‘args’
. This parameter is the dictionary of input parameters discussed above. In this example, the line
natcap.invest.ndr.ndr.execute(args)
executes the nutrient model end-to-end. If the user wishes to make batch calls to InVEST, this line will likely be placed inside a loop.
Example: Threshold Flow Accumulation Parameter Study¶
This example executes the InVEST NDR model on 10 values of threshold accumulation stepping from 500 to 1000 pixels in steps of 50. To modify the script above, replace the execution call with the following loop:
if __name__ == '__main__':
#Loops through the values 500, 550, 600, ... 1000
for threshold_flow_accumulation in range(500, 1001, 50):
#set the accumulation threshold to the current value in the loop
args['threshold_flow_accumulation'] = threshold_flow_accumulation
#set the suffix to be accum### for the current threshold_flow_accumulation
args['suffix'] = 'accum' + str(threshold_flow_accumulation)
natcap.invest.ndr.ndr.execute(args)
This loop executes the InVEST nutrient model 10 times for accumulation values
500, 550, 600, ... 1000
and adds a suffix to the output files so results
can be distinguished.
Example: Invoke NDR Model on a directory of Land Cover Maps¶
In this case we invoke the InVEST nutrient model on a directory of land cover data located at C:UserRichDesktoplandcover_data. As in the previous example, replace the last line in the UI generated Python script with:
import os
landcover_dir = r'C:\User\Rich\Desktop\landcover_data'
if __name__ == '__main__':
#Loop over all the filenames in the landcover dir
for landcover_file in os.listdir(landcover_dir):
#Point the landuse uri parameter at the directory+filename
args['lulc_uri'] = os.path.join(landcover_dir, landcover_file)
#Make a useful suffix so we can differentiate the results
args['suffix'] = 'landmap' + os.path.splitext(landcover_file)[0]
#call the nutrient model
natcap.invest.ndr.ndr.execute(args)
This loop covers all the files located in
C:\User\Rich\Desktop\landcover_data
and updates the relevant lulc_uri
key in the args dictionary to each
of those files during execution as well as making a useful suffix so output
files can be distinguished from each other.
Example: Saving model log messages to a file¶
There are many cases where you may want or need to capture all of the log
messages generated by the model. When we run models through the InVEST user
interface application, the UI captures all of this logging and saves it to a
logfile. We can replicate this behavior through the python logging package,
by adding the following code just after the import
statements in the
example script.
import logging
import pygeoprocessing
# Write all NDR log messages to logfile.txt
MODEL_LOGGER = logging.getLogger('natcap.invest.ndr')
handler = logging.FileHandler('logfile.txt')
MODEL_LOGGER.addHandler(handler)
# log pygeoprocessing messages to the same logfile
PYGEO_LOGGER = logging.getLogger('pygeoprocessing')
PYGEO_LOGGER.addHandler(handler)
This will capture all logging generated by the ndr
model and by
pygeoprocessing
, writing all messages to logfile.txt
. While
this is a common use case, the logging
package provides functionality
for many more complex logging features. For more
advanced use of the python logging module, refer to the Python project’s
Logging Cookbook
Example: Enabling Parallel Processing¶
Note
This is an in-development feature and should be used with caution.
Most InVEST models accept an optional entry in the args
dictionary
representing the number of parallel workers. Acceptable values for this
number are:
-1
, representing synchronous execution (this is the default across InVEST)0
representing threaded task managementAny other positive integer represents the number of processes that will be created to handle tasks.
2*multiprocessing.cpu_count()
is usually a good number.
Warning
If you use this feature, you must wrap your script in a
if __name__ == '__main__':
condition. Failure to do so will result
in a fork bomb (https://en.wikipedia.org/wiki/Fork_bomb).
Using the parameter study example, this might look like:
if __name__ == '__main__':
args['n_workers'] = 4 # Use 4 processes
#Loops through the values 500, 550, 600, ... 1000
for threshold_flow_accumulation in range(500, 1001, 50):
#set the accumulation threshold to the current value in the loop
args['threshold_flow_accumulation'] = threshold_flow_accumulation
#set the suffix to be accum### for the current threshold_flow_accumulation
args['suffix'] = 'accum' + str(threshold_flow_accumulation)
natcap.invest.ndr.ndr.execute(args)
Summary¶
The InVEST scripting framework was designed to assist InVEST users in automating batch runs or adding custom functionality to the existing InVEST software suite. Support questions can be directed to the NatCap support forums at http://community.naturalcapitalproject.org.