Peter D. Wilson

Weed methods

Species lists

The Australian Plant Name Index (APNI) was used as the taxonomic authority for all work reported on this website.

Four species lists were used as the first source of species to be modelled. The lists included:

Weeds of National Significance (WoNS) list: A review process by weed experts resutled in a "Top 20" list of major weed species which has been formally adopted or "declared" as the Weeds of National Significance list. In the process of preparing the declared list, 50 other species were evaluated and are sometimes referred to as the WoNS shortlist. I used the full WoNS list (i.e. declared + shortlist). Further details of the process of making the list and the species included on the declared list and shortlist at Weeds Australia and Weeds in Australia.
Alert list: An alternate list of priority species is the Alert list. The list of species together with important biological and management information is available at Weeds in Australia.
Sleeper weeds: Yet another list of priority species, this time attempting to rank weeds that are extablished and could emerge in the near future as a major threat to agriculture and conservation. The species list together with important biological and management information is available at Weeds in Australia.
Australian Weeds Committee list: The Australian Weeds Committee (AWC) maintains a consolidated list of all weed taxa listed under noxious plant legislation in the Australian states and territories. I downloaded the July 2009 version of the list as a pdf file (see latest version at www.weeds.org.au/noxious.htm). I converted the pdf table to a text table and cleaned the data to provide a list of taxa. The AWC list contains some typographical errors (e.g. Redesa phyteuma instead of the correct Reseda phyteuma) and there are many superseded/changed species names. The taxonomic discrepancies were rectified using APNI data.

Species occurrence data

Two on-line public data portals were used to collate occurrence data for this study. They are:

The Global Biodiversity Facility (GBIF)
Australia's Virtual Herbarium (AVH) or via the search facilities of the Atlas of Living Australia (ALA).

Webservices provided by GBIF and ALA allow automated downloading of occurrence data for a species. I have been developing R-scripts to do this plus first-pass data cleaning for the past 3 years. Recent changes to ALA now make a fully automated capture of species occurrence and R-scripts are nearly finished as at 1 March 2012. It is important to combine data form both GBIF and ALA because only a limited amount fo data from Australian herbaria are prensently “visible” to the GBIF data service. In contrast, all major herbaria in Australia and New Zealand supply data to ALA. (Note that the R package “dismo” includes a function to download data from GBIF).

NOTE: I have used mostly GBIF data for the models developed so far. Shortly, all species models will be re-run with combined data from AVH (via ALA) and GBIF.

I used only occurrence records that supplied valid latitude and longitude values. It is possible to use services such as BioGeomancer (www.biogeomancer.org. See also the biogeomancer function in the R package “dismo”) to provide approximate coordinates from the location description field in GBIF output. However, my trials suggest that the success rate is very low and the gain in the number, or geographical spread, of records is hardly worth the extra effort.

Climate data

I used WordClim data (www.worldclim.org; Hijmans et al. 2005) at 5 arc minute resolution to represent baseline or “current” climate. This data represents climate conditions averaged over the period 1960 to 1990. Mean monthly minimum and maximum temperatures and mean monthly precipitation data were downloaded and the standard 19 bioclimatic variables defined by Nix (1986) and Busby (1991) were then computed using a custom-written Delphi program.

I produced future climate using IPCC Fourth Assessment Report (AR4) General Circulation Models (GCMs). Data for a single run from each of 14 GCMs was downloaded from the Climate Model Inter-comparison Project (CMIP) website (www. ). The “Delta” method outlined in Hijmans et al. (2005) was used to adjust observed baseline climate represented by the WorldClim data to provide estimates of mean monthly maximum and minimum temperatures and mean monthly precipitation averaged over the period 2046 to 2055 giving a decadal climate for the decade centred on 2050. The monthly data was then used to calculate the standard 19 bioclimatic variables defined by Nix (1986) and Busby (1991).

An important limitation of the method I have developed is that it relies on the availability of data for minimum and maximum temperature, and precipitation from the GCM runs. Only 14 of the 23 AR4 GCMs provided these data, and all but four only provided data for the decade centred on 2050. An alternative approach to downscaling first generates statistical estimates of GCM data in years not supplied from the CMIP repository and for greenhouse gas scenarios not required to be provided by the modellers. This method, known as “pattern scaling", seems to be able to produce useful data for species distribution modelling but this wasn’t clear to me at the time I began developing my own future climate data. I was frankly rather sceptical of using a statistical model of a complex stochastic process model like a GCM - a rough model of a rough model! - and my experiments with using the then available version of the best software tool for pattern scaling were not productive of sound data. However, after Damian Fordham’s excellent examination of an updated version of the most commonly used software for pattern scaling (Fordham et al. 2001, 2012???), it may be worth pursuing.

The core set of future climate data I use were developed about two years ago, and recently I expanded coverage from 4 GCMs with some skill in the Australian region (Suppiah et al. 2007; but see Perkins et al and for slightly different outcomes) to 14 GCMs. I continue to use my own future climate data which is available here. Other free sources of downscaled future climate data are available at and .

Species models

Cleaned species occurrence data was used to build species distribution models using a number of modelling tools. Each modelling method may introduce bias in fitted models due to various aspects of the modelling algorithm and how well the data match the assumptions of a particular modelling method. It is widely understood that an ensemble of model results provides the best indication of the most likely relationship between species occurrence and environmental conditions ( ). Since the models I present an this website only use climate data, they are more accurately described as “climate envelope models” or “climatic niche models” although current terminology in the ecological modelling community often uses the generic term “species distribution models” or SDMs. The term “climatic niche model” is informative because the models I present here are attempts to model the locations without occurrence records that fit within the known climate component of the species’ realised niche ( ).

It is vitally important when you look at the maps produced by my models not to over-interpret them. They are most definitely NOT SPECIES DISTRIBUTION MODELS and the maps are NOT MAPS OF PROBABILITY OF OCCURRENCE. There are some very strict requires for output maps to be regarded as estimating probability of occurrence, and they are most definitely not met by the museum and herbarium data used for my models. My models do estimate the distribution of climate suitability.

At present, I only provide models fitted using MaxEnt (Phillips et al etc) but output from Boosted Regression Tree (BRT; Elith etc), Random Forest (RF; ), Bayesian Additive Regression Tree (BART; ), Artificial Neural Network (ANN; ) and Mahalanobis Distance ( ) methods will be included in the future.

All models of weed species were fitted using ALL available occurrences for a species. It is clear that the best approximation to the realised niche is provided by using both native and exotic occurrences of a species ( ).

MaxEnt models

MaxEnt models were built using version 3.3.3e of the software (www. ) with the following key settings:

10-fold cross-validation; and,
Threshold and Hinge features turned off to avoid spurious discontinuities in the output maps.

I generated SWD-format files of current climate data using a FreePascal program that I wrote. The background SWD file was generated using the whole World to select up to 10,000 background points. This unconstrained selection of background points has been argued to be wrong ( Vanderwal et al ) but I have found that it is necessary with globally distributed species for two reasons. First, we cannot prejudge were suitable environmental conditions for a species may occur - we would have to second guess too much about the physiological limits of the species. ???????For example, many weed species present in tropical and sub-tropical climates now are found in temperate climate regions through human transportation.

All 19 standard bioclim variables were used to fit models. Selection of variables for MaxEnt models is a matter of active consideration, and my own (as yet) unpublished research indicates that large differences in the spatial arrangement of climate suitability values is linked to the variables used to fit a model. Paradoxically, there is nothing to choose between sets of variables using AUC values - they are practically the same no matter which variable set is used to fit the model. Given that, for the many hundreds of species I am modelling, it is very difficult to pre-select variables based on a knowledge of each species’ traits and physiological tolerances, I opted to run with a naive full set of 19 bioclim variables.

A grand average climate suitability map was made across the 10 replicates for each of the 14 GCMs.

Data analysis

References

Busby (1991)

Fordham at al

Fordham et al

Hijmans et al. (2005)

Nix (1986)

Perkins et al.

Suppiah et al 2007

Vanderwal et al

peterwilson.id.au