In this project, I need an R implementation of two alternative approaches (random forest regression technique/neural network, Gradient Boosting AND Bayesian) upon a moderately-sized dataset (number of records ~ 23 000, number of variables 6-10). The dataset consists of a spatially and temporally references records. It is a gridded dataset, with each cell containing chronology of values representing intensity of a process. The dataset is 0-inflated in a sense that it is dominated by the records with process being non-existent.
The analysis aims to parameterize the effects of various variables on the intensity of the process and use this parameterization to make predictions under a number of scenarios. Specifically, I need:
(1) parameterization of the model on the training dataset and in doing so, selecting of meaningful parameters to work on at step 2. Parameterization should address 0-inflation of the dataset, and its spatial structure (and possible spatial autocorrelation).
(2) prediction of the expected values under a range of inputs (grouped in sets, let’s call them scenarios).
(3) graphical output, that includes mapping the results (all initial data and scenarios are georeferenced).
I would need a well commented and modular R code done using a series of R functions. It is also important that the code will generate a range of relevant descriptive statistics, well plotted.
You don’t have to write a plotting routine for the mapping of the results (on a geographical map). This functionality will be provided as a “custom-written” function, which will take the results to be mapped.
This is a time-critical project to be complete within a week since its start date.