RF Land Value Bulk Assessment Model Construction
Introduction to Random Forest (RF) Models
Constructing a bulk appraisal system for Collective Contractual Collective Lands (CCCL) prices through Random Forest (RF) techniques involves meticulous tuning of model parameters. These parameters critically influence the performance of the model, including the number of decision trees (ntree
), the method of feature selection, and the depth of those trees. The objective is to create a model that not only adapts well to the training data but also maintains robust predictive accuracy when tested on new datasets.
Parameter Optimization for RF Models
In the RF land price model built with R, critical attention was given to two primary factors: the number of variables selected for the split nodes of the decision tree (mtry
) and the number of trees used (ntree
). The selection on features for each binary split in the trees is vital, as it determines how the model generalizes.
For this study, mtry
was set to 5, with ntree
set to 500. This decision stemmed from a careful analysis of data characteristics and the specific attributes of the appraisal parcels. The reasoning includes:
-
Feature Dimensionality Suitability: Thirteen feature factors (X1–X13) were considered. Conventionally, the value of
mtry
for regression tasks is set around one-third of the number of predictors. Therefore, rounding up from the theoretical value of approximately 4.33 to 5 helps retain essential data while reducing the risks associated with feature collinearity. -
Data Characteristics Suitability: With a varied landscape for CCCL market entry, it’s crucial for the model to capture nonlinear interactions among features, such as those between market entry approaches and planned usage. This moderate increase in
mtry
enhances the trees’ expressive capacity while guarding against overfitting. - Convergence Assurance: CCCL, which often involves smaller parcel sizes yet larger quantities, usually comprises significant data noise. Setting
ntree
to 500 provides adequate error convergence, ensuring that random fluctuations do not impede the generalization of the model.
Assessing Parameter Efficiency
To refine parameter selection further, rigorous testing of both mtry
and ntree
was conducted. The methodology consisted of iterations that involved varying the mtry
parameter from 1 to 13 while fixing ntree
at 200 to observe goodness-of-fit (GoF) metrics. Subsequent iterations set ntree
to 500, repeating the process until noticeable changes in the GoF curves diminished.
The training outcomes revealed that the model’s optimal mtry
value appeared at around 5, as identified in graphical results of model performance. Thus, the study finalized the mtry
at 5 and ntree
at 500, establishing essential parameters for the model.
BPNN Land Value Bulk Evaluation Model Construction
The Back Propagation Neural Network (BPNN) was another avenue explored for CCCL price appraisal. Here, several key parameters were adjusted, including activation functions, training targets, error thresholds, and iteration limits. The process involved fitting the training set until it met acceptable testing standards, leading to the establishment of network weights for the succeeding appraisal phases.
Thirteen variables were again employed in the BPNN model. The architecture consisted of an input layer with 13 nodes and an output layer for calculating the CCCL market entry price. Grid search validation supported the determination of 21 nodes for the hidden layer, which balanced nonlinear model capabilities with the risk of overfitting.
SVM Land Value Bulk Assessment Model Construction
Using the MATLAB software platform, SVM was constructed to analyze the same dataset. The training comprised 102 samples, alongside a test set featuring 16 samples. Normalization via the mapminmax
function facilitated inputs into the SVM model. Given the limited prior research on CCCL market entry indicators, a Radial Basis Function (RBF) kernel was utilized for its smoothing properties, aiding in maintaining computational efficiency and predictive accuracy.
The key parameters C
(penalty) and gamma
(kernel) were finalized after a thorough optimization process, resulting in values of 32 and 0.1768, respectively. These settings were selected based on empirical performance and theoretical foundations.
Comparative Results of Forecasting Models
As the analysis of bulk assessment models unfolded, it became clear that the variation in operating land transactions in Beiliu City posed unique challenges. With diverse trading characteristics, representative testing samples were carefully selected across four market-entry methods and five planned usage scenarios.
The performance of the models was tested against these metrics, revealing that while RF offered balanced and accurate predictions across various land types, BPNN and SVM exhibited inconsistencies, particularly in volatile areas. Results from various scenarios highlighted the nuanced differences among land market entry methods, indicating that comprehensive market data was crucial for improving prediction accuracies across all models.
Predictive Performance and Precision Comparison
To quantify model performance, metrics such as R², RMSE, MAE, and RA were employed. These metrics illuminate the model’s fit and accuracy, with R² values indicating a good relationship between predicted and actual results.
From the comparison:
- RF emerged with the strongest generalization abilities, achieving 96.6% and 90.42% R² values in the training and test sets, respectively.
- BPNN followed with 89.0% and 83.29%, showcasing some overfitting sensitivity.
- SVM, while consistent, fell behind in capturing drastic price variations, asserting a requirement for more refined modeling in future applications.
In summary, the exploration of RF, BPNN, and SVM models for assessing land prices in Beiliu City presents a complex landscape influenced by multifaceted market dynamics. With RF demonstrating superior predictive capabilities, the path forward necessitates a deeper understanding of local market intricacies and improved data collection methodologies to enhance model performances continuously.