LOW-STRAIN IMPACT TESTING OF PILES - AI ANALYSIS

Joram.M. Amir, Piletest.com Ltd., Netanya, Israel, (+972)9891 1899, jmamir@piletest.com
Erez .l. Amir, Piletest.com Ltd., Netanya, Israel, (+972)9891 1899, erezam@piletest.com
Gil Amihai, Data Scientist, Kadimah-Tzoran, Israel, (+972)523331300, gil.amihai@gmail.com

ABSTRACT
Low-strain integrity testing of foundation piles (ASTM D5882) is a widespread method for the integrity testing of deep foundations. The equipment used is relatively low-cost, the pile head requires little preparation, and the net testing time is short. Due to these properties, the method is especially suited to 100% testing of piling sites. The method is standardized (ASTM 2016), where “integrity” is defined as “The qualitative evaluation of the physical dimensions, continuity of a pile, and consistency of the pile material”.
While field testing is relatively simple, interpreting the method's test results is not always straightforward and should be performed by suitably experienced geotechnical engineers. Given the large amounts of data collected, this task may be highly time-consuming.
The Artificial Intelligence (AI) model presented in this paper was trained to assess the pile lengths on a database of real test results, individually analyzed by an experienced engineer. The resulting model was then validated using test results obtained from several independent testing laboratories. In the first stage, the pile length was the only physical dimension parameter that we investigated. So far, the results are encouraging and justify further development.

KEYWORDS: Pile, Integrity, Analysis, Artificial Intelligence, AI, Machine Learning, ML

RESEARCH GOAL
The research goal was to build an AI model that can analyze pulse-echo test results as well as a human expert. Ideally, the model should be able to:

Predict the pile length (by placing the length marker at the correct toe reflection)
Assess the pile's integrity
Set up display parameters such as amplification, filter, etc.
Categorize the pile into one of the following categories:

good/questionable/problematic/inconclusive

Provide a textual analysis

PILE LENGTH
Establishing the actual pile length is of utmost importance to confirm that the pile has sufficiently penetrated the planned bearing layer and will develop the required geotechnical capacity. In most cases, the length also serves as the basis for payment to the piling contractor. Nevertheless, for bored and cast in-situ piles, the actual length is usually unknown. The four available values are as follows:

LPlanned - the planned length, i.e., the length shown in the contract drawings
LAsMade - the as-made length obtained and reported by the supervisor
LExpert - the length obtained by the low-strain impact test and reported by a qualified testing agency.
LModel - the length predicted by the proposed AI model

The last two items are a direct function of the wave speed assumed by the tester. For a given site, the wave speed is also unknown. Therefore, testers may be guided by the following methods:

Published correlations, based on the concrete compressive strength and age (Amir 1988).
If available, an average representative number of as-made lengths on the specific site.

Statistical analysis (Amir and Amir 2008) has shown that the wave speed assumed by most testers is 4,000 m/s, plus or minus 10 percent.

METHODOLOGY

Machine Learning Methods

Machine Learning (ML) model training is an iterative process in which the model goes over the input data, and “learns” the patterns by minimizing the “loss” (or error) between its predictions (output) and the real target (input). The training (Fig. 1) is stopped when the error reaches the minimum, but overfitting (learning specific instead of general patterns) should also be controlled.

Bruges 2025 (2025-02=06 for A4 print).docx.jpg

Fig 1: Training phase

At the end of the training phase, the model weights are saved, so the trained model can be used for inference-making predictions on new data (Fig.2)

Bruges 2025 (2025-02=06 for A4 print).docx (1).jpg

Fig. 2: Inference phase

We tried two types of ML models: decision-tree-based (e.g., XGBoost [4]) and deep learning [5]). As expected, the performance of all models improved as the amount of data increased. The deep learning model performed better, so we chose it for final training and validation.

Bruges 2025 (2025-02=06 for A4 print).docx (2).jpg

Fig. 3. Schematic deep learning network structure

The data

A testing agency gave us access to its test results database (590,000 test results), reflecting the products of six teams over twenty years. Eighty percent of those were randomly picked for the training stage of the ML model using several machine learning (ML) approaches. The remaining twenty percent were saved for model inference validation.

The inputs of the trained model for each tested pile were as follows:

Accelerometer time series vector - data at fixed time intervals according to the sample rate
Lplanned [m]
The wave speed c [m/s]. This is a major unknown and in order to remove the complexities involved from the model, we have switched to a time-only model and converted Lplanned to Tplanned (time), according to Equation 1.

[1]

Preprocessing

We passed the raw data through normalization before subjecting it to the model. After a few failed attempts, we realized that the data needed to be pre-processed using default parameters and passed on to the model in the same form as it appeared to an expert when analyzing it.

A quick-reject script scanned all input data for user entry errors. The script also rejected extreme cases such as very short or very long piles, piles with very high amplification, or piles where the user collected an abnormally high number of impacts (indicating this is an exceptionally hard-to-test case). In total, 81k piles, or 14% of the raw data, were rejected.

Amplification

A typical amplification factor for a 30m long pile is 100-200. Without proper amplification, the time series appears "flat" without apparent features.

Based on Amir & Amir (2008), who studied the relationship between planned length and amplification, we have set the amplification for all piles to 10 + Lplanned * 3 (for example, 70 for a 20m pile)

Data duration

The system acquires the data for a much longer time than later displayed. This is done for cases where the user does not know the planned length or wants to examine second reflections. Presenting the whole time- series to the model resulted in poor results (since most of the trace beyond the toe reflection contains no information and adds noise to the system).

Based on the dataset analysis, we clipped the time series to 1 millisecond beyond the planned length. At a typical wave speed of 4000m/s, this is equivalent to 2m of data.

Next, we diluted the time series to 100 points - a good compromise between size and resolution.

Other parameters

All display parameters, such as low-pass filtering, sharpening, etc., were set to zero. Next, we normalized the clipped and amplified time series to the range of 0…1 (where 0.0 is the time of the hammer impact and 1.0 is Tplanned + 1 millisecond). Tplanned was also normalized to the same range.

Evaluation Criterion

We used the mean absolute percentage error (MAPE) as the success criterion. MAPE (Eq. 2) indicates the absolute distance from the expert's prediction in percentage of the AI model prediction. For example, if the expert reached a figure of 10m and the AI model predicted 10.5m (or 9.5m), the MAPE is 5% (Eq. 2).

(2)

RESULTS

The model was tested on the remaining ~20% (110,000) and the results (Fig. 4) show a correlation (R²) of 0.9757 between Lmodel and Lexpet's analysis, with a MAPE of 1.6% relative to the human experts’ analysis Lexpert. In other terms, 90% of the AI models’ predictions were not more than 2% shorter or longer in relation to the human expert’s analysis.

The model-predicted length mark Lmodel was compared to two other "Naive" benchmarks:

The planned length Lplanned. Although an input to the system, this value can also be used as a naive predictor.
The deepest trough (toe reflection) in the signal after the initial hammer impact. Lminimum.
Under ideal conditions, the deepest trough is defined as the toe reflection, so this naive benchmark works well in most piles (in this research, about 85% of the piles).

Fig. 4: ML prediction compared to experts’ analysis

To study the scatter in more detail, we plotted the accumulated percentage of the predictions (LModel, Lplanned, and Lminimum), against Lexpert.

The prediction distribution (Fig. 5) shows that the AI model performs better than the two other naive predictors. To clarify that, we have highlighted a few points on the graph with a large red dot.

90, 94, and 96 percent of the model’s predictions Lmodel are accurate to within 2, 5, and 10 percent (respectively) of the expert-predicted length Lexpert (or better).

The ML prediction accuracy is better than the values considered acceptable by the industry (Amir and Amir 2008).

CLASSIFICATION OF THE RESULTS

For simplicity, the AI results may be classified as:

"Good" predictions - within 5% of Lexpert (About 94% of the results) (Fig. 6)
"Better" results, where the model result was clearly better than the expert's
(Fig. 7)
"Wrong" results (Fig. 8)
"Inconclusive" prediction - Different from Lexpert by more than 5% but could have been picked up by another expert (Fig. 9). In Figures. 6-9, the upwards-pointing green triangle is Lmodel and the vertical red line is Lexpert.

Fig. 4: ML prediction compared to experts’ analysis

To study the scatter in more detail, we plotted the accumulated percentage of the predictions (LModel, Lplanned, and Lminimum), against Lexpert.

The prediction distribution (Fig. 5) shows that the AI model performs better than the two other naive predictors. To clarify that, we have highlighted a few points on the graph with a large red dot.

90, 94, and 96 percent of the model’s predictions Lmodel are accurate to within 2, 5, and 10 percent (respectively) of the expert-predicted length Lexpert (or better).

The ML prediction accuracy is better than the values considered acceptable by the industry (Amir and Amir 2008).

CLASSIFICATION OF THE RESULTS

For simplicity, the AI results may be classified as:

"Good" predictions - within 5% of Lexpert (About 94% of the results) (Fig. 6)
"Better" results, where the model result was clearly better than the expert's
(Fig. 7)
"Wrong" results (Fig. 8)
"Inconclusive" prediction - Different from Lexpert by more than 5% but could have been picked up by another expert (Fig. 9). In Figures. 6-9, the upwards-pointing green triangle is Lmodel and the vertical red line is Lexpert.

Fig. 6 - "Good" example: Lexpert ≅ Lmodel

Fig 7. "Better" example: Lmodel is a better choice than Lexpert

Fig 8: "Wrong": Lmodel does not select the right location

Fig. 9: "Inconclusive" example: on another day or another expert could have agreed with Lmodel

SUMMARY AND CONCLUSIONS

The AI model predicted the lengths of 90% of all test results with an accuracy of 2% or better - compared to Lexpert (human expert’s analysis). This is better than that considered acceptable by the industry.
It is likely that by improving the AI models’ architecture and training, and by gathering more data, the prediction quality would improve.
A well-performing AI model could point to erroneous human analyses or be used for training novice analysts.
The potential of using AI to replace human analysis is promising. However, as mistakes are inherent in any AI inference, AI cannot replace solid engineering judgment.
Further development is required to achieve more of the research goals:
- Identify anomalies as related to changes in the pile impedance and provide textual analysis.
- Categorize the pile into one of the following categories:
- good/questionable/problematic/inconclusive
The AI model should be expanded using test results from a variety of test agencies & experts.

ACKNOWLEDGMENTS

The authors are indebted to Dr. Sharshevski and Y. Neeman (Isotop Ltd.), who provided valuable access to their vast databases.

REFERENCES

Amir, J.M (1988): Wave velocity in young concrete, Proc 3rd Intl Conf on Application of stress-wave theory to piles, Ottawa, pp. 911-912.

Amir, E.I & Amir, J.M (2008): Statistical Analysis of a Large Number of PEM Tests on Piles, Proc. 8th Intl. Conf on Application of Stress Wave Theory to Piling, Lisbon, pp. 671-675.

ASTM (2016) : D5882 Standard Test Method for Integrity Testing of Deep Foundations by Low Strain Impact, ASTM, W.Conshohocken, PA.

Introduction to boosting trees: https://xgboost.readthedocs.io/en/stable/tutorials/model.html.

What is deep learning: https://aws.amazon.com/what-is/deep-learning/.

RESEARCH GOAL The research goal was to build an AI model that can analyze pulse-echo test results as well as a human expert. Ideally, the model should be able to:

Predict the pile length (by placing the length marker at the correct toe reflection)

Assess the pile's integrity

Set up display parameters such as amplification, filter, etc.

Categorize the pile into one of the following categories:

good/questionable/problematic/inconclusive

Provide a textual analysis

LPlanned - the planned length, i.e., the length shown in the contract drawings

LAsMade - the as-made length obtained and reported by the supervisor

LExpert - the length obtained by the low-strain impact test and reported by a qualified testing agency.

LModel - the length predicted by the proposed AI model

The last two items are a direct function of the wave speed assumed by the tester. For a given site, the wave speed is also unknown. Therefore, testers may be guided by the following methods:

Published correlations, based on the concrete compressive strength and age (Amir 1988).

If available, an average representative number of as-made lengths on the specific site.

Statistical analysis (Amir and Amir 2008) has shown that the wave speed assumed by most testers is 4,000 m/s, plus or minus 10 percent.

​​METHODOLOGY

Machine Learning Methods

​​Fig 1: Training phase

Fig. 2: Inference phase

Fig. 3. Schematic deep learning network structure​​

The data

Preprocessing

Amplification

Data duration

Other parameters

Evaluation Criterion

RESULTS

CLASSIFICATION OF THE RESULTS

CLASSIFICATION OF THE RESULTS

Fig. 6 - "Good" example: Lexpert ≅ Lmodel

Fig 7. "Better" example: Lmodel is a better choice than Lexpert

Fig 8: "Wrong": Lmodel does not select the right location

Fig. 9: "Inconclusive" example: on another day or another expert could have agreed with Lmodel

SUMMARY AND CONCLUSIONS

ACKNOWLEDGMENTS

REFERENCES

CONTACT US

Products

Engineer

Knowledge base

Account

Contact

Social

Legal

RESEARCH GOAL
The research goal was to build an AI model that can analyze pulse-echo test results as well as a human expert. Ideally, the model should be able to:

METHODOLOGY

Fig 1: Training phase

Fig. 3. Schematic deep learning network structure