Cite this asYounes K, Mouhtady O, Chaouk H (2021) The application of unsupervised machine learning to optimize water treatment membrane selection. Ann Robot Automation 5(1): 030-033. DOI: 10.17352/ara.000010
Artificial intelligence technologies have been extensively used to decipher water quality and characterization. Fewer studies have employed these techniques in the purpose of optimizing a water treatment process. Here, we apply unsupervised machine learning techniques for the optimization of the choice of membranes, following the different constraints and conditions encountered. The adopted data analysis techniques are the Principal Component Analysis (PCA) and the Hierarchical Cluster Analysis (HCA). Both methods showed their capacity to reveal resemblance and discrepancies between different membrane types and based on several properties. PCA is more appreciated than HCA as it removes any intercorrelation between factors and it helps in a better understanding of different trends of the dataset by establishing a Scores-Factors relation.
With the increasing rate of worldwide population and the scarcity of natural water supply, due to the lack of global precipitation and climate change, the demand on drinking water is rising in an exponential way . This increase has put pressure on water working treatment personal, as a need to boost drinking water production is more and more challenging. In order to overcome this problem, all applied fundamental scientific fields have found their place in the Water Treatment Plant’s (WTP) optimization. Chemistry and Physics were involved in order to understand the behavior of water as a molecule and a mixture of different components, at the microscopic and macroscopic level, simultaneously . The application of chemistry and physics in a single-handed way, has found its shortcomings in giving answers for the problems encountered in WTP [2-4]. It is not surprising that looking at a problem, in a monotonic way, will give less grasp to the phenomena occurred. In order to overcome this issue, the integration of applied mathematics to the applied sciences have gave a better understanding of the problem, hence, a better set of solutions for water treatment issues . This combination of different scientific fields has put the artificial intelligence (AI) in the heart of water analysis and treatment .
In other hand, the application of AI technologies in water real-life problems have raised several concerns, because the mathematical models, employed, are usually based on assumptions that are difficult to implement in practice . Moreover, modelling lacks of an overall understanding of the analyzed system requirements [2-4]. The non-linear relationships involved in water processing are challenging to fit [2-4]. This has raised the interest of using unsupervised machine learning approaches . These techniques work on a given dataset (from chemistry or physics) and yields a certain trend, without a prior knowledge or any assumptions adopted .
The application of Machine Learning technologies has been more likely used to reveal unhidden pattern in analytical dataset, to better interpret the quality of analyzed water. In the process optimization side and material choice, extensive work can be executed. Therefore, our aim in this study is to perform two unsupervised machine learning techniques to better pick between filter types that are mostly used in WTPs. The two investigated methods are the Hierarchical Cluster Analysis (HCA) and the Principal Component Analysis (PCA).
Principal Component Analysis (PCA): PCA is used in data exploration techniques used and for establishing descriptive models. It works on the dimensionality reduction. Data reduction is done, with two perspectives in mind: (1) the lower the dimensions yielded, and (2) the orthogonality of the new dimensions, which are the principal components (PCs). PCs are actually the direction of the maximal variance of the dataset. Performing this task for two factors is quite feasible, yet this issue gradually increases with the increase of the number of factors. Overcoming this problem is made easy with the development of sophisticated calculation algorithms [9–11]. Several studies have focused on the mathematical description of PCA. Hence, the theory is well developed. Other studies have used PCA as a tool to reveal some proxies in Geochemistry [12–15], Energy  and Biomass characterization [17,18].
Hierarchical Cluster Analysis (HCA): HCA is a classification technique of objects into different groups. It starts with one cluster, as individual item in its own cluster, and it iteratively merge clusters until all items belong to one cluster. It follows a bottom-up approach, where the clusters are merged together. Pictorially, dendrograms are used to represent the HCA. It can be represented using three techniques, the single-nearest distance or single linkage, the complete-farthest distance, and average-average distance or known as average linkage. The single linkage is described as the distance between the closest members of two clusters, the complete linkage as the distance between the members that are farther apart. The average linkage involves looking at the distances between all pairs and averages of these distances. This is also called the unweighted pair group mean averaging, that we have used in our study [9,10]. The application of HCA is more likely extended to molecular biology [19,20]. Other studies have traced the application of HCA in Biomass characterization [17,18].
Here, we will apply the above-described unsupervised machine learning tools for the elucidation of different trends that might occur in different type of water treatment membranes. Several features are employed to describe the membrane  (Table 1). Turbidity-Raw and turbidity-Effluent present the efficiency of the membrane in eliminating particles. Without any doubt, an efficient membrane requires low turbidity effluent. Some other features like Pre-Coat and Body Feed describe operating protocol. The pre-coat tank is filled with a given amount of water and with a given mass charge of diatomite that results in the specified pre-coat area density. Selection of DE grade and determination of concentration and mixing slurry are the tasks associated with body feed. Usually, the grade and concentration are determined as a first estimate during the design phase and refined during operation. The differential pressure (∆P) states the energy required to perform filtration. The higher the differential is, more energy is required. Number of runs is an age-related factor; the higher it is, the longer the lifetime of the membrane will be.
The dendrograms in Figure 1 show similarities and dissimilarities between the investigated DE membranes. The Euclidean distance between strains was investigated based on the featured properties, shown in Table 1. Generally, two main clusters are observed. 900W(P), C-535 and FW12(DE) are the components of the cluster showing the highest similarity. Other membranes presented the components of the second cluster where lower similarity was observed. If one compares it with the components of the first cluster. In other words, the Euclidean distances between the components of the first cluster are relatively lower than those between the other components. The most similar membranes are FW-20(DE) and 4200(DE). These two membranes are exceptional in the second cluster, as higher discrepancy, between other components, is shown. Although the same features were presented for all membranes (Table 1), HCA allowed the distinction between two main patterns: 900W(P), C-535 and FW12(DE) compose one pattern and the rest compose the second patter. This data analysis technique has shown similarities and discrepancies between membrane types; a deeper investigation of this difference would be envisaged by PCA.
PCA was performed for the featured properties (Table 1), in order to test their distribution among different types of membranes. PCA also provides a general view of correlation and dissimilarity among the seven membrane type and the six investigated factors. The results were presented on two-dimensional perspective with a graph (plot for scores and loadings, simultaneously) obtained from Pearson correlation matrix for variables (Figure 2). The first and second PCs accounted for 75.57% of total variance in data set (PC1, 52.07% and PC2, 23.5%). This high value indicates that the comparison of the employed parameters is statistically meaningful and reliable trends can be concluded from this dataset (Table 1). Four different clusters can be identified from the PCA approach (Figure 2). The first cluster contains FW-20(DE), 4200(DE), 900W(P) and C-535. The other three clusters contained FW12(DE), HyfloSuper-Cel(DE) and 700 900 1500(P), separately. Unlike HCA approach, 4200(DE) showed proximity to 900W(D) and its homologues.
Regarding factor loadings, PC1 was most likely positively dominated by Body Feed, turbidity and ∆P. On the negative side of PC1, the number of Runs has the major influence. Following these trends, it can be shown that PC1 presents the “Efficiency Factors” of membranes. PC2, was most likely positively dominated by Pre-Coat. This indicates that PC2 is most likely an indicator of the conditioning state of the membranes. All membranes were found to be away from “Turbidity Raw” factor. This indicates that all investigated membranes present a high pollutant removal capacity. Cluster 1 of membranes (FW-20(DE), C-535, 4200(DE) and 900W(P), Figure 2) shows high correlation with the factor “Run”. This indicates that these membranes could be more likely employed in conventional water treatment processes that require higher number of runs. FW12(DE) presented a high correlation to the “Pre-Coat” factor. This indicates that care should be taken when FW12(DE) is used in corrosive and highly reactant conditions (high acidity, alkaline conditions, temperature, pressure…). 700 900 1500 (P) membrane presents high correlation with ∆P. This indicates that this type of membrane requires and handles high-energy input. Interestingly, HyfloSuper-Cel(DE) have been projected near the node. This means that this membrane presented low influence to the investigated factors, or it presents intermediate behavior, if compared to the other membrane types.
The unsupervised machine learning approaches show interesting features of the compared properties, as trends of the relative properties are hardly seen, when analyzed independently. Both methods showed a clear dissimilarity between some of the membrane types. PCA showed higher efficiency rather than HCA; as along showing discrepancies, PCA allowed us to quantify the influence of the investigated factors. This feature makes it rather advantageous on the HCA, as it only distinguished, quantitatively, between membranes.
PCA simplifies the complexity of a dataset with high dimensionality while, at the same time, keeps the different patterns and highlights the significant trends. This yields a better interpretation and PCs act as the new factors representing the dataset. These factors are independent from each other, yet represent, in a single-handed way, a combination of all of the factors with a different proportion of influence.
This study only presents a small extent of the applicability of unsupervised machine learning to pick the required apparatus, for an investigated treatment. A small dataset has been purposely chosen, in order to reveal the correlations and discrepancies via simple data visualization. Hence, the proposed data mining approach have elucidated the efficiency of PCA and HCA to reveal trends between membrane materials. PCA found a better efficiency rather than HCA, as the first showed the influence and weight of each factor, in regard to the classification. The second was only restricted to classifying different membranes used, without deciphering the factors involved in this classification. Hence, we strongly recommend the application of PCA for depicting a better choice of equipment and to optimize water treatment process conditions.
Subscribe to our articles alerts and stay tuned.
PTZ: We're glad you're here. Please click "create a new query" if you are a new visitor to our website and need further information from us.
If you are already a member of our network and need to keep track of any developments regarding a question you have already submitted, click "take me to my Query."