Machine learning in AML

May 2023 | TALKINGPOINT | FRAUD & CORRUPTION

Financier Worldwide Magazine

May 2023 Issue

FW discusses machine learning in AML with Andrea Vieten, Halyna Hermanns and Kevin Nagel at INFORM.

FW: Could you provide an overview of the extent to which financial crime, particularly money laundering, is growing worldwide? How would you characterise the evolving sophistication of such crime and its methodologies?

Hermanns: Money laundering and terrorism financing severely threaten the integrity of the European Union (EU) economy, its financial system and its citizens. According to the United Nations (UN) Office on Drugs and Crime, the worldwide damage caused every year by money laundering amounts to as much as €1.87 trillion, equivalent to about 5 percent of global GDP. Unfortunately, these numbers will continue to rise in the coming years. This is partly due to the increase in global trade and the increase in digital payments. Criminals have more opportunities to move and hide illicit funds across borders and jurisdictions. Often, tracing money transfers becomes particularly difficult or even impossible when cooperation with countries and institutions outside the EU is required. In the digital world, cryptocurrencies and online platforms play an essential role, as criminals can conduct transactions there securely and anonymously.

Vieten: While financial crime is increasingly the focus of regulators – and many countries have enacted new laws and regulations to combat money laundering and terrorist financing – the fact is that 90 percent of money laundering across the industry still goes undetected. Last year, banks and other financial institutions (FIs) were hit with fines totalling nearly $5bn for money laundering violations, sanctions and similar shortcomings, as the Financial Times reported earlier this year. This brings the total fines to nearly $55bn since the global financial crisis. In addition to these high penalties, FIs are threatened by reputational damage and must make great efforts to secure their anti-money laundering (AML) compliance.

“By learning more about where best to deploy ML, organisations can ensure they get the maximum benefit from their investment.”

— Kevin Nagel

FW: To what extent are tightening anti-money laundering (AML) regulations forcing organisations to assess their risks, review their internal processes and improve their compliance programmes?

Hermanns: FIs and insurance companies are under constant pressure to comply with current regulations. This requires a thorough review of the current processes, compliance programmes and, sometimes, an investment in improved systems. They need to ensure that their customer due diligence processes are efficient, that their systems are working as expected, and that their employees are trained and understand the importance of compliance. Staying up to date with the latest regulations, conducting internal audits and identifying potential improvement areas puts a lot of pressure on organisations within the quickly changing financial landscape.

FW: Generally speaking, how effective are traditional approaches to fighting money laundering? What role is software such as machine learning (ML) playing in moving things forward?

Hermanns: In recent years, advancements in digital payment systems, cryptocurrencies, and offshore banking have made it easier than ever to hide and move large sums of money across international borders, making AML prevention and detection more challenging. The traditional approach to fighting money laundering involves manual processes, such as customer due diligence reviews, transaction monitoring and suspicious activity reporting (SAR). The methods of detection are still effective but can be labour-intensive and costly. With advances in technology, particularly machine learning (ML), there is potential for a more efficient way to detect and prevent money laundering activities.

Vieten: Traditional methods are based on predefined rules. For example, suspicious activities can be detected by thresholding. Such systems are common, but inefficient because they are prone to errors. Legacy rule-based systems generate many false alerts as a result, which require compliance officers and analysts to check a multitude of transactions manually. Not only is it insanely time-consuming for compliance departments to manually review each alert, they must also handle increasing demands due to the accelerated use of digital accounts for daily transactions. Every day, the number of new bank and online gaming accounts increases. Digital wallets and online payments are becoming rapidly more popular in everyday life. Therefore, we see a need for fine-grained mechanisms to balance the number of unnecessary alerts, or false positives, like moving thresholds, and the number of missed alerts, or false negatives.

Nagel: ML is certainly a very important technology for the future of AML compliance, even if it cannot be considered the only crucial building block of futureproof systems. A key benefit of ML algorithms is the ability to quickly analyse large amounts of data and build accurate models with predefined user settings and requirements in the model’s performance characteristics to identify suspicious transactions or patterns that may indicate criminal activity. ML-powered solutions can also be used for automated customer due diligence checks, reducing the manual labour required for such tasks while improving the accuracy and completeness of results. Finally, ML can assist with SAR filings by quickly identifying transactions requiring further investigation or additional reporting requirements. Still, the whole process of generating rules, moving thresholds and extending the rule set with adaptive rules is also resource demanding and time consuming. To further improve the process, we need to adopt human feedback, such as the reason for a case being closed by the assigned compliance officer to indicate false positives or false negatives.

“Data is the backbone of every ML solution, and its quality has a direct effect on the algorithm’s ability to learn patterns.”

— Halyna Hermanns

FW: How can ML techniques be used to optimise the workload of analysts involved in AML efforts, such as prioritising alerts based on risk level?

Nagel: Supervised learning is a branch of ML that makes use of labelled datasets, such as SAR flags, to train algorithms and to classify data or to predict outcomes accurately. One application is alert prioritisation, where legacy rule-based systems are used to create alerts under compliance and regulatory aspects. The results are then subjected to an ML model to prioritise alerts into groups. For example, this could be alerts with a high probability of false alert on the one hand, alerts with a high probability leading to a SAR on the other, as well as ambiguous transactions that need to be scrutinised. As a next step, each priority group can be assigned to dedicated sub-teams in the compliance department. This way, the team can better focus on the most urgent alerts first.

FW: How important is it for organisations to understand the circumstances in which ML can be used most effectively, to optimise its application? What kinds of risks or limitations need to be considered?

Vieten: Understanding the correct use and limits of ML is, of course, crucial. FIs, as well as insurance companies, must understand the potential of ML and its applicability to various business scenarios to make the most out of it. By learning more about where best to deploy ML, organisations can ensure they get the maximum benefit from their investment. One of the main risks associated with ML is data quality. ML models can only be as good as the data that was used for training. If training datasets are high-quality, then the accuracy of any ML model can be improved and lead to accurate results or predictions. Thus, an organisation needs to check data quality in terms of consistency, validity and completeness before deploying a model.

Nagel: Taking a very in-depth look at the data is recommended. For example, if the dataset used to train a predictive model contains noisy or incomplete information, it may lead to incorrect predictions. Unbalanced or biased data used to train models can lead to inaccurate results, too. While a model is learnt from a training dataset, its validity must be checked. For instance, algorithmic bias may occur when the model is trained on data that is not representative of the population. Consequently, in such a case, the model may make unfair predictions or discriminate outcomes. Correspondingly, a model might return good prediction results for customers with certain transaction characteristics, while for a different customer group, the prediction might entail errors. Understanding these blind spots of a model enables organisations to deploy complementary methods. For example, if a model’s prediction performance for a certain customer segment tends to have 10 percent errors, then by combining it with business knowledge-based rule sets, the computed risk score can be refined to reduce the fault rate.

Hermanns: Ultimately, with careful planning and review, organisations can ensure they are getting the most out of their investments in ML technology. After all, costs associated with AI and ML applications can be expensive and require regular updates to remain effective. In a nutshell, this is not only about fixing and improving the learning of an ML model, but understanding its relevance and robust contribution within a system.

FW: What are some of the key challenges in implementing ML for AML, such as data quality, and how can these be addressed?

Hermanns: Data is the backbone of every ML solution, and its quality has a direct effect on the algorithm’s ability to learn patterns. What is important for the compliance officer is also important for the system: feeding raw transactional data to an ML algorithm will yield limited results, because it is the information contained in the data that can unlock secrets and details. For example, by aggregating data into customer profile information and behavioural information – both important insights for compliance officers – high-level statistics can be extracted and fed to the system for learning purposes. However, there is a lot to consider when implementing ML. In the context of AML, the number of labels indicating suspicious activity or money laundering is small. Thus, for the system to learn accurately, the data must not have mislabelled cases, or else the system will predict wrong outcomes for similar transactions and alerted cases.

Nagel: As a rule of thumb, the more data you have and the more accurate it is, the better your ML models are likely to perform. When a lot of data is provided to the system, it is more probable to detect patterns or to distinguish normal from abnormal groups of behaviour. Outdated data is to be replaced. This concept is referred to as data drift, which means that the system needs to be updated to learn from changes, because any characteristic may obtain new values as time passes, for example when a new payment method is added. The system needs to be fed with representative transactional data that reflect current patterns of what is deemed normal customer behaviour.

Vieten: Another issue is the possibility of contradicting data. If two customers have the same characteristics but are categorised at different risk levels, then the data does not contain enough information to describe the customer or transactional characteristics. In other words, the data does not have enough resolution for the system to distinguish both. Identifying such subsets of data is important because they can highlight the area of prediction with less confidence. One option to deal with contradicting data is to provide more data from different sources.

“One of the main risks associated with ML is data quality. ML models can only be as good as the data that was used for training.”

— Andrea Vieten

FW: What essential advice would you offer to companies on enhancing their AML systems with advanced ML capabilities?

Vieten: ML can only work efficiently if there is good data quality. Feature engineering is of the utmost importance. Features are not only based on raw data but enriched information. Derived and aggregated information should be used in feature engineering. In practice, this means using the original data, but also adding additional information, such as from external lists, and enriching it with statistical data or profiles that already represent expert knowledge. In addition, data-based and knowledge-based methods of artificial intelligence (AI) can be combined in a hybrid AI approach. This means that a transaction’s formed risk score is created from expert knowledge and machine-learned knowledge, which together enable the highest precision. The score is then used again to train the ML models. Thus, the knowledge-based component indirectly flows into the ML component. However, this is only possible if the appropriate data is available for training.

Nagel: ML algorithms require labelled data to learn patterns and generate accurate predictions. This means that the data should be tagged with labels that indicate the desired output. In AML compliance, this could be suspicious or non-suspicious. Also, the data should represent a wide range of scenarios, contexts and demographics. Above that, FIs need to choose the right ML technology. There are many different ML technologies available, and each has its own strengths and weaknesses. It is important to choose the right technology for specific needs and goals. The following questions can guide an FI when enhancing its AML systems with advanced ML capabilities. Regarding trust, is the prediction unbiased? Regarding accountability, are causal relationships taken into account? Regarding compliance, is the General Data Protection Regulation (GDPR) being respected? Regarding performance, do small changes in the input lead to largely different results? And regarding control, how easy and fast can a mistake in the ML model be fixed?

Hermanns: In the context of compliance, the result produced by an ML solution needs to be transparent and auditable. Algorithms should be able to provide clear explanations for their predictions and decisions, allowing humans to understand and trust in the results. It is important to build a comprehensive AML strategy. ML capabilities should be integrated with existing AML systems and workflows to ensure they work seamlessly together. It is also important to establish a process for monitoring and adjusting ML algorithms on a regular basis to ensure they continue to produce accurate results. Data drift and changes in patterns need to be retrained. However, retraining ML models that reflect such new patterns might take time. In such a case, it is much easier to adopt the knowledge-based rule sets to have immediate protection against new criminal patterns.

FW: What enhancements can we expect to see in AML-related ML capabilities in the years ahead? To what extent is ML a game changer in the fight against money laundering, capturing the subtleties of criminal behaviour to produce effective defences?

Hermanns: Having a crystal ball to see all possible enhancements would be great. However, capturing all possible predictions in the years ahead is impossible, especially with such a dynamically evolving technology landscape. That said, a few developments for forthcoming years include using natural language processing to analyse unstructured data sources such as customers’ email correspondence, developing more sophisticated algorithms for spotting suspicious patterns or behaviour, and focusing on explainable AI to help regulators and investigators understand the AI models that will help them to drive further conclusions or recommendations. ML is a game changer and has huge potential to support AML investigations, but not without limitations and associated risks. It is a tool that complements human expertise rather than a complete replacement.

Vieten: A common problem with traditional AML methods is the high rate of false positives, leading to unnecessary investigations and increased costs or a wasteful allocation of resources. ML can reduce false positives by efficiently identifying patterns and anomalies more accurately and prioritising generated alerts. Given the current developments in the labour market and the limited availability of resources and professionals, it is more important than ever to use available resources effectively and efficiently. Furthermore, ML algorithms can effectively identify patterns of suspicious activity, clusters, hidden relationships between entities and anomalies, which are not obvious to human analysts. This way, they indeed offer better protection.

Nagel: As ML algorithms learn and improve over time, they allow FIs to adapt to changing money laundering patterns and improve the accuracy of their AML systems based on current data and developments. Still, the hybrid AI approach remains important. Knowledge-based rules can be productive right away without any training. This will remain very important as it allows organisations to react instantly to new patterns. This means combining both worlds will remain the best means to fight money laundering.

Andrea Vieten is a skilled product manager for INFORM’s RiskShield solution, bringing over 15 years of expertise and knowledge in risk and fraud. With a history of various positions, she has guided numerous financial institutions to implement intelligent decision systems for risk, fraud, processes and compliance. Combining her technical and business knowledge with experience in sales and marketing, she continues to develop customer-oriented strategies to meet the evolving needs of the financial industry. She can be contacted on +49 (0)2408 9456 5000 or by email: riskshield@inform-software.com.

Halyna Hermanns is a business development and account manager in the risk & fraud division of INFORM with over a decade of experience. Throughout her career, she has been involved in multiple customer projects in the payment and compliance area, utilising her expertise to help clients minimise risk and fraud. Her deep understanding of financial crime risk and compliance regulations is mirrored in her CAMS certificate by ACAMS. She can be contacted on +49 (0)2408 9456 5000 or by email: riskshield@inform-software.com.

Kevin Nagel is a data scientist by passion and profession. In his role as senior consultant in the INFORM risk & fraud division, he focuses on adding value to customers’ digital decision processes by combining the right choice of models and rule sets for the given customer challenges. His favourite combination is knowledge-based rule sets and data-based machine learning, also often referred to as RiskShield Hybrid AI. He can be contacted on +49 (0)2408 9456 5000 or by email: riskshield@inform-software.com.

THE PANELLISTS

Andrea Vieten

Halyna Hermanns

Kevin Nagel

INFORM