Real-Time Fraud Detection and Feature Store Design Patterns for Streaming ML in Financial Services | IJCSE Volume 6 β Issue 2 | IJCSE-V6I2P6
Table of Contents
ToggleInternational Journal of Computer Science Engineering Techniques
ISSN: 2455-135X
Volume 6, Issue 2
|
Published:
Author
Jeevan Krishna Paruchuri
Abstract
Real-time fraud detection in payment authorization workflows imposes a particularly demanding combination of constraints. Tens of thousands of transactions per second must be scored within an end-to-end latency budget of two hundred milliseconds. The features that the scoring model consumes must reflect a recent enough view of the underlying entities to capture fraudulent activity that occurred seconds earlier. The system must continue to detect fraud even under partial component failure. This paper presents a case study of a production fraud detection system operated at a large payments processor. It processes between ten thousand and fifty thousand transactions per second under a two-hundred-millisecond SLA. The system combines event-driven ingestion through Apache Kafka, Apache Spark Structured Streaming for feature aggregation, ScyllaDB for ultra-low-latency feature lookups, a custom C++ inference engine with AVX-512 optimization, and C++ as the primary serving language. The paper documents the engineering decisions that enabled the SLA to be met under steady-state and peak load. A central decision was the migration of the scoring path from Python to C++ with AVX-512 SIMD intrinsics, which reduced p99 latency by approximately 4Γ. The paper also reports two production incidents observed over a two-year operational window and the post-incident changes that followed. Neither incident produced fraud leakage, illustrating that graceful degradation and observability are as important as raw performance to the operational viability of fraud detection systems.
Keywords
fraud detection, real-time streaming, machine learning inference, low-latency processing, financial transactions, feature engineeringConclusion
Real-time fraud detection in payment authorization is a demanding combination of latency, accuracy, and operational reliability requirements that few other production ML applications match. The system described in this paper event-driven ingestion through Apache Kafka, Apache Spark Structured Streaming for feature aggregation, ScyllaDB for online feature caching, custom C++ inference engine with AVX-512 optimization for model inference, and Java as the primary serving language satisfies a two-hundred-millisecond end-to-end SLA at tens of thousands of transactions per second, and has done so reliably across two years of production operation interrupted by only two minor incidents. Neither incident resulted in fraud leakage, demonstrating that the explicit graceful degradation paths and the monitoring instrumentation described in Sections 8 and 9 do their intended work.
The principal lessons are that latency is a feature deserving the same engineering attention as accuracy, that the bottleneck in a real production workload is rarely where the team’s intuition expects it (in this case ScyllaDB network I/O rather than model inference), that connection pooling and request batching often produce larger improvements than algorithmic optimization, that graceful degradation under partial failure proves preferable to clean failure, and that shadow deployment combined with canary rollouts is the only reliable way to catch model regressions before they affect customers. Future work in real-time retraining, graph-based fraud signals, and explainability will extend the system in directions that the current architecture supports but does not yet exploit. As payment fraud continues to evolve and as the models defending against it become more sophisticated, the operational discipline required to run these systems well will remain at least as important as the modeling techniques themselves.
References
[1] Abadi, D. J., Carney, D., Γetintemel, U., Cherniack, M., Convey, C., Lee, S., Stonebraker, M., Tatbul, N., and Zdonik, S. (2003). Aurora: A new model and architecture for data stream management. The VLDB Journal, 12(2), 120β139.
[2] Akoglu, L., Tong, H., and Koutra, D. (2015). Graph based anomaly detection and description: A survey. Data Mining and Knowledge Discovery, 29(3), 626β688.
[3] Armbrust, M., Das, T., Davidson, A., Ghodsi, A., Or, A., Rosen, J., Stoica, I., Wendell, P., Xin, R., and Zaharia, M. (2018). Structured Streaming: A declarative API for real-time applications in Apache Spark. Proceedings of the 2018 ACM SIGMOD International Conference on Management of Data, 601β613.
[4] Armbrust, M., Das, T., Sun, L., Yavuz, B., Zhu, S., Murthy, M., Torres, J., et al. (2020). Delta Lake: cloud object store table format with ACID guarantees. Proceedings of the VLDB Endowment, 13(12), 3411β3424.
[5] Basiri, A., Behnam, N., de Rooij, R., Hochstein, L., Kosewski, L., Reynolds, J., and Rosenthal, C. (2016). Chaos engineering. IEEE Software, 33(3), 35β41.
[6] Bernstein, P. A., Hadzilacos, V., and Goodman, N. (1987). Concurrency Control and Recovery in Database Systems. Addison-Wesley.
[7] Beyer, B., Jones, C., Petoff, J., and Murphy, N. R. (2016). Site Reliability Engineering: How Google Runs Production Systems. O’Reilly Media.
[8] Bolton, R. J., and Hand, D. J. (2002). Statistical fraud detection: A review. Statistical Science, 17(3), 235β255.
[9] Breiman, L. (2001). Random forests. Machine Learning, 45(1), 5β32.
[10] Carbone, P., Katsifodimos, A., Ewen, S., Markl, V., Haridi, S., and Tzoumas, K. (2015). Apache Flink: Stream and batch processing in a single engine. Bulletin of the IEEE Computer Society Technical Committee on Data Engineering, 36(4).
[11] Carbone, P., Ewen, S., FΓ³ra, G., Haridi, S., Richter, S., and Tzoumas, K. (2017). State management in Apache Flink: Consistent stateful distributed stream processing. Proceedings of the VLDB Endowment, 10(12), 1718β1729.
[12] Carcillo, F., Le Borgne, Y. A., Caelen, O., and Bontempi, G. (2018). Streaming active learning strategies for real-life credit card fraud detection: Assessment and visualization. International Journal of Data Science and Analytics, 5(4), 285β300.
[13] Chandola, V., Banerjee, A., and Kumar, V. (2009). Anomaly detection: A survey. ACM Computing Surveys, 41(3), 1β58.
[14] Chandy, K. M., and Lamport, L. (1985). Distributed snapshots: Determining global states of distributed systems. ACM Transactions on Computer Systems, 3(1), 63β75.
[15] Chen, T., and Guestrin, C. (2016). XGBoost: A scalable tree boosting system. Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 785β794.
[16] Crankshaw, D., Wang, X., Zhou, G., Franklin, M. J., Gonzalez, J. E., and Stoica, I. (2017). Clipper: A low-latency online prediction serving system. NSDI, 613β627.
[17] Dal Pozzolo, A., Boracchi, G., Caelen, O., Alippi, C., and Bontempi, G. (2018). Credit card fraud detection: A realistic modeling and a novel learning strategy. IEEE Transactions on Neural Networks and Learning Systems, 29(8), 3784β3797.
[18] Dean, J., and Barroso, L. A. (2013). The tail at scale. Communications of the ACM, 56(2), 74β80.
[19] DeCandia, G., Hastorun, D., Jampani, M., Kakulapati, G., Lakshman, A., Pilchin, A., Sivasubramanian, S., Vosshall, P., and Vogels, W. (2007). Dynamo: Amazon’s highly available key-value store. ACM SIGOPS Operating Systems Review, 41(6), 205β220.
[20] Fawcett, T., and Provost, F. (1997). Adaptive fraud detection. Data Mining and Knowledge Discovery, 1(3), 291β316.
[21] Fitzpatrick, B. (2004). Distributed caching with Memcached. Linux Journal, 2004(124), 5.
[22] Friedman, J. H. (2001). Greedy function approximation: A gradient boosting machine. Annals of Statistics, 29(5), 1189β1232.
[23] Gilbert, S., and Lynch, N. (2002). Brewer’s conjecture and the feasibility of consistent, available, partition-tolerant web services. ACM SIGACT News, 33(2), 51β59.
[24] Gosling, J., Joy, B., Steele, G., Bracha, G., and Buckley, A. (2014). The Java Language Specification: Java SE 8 Edition. Addison-Wesley.
[25] Helland, P. (2009). Life beyond distributed transactions: An apostate’s opinion. CIDR, 132β141.
[26] Hwang, J. H., Balazinska, M., Rasin, A., Γetintemel, U., Stonebraker, M., and Zdonik, S. B. (2005). High-availability algorithms for distributed stream processing. Proceedings of the 21st International Conference on Data Engineering (ICDE), 779β790.
[27] Ke, G., Meng, Q., Finley, T., Wang, T., Chen, W., Ma, W., Ye, Q., and Liu, T. Y. (2017). LightGBM: A highly efficient gradient boosting decision tree. Advances in Neural Information Processing Systems, 30, 3146β3154.
[28] Kleppmann, M. (2017). Designing Data-Intensive Applications. O’Reilly Media.
[29] Kreps, J., Narkhede, N., and Rao, J. (2011). Kafka: A distributed messaging system for log processing. Proceedings of the NetDB Workshop.
[30] Kulkarni, S., Bhagat, N., Fu, M., Kedigehalli, V., Kellogg, C., Mittal, S., Patel, J. M., Ramasamy, K., and Taneja, S. (2015). Twitter Heron: Stream processing at scale. Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data, 239β250.
[31] Lamport, L. (1978). Time, clocks, and the ordering of events in a distributed system. Communications of the ACM, 21(7), 558β565.
[32] Lundberg, S. M., and Lee, S. I. (2017). A unified approach to interpreting model predictions. Advances in Neural Information Processing Systems, 30, 4765β4774.
[33] Marz, N., and Warren, J. (2015). Big Data: Principles and Best Practices of Scalable Realtime Data Systems. Manning.
[34] Nishtala, R., Fugal, H., Grimm, S., Kwiatkowski, M., Lee, H., Li, H. C., McElroy, R., Paleczny, M., Peek, D., Saab, P., et al. (2013). Scaling Memcache at Facebook. NSDI, 13, 385β398.
[35] Olston, C., Fiedel, N., Gorovoy, K., Harmsen, J., Lao, L., Li, F., Rajashekhar, V., Ramesh, S., and Soyke, J. (2017). TensorFlow-Serving: Flexible, high-performance ML serving. arXiv preprint arXiv:1712.06139.
[36] Phua, C., Lee, V., Smith, K., and Gayler, R. (2010). A comprehensive survey of data mining-based fraud detection research. arXiv preprint arXiv:1009.6119.
[37] Polyzotis, N., Roy, S., Whang, S. E., and Zinkevich, M. (2018). Data lifecycle challenges in production machine learning: A survey. ACM SIGMOD Record, 47(2), 17β28.
[38] Sculley, D., Holt, G., Golovin, D., Davydov, E., Phillips, T., Ebner, D., Chaudhary, V., Young, M., Crespo, J. F., and Dennison, D. (2015). Hidden technical debt in machine learning systems. Advances in Neural Information Processing Systems, 28, 2503β2511.
[39] Shen, H., Chen, L., Jin, Y., Zhao, L., Kong, B., Philipose, M., Krishnamurthy, A., and Sundaram, R. (2019). Nexus: A GPU cluster engine for accelerating DNN-based video analysis. Proceedings of the 27th ACM SOSP, 322β337.
[40] Sridharan, C. (2018). Distributed Systems Observability. O’Reilly Media.
[41] Stonebraker, M., Γetintemel, U., and Zdonik, S. (2005). The 8 requirements of real-time stream processing. ACM SIGMOD Record, 34(4), 42β47.
[42] Toshniwal, A., Taneja, S., Shukla, A., Ramasamy, K., Patel, J. M., Kulkarni, S., Jackson, J., Gade, K., Fu, M., Donham, J., Bhagat, N., Mittal, S., and Ryaboy, D. (2014). Storm @Twitter. Proceedings of the 2014 ACM SIGMOD International Conference on Management of Data, 147β156.
[43] Vavilapalli, V. K., Murthy, A. C., Douglas, C., Agarwal, S., Konar, M., Evans, R., Graves, T., Lowe, J., Shah, H., Seth, S., et al. (2013). Apache Hadoop YARN: Yet another resource negotiator. Proceedings of the 4th Annual Symposium on Cloud Computing, 1β16.
[44] Wang, G., Chen, L., Dikshit, A., Gustafson, J., Chen, B., Sax, M. J., Roesler, J., Blee-Goldman, S., Cadonna, B., Mehta, A., Madan, V., and Rao, J. (2019). Consistency and completeness: Rethinking distributed stream processing in Apache Kafka. Proceedings of the 2019 ACM SIGMOD International Conference on Management of Data.
[45] West, J., and Bhattacharya, M. (2016). Intelligent financial fraud detection: A comprehensive review. Computers & Security, 57, 47β66.
[46] Whitrow, C., Hand, D. J., Juszczak, P., Weston, D., and Adams, N. M. (2009). Transaction aggregation as a strategy for credit card fraud detection. Data Mining and Knowledge Discovery, 18(1), 30β55.
[47] Zaharia, M., Das, T., Li, H., Hunter, T., Shenker, S., and Stoica, I. (2013). Discretized streams: Fault-tolerant streaming computation at scale. Proceedings of the 24th ACM Symposium on Operating Systems Principles, 423β438.
[48] Zaharia, M., Xin, R. S., Wendell, P., Das, T., Armbrust, M., Dave, A., Meng, X., Rosen, J., Venkataraman, S., Franklin, M. J., Ghodsi, A., Gonzalez, J., Shenker, S., and Stoica, I. (2016). Apache Spark: A unified engine for big data processing. Communications of the ACM, 59(11), 56β65.
[49] Armbrust, M., et al. (2021). Lakehouse: A New Generation of Open Platforms that Unify Data Warehousing and Advanced Analytics. Proceedings of CIDR 2021.
Real-Time Fraud Detection and Feature Store Design Patterns for Streaming ML in Financial ServicesDownload