Exactly-Once Semantics in Distributed Stream Processing at Scale | IJCSE Volume 5 – Issue 1 | IJCSE-V5I1P4

IJCSE International Journal of Computer Science Engineering Logo

International Journal of Computer Science Engineering Techniques

ISSN: 2455-135X
Volume 5, Issue 1  |  Published:
Author

Abstract

Exactly-once semantics in distributed stream processing remain difficult to achieve in practice, particularly when pipelines span multiple subsystems with independent failure domains: a message broker, a stream processor, and a persistent sink. Although Apache Kafka and Apache Spark Structured Streaming each advertise exactly-once guarantees within their own boundaries, the end-to-end guarantee experienced by an application is determined by the weakest link in the chain, and the conditions under which the chain holds together are often poorly understood by practitioners. This paper presents a formal analysis of the conditions required for end-to-end exactly-once guarantees in Kafka-to-Spark-to-sink pipelines, supplemented by empirical failure observations from a production financial services streaming platform processing tens of thousands of payment transactions per second under a two-hundred-millisecond latency service-level agreement. The paper contributes a five-category failure taxonomy for exactly-once violations observed in production, including a previously under-documented class of failure involving staging table isolation under concurrent foreachBatch invocations; empirical data on incident frequency, detection time, and impact; and a set of architectural recommendations for separating latency-critical hot paths from analytical cold paths, designing genuinely idempotent sinks, and implementing reconciliation-based monitoring of exactly-once guarantees. The central message is that exactly-once is achievable but requires deliberate architectural design across all three pipeline layers; relying on framework defaults is insufficient for production systems where duplicates or losses carry regulatory consequences.

Keywords

exactly-once semantics, stream processing, Apache Kafka, distributed systems, fault tolerance, message delivery guarantees

Conclusion

This paper has presented a formal analysis of the conditions required for end-to-end exactly-once guarantees in Kafka and Spark Structured Streaming pipelines, a five-category taxonomy of exactly-once failure modes observed in production, empirical data from a financial services streaming platform processing tens of thousands of transactions per second, and a set of architectural recommendations for achieving exactly-once guarantees in practice. The principal contribution is the identification of staging table isolation failures as a distinct failure category that does not appear to be widely documented in the published streaming literature but that produced a four-hour data inconsistency incident in the empirical environment. The category arises specifically in the foreachBatch sink integration pattern when the user-supplied function uses a staging structure shared across concurrent invocations, and it can be reproduced only under recovery scenarios that produce concurrent execution. The mitigation per-batch isolated staging keyed on the framework-supplied batch identifier is straightforward, but it depends on awareness that the failure mode exists. More broadly, the paper argues that exactly-once is not a property that can be acquired by configuration of a single framework; it is a property that emerges from correct composition of source, processor, and sink layers, each with its own failure domain and its own durability semantics. Practitioners working on production financial services streaming systems should treat framework guarantees as necessary but not sufficient, and should design idempotency, isolation, and reconciliation as first-class concerns of the pipeline architecture. The empirical experience reported here suggests that monitoring particularly reconciliation-based monitoring that compares source and sink record counts independently of framework guarantees is the most effective mechanism for containing the incidents that inevitably arise. Future work should extend the failure taxonomy to additional sink technologies and processor frameworks, and should investigate formal verification techniques that could surface staging isolation hazards at design time rather than after a production incident.

References

[1] Akidau, T., Bradshaw, R., Chambers, C., Chernyak, S., FernΓ‘ndez-Moctezuma, R. J., Lax, R., McVeety, S., Mills, D., Perry, F., Schmidt, E., and Whittle, S. (2015). The Dataflow model: A practical approach to balancing correctness, latency, and cost in massive-scale, unbounded, out-of-order data processing. Proceedings of the VLDB Endowment, 8(12), 1792–1803. [2] Akidau, T., Balikov, A., Bekiroğlu, K., Chernyak, S., Haberman, J., Lax, R., McVeety, S., Mills, D., Nordstrom, P., and Whittle, S. (2013). MillWheel: Fault-tolerant stream processing at internet scale. Proceedings of the VLDB Endowment, 6(11), 1033–1044. [3] Armbrust, M., Das, T., Davidson, A., Ghodsi, A., Or, A., Rosen, J., Stoica, I., Wendell, P., Xin, R., and Zaharia, M. (2018). Structured Streaming: A declarative API for real-time applications in Apache Spark. Proceedings of the 2018 ACM SIGMOD International Conference on Management of Data, 601–613. [4] Armbrust, M., Das, T., Sun, L., Yavuz, B., Zhu, S., Murthy, M., Torres, J., van Hovell, H., Ionescu, A., Łuszczak, A., Switakowski, M., SzafraΕ„ski, M., Li, X., Ueshin, T., Mokhtar, M., Boncz, P., Ghodsi, A., Paranjpye, S., Senster, P., Xin, R., and Zaharia, M. (2020). Delta Lake: enterprise-grade ACID table capabilities on cloud storage. Proceedings of the VLDB Endowment, 13(12), 3411–3424. [5] Balazinska, M., Balakrishnan, H., Madden, S. R., and Stonebraker, M. (2008). Fault-tolerance in the Borealis distributed stream processing system. ACM Transactions on Database Systems, 33(1), 1–44. [6] Bernstein, P. A., Hadzilacos, V., and Goodman, N. (1987). Concurrency Control and Recovery in Database Systems. Addison-Wesley. [7] Carbone, P., Ewen, S., FΓ³ra, G., Haridi, S., Richter, S., and Tzoumas, K. (2017). State management in Apache Flink: Consistent stateful distributed stream processing. Proceedings of the VLDB Endowment, 10(12), 1718–1729. [8] Carbone, P., Katsifodimos, A., Ewen, S., Markl, V., Haridi, S., and Tzoumas, K. (2015). Apache Flink: Stream and batch processing in a single engine. Bulletin of the IEEE Computer Society Technical Committee on Data Engineering, 36(4). [9] Chandy, K. M., and Lamport, L. (1985). Distributed snapshots: Determining global states of distributed systems. ACM Transactions on Computer Systems, 3(1), 63–75. [10] Chandramouli, B., Goldstein, J., Barnett, M., DeLine, R., Fisher, D., Platt, J. C., Terwilliger, J. F., and Wernsing, J. (2014). Trill: A high-performance incremental query processor for diverse analytics. Proceedings of the VLDB Endowment, 8(4), 401–412. [11] Cherniack, M., Balakrishnan, H., Balazinska, M., Carney, D., Γ‡etintemel, U., Xing, Y., and Zdonik, S. B. (2003). Scalable distributed stream processing. CIDR. [12] Dean, J., and Barroso, L. A. (2013). The tail at scale. Communications of the ACM, 56(2), 74–80. [13] Fischer, M. J., Lynch, N. A., and Paterson, M. S. (1985). Impossibility of distributed consensus with one faulty process. Journal of the ACM, 32(2), 374–382. [14] Gilbert, S., and Lynch, N. (2002). Brewer’s conjecture and the feasibility of consistent, available, partition-tolerant web services. ACM SIGACT News, 33(2), 51–59. [15] Gray, J., and Reuter, A. (1992). Transaction Processing: Concepts and Techniques. Morgan Kaufmann. [16] Hellerstein, J. M. (2010). The declarative imperative: Experiences and conjectures in distributed logic. ACM SIGMOD Record, 39(1), 5–19. [17] Helland, P. (2007). Life beyond distributed transactions: An apostate’s opinion. CIDR, 132–141. [18] Hwang, J. H., Balazinska, M., Rasin, A., Γ‡etintemel, U., Stonebraker, M., and Zdonik, S. B. (2005). High-availability algorithms for distributed stream processing. Proceedings of the 21st International Conference on Data Engineering (ICDE), 779–790. [19] Kafka, F. (2017). Apache Kafka: Exactly-once semantics. Confluent Engineering Blog / Apache Kafka Documentation. [20] Kleppmann, M. (2017). Designing Data-Intensive Applications. O’Reilly Media. [21] Kleppmann, M., and Kreps, J. (2015). Kafka, Samza and the Unix philosophy of distributed data. IEEE Data Engineering Bulletin, 38(4), 4–14. [22] Kreps, J. (2014). Questioning the Lambda architecture. O’Reilly Radar. [23] Kreps, J., Narkhede, N., and Rao, J. (2011). Kafka: A distributed messaging system for log processing. Proceedings of the NetDB Workshop. [24] Kulkarni, S., Bhagat, N., Fu, M., Kedigehalli, V., Kellogg, C., Mittal, S., Patel, J. M., Ramasamy, K., and Taneja, S. (2015). Twitter Heron: Stream processing at scale. Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data, 239–250. [25] Lamport, L. (1978). Time, clocks, and the ordering of events in a distributed system. Communications of the ACM, 21(7), 558–565. [26] Lamport, L. (1998). The part-time parliament. ACM Transactions on Computer Systems, 16(2), 133–169. [27] Lin, J., and Ryaboy, D. (2013). Scaling big data mining infrastructure: The Twitter experience. ACM SIGKDD Explorations Newsletter, 14(2), 6–19. [28] Marz, N., and Warren, J. (2015). Big Data: Principles and Best Practices of Scalable Realtime Data Systems. Manning. [29] Murray, D. G., McSherry, F., Isaacs, R., Isard, M., Barham, P., and Abadi, M. (2013). Naiad: A timely dataflow system. Proceedings of the 24th ACM Symposium on Operating Systems Principles, 439–455. [30] Neumeyer, L., Robbins, B., Nair, A., and Kesari, A. (2010). S4: Distributed stream computing platform. IEEE International Conference on Data Mining Workshops, 170–177. [31] Noghabi, S. A., Paramasivam, K., Pan, Y., Ramesh, N., Bringhurst, J., Gupta, I., and Campbell, R. H. (2017). Samza: Stateful scalable stream processing at LinkedIn. Proceedings of the VLDB Endowment, 10(12), 1634–1645. [32] Ongaro, D., and Ousterhout, J. (2014). In search of an understandable consensus algorithm. USENIX Annual Technical Conference, 305–319. [33] Pacheco, F., Rangan, K., et al. (2018). Production experience with Kafka at financial scale. Industry experience report. [34] Schwarzkopf, M., Konwinski, A., Abd-El-Malek, M., and Wilkes, J. (2013). Omega: Flexible, scalable schedulers for large compute clusters. Proceedings of the 8th ACM European Conference on Computer Systems, 351–364. [35] Shapiro, M., PreguiΓ§a, N., Baquero, C., and Zawirski, M. (2011). Conflict-free replicated data types. Symposium on Self-Stabilizing Systems, 386–400. [36] Stonebraker, M., Γ‡etintemel, U., and Zdonik, S. (2005). The 8 requirements of real-time stream processing. ACM SIGMOD Record, 34(4), 42–47. [37] Toshniwal, A., Taneja, S., Shukla, A., Ramasamy, K., Patel, J. M., Kulkarni, S., Jackson, J., Gade, K., Fu, M., Donham, J., Bhagat, N., Mittal, S., and Ryaboy, D. (2014). Storm @Twitter. Proceedings of the 2014 ACM SIGMOD International Conference on Management of Data, 147–156. [38] Wang, G., Koshy, J., Subramanian, S., Paramasivam, K., Zadeh, M., Narkhede, N., Rao, J., Kreps, J., and Stein, J. (2015). Building a replicated logging system with Apache Kafka. Proceedings of the VLDB Endowment, 8(12), 1654–1655. [39] Wang, G., Chen, L., Dikshit, A., Gustafson, J., Chen, B., Sax, M. J., Roesler, J., Blee-Goldman, S., Cadonna, B., Mehta, A., Madan, V., and Rao, J. (2019). Consistency and completeness: Rethinking distributed stream processing in Apache Kafka. Proceedings of the 2019 ACM SIGMOD International Conference on Management of Data. [40] Zaharia, M., Das, T., Li, H., Hunter, T., Shenker, S., and Stoica, I. (2013). Discretized streams: Fault-tolerant streaming computation at scale. Proceedings of the 24th ACM Symposium on Operating Systems Principles, 423–438. [41] Zaharia, M., Xin, R. S., Wendell, P., Das, T., Armbrust, M., Dave, A., Meng, X., Rosen, J., Venkataraman, S., Franklin, M. J., Ghodsi, A., Gonzalez, J., Shenker, S., and Stoica, I. (2016). Apache Spark: A unified engine for big data processing. Communications of the ACM, 59(11), 56–65.
Β© 2025 International Journal of Computer Science Engineering Techniques (IJCSE).

Submit Your Paper