In the era of exponential data growth, efficient information retrieval (IR) has become essential for extracting relevant knowledge from large text corpora. This paper presents a data-driven approach that integrates Association Rule Mining (ARM) with Natural Language Processing (NLP)-based stemming techniques to enhance the accuracy and performance of text-based information retrieval systems. The proposed framework preprocesses textual data using tokenization, stop-word elimination, and stemming to reduce dimensionality and linguistic redundancy. Subsequently, ARM is applied to discover frequent term associations and semantic relationships among keywords, thereby improving the ranking and relevance of retrieved documents. Experimental analysis conducted on benchmark text datasets demonstrates that the hybrid model significantly outperforms conventional keyword-based retrieval methods in terms of precision, recall, and F-measure. The study highlights how the integration of ARM and NLP techniques contributes to intelligent information retrieval, enabling more context-aware and semantically enriched search results for big data applications.
Keywords
Association Rule Mining, Information Retrieval, Text Mining, Natural Language Processing, Stemming, Data-Driven Approach, Semantic Search, Big Data Analytics.
Conclusion
The proposed data-driven information retrieval model successfully integrates Association Rule Mining (ARM) and NLP-based stemming techniques to enhance the efficiency and accuracy of text retrieval systems. Through the application of stemming and linguistic normalization, the system reduces lexical variability, ensuring that semantically related words are treated uniformly. The incorporation of ARM enables the discovery of meaningful associations among terms, contributing to improved query expansion and relevance ranking. Experimental evaluations demonstrate that the proposed hybrid model outperforms traditional keyword-based approaches and classical vector-space models in terms of precision, recall, and F-measure. The model effectively bridges the gap between syntactic representation and semantic understanding, offering a scalable and interpretable framework for intelligent information retrieval.
The findings confirm that combining text mining with association rule analysis significantly enhances information retrieval by reducing redundancy, increasing contextual awareness, and improving overall system responsiveness. The evaluation metrics indicate that the integration of statistical and linguistic techniques leads to superior retrieval accuracy, particularly in large and unstructured text corpora. Furthermore, the proposed approach exhibits strong generalization capabilities, making it suitable for diverse data sources such as digital libraries, social media analytics, and enterprise document repositories.
References
[1] G. Salton and C. Buckley, āTerm-weighting approaches in automatic text retrieval,ā Information Processing & Management, vol. 24, no. 5, pp. 513ā523, 1988.
[2] C. D. Manning, P. Raghavan, and H. Schütze, Introduction to Information Retrieval, Cambridge University Press, 2008.
[3] M. F. Porter, āAn algorithm for suffix stripping,ā Program, vol. 14, no. 3, pp. 130ā137, 1980.
[4] J. Lovins, āDevelopment of a stemming algorithm,ā Mechanical Translation and Computational Linguistics, vol. 11, pp. 22ā31, 1968.
[5] C. Paice, āAnother stemmer,ā ACM SIGIR Forum, vol. 24, no. 3, pp. 56ā61, 1990.
[6] A.-H. Tan, āText mining: The state of the art and the challenges,ā Proc. PAKDD Workshop on Knowledge Discovery from Advanced Databases, 1999.
[7] R. Agrawal and R. Srikant, āFast algorithms for mining association rules,ā Proc. 20th Int. Conf. Very Large Data Bases (VLDB), pp. 487ā499, 1994.
[8] R. Agrawal, T. ImieliÅski, and A. Swami, āMining association rules between sets of items in large databases,ā Proc. ACM SIGMOD, pp. 207ā216, 1993.
[9] J. Srivastava and R. Cooley, āWeb usage mining: Discovery and applications of usage patterns from web data,ā SIGKDD Explorations, vol. 1, no. 2, pp. 12ā23, 2000.
[10] J. Han, M. Kamber, and J. Pei, Data Mining: Concepts and Techniques, 3rd ed., Morgan Kaufmann, 2011.
[11] M. J. Zaki and C. J. Hsiao, āEfficient algorithms for mining closed itemsets and their lattice structure,ā IEEE Trans. Knowl. Data Eng., vol. 17, no. 4, pp. 462ā478, 2005.
[12] Y. Li and N. Zhong, āMining ontology for automatically acquiring topic hierarchies from text,ā Proc. IEEE/WIC Int. Conf. Web Intelligence, pp. 296ā302, 2003.
[13] G. Zhou, J. Su, and J. Zhang, āExploring deep knowledge resources in text mining,ā Proc. Int. Conf. Computational Linguistics, pp. 1ā7, 2008.
[14] Y. Chen and L. He, āHybrid text mining model for multilingual information retrieval using association rule mining,ā Expert Systems with Applications, vol. 38, no. 12, pp. 14514ā14522, 2011.
[15] S. Kumar, R. Singh, and M. Kaur, āA hybrid deep learning and association rule mining approach for semantic document retrieval,ā IEEE Access, vol. 9, pp. 101234ā101245, 2021.
[16] N. Patel, M. Joshi, and S. Shah, āAssociation rule-based semantic query expansion for efficient information retrieval,ā Proc. Int. Conf. Intelligent Computing and Control Systems (ICICCS), pp. 202ā208, 2020.
[17] M. Rahman, S. Sarker, and T. Alam, āSemantic text mining for intelligent information retrieval using hybrid NLP model,ā Journal of Big Data, vol. 8, no. 1, pp. 1ā14, 2021.
[18] Mr. M. Rajkumar*1, Mr. D. Govindaraj*2, Dr. J.M. Dhayashankar*3 āEnhancing Association Rule Mining Efficiency: A Comprehensive Survey On Fp-Tree-Based Algorithmsā International Research Journal of Modernization in Engineering Technology and Science, Volume:07/Issue:03/March-2025.