In the era of exponential data growth, efficient information retrieval (IR) has become essential for extracting relevant knowledge from large text corpora. This paper presents a data-driven approach that integrates Association Rule Mining (ARM) with Natural Language Processing (NLP)-based stemming techniques to enhance the accuracy and performance of text-based information retrieval systems. The proposed framework preprocesses textual data using tokenization, stop-word elimination, and stemming to reduce dimensionality and linguistic redundancy. Subsequently, ARM is applied to discover frequent term associations and semantic relationships among keywords, thereby improving the ranking and relevance of retrieved documents. Experimental analysis conducted on benchmark text datasets demonstrates that the hybrid model significantly outperforms conventional keyword-based retrieval methods in terms of precision, recall, and F-measure. The study highlights how the integration of ARM and NLP techniques contributes to intelligent information retrieval, enabling more context-aware and semantically enriched search results for big data applications.
Keywords
Association Rule Mining, Information Retrieval, Text Mining, Natural Language Processing, Stemming, Data-Driven Approach, Semantic Search, Big Data Analytics.
Conclusion
The proposed data-driven information retrieval model successfully integrates Association Rule Mining (ARM) and NLP-based stemming techniques to enhance the efficiency and accuracy of text retrieval systems. Through the application of stemming and linguistic normalization, the system reduces lexical variability, ensuring that semantically related words are treated uniformly. The incorporation of ARM enables the discovery of meaningful associations among terms, contributing to improved query expansion and relevance ranking. Experimental evaluations demonstrate that the proposed hybrid model outperforms traditional keyword-based approaches and classical vector-space models in terms of precision, recall, and F-measure. The model effectively bridges the gap between syntactic representation and semantic understanding, offering a scalable and interpretable framework for intelligent information retrieval.
The findings confirm that combining text mining with association rule analysis significantly enhances information retrieval by reducing redundancy, increasing contextual awareness, and improving overall system responsiveness. The evaluation metrics indicate that the integration of statistical and linguistic techniques leads to superior retrieval accuracy, particularly in large and unstructured text corpora. Furthermore, the proposed approach exhibits strong generalization capabilities, making it suitable for diverse data sources such as digital libraries, social media analytics, and enterprise document repositories.
References
[1] G. Salton and C. Buckley, βTerm-weighting approaches in automatic text retrieval,β Information Processing & Management, vol. 24, no. 5, pp. 513β523, 1988.
[2] C. D. Manning, P. Raghavan, and H. SchΓΌtze, Introduction to Information Retrieval, Cambridge University Press, 2008.
[3] M. F. Porter, βAn algorithm for suffix stripping,β Program, vol. 14, no. 3, pp. 130β137, 1980.
[4] J. Lovins, βDevelopment of a stemming algorithm,β Mechanical Translation and Computational Linguistics, vol. 11, pp. 22β31, 1968.
[5] C. Paice, βAnother stemmer,β ACM SIGIR Forum, vol. 24, no. 3, pp. 56β61, 1990.
[6] A.-H. Tan, βText mining: The state of the art and the challenges,β Proc. PAKDD Workshop on Knowledge Discovery from Advanced Databases, 1999.
[7] R. Agrawal and R. Srikant, βFast algorithms for mining association rules,β Proc. 20th Int. Conf. Very Large Data Bases (VLDB), pp. 487β499, 1994.
[8] R. Agrawal, T. ImieliΕski, and A. Swami, βMining association rules between sets of items in large databases,β Proc. ACM SIGMOD, pp. 207β216, 1993.
[9] J. Srivastava and R. Cooley, βWeb usage mining: Discovery and applications of usage patterns from web data,β SIGKDD Explorations, vol. 1, no. 2, pp. 12β23, 2000.
[10] J. Han, M. Kamber, and J. Pei, Data Mining: Concepts and Techniques, 3rd ed., Morgan Kaufmann, 2011.
[11] M. J. Zaki and C. J. Hsiao, βEfficient algorithms for mining closed itemsets and their lattice structure,β IEEE Trans. Knowl. Data Eng., vol. 17, no. 4, pp. 462β478, 2005.
[12] Y. Li and N. Zhong, βMining ontology for automatically acquiring topic hierarchies from text,β Proc. IEEE/WIC Int. Conf. Web Intelligence, pp. 296β302, 2003.
[13] G. Zhou, J. Su, and J. Zhang, βExploring deep knowledge resources in text mining,β Proc. Int. Conf. Computational Linguistics, pp. 1β7, 2008.
[14] Y. Chen and L. He, βHybrid text mining model for multilingual information retrieval using association rule mining,β Expert Systems with Applications, vol. 38, no. 12, pp. 14514β14522, 2011.
[15] S. Kumar, R. Singh, and M. Kaur, βA hybrid deep learning and association rule mining approach for semantic document retrieval,β IEEE Access, vol. 9, pp. 101234β101245, 2021.
[16] N. Patel, M. Joshi, and S. Shah, βAssociation rule-based semantic query expansion for efficient information retrieval,β Proc. Int. Conf. Intelligent Computing and Control Systems (ICICCS), pp. 202β208, 2020.
[17] M. Rahman, S. Sarker, and T. Alam, βSemantic text mining for intelligent information retrieval using hybrid NLP model,β Journal of Big Data, vol. 8, no. 1, pp. 1β14, 2021.
[18] Mr. M. Rajkumar*1, Mr. D. Govindaraj*2, Dr. J.M. Dhayashankar*3 βEnhancing Association Rule Mining Efficiency: A Comprehensive Survey On Fp-Tree-Based Algorithmsβ International Research Journal of Modernization in Engineering Technology and Science, Volume:07/Issue:03/March-2025.