Data-Driven Information Retrieval Using Association Rule Mining and NLP-Based Stemming Techniques | IJCSE Volume 9 – Issue 6 | IJCSE-V9I6P30

IJCSE International Journal of Computer Science Engineering Logo

International Journal of Computer Science Engineering Techniques

ISSN: 2455-135X
Volume 9, Issue 6  |  Published:
Author

Abstract

In the era of exponential data growth, efficient information retrieval (IR) has become essential for extracting relevant knowledge from large text corpora. This paper presents a data-driven approach that integrates Association Rule Mining (ARM) with Natural Language Processing (NLP)-based stemming techniques to enhance the accuracy and performance of text-based information retrieval systems. The proposed framework preprocesses textual data using tokenization, stop-word elimination, and stemming to reduce dimensionality and linguistic redundancy. Subsequently, ARM is applied to discover frequent term associations and semantic relationships among keywords, thereby improving the ranking and relevance of retrieved documents. Experimental analysis conducted on benchmark text datasets demonstrates that the hybrid model significantly outperforms conventional keyword-based retrieval methods in terms of precision, recall, and F-measure. The study highlights how the integration of ARM and NLP techniques contributes to intelligent information retrieval, enabling more context-aware and semantically enriched search results for big data applications.

Keywords

Association Rule Mining, Information Retrieval, Text Mining, Natural Language Processing, Stemming, Data-Driven Approach, Semantic Search, Big Data Analytics.

Conclusion

The proposed data-driven information retrieval model successfully integrates Association Rule Mining (ARM) and NLP-based stemming techniques to enhance the efficiency and accuracy of text retrieval systems. Through the application of stemming and linguistic normalization, the system reduces lexical variability, ensuring that semantically related words are treated uniformly. The incorporation of ARM enables the discovery of meaningful associations among terms, contributing to improved query expansion and relevance ranking. Experimental evaluations demonstrate that the proposed hybrid model outperforms traditional keyword-based approaches and classical vector-space models in terms of precision, recall, and F-measure. The model effectively bridges the gap between syntactic representation and semantic understanding, offering a scalable and interpretable framework for intelligent information retrieval. The findings confirm that combining text mining with association rule analysis significantly enhances information retrieval by reducing redundancy, increasing contextual awareness, and improving overall system responsiveness. The evaluation metrics indicate that the integration of statistical and linguistic techniques leads to superior retrieval accuracy, particularly in large and unstructured text corpora. Furthermore, the proposed approach exhibits strong generalization capabilities, making it suitable for diverse data sources such as digital libraries, social media analytics, and enterprise document repositories.

References

[1] G. Salton and C. Buckley, “Term-weighting approaches in automatic text retrieval,” Information Processing & Management, vol. 24, no. 5, pp. 513–523, 1988. [2] C. D. Manning, P. Raghavan, and H. Schütze, Introduction to Information Retrieval, Cambridge University Press, 2008. [3] M. F. Porter, “An algorithm for suffix stripping,” Program, vol. 14, no. 3, pp. 130–137, 1980. [4] J. Lovins, “Development of a stemming algorithm,” Mechanical Translation and Computational Linguistics, vol. 11, pp. 22–31, 1968. [5] C. Paice, “Another stemmer,” ACM SIGIR Forum, vol. 24, no. 3, pp. 56–61, 1990. [6] A.-H. Tan, “Text mining: The state of the art and the challenges,” Proc. PAKDD Workshop on Knowledge Discovery from Advanced Databases, 1999. [7] R. Agrawal and R. Srikant, “Fast algorithms for mining association rules,” Proc. 20th Int. Conf. Very Large Data Bases (VLDB), pp. 487–499, 1994. [8] R. Agrawal, T. Imieliński, and A. Swami, “Mining association rules between sets of items in large databases,” Proc. ACM SIGMOD, pp. 207–216, 1993. [9] J. Srivastava and R. Cooley, “Web usage mining: Discovery and applications of usage patterns from web data,” SIGKDD Explorations, vol. 1, no. 2, pp. 12–23, 2000. [10] J. Han, M. Kamber, and J. Pei, Data Mining: Concepts and Techniques, 3rd ed., Morgan Kaufmann, 2011. [11] M. J. Zaki and C. J. Hsiao, “Efficient algorithms for mining closed itemsets and their lattice structure,” IEEE Trans. Knowl. Data Eng., vol. 17, no. 4, pp. 462–478, 2005. [12] Y. Li and N. Zhong, “Mining ontology for automatically acquiring topic hierarchies from text,” Proc. IEEE/WIC Int. Conf. Web Intelligence, pp. 296–302, 2003. [13] G. Zhou, J. Su, and J. Zhang, “Exploring deep knowledge resources in text mining,” Proc. Int. Conf. Computational Linguistics, pp. 1–7, 2008. [14] Y. Chen and L. He, “Hybrid text mining model for multilingual information retrieval using association rule mining,” Expert Systems with Applications, vol. 38, no. 12, pp. 14514–14522, 2011. [15] S. Kumar, R. Singh, and M. Kaur, “A hybrid deep learning and association rule mining approach for semantic document retrieval,” IEEE Access, vol. 9, pp. 101234–101245, 2021. [16] N. Patel, M. Joshi, and S. Shah, “Association rule-based semantic query expansion for efficient information retrieval,” Proc. Int. Conf. Intelligent Computing and Control Systems (ICICCS), pp. 202–208, 2020. [17] M. Rahman, S. Sarker, and T. Alam, “Semantic text mining for intelligent information retrieval using hybrid NLP model,” Journal of Big Data, vol. 8, no. 1, pp. 1–14, 2021. [18] Mr. M. Rajkumar*1, Mr. D. Govindaraj*2, Dr. J.M. Dhayashankar*3 “Enhancing Association Rule Mining Efficiency: A Comprehensive Survey On Fp-Tree-Based Algorithms” International Research Journal of Modernization in Engineering Technology and Science, Volume:07/Issue:03/March-2025.
© 2025 International Journal of Computer Science Engineering Techniques (IJCSE).