Exploring the Synergy of Grammar-Aware Prompt Engineering and Formal Methods for Mitigating Hallucinations in LLMs
Abstract
Recent advancements in Artificial Intelligence (AI), particularly in the advanced machine learning for the Natural Language Processing (NLP) paradigm, have led to the development of powerful Large Language Models (LLMs) capable of impressive feats in tasks like translation, text summarisation, text generation and code generation. However, a critical challenge hindering their real-world deployment is their susceptibility to hallucinations, where they generate plausible looking but factually incorrect outputs. These limitations come with adverse effects, such as the propagation of misinformation and reducing user trustworthiness in the related technologies, even when they possess transformative potential in various sectors. This study aims to enhance the performance of LLMs by presenting a new strategy that combines grammar-aware prompt engineering (GAPE) and formal methods (FMs) to leverage their synergy in the LLM process logic. We argue that by combining linguistic principles using GAPE and constructing a basis of formal structures using FMs, we could improve the LLM's ability to analyse language, decrease ambiguity in prompts, improve consistency in output, and eventually, greatly diminish LLM hallucinations. To do this, we propose a collaboration between linguists and AI experts while also providing specialised training for LLMs that emphasises linguistic precision. Additionally, we suggest implementing iterative design and development procedures for LLMs that use GAPE and FM principles to continuously enhance the performance of LLMs. By following these techniques, we may create a future in which LLMs are more trustworthy for a wide range of users and use cases with reliable LLM technologies and efficient advancements in practical situations
Downloads
References
Abedi, M., Alshybani, I., Shahadat, M. R. B., & Murillo, M. (2023). Beyond Traditional Teaching: The Potential of Large Language Models and Chatbots in Graduate Engineering Education. Qeios. https://www.qeios.com/read/MD04B0
Ahmad, M. A., Yaramis, I., & Roy, T. D. (2023). Creating Trustworthy LLMs: Dealing with Hallucinations in Healthcare AI (arXiv:2311.01463). arXiv. http://arxiv.org/abs/2311.01463
Alberts, I. L., Mercolli, L., Pyka, T., Prenosil, G., Shi, K., Rominger, A., & Afshar-Oromieh, A. (2023). Large language models (LLM) and ChatGPT: What will the impact on nuclear medicine be? European Journal of Nuclear Medicine and Molecular Imaging, 50(6), 1549–1552. https://doi.org/10.1007/s00259-023-06172-w
Athaluri, S. A., Manthena, S. V., Kesapragada, V. S. R. K. M., Yarlagadda, V., Dave, T., Duddumpudi, R. T. S., Athaluri, S. A., Manthena, S. V., Kesapragada, V. S. R. K. M., Yarlagadda, V., Dave, T., & Duddumpudi, R. T. S. (2023). Exploring the Boundaries of Reality: Investigating the Phenomenon of Artificial Intelligence Hallucination in Scientific Writing Through ChatGPT References. Cureus, 15(4). https://doi.org/10.7759/cureus.37432
Athavale, A., Baier, J., Ross, E., & Fukaya, E. (2023). The potential of chatbots in chronic venous disease patient management. JVS-Vascular Insights, 1, 100019. https://doi.org/10.1016/j.jvsvi.2023.100019
Bang, Y., Cahyawijaya, S., Lee, N., Dai, W., Su, D., Wilie, B., Lovenia, H., Ji, Z., Yu, T., Chung, W., Do, Q. V., Xu, Y., & Fung, P. (2023). A Multitask, Multilingual, Multimodal Evaluation of ChatGPT on Reasoning, Hallucination, and Interactivity (arXiv:2302.04023). arXiv. http://arxiv.org/abs/2302.04023
Bommineni, V. L., Bhagwagar, S., Balcarcel, D., Davatzikos, C., & Boyer, D. (2023). Performance of ChatGPT on the MCAT: The road to personalized and equitable premedical learning. MedRxiv, 2023–03.
Bozkurt, A., & Sharma, R. C. (2023). Generative AI and prompt engineering: The art of whispering to let the genie out of the algorithmic world. Asian Journal of Distance Education, 18(2), i–vii.
de Wynter, A., Wang, X., Sokolov, A., Gu, Q., & Chen, S.-Q. (2023). An evaluation on large language model outputs: Discourse and memorization. Natural Language Processing Journal, 4, 100024. https://doi.org/10.1016/j.nlp.2023.100024
Dhuliawala, S., Komeili, M., Xu, J., Raileanu, R., Li, X., Celikyilmaz, A., & Weston, J. (2023). Chain-of-Verification Reduces Hallucination in Large Language Models (arXiv:2309.11495). arXiv. http://arxiv.org/abs/2309.11495
Dietze, S., Jabeen, H., Kallmeyer, L., & Linzbach, S. (n.d.). Towards syntax-aware pretraining and prompt engineering for knowledge retrieval from large language models.
Dziri, N., Milton, S., Yu, M., Zaiane, O., & Reddy, S. (2022). On the Origin of Hallucinations in Conversational Models: Is it the Datasets or the Models? Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 5271– 5285. https://doi.org/10.18653/v1/2022.naacl-main.387
Ekin, S. (2023). Prompt Engineering For ChatGPT: A Quick Guide To Techniques, Tips, And Best Practices. https://doi.org/10.36227/techrxiv.22683919.v2
Emsley, R. (2023). ChatGPT: These are not hallucinations – they’re fabrications and falsifications. Schizophrenia, 9(1), 52, s41537- 023- 00379– 4. https://doi.org/10.1038/s41537-023-00379-4
Franceschelli, G., & Musolesi, M. (2023). On the creativity of large language models. arXiv Preprint arXiv:2304.00008. https://arxiv.org/abs/2304.00008
Giray, L. (2023). Authors should be held responsible for artificial intelligence hallucinations and mistakes in their papers. Journal of the Practice of Cardiovascular Sciences, 9(2), 161. https://doi.org/10.4103/jpcs.jpcs_45_23
Gu, J., Han, Z., Chen, S., Beirami, A., He, B., Zhang, G., Liao, R., Qin, Y., Tresp, V., & Torr, P. (2023). A Systematic Survey of Prompt Engineering on Vision-Language Foundation Models (arXiv:2307.12980). arXiv. http://arxiv.org/abs/2307.12980
Heston, T. F., & Khun, C. (2023). Prompt Engineering in Medical Education. International Medical Education, 2(3), 198–205. https://doi.org/10.3390/ime2030019
Huang, L., Yu, W., Ma, W., Zhong, W., Feng, Z., Wang, H., Chen, Q., Peng, W., Feng, X., Qin, B., & Liu, T. (2023a). A Survey on Hallucination in Large Language Models: Principles, Taxonomy, Challenges, and Open Questions (arXiv:2311.05232). arXiv. http://arxiv.org/abs/2311.05232
Huo, S., Arabzadeh, N., & Clarke, C. L. A. (2023). Retrieving Supporting Evidence for Generative Question Answering. https://doi.org/10.1145/3624918.3625336
Islam, M. R., Liu, S., Wang, X., & Xu, G. (2020). Deep learning for misinformation detection on online social networks: A survey and new perspectives. Social Network Analysis and Mining, 10(1), 82. https://doi.org/10.1007/s13278-020-00696-x
Jha, S., Jha, S. K., Lincoln, P., Bastian, N. D., Velasquez, A., & Neema, S. (2023). Dehallucinating Large Language Models Using Formal Methods Guided Iterative Prompting. 2023 IEEE International Conference on Assured Autonomy (ICAA), 149– 152. https://doi.org/10.1109/ICAA58325.2023.00029
Ji, Z., Lee, N., Frieske, R., Yu, T., Su, D., Xu, Y., Ishii, E., Bang, Y. J., Madotto, A., & Fung, P. (2023). Survey of Hallucination in Natural Language Generation. ACM Computing Surveys, 55(12), 1– 38. https://doi.org/10.1145/3571730
Jones, E., Palangi, H., Simões, C., Chandrasekaran, V., Mukherjee, S., Mitra, A., Awadallah, A., & Kamar, E. (2023). Teaching Language Models to Hallucinate Less with Synthetic Tasks (arXiv:2310.06827). arXiv. http://arxiv.org/abs/2310.06827
Lei, D., Li, Y., Hu, M., Wang, M., Yun, V., Ching, E., & Kamal, E. (2023). Chain of Natural Language Inference for Reducing Large Language Model Ungrounded Hallucinations (arXiv:2310.03951). arXiv. http://arxiv.org/abs/2310.03951
Li, J., Cheng, X., Zhao, X., Nie, J.-Y., & Wen, J.-R. (2023). HaluEval: A Large-Scale Hallucination Evaluation Benchmark for Large Language Models. Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, 6449–6464. https://doi.org/10.18653/v1/2023.emnlp-main.397
Li, Y., Du, Y., Zhou, K., Wang, J., Zhao, W. X., & Wen, J.-R. (2023). Evaluating Object Hallucination in Large Vision-Language Models (arXiv:2305.10355). arXiv. http://arxiv.org/abs/2305.10355Liu, Y., Deng, G., Xu, Z., Li, Y., Zheng, Y., Zhang, Y., Zhao, L., Zhang, T., Wang, K., & Liu, Y. (2024). Jailbreaking ChatGPT via Prompt Engineering: An Empirical Study (arXiv:2305.13860). arXiv. http://arxiv.org/abs/2305.13860
Luo, J., Xiao, C., & Ma, F. (2023). Zero-Resource Hallucination Prevention for Large Language Models (arXiv:2309.02654). arXiv. http://arxiv.org/abs/2309.02654
Nie, F., Yao, J.-G., Wang, J., Pan, R., & Lin, C.-Y. (2019). A simple recipe towards reducing hallucination in neural surface realisation. Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, 2673–2679. https://aclanthology.org/P19-1256/
Patterson, J. L. (2013). Parsing of Natural Language Requirements [California Polytechnic State University]. https://doi.org/10.15368/theses.2013.227
Perzylo, A., Griffiths, S., Lafrenz, R., & Knoll, A. (2015). Generating grammars for natural language understanding from knowledge about actions and objects. 2015 IEEE International Conference on Robotics and Biomimetics (ROBIO), 2008– 2013. https://doi.org/10.1109/ROBIO.2015.7419068
Rawte, V., Chakraborty, S., Pathak, A., Sarkar, A., Tonmoy, S. M. T. I., Chadha, A., Sheth, A. P., & Das, A. (2023b). The Troubling Emergence of Hallucination in Large Language Models—An Extensive Definition, Quantification, and Prescriptive Remediations (arXiv:2310.04988). arXiv. http://arxiv.org/abs/2310.04988
Rawte, V., Sheth, A., & Das, A. (2023a). A Survey of Hallucination in Large Foundation Models (arXiv:2309.05922). arXiv. http://arxiv.org/abs/2309.05922
Sartori, G., & Orrù, G. (2023). Language models and psychological sciences. Frontiers in Psychology, 14. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10629494/
Semnani, S. J., Yao, V. Z., Zhang, H. C., & Lam, M. S. (n.d.). WikiChat: Combating Hallucination of Large Language Models by Few-Shot Grounding on Wikipedia.
Ssanyu, J., Bainomugisha, E., & Kanagwa, B. (2021). PAMOJA: A component framework for grammar-aware engineering. Science of Computer Programming, 211, 102703. https://doi.org/10.1016/j.scico.2021.102703
Sun, W., Shi, Z., Gao, S., Ren, P., de Rijke, M., & Ren, Z. (2023). Contrastive learning reduces hallucination in conversations. Proceedings of the AAAI Conference on Artificial Intelligence, 37(11), 13618–13626. https://ojs.aaai.org/index.php/AAAI/article/view/26596
Wang, J., Shi, E., Yu, S., Wu, Z., Ma, C., Dai, H., Yang, Q., Kang, Y., Wu, J., Hu, H., Yue, C., Zhang, H., Liu, Y., Pan, Y., Liu, Z., Sun, L., Li, X., Ge, B., Jiang, X., … Zhang, S. (2024). Prompt Engineering for Healthcare: Methodologies and Applications (arXiv:2304.14670). arXiv. http://arxiv.org/abs/2304.14670
White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., & Schmidt, D. C. (2023). A Prompt Pattern Catalog to Enhance Prompt Engineering with ChatGPT (arXiv:2302.11382). arXiv. http://arxiv.org/abs/2302.11382
Zhan, X., Xu, Y., & Sarkadi, S. (2023). Deceptive AI Ecosystems: The Case of ChatGPT. Proceedings of the 5th International Conference on Conversational User Interfaces, 1– 6. https://doi.org/10.1145/3571884.3603754
Zhang, Y., Li, Y., Cui, L., Cai, D., Liu, L., Fu, T., Huang, X., Zhao, E., Zhang, Y., Chen, Y., Wang, L., Luu, A. T., Bi, W., Shi, F., & Shi, S. (2023). Siren’s Song in the AI Ocean: A Survey on Hallucination in Large Language Models (arXiv:2309.01219). arXiv. http://arxiv.org/abs/2309.01219
Zhou, Z., Li, L., Chen, X., & Li, A. (2023). Mini-Giants: “Small” Language Models and Open Source Win-Win (arXiv:2307.08189). arXiv. http://arxiv.org/abs/2307.08189
Copyright (c) 2024 Tibakanya Joseph, Male Henry Keneth

This work is licensed under a Creative Commons Attribution 4.0 International License.