Exploring the Synergy of Grammar-Aware Prompt Engineering and Formal Methods for Mitigating Hallucinations in LLMs

Tibakanya Joseph; Male Henry Keneth

doi:10.37284/eajit.7.1.2111

Tibakanya Joseph Makerere University
Male Henry Keneth Makerere University

DOI: https://doi.org/10.37284/eajit.7.1.2111

Keywords: Large Language Models (LLMs), LLM Hallucination, Grammar-Aware Prompt Engineering (GAPE), Formal Methods (FMs)

Share Article:

Abstract

Recent advancements in Artificial Intelligence (AI), particularly in the advanced machine learning for the Natural Language Processing (NLP) paradigm, have led to the development of powerful Large Language Models (LLMs) capable of impressive feats in tasks like translation, text summarisation, text generation and code generation. However, a critical challenge hindering their real-world deployment is their susceptibility to hallucinations, where they generate plausible looking but factually incorrect outputs. These limitations come with adverse effects, such as the propagation of misinformation and reducing user trustworthiness in the related technologies, even when they possess transformative potential in various sectors. This study aims to enhance the performance of LLMs by presenting a new strategy that combines grammar-aware prompt engineering (GAPE) and formal methods (FMs) to leverage their synergy in the LLM process logic. We argue that by combining linguistic principles using GAPE and constructing a basis of formal structures using FMs, we could improve the LLM's ability to analyse language, decrease ambiguity in prompts, improve consistency in output, and eventually, greatly diminish LLM hallucinations. To do this, we propose a collaboration between linguists and AI experts while also providing specialised training for LLMs that emphasises linguistic precision. Additionally, we suggest implementing iterative design and development procedures for LLMs that use GAPE and FM principles to continuously enhance the performance of LLMs. By following these techniques, we may create a future in which LLMs are more trustworthy for a wide range of users and use cases with reliable LLM technologies and efficient advancements in practical situations

Downloads

Download data is not yet available.

References

Abedi, M., Alshybani, I., Shahadat, M. R. B., & Murillo, M. (2023). Beyond Traditional Teaching: The Potential of Large Language Models and Chatbots in Graduate Engineering Education. Qeios. https://www.qeios.com/read/MD04B0

Ahmad, M. A., Yaramis, I., & Roy, T. D. (2023). Creating Trustworthy LLMs: Dealing with Hallucinations in Healthcare AI (arXiv:2311.01463). arXiv. http://arxiv.org/abs/2311.01463

Alberts, I. L., Mercolli, L., Pyka, T., Prenosil, G., Shi, K., Rominger, A., & Afshar-Oromieh, A. (2023). Large language models (LLM) and ChatGPT: What will the impact on nuclear medicine be? European Journal of Nuclear Medicine and Molecular Imaging, 50(6), 1549–1552. https://doi.org/10.1007/s00259-023-06172-w

Athaluri, S. A., Manthena, S. V., Kesapragada, V. S. R. K. M., Yarlagadda, V., Dave, T., Duddumpudi, R. T. S., Athaluri, S. A., Manthena, S. V., Kesapragada, V. S. R. K. M., Yarlagadda, V., Dave, T., & Duddumpudi, R. T. S. (2023). Exploring the Boundaries of Reality: Investigating the Phenomenon of Artificial Intelligence Hallucination in Scientific Writing Through ChatGPT References. Cureus, 15(4). https://doi.org/10.7759/cureus.37432

Athavale, A., Baier, J., Ross, E., & Fukaya, E. (2023). The potential of chatbots in chronic venous disease patient management. JVS-Vascular Insights, 1, 100019. https://doi.org/10.1016/j.jvsvi.2023.100019

Bang, Y., Cahyawijaya, S., Lee, N., Dai, W., Su, D., Wilie, B., Lovenia, H., Ji, Z., Yu, T., Chung, W., Do, Q. V., Xu, Y., & Fung, P. (2023). A Multitask, Multilingual, Multimodal Evaluation of ChatGPT on Reasoning, Hallucination, and Interactivity (arXiv:2302.04023). arXiv. http://arxiv.org/abs/2302.04023

Bommineni, V. L., Bhagwagar, S., Balcarcel, D., Davatzikos, C., & Boyer, D. (2023). Performance of ChatGPT on the MCAT: The road to personalized and equitable premedical learning. MedRxiv, 2023–03.

Bozkurt, A., & Sharma, R. C. (2023). Generative AI and prompt engineering: The art of whispering to let the genie out of the algorithmic world. Asian Journal of Distance Education, 18(2), i–vii.

de Wynter, A., Wang, X., Sokolov, A., Gu, Q., & Chen, S.-Q. (2023). An evaluation on large language model outputs: Discourse and memorization. Natural Language Processing Journal, 4, 100024. https://doi.org/10.1016/j.nlp.2023.100024

Dhuliawala, S., Komeili, M., Xu, J., Raileanu, R., Li, X., Celikyilmaz, A., & Weston, J. (2023). Chain-of-Verification Reduces Hallucination in Large Language Models (arXiv:2309.11495). arXiv. http://arxiv.org/abs/2309.11495

Dietze, S., Jabeen, H., Kallmeyer, L., & Linzbach, S. (n.d.). Towards syntax-aware pretraining and prompt engineering for knowledge retrieval from large language models.

Dziri, N., Milton, S., Yu, M., Zaiane, O., & Reddy, S. (2022). On the Origin of Hallucinations in Conversational Models: Is it the Datasets or the Models? Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 5271– 5285. https://doi.org/10.18653/v1/2022.naacl-main.387

Ekin, S. (2023). Prompt Engineering For ChatGPT: A Quick Guide To Techniques, Tips, And Best Practices. https://doi.org/10.36227/techrxiv.22683919.v2

Emsley, R. (2023). ChatGPT: These are not hallucinations – they’re fabrications and falsifications. Schizophrenia, 9(1), 52, s41537- 023- 00379– 4. https://doi.org/10.1038/s41537-023-00379-4

Franceschelli, G., & Musolesi, M. (2023). On the creativity of large language models. arXiv Preprint arXiv:2304.00008. https://arxiv.org/abs/2304.00008

Giray, L. (2023). Authors should be held responsible for artificial intelligence hallucinations and mistakes in their papers. Journal of the Practice of Cardiovascular Sciences, 9(2), 161. https://doi.org/10.4103/jpcs.jpcs_45_23

Gu, J., Han, Z., Chen, S., Beirami, A., He, B., Zhang, G., Liao, R., Qin, Y., Tresp, V., & Torr, P. (2023). A Systematic Survey of Prompt Engineering on Vision-Language Foundation Models (arXiv:2307.12980). arXiv. http://arxiv.org/abs/2307.12980

Heston, T. F., & Khun, C. (2023). Prompt Engineering in Medical Education. International Medical Education, 2(3), 198–205. https://doi.org/10.3390/ime2030019

Huang, L., Yu, W., Ma, W., Zhong, W., Feng, Z., Wang, H., Chen, Q., Peng, W., Feng, X., Qin, B., & Liu, T. (2023a). A Survey on Hallucination in Large Language Models: Principles, Taxonomy, Challenges, and Open Questions (arXiv:2311.05232). arXiv. http://arxiv.org/abs/2311.05232

Huo, S., Arabzadeh, N., & Clarke, C. L. A. (2023). Retrieving Supporting Evidence for Generative Question Answering. https://doi.org/10.1145/3624918.3625336

Islam, M. R., Liu, S., Wang, X., & Xu, G. (2020). Deep learning for misinformation detection on online social networks: A survey and new perspectives. Social Network Analysis and Mining, 10(1), 82. https://doi.org/10.1007/s13278-020-00696-x

Jha, S., Jha, S. K., Lincoln, P., Bastian, N. D., Velasquez, A., & Neema, S. (2023). Dehallucinating Large Language Models Using Formal Methods Guided Iterative Prompting. 2023 IEEE International Conference on Assured Autonomy (ICAA), 149– 152. https://doi.org/10.1109/ICAA58325.2023.00029

Ji, Z., Lee, N., Frieske, R., Yu, T., Su, D., Xu, Y., Ishii, E., Bang, Y. J., Madotto, A., & Fung, P. (2023). Survey of Hallucination in Natural Language Generation. ACM Computing Surveys, 55(12), 1– 38. https://doi.org/10.1145/3571730

Jones, E., Palangi, H., Simões, C., Chandrasekaran, V., Mukherjee, S., Mitra, A., Awadallah, A., & Kamar, E. (2023). Teaching Language Models to Hallucinate Less with Synthetic Tasks (arXiv:2310.06827). arXiv. http://arxiv.org/abs/2310.06827

Lei, D., Li, Y., Hu, M., Wang, M., Yun, V., Ching, E., & Kamal, E. (2023). Chain of Natural Language Inference for Reducing Large Language Model Ungrounded Hallucinations (arXiv:2310.03951). arXiv. http://arxiv.org/abs/2310.03951

Li, J., Cheng, X., Zhao, X., Nie, J.-Y., & Wen, J.-R. (2023). HaluEval: A Large-Scale Hallucination Evaluation Benchmark for Large Language Models. Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, 6449–6464. https://doi.org/10.18653/v1/2023.emnlp-main.397

Li, Y., Du, Y., Zhou, K., Wang, J., Zhao, W. X., & Wen, J.-R. (2023). Evaluating Object Hallucination in Large Vision-Language Models (arXiv:2305.10355). arXiv. http://arxiv.org/abs/2305.10355Liu, Y., Deng, G., Xu, Z., Li, Y., Zheng, Y., Zhang, Y., Zhao, L., Zhang, T., Wang, K., & Liu, Y. (2024). Jailbreaking ChatGPT via Prompt Engineering: An Empirical Study (arXiv:2305.13860). arXiv. http://arxiv.org/abs/2305.13860

Luo, J., Xiao, C., & Ma, F. (2023). Zero-Resource Hallucination Prevention for Large Language Models (arXiv:2309.02654). arXiv. http://arxiv.org/abs/2309.02654

Nie, F., Yao, J.-G., Wang, J., Pan, R., & Lin, C.-Y. (2019). A simple recipe towards reducing hallucination in neural surface realisation. Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, 2673–2679. https://aclanthology.org/P19-1256/

Patterson, J. L. (2013). Parsing of Natural Language Requirements [California Polytechnic State University]. https://doi.org/10.15368/theses.2013.227

Perzylo, A., Griffiths, S., Lafrenz, R., & Knoll, A. (2015). Generating grammars for natural language understanding from knowledge about actions and objects. 2015 IEEE International Conference on Robotics and Biomimetics (ROBIO), 2008– 2013. https://doi.org/10.1109/ROBIO.2015.7419068

Rawte, V., Chakraborty, S., Pathak, A., Sarkar, A., Tonmoy, S. M. T. I., Chadha, A., Sheth, A. P., & Das, A. (2023b). The Troubling Emergence of Hallucination in Large Language Models—An Extensive Definition, Quantification, and Prescriptive Remediations (arXiv:2310.04988). arXiv. http://arxiv.org/abs/2310.04988

Rawte, V., Sheth, A., & Das, A. (2023a). A Survey of Hallucination in Large Foundation Models (arXiv:2309.05922). arXiv. http://arxiv.org/abs/2309.05922

Sartori, G., & Orrù, G. (2023). Language models and psychological sciences. Frontiers in Psychology, 14. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10629494/

Semnani, S. J., Yao, V. Z., Zhang, H. C., & Lam, M. S. (n.d.). WikiChat: Combating Hallucination of Large Language Models by Few-Shot Grounding on Wikipedia.

Ssanyu, J., Bainomugisha, E., & Kanagwa, B. (2021). PAMOJA: A component framework for grammar-aware engineering. Science of Computer Programming, 211, 102703. https://doi.org/10.1016/j.scico.2021.102703

Sun, W., Shi, Z., Gao, S., Ren, P., de Rijke, M., & Ren, Z. (2023). Contrastive learning reduces hallucination in conversations. Proceedings of the AAAI Conference on Artificial Intelligence, 37(11), 13618–13626. https://ojs.aaai.org/index.php/AAAI/article/view/26596

Wang, J., Shi, E., Yu, S., Wu, Z., Ma, C., Dai, H., Yang, Q., Kang, Y., Wu, J., Hu, H., Yue, C., Zhang, H., Liu, Y., Pan, Y., Liu, Z., Sun, L., Li, X., Ge, B., Jiang, X., … Zhang, S. (2024). Prompt Engineering for Healthcare: Methodologies and Applications (arXiv:2304.14670). arXiv. http://arxiv.org/abs/2304.14670

White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., & Schmidt, D. C. (2023). A Prompt Pattern Catalog to Enhance Prompt Engineering with ChatGPT (arXiv:2302.11382). arXiv. http://arxiv.org/abs/2302.11382

Zhan, X., Xu, Y., & Sarkadi, S. (2023). Deceptive AI Ecosystems: The Case of ChatGPT. Proceedings of the 5th International Conference on Conversational User Interfaces, 1– 6. https://doi.org/10.1145/3571884.3603754

Zhang, Y., Li, Y., Cui, L., Cai, D., Liu, L., Fu, T., Huang, X., Zhao, E., Zhang, Y., Chen, Y., Wang, L., Luu, A. T., Bi, W., Shi, F., & Shi, S. (2023). Siren’s Song in the AI Ocean: A Survey on Hallucination in Large Language Models (arXiv:2309.01219). arXiv. http://arxiv.org/abs/2309.01219

Zhou, Z., Li, L., Chen, X., & Li, A. (2023). Mini-Giants: “Small” Language Models and Open Source Win-Win (arXiv:2307.08189). arXiv. http://arxiv.org/abs/2307.08189