Automated Extraction of Collocations from Intermediate English Textbooks

A Text Mining Study


  • Zafar Ullah PhD Scholar NUML, Islamabad
  • Dr. Arshad Mehmood Professor/Director Publications, NUML, Islamabad
  • Muhammad Uzair Dean Faculty of Arts and Humanities, NUML, Islamabad


collocations, text mining, ESL textbooks, Phrases tool, educational


Many intermediate level students write wrong collocations because learners do not find standard collocations in their textbooks; hence, they are unable to extract and learn standard collocations which play their roles in fluency and language accuracy. Another problem is that learners, teachers and book writers do not have programming skills to extract collocations. This study aims to extract standard collocations from the generated corpus of intermediate English textbooks taught in Punjab, Pakistan. With the application of Knowledge Discovery Theory and Phrases tool, this study prepares a corpus of 82,487 words. Then meaningful standard collocations have been selected for educational purpose. The current research extracts 166 standard collocational patterns and 297 standard collocational examples belonging to 18 grammatical categories. It becomes a self-made mini collocational dictionary, and this study empowers language learners to generate such mini collocational dictionaries of ESL textbooks with Phrases tool. This study is potentially valuable for intermediate-level ESL students, teachers and textbook writers. Following this study, the learners will decrease collocational errors in their academic discourse and exams. The novelty of this study is evident that first-time, this corpus, and its collocations have been extracted from the research documents of intermediate English textbooks taught in Punjab, Pakistan.