As an official language within the international maritime community, maritime English is one of the branches of English for Specific Purposes (ESP). However, corpus linguists have paid little attention to maritime English. This thesis has two aims. The first aim is to compile a four million word maritime English corpus (MEC) consisting of academy, news, laws, and textbooks. The MEC contains tagged multi-word compounds, which can be called specific purpose terms in maritime English. Tagging multi-word compounds is essential for the ESP study because maritime vocabulary includes a great variety of n-grams such as ballast water, fore peak bulkhead, container freight station charges, etc. The second aim is to provide a further explanation of corpus linguistic data, adopting language network analysis and comparing keyword networks with collocation networks.
My idea converging on corpus linguistics and language networks has been originally traced back to researches published by Jones in 1971 and Scott and Tribble in 2006. Jones discussed four types of links between keyword nodes such as strings, stars, cliques, and clumps in her keyword retrieval study. Based on Jones’ work, Scott and Tribble hypothesized that keywords could be redrawn as a network of connections to show a picture of understanding about a text or texts. By incorporating corpus linguistics and language networks, this thesis tries to explore what the structures of keywords networks and collocation networks can tell us about maritime English through centrality and cohesion algorithms.
This thesis makes an attempt to answer the following two research questions. First, how can we build a corpus of maritime English to represent specific purpose terms such as multi-word compounds? Second, if language network analysis can be one of the explanatory analyses to make up for the present corpus linguistic descriptions, what can keyword networks and collocation networks tell us about the MEC? In pursuit of my research questions, I review previous studies about the concepts of keyness, collocations, and language networks. I then discuss how to compile the MEC focusing on representativeness, balance, size, and sampling, proposing a method of tagging English multi-word compounds. In addition, I propose a language network analysis in order to give a further explanatory power to the descriptions of maritime English. I compare keyword networks with collocation networks with regard to network structures using centrality and cohesion for the better understanding of maritime English.
In conclusion, my network analysis and critical evaluation led us to clarify and confirm that centrality structures created by eigenvector and betweenness in collocation networks have more advantages over keyword network structures to find general purpose terms. On the other hand, the cohesion community structures created by eigenvector and betweenness in keyword networks distinguish a group of the specific purpose terms from a group of general purpose terms. More specifically, the eigenvector centrality structures in collocation networks represented better results than betweenness centrality in identifying general purpose terms. On the other hand, the eigenvector cohesion community structures in keyword networks represented better results than betweenness in identifying specific purpose terms.