2011-06-21

Meeting the Challenge: Edmond Kachale on Chichewa

Chichewa is a Bantu language spoken in four countries in southern Africa; namely Malawi, Mozambique, Zambia, and Zimbabwe. It is widely spoken in Malawi where at one point it had official status as a national language. Currently, it is no longer a national language, but it is a popular lingua franca with 60% of the population having full command of the language and 75% able to understand it. In Zambia, it is the third most popular language, spoken mostly in the Eastern Province and in Lusaka, and it is among the seven official African languages.  In Malawi, the language is taught in primary and secondary schools but never at tertiary levels, except for those specialising in African languages, where students learn the structure of the language (through the medium of English). 

Chichewa, like other Bantu languages, has a very complicated verbal system; a single verb root can take on thousands or even millions of different forms, depending on such things as tense, agreement with subject and object nouns, and a number of other features.   This makes even simple computer applications like spell checkers exceedingly challenging.  For speakers of similarly-complicated languages, one of the takeaway messages from the interview below is that tackling these challenges is possible, and even one person can make great strides.

Edmond Kachale is a software developer by profession, based in Malawi, and has been involved in developing natural language processing tools for Chichewa for several years now.   When I began working on Chichewa in around 2004 there were virtually no resources for the language, not even a good electronic word list.  Now, thanks to Edmond's efforts, there are some advanced resources such as a part-of-speech tagger (ChicPOS), tools for morphological analysis and generation (ChicMorph, AffixGen), and a program for visualizing phrase structure (ChiVisualize).  He has also been involved in translating important software packages and web sites into Chichewa, including the Google search interface.  Edmond tweets in Chichewa and English as @ceekays, and also writes the blog Edmond pa Kanjedza, where you'll find descriptions and screenshots of some of the software just mentioned. 

KPS: Chichewa has an interesting history in terms of its status as an official language in Malawi.   Please tell us a bit about that.

Edmond Kachale
EK: Indeed, Chichewa was previously declared by the late Hastings Kamuzu Banda, the First Head of State and Government, as the national language because the president himself was also a Chewa. But with the coming in of the democracy and the change of government administration from Kamuzu Banda to Bakili Muluzi, Chichewa lost its status as a national language on grounds that it was discriminatory to other indigenous languages. It became one of the official lingua francas in Malawi, together with Chitumbuka, Chiyao, Chilhomwe and Chisena. Even the current Malawian Constitution highlights this. Of course, Chichewa remains the only indigenous language that is taught in schools, though some linguists and educationalists are fighting for the other indigenous languages outlined above to be taught in (primary) schools too. I have been following two policies that are addressing the issue of indigenous languages in schools: 'Language in Education' Policy (LIE) and 'Language Across the Curriculum' Policy (LAC). The first policy seeks to guide on which language should used for what subject matter or which language should be taught as a subject. On the other hand, LAC advocates for use of a familiar language as medium of instruction in [primary] schools. Unfortunately, none have been adopted yet.

The then Chichewa Board was dissolved and replaced by the Centre of Language Studies (CLS). CLS was established to reflect the change in the status of Chichewa language as a national language. It was mandated to conduct research mainly on indigenous languages. Of course, right now the Chichewa Board has made a comeback and is sometimes referred by a new name, Chichewa Heritage Foundation. In addition, it has realigned its duties to focus on promotion and preservation of the other unifying cultural factors of the Chewa than the language itself. They focus on dances, Kulamba Ceremony (Paying Homage to the Paramount Chief, Kalonga Gawa Undi (implying His Lordship Gawa Undi) currently resident in Zambia) and other customs. Interestingly, the foundation/board pays less attention to the issue of language development itself. 

KPS: Chichewa (or "Chicheŵa"!) is written with just one diacritical mark, the w-with-circumflex (ŵ), but as I understand it, there has been some disagreement over its use in spelling.  What is the history of this letter and where do things stand now?

EK: Kamuzu Banda was said to be an authority (encyclopedia) of Chichewa.  Even though he used to speak in English, when his [Chichewa] interpreter would make a mistake he would correct him right away.  He was given an honorary Professorship of Chichewa by the University of Malawi for setting most of the rules of Chichewa grammar, including emphasizing the use of the bilabial affricative ŵ.[Around the time of his departure from power,] ŵ was removed from the writing/alphabet system.  Currently, it is still fighting for its way in with the "linguistic courts" (i.e. linguistic forums). It is also trying to enter via the back door into the Chitumbuka orthographical system. I have also observed that primary school books written in Chichewa still have ŵ, while Chichewa language and grammar books used in secondary/high schools do not have it. A grammatical confusion!!
 
KPS: What opportunities are there to use the language online?  Is internet connectivity or access to computers an issue for your community? How about translations of software and websites?

EK: Like other Bantu languages, usage of Chichewa in computing technologies is very marginal. However, there are a number of opportunities to use Chichewa online. Malawi has an agro-based type of economy. Most farmers live in remote areas and do not understand English, which is Malawi's official business language. This is reason enough to have software (like word processors and spreadsheets) and online content in Chichewa.

Currently, there are a few websites with Chichewa articles. There are also a few Wikipedia entries on the Chichewa/Chinyanja portal. In addition, the language is used on Google. There are four homepages that are enjoying this: Google Malawi, Google Mozambique, Google Zambia and Google Zimbabwe.

There is also one website that has a bilingual online dictionary. It offers a Chichewa to English (and vice versa) dictionary. It also provides other Chichewa resources though at a limited scale, but the online dictionary is their great artwork. These resources are available to paid registered members only.

Access to computers is a very big issue in my society. Close to 60% of the society are poor and cannot afford buying computers. In addition, most of these people live in rural areas where issues of computer access, internet connectivity and power supply are big problems. Thus, the computer and the internet have not been fully embraced in my society.

Malawian girls playing netball
There are efforts to develop localised software in Chichewa. Currently, there are spell-checking plugins for OpenOffice.org and Mozilla Firefox, though the latter has not yet found its way to the official Mozilla Addons page yet. There were once some talks to localise the OpenOffice.org Office package and the Mozilla Firefox web browser, and they were backed up by Government authorities, but now they have proved futile as there has been no progress at all. However there are still people who are interested in furthering this initiative. There was also another intervention to develop an online Bantu dictionary by one of the local language authorities, but the idea died in its premature stages.

KPS: Many speakers of indigenous and minority languages are reluctant to use their languages online, because of difficulties with keyboard input, or because they don't know terminology for talking about computing, or simply because they learned computing in a language like English or French.  Are any of these issues relevant for speakers of your language?  What is the general attitude toward using the language online?   

EK: I think I can agree that these sentiments are very common among most less popular languages like ours.  Indeed most Malawians feel ashamed to associate themselves with the language. A question of the usage of a language in Malawi is determined by factors such as status and attitudes towards what language is indigenous vis-à-vis what language provides more economic opportunities. Generally, local languages are associated with illiteracy and poverty. As such, there are a few people that are often comfortable to express themselves in the vernacular. I remember having outlined similar arguments in one paper that was published online by OSISA.
Lake Malawi

Of course, in addition to these arguments I have heard most people saying Chichewa has a complex writing system, Chichewa has less scientific terminologies and other arguments as you have rightly outlined them. But to the contrary too, people have often times blamed me (among other frequent users of Chichewa online) for using deep words in expressing myself, especially when it comes to computing  terms. So I do not understand where the issue of “Chichewa being shallow” comes from.

KPS: I mentioned above that many indigenous languages lack computing terminology.  Is this an issue for your language?  How is/was terminology developed?

EK: Yes, for sure! Our language is another victim of “lack of computing terminology syndrome”. I remember one linguist, Prof. Pascal Kishindo, also observing the same that scientific and technological terminology in Chichewa in a disordered state. Currently, the media and other corporate stakeholders are left on their own to deal with the plethora of new foreign scientific terminology. The media (TV, radios and newspapers) sometimes mislead people with wrong spellings and meaningless terms, often they are English-based paraphrased loanwords like "kompuyta" for computer, and "pulinta/printa" for printer.

Frankly, I am not a fan of official standardized terminologies. I have often felt that standardization limits language enhancement through development of terminological synonyms. Thus, standardized terminologies limit communities from developing new words for the same term. I have always believed that the communities should be left at liberty to develop terms on their own,  thereby enriching the language database. This is how words like "email", "laptop" and "netizen" found their way into the English vocabulary. What we need is a body that will only be collecting and documenting such terms, and disseminate them by publishing new dictionaries or public gazettes at least annually.

Malawian children playing
Despite the current state of affairs, I am not happy with the way the media deal with new terms. They are lazy in generating pure Chichewa terminologies for the fast growing technology usage. Now, it is becoming a challenge for Chichewa speakers to manage the constantly rising communicative complexity induced by these paraphrased loanwords in their communities. Paraphrased loanwords do not make use of the inherent terminology “generators” of the language. In addition, these loanwords often break grammatical rules. For example, in Chichewa the word printa (printer) is wrong in several aspects,  some of which include the following: (i) the consonant combination pr is non-existent, (ii) the plural form maprinta breaks the formal classification structure of nouns in not only Chichewa but also the entire Bantu noun classification system, as it traverses from a singular of class 9 to form a plural of class 6 instead of class 10, which is ungrammatical and semantically senseless!

Of course, there is the Chichewa Board but it is more interested in promoting and preserving cultural customs and traditions of the Chewa tribe than in the development of the language itself. In addition, there is the Centre for Language Studies, an academic research centre in the University of Malawi, which deals very much with issues of language development, from standardization of writing system (orthography) through to development of terminologies. Chichewa has been one of the language receiving more attention at the Centre to extent that they were able to produce Mtanthauzira-Mawu, a monolingual Chichewa dictionary. However, of late the Centre has been receiving less attention due to funding issues and “technical” restructuring within the University.

KPS: Are there other special challenges your community faces in terms of developing technology for the language and/or communicating online?

EK: There are several challenges. Issues of dialect differences are a common denominator, I think, across most indigenous languages. These are sometimes accelerated by prejudice and exasperation from tribes that claim to ancestrally own the indigenous languages. Within Malawi, people from various regions speak differing dialects with differences in spelling systems. The situation is worse when one crosses Malawian borders to Mozambique, Zambia or Zimbabwe where the Chichewa that is spoken there is completely divergent from the standard one in both grammatical and semantical structures.
Tea plantations

National politics and “cold tribal wars” also play a part in impinging development of Chichewa. When one starts issues of development or promotion of indigenous languages in Malawi, they are likely to be unfruitful as they end up in emotional damages deep-cutting into some sort of tribal competitions over language dominance and inborn prejudice over other indigenous languages. This is a very big problem even in Zambia to the extent that the government there had to declare that English is the national language as well as official mode of communication.

Another challenge is that there is less interest from foreign investors in ICT projects for the language itself. In contrast to the situation with other popular African languages like Kiswahili and Zulu, there is less tendency to assume that indigenous Malawian languages will be used on some level, especially that which is economically-oriented. I have tried contacting Microsoft on the possibilities of developing localised systems, but it is almost two years since I wrote them; all I got was “I have forwarded your request to right authorities”.  I have never heard of any efforts from Apple on development of technology for Chichewa. With John Duffel, a friend of mine, I once proposed to Facebook to add Chichewa to the list of translations to allow us localise facebook.com, just as other friends have done. But our efforts have led us nowhere as they have not responded yet since our proposal two years ago.

However, I should commend Google for its positive intervention. We are now proud that we have a Chichewa version of Google Web Search as explained above. In addition, there are talks on extending the localization project to other significant applications like Gmail. Using Google Technology User Group (GTUGs) set up in Malawi, Google is also working with local developers in trying to promote development of applications using its API that will have local usage.

KPS: Are young people using the language online?  Do you think social media sites like facebook and twitter are helping encourage language use by younger speakers?

EK: Yes, to some extent people are using Chichewa online, especially the youth. In addition, as I alluded to earlier, there are a few websites that publish articles in Chichewa. Social networks also encourage usage of the language online. Of course, language use on social forums has led to excessive growth of code-switching between English and Chichewa phrases, leading into the development a new Internet language altogether. For example, someone may say "Ndikupanga apudeti pa Fesibuku" to mean "I will update you on Facebook" or they can also say "Ndinayesa kuchigugula" to mean "I tried to google for it".

Mulanje Mountain, highest mountain in Malawi
KPS: What is your vision for your language in ten years, both in general terms and in terms of software/online use?   

EK: This question a bit difficult to answer affirmatively because there are so many factors that can affect the usage of the language in ICT.  But in general, if the situation on the ground does not change, ten years will come like a tick of a second. The Centre for Language Studies needs a special intervention and the Chichewa Board needs a different approach for the language to develop and have rich resources. I hope we need a multi-sectoral approach; both the government and private investors should take the issue of  language use and development earnestly.  Especially the government's intervention is very significant. I observed that due to political influence on natural development in Africa, some issues like those that concern language development cannot move without government intervention.

In terms of online content, I think we can do better than what we already have. Currently, much web content with intended local application, even concerning Malawians themselves and originating from Malawi, tend to use languages understood internationally, especially English. In addition, content on the Internet should also incorporate the subject matter that reflects the culture and the needs of the Malawian nation.

From the software perspective, there is a need for many Malawian developers to start working on localised systems. Otherwise we are way far behind, and if this continues like this, we will just be watching as the technological era passes us by. I have always believed that localization is another way of preserving and enhancing language. If we can take local content (such as games and systems) online, and spicing them with localised content, that will be a great stride. With high illiteracy levels, we may also want to take advantage of other forms of technology to preserve our ideas. For example, working on animations for most of the popular folk tales will do us more good, making technology more exciting and appropriate to our community. Of course, I also recognise that some few local developers have already started but we need more and more developers to join the bandwagon.

2011-06-07

Tír gan teanga, tír gan anam: Keola Donaghy on the Hawaiian language

    Hawaiian is on the long list of languages I've been trying to learn, going back to my first visit to the islands in 1996.  Many years ago I started work on a spell checker for the language, using data gathered by my web crawler which finds all web pages written in Hawaiian (and many other languages), and generates lists of words from those pages to be edited.  In scanning web-crawled word lists, it's not unusual to encounter the occasional word in English, but I was pretty surprised, while editing Hawaiian word lists in 2004, to come across the word "bodhrán", an Irish word for a kind of drum used in traditional Irish music.  Investigating further, the word appeared in a blog by someone named Keola Donaghy, a Hawaiian speaker who had travelled to Glen Colm Cille in Ireland to learn the Irish language for a summer in 2002.   I wrote to Keola and he immediately began helping with the spell checker project, and has provided much-needed language expertise on several other projects over the years, for example by testing Hawaiian support for the accentuate.us Firefox add-on, and more recently by translating the Hawaiian Indigenous Tweets page.  For indigenous languages with small speaker populations, there tends to be just one "go-to" person who is involved in just about every technology project; Keola is that person for Hawaiian.  He's never said "no" any time I've asked for help over the last seven years.

Keola Donaghy
Keola is now Assistant Professor of Hawaiian Studies at the University of Hawai‘i at Hilo.  He is also a composer and musician, and an active member of the music scene in Hawai‘i.   He tweets in English and Hawaiian as @keoladonaghy.

The title of this post, "Tír gan teanga, tír gan anam" is a famous Irish saying that Keola uses in his email signature (and which could be the motto for the whole Indigenous Tweets project); it means "Country without a language, country without a soul"!

KPS: Could you tell us a bit about the current state of the language?  How many speakers are there, and how many children are learning the language?

KD: Estimates vary, depending on what degree of fluency is being considered. I believe that no more that 10,000 are conversant, that is, could function in Hawaiian all day if necessary. The number is probably lower. While the numbers have increased during the past 20 years, I believe it has reached a plateau and significant effort will need to be extended to again begin growing. The language is officially recognized by our State in its constitution, however, in everyday life it is still not afforded the same level of support as some immigrant languages. Hawaiian is taught in immersion schools K-12, and Ka Haka ‘Ula o Ke‘elikōlani at UH-Hilo uses Hawaiian exclusively in all undergraduate and many graduate level courses. Many high schools in the state teach Hawaiian as an elective, but it is not a required subject in any of them. There are approximately 2,000 students in K-12 Hawaiian medium education. Most native speakers are either elderly or residents of the island of Ni‘ihau. The total of all of these number around 500-600. Nearly all language instruction in the state outside of the Ni‘ihau community is done by non-native speakers, such as myself.

KPS: What opportunities are there to use Hawaiian online, in terms of hardware and software support, translated software, web sites, etc.?

KD: Hawaiian is supported by Mac OS X. There is a Hawaiian keyboard, localized date and time strings as well as Hawaiian sorting in the system. We've translated many programs into Hawaiian, including an integrated communication system (email, discussion forums, chat rooms, file transfers) called Leokī, which is based on the FirstClass system. We've translated the entire interface and all communication system is done in Hawaiian. The system has been used for nearly 18 years. We have online dictionaries, spell checkers, and a vast digital repository of Hawaiian language on Ulukau.

Since Hawaiian is based on the Latin alphabet with some diacritics, it is well supported in Unicode. Hawaiian works on a variety of social networking sites like Facebook, Twitter, and many open source programs like Moodle, WordPress, Drupal, Joomla and others require only minor tweaks to their CSS files to be able handle the language.

We would like to have official support for Hawaiian in Windows like we do in Macintosh and iOS, and hope that it will happen someday.  In the interim, we offer a Hawaiian keyboard that people can download and install for free. The iPhone, iPod, and iPad all have native support for Hawaiian. We now have a free Hawaiian keyboard for the Android operating system that users can download and install.

KPS: The Google search interface has been available in Hawaiian for some time now.  Many other language groups are interested in taking this step as well - can you tell us how you started on this, and what was entailed in completing the translation?

KD: I had tried for years to reach someone in the Google In Your Language [GIYL] program about localizing the search interface. I thought it would be very significant symbolically for us, and perhaps get our foot in the door with doing further work with Google. Finally in late 2008 I heard that Google had done a Māori language version. Since I know most of the Māori folks involved in technology issues for the language, I made some inquiries, and found out my friend Te Taka Keegan was responsible. Not only that, but he was about to do a 6 month post-doc at Google to help with localization issue. I contacted him, he put me in touch with the right person at Google. Once they set up a Hawaiian link for GIYL, it was quite painless to do the translation–it was entirely web based, showed the English text, provided a block for submitting the Hawaiian, and provided the context of the word or sentence to be translated. It took me about 6 months of on-and-off work – mostly in my spare time – to do. Maintenance is likewise a breeze – when new strings need translation or older ones have changed, they appear in the translation console, and are submitted. The changes do not get committed to the Google search page immediately; I have to notify the coordinator and they do a rebuild of the interface. I am a bit behind in doing updates, but hope to get caught up again this summer.

Keola in Glen Colm Cille, Co. Donegal, Ireland
KPS: What issues are there in terms of getting Hawaiian speakers to use the language online?  Have problems with keyboards and fonts been an issue?  What about computing terminology?
 
KD: I don't think lack of support for the diacritics have hampered the use of the language. I know many people who are using Hawaiian on Facebook without the diacritics, and receive many emails that don't either. Most are happy to be able to do so, and have when they've learned about he availability of tools. I can't tell you how many people I've spoken to how, despite our best efforts to make it known, were unaware that Mac OS has had a Hawaiian keyboard as a default since 2002. 

I don't believe people not knowing the terminology for technology has been an impediment. There hasn't been a whole lot of discourse about the technology, we want to use it in the same way that everyone else does, so it has many uses in many contexts. The technology itself is only a small part of it.

KPS: How is terminology developed?  Is there a "language board" the decides on terms and disseminates them to the community? 

KD: There is a lexicon committee that is coordinated by our College of Hawaiian Language. I have participated in this and have contributed many words. There are a variety of methods used: transliteration, translation, borrowing from other languages. All are considered. When a need for a new word comes up, we address it. Because of a backlog of words, we occasionally create words, start using them, and they are picked up. An example is ho‘olele hualono for "podcast".  Ho‘olele is an already established word for "broadcast".  Hua is "seed" or "pod", and lono is to "hear" as opposed to "listen" (subtle difference).  I created a Hawaiian language podcast, coined the word with input from a committee member but no official approval. It's been widely accepted. 

KPS: Any other special challenges Hawaiian speakers face in terms of developing technology for the language?  

KD: Vendors have been very supportive, and I'm sure the fact that our work has gotten a lot of publicity doesn't hurt. Dialects and spelling system differences are non-issues. The most annoying tech issue is that few fonts have the ‘okina (glottal) in the Unicode location that we prefer. I've talked to both Microsoft and Apple about it, but realize that since fonts are developed by outside foundries, it will take a while.

KPS: Are young people using the language online?  Do you think social media sites like Facebook and Twitter are helping encourage language use by younger speakers?

KD: Absolutely. I have a number of friends and classmates of my daughter (20) on Facebook. All were Hawaiian immersion students. I've notice that while they occasionally use English between themselves, they always use Hawaiian with me. I think sometimes they also use it as a way to exclude non-Hawaiian speakers from knowing what they are talking about. They dynamics of their use and choices are interesting, and I'd love to research and study it sometime. 

KPS: What is your vision for your language in ten years, both in general terms and in terms of software/online use?

KD: Basically my vision is that Hawaiian use with technology be as easy as any other language. If people want to use a service or system, it's available to them in Hawaiian. But our pool of resources is very limited, so we must prioritize carefully and make sure we get the most bang for the buck/effort. Having a Hawaiian keyboard and our characters as core system-level supported elements is the first step. We're getting there with both desktop and mobile systems, but still have much more to go.

I would very much like to see localization systems mature to the point where we can have a single repository of translated strings that all of our projects could draw from, rather than starting from scratch with so many new things. I would love to see Hawaiian voice synthesis and voice recognition happen, but personally don't have the technical skills to do it myself.