On this page/Sur cette page... (hide)
This section provides an overview of the current state of localisation in Africa, focusing on recent and current activity, with some discussion of potential areas. It is complemented by information in the country and language profiles that make up the appendices (Section 12).80
This part gives a brief overview of aspects of use of African languages in computing and on the internet. A very general characterisation would be that African language computing and internet has been relatively slow in developing, due to a number of linguistic, educational, policy, and technical factors, some very basic as mentioned above (section 5.2).
However, it is important to keep in mind that computers and the internet, like formal educational systems a century earlier, have been introduced and disseminated as more or less monolingual media relying on one or another ELWC. This is a reflection of both the international dominance in software and internet content of these same languages inherited from colonisation, and the knowledge of these tongues by those people in Africa most likely to have access to the technology (generally elites in urban areas).
Some specific aspects of the evolution of use of African languages in ICT are dealt with below, but one particular problem for a number of languages that are written with modified letters or diacritic characters – or entire alphabets – beyond the basic Latin alphabet (the 26 letters used in English), or the ASCII character set (that alphabet plus basic symbols) is how computer systems and software handle these (see above, 6.2).
Although the earliest personal computer interfaces used the English language and the ASCII character set, the potential to use a rendition of other languages was certainly tried. Such use is hard to quantify but with advances in the capacities of systems to handle larger character sets and the elaboration of the internet, multilingual computing in Africa as elsewhere. The greater but still limited potential of 8-bit fonts (various terms such as ANSI, as previously mentioned, describe this) permitted development of fonts for more languages.
Over the years a number of workarounds have been observed in common use for dealing with African language text in situations where available fonts or font compatibility issues prevented use of the official orthography – notably in e-mail and on the web. A summary of approached to using African text in this environment, as adapted from Osborn (2001), as shown in Table 3:
These workarounds are still being used to one degree or another even as Unicode and UTF-8 – and their accommodation in newer applications – in principle permit use of larger (and complete) character sets. The substitute approaches, for instance, are especially noted on e-mail lists and discussion fora.81 What this seems to represent is that despite the advantages offered by Unicode, the potential is not yet a reality for a wide range of users and internet applications. This is sometimes due to lack of fonts (see below, 7.2.), but even when fonts are available, the lack of convenient systems for input then becomes an issue (see below, 7.3).
Web content in principle has the same issues but static presentations are able to make use of Unicode, even if it requires input of hex or decimal codes for non-ANSI characters in the HTML coding. The amount of web content in African languages is discussed below.
It is also worth remembering that the discussions of African language use in ICT repeat some of the same themes as discussions before the advent of computers about what kinds of orthographies, harmonisation of transcription, etc. These were the subject of study and expert meetings mentioned above (4.3). This context is often forgotten but really set the foundation for current efforts in many cases.
Also, and related to the previous point, the use of African languages on computers was preceded by discussions and propositions concerning their use with typewriters and in typesetting. Most of these are forgotten now but they encountered some of the same issues that are of concern now related to input.
E-mail was an obvious first step in use of the internet in Africa and for long the principle use.82 By its nature it is harder to track the contents but there is other information that can be utilised to get an indication of the use of African languages for this purpose. For instance, at one point there were two web-based e-mail services that provide for composition in several African languages: Africast.com and Mailafrica.net (though these both have since ceased to function).83 In addition, recent years have seen the setting up of a number of e-mail fora in which much or most of the traffic is in one or another African language. For instance there are several Hausa and Swahili email lists in which these, probably the most widely spoken indigenous tongues on the continent are the primary languages of communication, and Van der Veken and de Schryver (2003) found fora in Hausa, Somali, and Lingala.
African languages are represented on the web, but not prominently as media of communication. However the actual level of use is emerging as a topic of discussion. It is easy to get the impression that African language content is still rare and only gradually increasing. A look at the results of several surveys yields a fuller picture of the current status and evolution of African language web content.
The surveys can be grouped under four (4) headings:
A few years ago some simple informal surveys of web content by language that relied on search engines unsurprisingly did not find enough in any African language to rank them as high as some minority European languages with relatively few speakers.84
More focused surveys yield more interesting results. For instance, an informal survey done in Tanzania in 2001 as part of a larger report for the Swedish International Development Agency estimated that ten percent of websites with a Tanzanian focus had at least some Swahili content (Miller Esselaar Associates, 2001), but most of the sites did not have majority content in the language.
An extensive study by Diki-Kidiri and Edema (2003) involved searching, listing and counting websites. It found a significant number of sites that treat African languages in one way or another, but also showed that these generally have minimal content in the languages themselves. In effect, a large proportion of the sites they censused consisted of presentations about African languages, including online dictionaries and instructional pages.
Another approach taken by Van der Veken and de Schryver (2003), used a different search methodology and statistical extrapolation. By counting hits of particular words and estimating the larger number of words that that might indicate based on frequency of the searched terms in a typical text, they concluded that there may actually be significantly more African language web content than was commonly thought. However it is hard to determine from such estimates what kind of content they would imply.85
A current study undertaken by the Language Observatory seeks to more accurately evaluate diverse language content on the web. The project uses a web-crawler with certain analytical capabilities (Suzuki et al 2002).
Analyzing the character of the content in particular languages is of course more complex than estimating the presence of the languages on the web. Diki-Kidiri and Edema’s (ibid.) study seems to be the most revealing in this regard.
Another study, which did not look specifically at Africa or language, done by Ballantyne (2002) offered a schema for categorising content in terms of its origins and audience. To this one might add a third dimension of subject of the content. Such a schema would be very useful in understanding the nature of the content by looking at where it was coming from and who the intended or anticipated audience would be. For instance the large percentage of sites with African language content being descriptive reflects the dominance of non-Africans on the web, who may be interested in learning or knowing more about the languages. By looking at content in this way, it also facilitates understanding of who is localising content for whom and how their work can best be facilitated. Furthermore, by looking at what is not done in terms of localisation, one can use this schema to analyse the reasons for that and what might be needed to achieve better results.
Web content about African languages deserves special comment since it is fairly prominent. This is a broad category that includes a range of presentations varying in quality, from the very informal and sometimes incomplete, to the very well thought out and sometimes ambitious projects. In general this category reflects the fact that the potential audience for African language topics has been predominately people with either little or no knowledge of the language on one hand, or knowledge of the language without a great deal of familiarity with its written form on the other hand (the latter including the people mentioned above [4.4] who are "not literate" in their first language [L1]).
The higher end of this category, if one might put it that way, includes online dictionaries, of which the Kamusi Online Living Swahili Dictionary86 deserves special note, online descriptions of academic use such as the Hausa site at UCLA, and some efforts to use the web as an instructional tool. The latter in turn can be considered by its audience: second (or additional) language learners generally outside of Africa and with little knowledge of the language (for instance at a university), children of expatriate Africans (this is sometimes called "heritage language" education), and L1 literacy for Africans within Africa, including L1 literacy for the otherwise literate or the illiterate (such as the ALI project in Cameroon several years ago).87
In addition to meeting certain needs and raising the profile of African languages in general, such content approaches in principle also enhance the environment for other kinds of localisation.
Weblogs, or blogs, are becoming increasingly widespread around the world including in Africa. There are already several blogs in African languages. Blogging is a relatively easy way for individuals to produce text content in any language, given that there are free sites offering space to anyone who wishes to start a blog. As long as blogs remain a significant feature in cyberspace we should expect to see more – and facilitate more – content in diverse African languages.
Wikipedia is an online encyclopaedia that is expressedly multilingual in its approach. There are almost 40 Wikipedia editions begun in African languages, of which Arabic, Afrikaans and Swahili are the most represented. Many however have very little content. Following discussions on how to facilitate growth of these and other African language editions of Wikipedia at the conference in 2006, an effort to coordinate work was launched.88
Fonts, as an aspect of localisation in Africa, have been a particular issue in countries and for languages that use extended Latin orthographies and/or non-Latin scripts. The following briefly considers these two areas.
Adapting 8-bit fonts to the transcription needs of many African languages – a practice involving various individuals, organizations, and projects that has been characterised as "anarchic" (Cissé et al 2004) – has apparently been fairly common (see above, 6.2). The result has been a number of mutually non-intercompatible "special fonts," or what are now generally referred to as "legacy fonts," that are still in use to varying degrees.89
There is to our knowledge no comprehensive listing of such fonts but a list of a few examples is given in Table 4.
|Location, site or organization||8-bit legacy fonts||Created by|
|Mali||Bambara Arial, Bambara Times||Created in connection with an ACCT workshop; late 1990s|
|Matchfont.com||(font for Gikuyu)||Created by Gatua wa Mbugwa, 1999|
|Niger||INDRAP98, La Nigeriènne||Created in Niger (?), late 1990s|
|SIL||Many fonts for general and country-specific usage||Created in the 1990s (and before?)|
While Unicode as a standard provides for extended Latin characters, the availability of fonts with the characters has only gradually become better. There are additional issues with the support of combining diacritics (such as tone marks in some cases) that have to do with other aspects of software, but nevertheless affect the utility of some fonts. An alternative strategy of using a single glyph for the combination of a base character plus a combining diacritic is one way to get around this problem.
There are a significant number of Arabic fonts available, either in the 8-bit ISO-8859-6 or Windows-1256 standards or in Unicode. Unicode covers these, of course, plus some additional characters, mostly for non-Arabic languages of the Middle East but also useful for some African languages. It is not clear how well existing fonts accommodate African usage, in part because standards are informal.
There exist various kinds of non-Unicode font solutions for Ethiopic/Ge'ez, and naturally Unicode fonts are more satisfactory for several reasons. There are some fonts for Tifinagh and N'Ko, but the latter still has some technical issues that are not yet resolved.
Computer keyboards designed for Europe and North America (in particular the English QWERTY and French AZERTY) are the rule in sub-Saharan Africa. The only language covered in this survey that has well-established keyboards is Arabic. There are input systems for Ethiopic/Ge'ez in the Horn of Africa, though no standard as of yet. Most of the discussion in this section, therefore, deals with efforts to provide for Latin-based transcriptions, and mainly ones that include extended characters and diacritics (see above, 4.3 and 6.3 for background).
Where languages use essentially the same characters that are indicated on the ELWC keyboards (and in the software), there is generally not a question of new methods for input. However, the numerous languages of Africa that use extended characters and diacritics pose varying challenges. In the case of many languages where typewriters for them were in use at the time desktop computers were introduced, the typewriter keyboard was adapted to the computer keyboard. These were few in Africa and have to our knowledge had no impact on computer keyboard design.90 So alternative workarounds were necessary.
Mostly Africa uses keyboards designed for English or French, with a fair number of layouts having been designed for input of specific African languages.
As indicated above (6.3) interfaces for input of special characters and diacritics in the Latin script can be done in a number of ways such as using programs like Tavultesoft’s Keyman program, Microsoft’s Keyboard Layout Creator (MSKLC) utility, or simply by assigning keys within a wordprocessor program. These are not particularly hard to implement and in fact there is an increasing number of these available for various languages and countries or regions.91 Some examples of efforts to design keyboards for African language needs are listed as part of Appendix 5 (Section 12.5).
The issue of keyboard layouts for Latin-based scripts is one that has had the attention of a number of individuals and a few firms. In a few cases, such as in Nigeria, it has also received some official (governmental) attention, but as a general rule keyboards for African languages have not had a wide or systematic consideration. As a result there have been a fair number of layouts designed for one or another situation in Africa, going back some years (and in some cases a keyboard driver and 8-bit font have been developed as part of a package).92
The facility with which one can create and disseminate a keyboard layout has a down side, however. Chantal Enguehard of the University of Nantes and the RIFAL project, has expressed concern that the proliferation of layouts might become confusing. She and Naroua Harouna of the University of Niamey are at this writing researching various keyboard layouts for evaluation and comparison.
Discussion of keyboard layouts in Africa inevitably leads to the topic of use of alternatives to the QWERTY or AZERTY keyboards or the development of specifically African keyboards. In one case for instance, a Nigerian linguist, Chinedu Uchechukwu, who was based in Germany suggested working with the German QWERTZ keyboard which has one more key than the QWERTY. This facilitated including on it the extra diacritical characters necessary to compose in Igbo. This idea and some others that led to creation of several keyboard layouts 93 were the outcome of discussions on several email fora (see below, 7.7, Table 6).
The only production computer keyboard we are aware of is the Konyin keyboard for Nigerian languages mentioned above (6.3).94 It follows the thinking that new layouts should probably not depart too much from the keyboards that current users are already accustomed to – generally English or French keyboards.
Currently the entire continent, especially south of the Sahara, uses keyboards designed originally for one or another West European or North American environment.95 To certain extent the use of English, French, and Portuguese keyboards are useful for the official languages are English and French and indeed these may be the basis for more Africanised keyboards. The proliferation of new keyboard layouts for African languages may have some drawbacks, however out of that process we may find new concepts for production keyboards that work better for Africa than the traditional European ones (such as Konyin attempts to do).
Nigeria in particular has seen a number of different efforts to design keyboards to accommodate special character needs for transcribing the many Nigerian languages. Between those efforts and others such as in the case of Francophone African countries using the AZERTY keyboard, one might foresee at least two production keyboards for Africa, each of which could accommodate more than one keyboard layout. (See also below, section 9).
Some graphics tablet keyboards put together (for production or concept) by Lee Pearce of Large-Format Computing in 2003.96 This solution would seem especially useful for a syllabary like Ethiopic/Ge'ez (and indeed there was a graphics tablet keyboard developed for it), but it has apparently not proved popular in other contexts. The input method is rather slow and requires use of a stylus. However, an advantage is that, as a USB device, it can be used alongside any other traditional keyboard to facilitate multilingual or multiscript input.
Locale data and its importance for localisation and multilingual ICT were introduced above (6.4). At the present time, however, relatively few African languages have local data.
In early 2006, Alberto Escudero-Pascual and Louise Berthilson of IT46 launched an online locale generator tool to assist people in compiling locale data for OpenOffice and CLDR.97 This led to filing of several more locales. Table 5 lists the African languages for which locale data has been filed with CLDR by mid-2006. Locale data is filed for a language and a country.
Table 5: African languages filed in CLDR 1.4 (as of July 2006)
|Language||ISO-639 code used||Country(ies) filed for|
|Afar||aa||Djibouti, Eritrea, Ethiopia|
|Afrikaans||af||South Africa, Namibia|
|Arabic||ar||Algeria, Egypt, Libya, Morocco, Sudan, Tunisia, and several countries in SW Asia|
|Hausa||ha||Nigeria, Niger, Ghana; Latin and Arabic scripts|
|Koro?||kfo||Nigeria? [there is an error in this locale]|
|Lingala||ln||Congo, Democratic Republic of Congo|
|Ndebele, South||nr||South Africa|
|Somali||so||Somalia, Ethiopia, Kenya, Djibouti|
|Sotho, Northern||nso||South Africa|
|Sotho, Southern||st||South Africa|
A website called "Yeha" focuses on locale data for languages of East Africa.98
Software applications in African languages can be seen as a fundamental way of both facilitating greater use of (and soft access to) to the technology, and a facilitator for those individuals who would develop web content as well.
Software localisation for African languages began before the current interest in localising FOSS, though instances of it were few. Examples of localisations in DOS environment in the 1990s include:
Even where such efforts have not generated sustained activity, they still point to early recognition of the potential and a base of experience to build on.
In recent years there has been more recognition of the need for localised software and efforts to localise. Initiatives such as Translate.org.za have led the way among FOSS localisers in Africa, and there has been some interest by the major proprietary software firm, Microsoft.
Table 6 lists African languages for which there are current active or completed projects to localise the OpenOffice software.
Table 6: OpenOffice Localisation Projects99
|Northern Sotho/Sepedi||ns||http://www.translate.org.za||Dwayne Bailey|
In some cases there has been other software localised. For example, the non-governmental organisation, Open Knowledge Network, developed its own localised software for project purposes. Another example is a children’s computer drawing program, TuxPaint has been localised into Swahili, and recently into Xhosa and Venda. Yet another is the Mozilla browser in Luganda.
In terms of operating systems, there are some projects for localising Ubuntu Linux, and Microsoft is, via its LIP project for overlay packs with about 80% of commands localised, working on several major languages.
Another interesting area to consider is localisation of user interfaces for online tools such as search engines. This area is related to both software localisation, in that terminologies and user profiles need to be considered, and to web content, in that it involves use of African languages on websites (that happen also to be user interfaces). An example is the Google program "Google in your language" includes several African language versions.100
This part will highlight how some other ICTs relate to localisation and will consider technologies and applications of potential importance in localisation. As discussed above (6.6), internationalisation and research on new technologies are opening new dimensions of ICT. Some of these are already being used or explored in Africa.
Mobile technology in the form of cellular phones has already emerged as a significant ICT in Africa. Cellular phones are increasingly widespread, much more than fixed line phones now, and even into rural areas of some countries. Along with this, and evolution of the technology to handle text messaging etc., there has been increasing interest in localising the user interfaces in African languages. This may be the new growth area for localisation, and certainly its importance is increased to the extent that mobile devices and computers can be used interchangeably to share and process information. Shanglee (2004) describes some of the considerations in localising cellphone technology for South African languages.
Among cellphone companies, Nokia appears to be particularly active in the area of localisation,101 with Sony-Eriksson and Samsung also marketing local language interfaces in South Africa. An American company, Tegic Communications, has adapted its "predictive text" software – which facilitates input of words using telephone keys and is used on many models of mobile telephone – to several African languages, including Afrikaans, Arabic, and Swahili, with Xhosa and Zulu in development (Senne 2006).
Commands in non-Latin scripts of Africa, notably Arabic, with research well advanced on Amharic, are proven, and text messaging in Arabic is also incorporated on many phones used in Arabophone regions.
There has been interest in TTS by several researchers, since many Africans are not literate. In recent years a few programs have been developed for African languages The Local Language Speech Technology Initiative (LLSTI) has coordinated development of TTS in Swahili, Zulu and Ibibio with local and international partners in each case. An interesting example of application of TTS is the Swahili version, which is used for text messages on mobile phones in a Kenyan project originally pioneered by the Open Knowledge Network and the University of Nairobi.
The ability to transform thought in writing or speech from one language to another with the assistance of a computer is one of the most interesting uses of ICT in multilingual contexts, but one that has had relatively little attention in Africa.103 The technology in this area is evolving quickly and has connections with and implications for localisation work.
For convenience one might divide it under two headings: machine translation (MT), or the automatic translation between languages by a computer program, which aims at translating speech or text from one language into another, in general or specific settings; and translation memory (TM), which is mostly used as a tool to facilitate new translations based on previous translations of the same or similar text content.
MT has been under development for a number of years and encompasses different approaches and technologies, the details of which will not be discussed here. What is of particular interest however is the subcategory of MT referred to as "shallow-transfer" which is a simpler approach adapted for translation between closely related language pairs (an example being the "Apertium" open-source translation software 104). Like the simpler "computer assisted dialect adjustment" (CADA) programs of a number of years ago 105 this may find significant use among related languages within Africa.
At this time there is not much MT for use between African and non-African languages apart from Arabic (especially Arabic to and from English), for which there is much research, commercial MT software and even online translation available. For the rest of the languages of the continent, there are at this writing several projects but actual working MT available only for Xhosa and Pulaar/Fulfulde in pairs with English. The latter have been built and presented online 106 through the efforts of Martha O'Kennon, professor emerita at Albion College (U.S.), using Prologue, but translate only short sentences. Prof. O’Kennon is also collaborating with other individuals on languages such as Akan and Yoruba.
A number of larger-scale projects exist, notably a longstanding one for Swahili called "Salama" under the direction of Arvi Hurskainen, professor at University of Helsinki (Finland).107 The African Language Research Project at the University of Maryland – Eastern Shore (U.S.) has an initiative to research MT for African languages. There are other corpus-building efforts that envision applying their work eventually in MT, such as one called SAY for Amharic (and some non-African languages) at New Mexico State University (U.S.).
There are apparently some MT specialists in South Africa, but we are not aware of any active MT efforts for African languages based in Africa.
Translation memory (TM) has had some application with African languages. The South African company Web-Lingo, for instance, uses a TM program called "Trados" in some of its work. The open-source TM program "OmegaT" has an [[Afrikaans] version.108
Beginning in January 2002, a number of email lists and fora have been set up specifically to advance discussion of topics related to computing and the internet in African languages. Previously African languages and ICT was a subject that might be discussed on other lists or not at all. These dedicated lists bear mentioning as arguably a number of dynamics have been set in motion by their existence and functioning. (Table 7 summarizes information on these lists.)
Table 7: E-mail forums on African languages and ICT
|Bisharat E-mail forum||Date established||Format & participation||Language||Number of subscribers||Number of postings & notes on topics|
|Hausa charsets & keyboards http://www.quicktopic.com/8/H/JxKHyg9ccPUVB||2001-9-2||Message board, no subscription necessary||English||9*||94, fonts, orthography, keyboards|
|Unicode-Afrique http://fr.groups.yahoo.com/group/Unicode-Afrique/||2002-1-20||E-mail subscription list||French||154||1029, orthographies, fonts, Unicode, encoding, keyboards, other projects|
|A12n-Collaboration http://lists.kabissa.org/mailman/listinfo/a12n-collaboration||2002-3-21||E-mail subscription list||English||64||898, character sets, fonts, keyboards, encoding, technical issues|
|Ghanaian languages & ICT http://www.quicktopic.com/16/H/9xffAXi7whnv||2002-7-9||Message board, no subscription necessary||English||4*||57, fonts, orthography, keyboards|
|Yoruba language & ICT (fonts, keyboards & applications) http://www.quicktopic.com/15/H/KKgbRqJUAR8||2002-7-27||Message board, no subscription necessary||English||18*||267, fonts, orthography, keyboards|
|Igbo language & ICT (fonts, keyboards & applications) http://www.quicktopic.com/17/H/tCcDxVXHgQxN||2002-10-17||Message board, no subscription necessary||English||9*||190, fonts, orthography, keyboards|
|A12n-Forum http://lists.kabissa.org/mailman/listinfo/a12n-forum||2003-6-1||E-mail subscription list||English||42||469, news, localisation, web content|
|A12n-Entraide http://lists.kabissa.org/mailman/listinfo/a12n-entraide||2003-6-1||E-mail subscription list||French||18||123,|
|Langues Togolaises et les NTIC http://www.quicktopic.com/25/H/k2zuDzmgxGkc||2004-1-23||Message board, no subscription necessary||French||5*||28, fonts, language instruction|
|Langues Sénégalaises et les NTIC http://www.quicktopic.com/25/H/6KmBx6F8jES||2004-3-23||Message board, no subscription necessary||French||3*||17, keyboards, orthographies, news|
|Langues Béninoises et les NTIC http://www.quicktopic.com/27/H/UbEFBKa7X46Ra||2004-7-19||Message board, no subscription necessary||French||2*||12, orthography, fonts, text online|
|Langues Burkinabè et les NTIC http://www.quicktopic.com/31/H/rhTwJR2T8ar||2005-5-28||Message board, no subscription necessary||French||1*||3, sample text|
|PanAfrLoc http://lists.kabissa.org/mailman/listinfo/PanAfrLoc||2005-6-15||E-mail subscription list||English & French||46||173,|
|Selected other E-mail fora on African language localisation||Date established||Format & participation||Language||Number of subscribers||Number of postings & notes on topics|
|Africa@unicode.org||2003||E-mail subscription list||English||?||?; encoding of African alphabets|
|Informatique et langues des deux Congo http://groups.google.com/group/info-langues-congo||2005||E-mail subscription list||French||18||?; issues relating to languages of DRC and RC|
|Linux2Igbo||2003||E-mail subscription list||English||-; localisation of Linux OS in Igbo|
One thing that has become apparent from experience with Bisharat's mailing lists is the usefulness and importance of this medium for communication and fostering collaboration on various aspects of using African languages and ICT. This importance had been underscored by the finding that there have been some cases been different groups working in the same country without knowledge of each other on questions of localisation. E-mail lists are not the only way to foster communication but they are inexpensive and effective and when coupled with traditional conference based approaches, and individual networking, can really change the environment at vacation around a particular question. For this reason the PanAfrican Localisation project has launched a trilingual forum to attempt to encourage communication across the continent and across the postcolonial linguistic boundaries as well.109