The Language Observatory is a project based in Japan that assesses the presence of various languages on the internet. (Quoted text below from the project webpages)
The "objectives of the Language Observatory Project can be stated as:
- To raise public awareness on "Digital Language Divide" issues
- To encourage support to the processing of those languages now falling through the net."
"Language Observatory surveys language activities in the virtual universe over the Internet. ... [It] tries to catch subtle messages of less spoken languages, as far as they appear on the Internet, and answers such questions like:
- How many languages are found on the Internet?
- How many web pages are written by any given language/script under specific country code domain (ccTLD)?
- What kind of character encoding schemes (CESs) are employed to encode a given language?
- How quickly UCS/Unicode is spreading?
- To what extent open-source software (OSS) technologies are employed by specific language community?
- How specific language community is linked together with other language communities? (web-graph analysis)"
"The Language Observatory works through the following steps.
- Crawler Robots visit pages on the Internet at least once a year, and fetch text content. These robots return back to the same page regularly so as to produce a periodical report.
- Language Identification Module (LIM) analyses the page content and identifies language property (language, script and character encoding scheme, etc.) of the page. LIM is trained by the contribution of language experts.
- The Observatory counts up number of pages according to their language properties and compiles a regular report.
- The Observatory also analises HTML-tag information and link information to reveal open-source usage status, web graph structure, etc."
Funding & operation
"The project is currently funded by Japan Science and Technology Agency (JST) under RISTEX program, and is implemented by the partnership of several institutions" in Asia and Europe.
African languages have been surveyed as a part of the Language Observatory's work
References & links
< Kabissa | Organisations (InterAfrican & International) | Language Weaver >