There are tens of thousands of Hanzi, shared by Chinese, Japanese, Korean and many other Sinitic languages, and all these languages have all their own pronunciation of the Hanzi. So it's really burdensome trying to transcode Hanzi into romanzations, because it won't satisfy everyone :( So I guess we don't have to expect the transcoders to work on CJK languages. It would be natural inconvenience for East Asia people to name their web pages; blame that it's the Westerners who have invented computers :p
I myself can accept that we name the pages with meaningful English phrases, but there is one trouble: people might not know they have to do this when they type "the title in the original language" in the "add new page" field; and if they type "20070105:開站感言" like I did, they might not notice that the page name has been cut to "20070105" only, and more pages with "20070105…" would result in confusing (and conflicts?). Such problem also happened when I add a new thread on my forum: I enter the Chinese post title, but not until I have saved did I know that the Chinese part in my title were all gone, because the system automatically takes the title as the page name, and ignores non-alphanumeric characters at all, so the page name could be much less meaningful than expected. Furthermore, if I have entered a title full of Chinese characters? Would there even be anything for the page name? Hmm, I should try this later.
So for the East Asia trouble, here are my humble suggestions:
- Allowing people to name the pages with multibyte characters, and transcode them into %xx%xx%xx as wikis like Wikipedia do. Although the page names would be impossible to understand by any human, but East Asians won't need to translate their mental page name into English, and they can directly link to these pages with [[[漢字]]](Hanzi), without any need of aliases such as [[[Hanzi|漢字]]]
- In this case, maybe an option for people to choose to keep the current way (suitable for languages written in alphabetics), or transcode their page names always.
- Or if you don't wish to break the semanticity of current Roman characters (no transcoding into %xx%xx), then maybe when multibyte characters are typed in the "add new page" field or "post title" field (i.e. whenever it would affect the page name), a bubble or some warning pops up to notify users that what they type are illegal names and could be ignored or truncated, and suggests them to name the page in Western characters first and key in their preferred title later in another field.
- For the content pages, page name and title are separated fields, so at least multibyte characters can be kept in the second field. But such distinction is missing in the forum threads! So could we do a separation of page names and titles for forum thread too? (But this option can be turned off if Western users don't need it)
Personally I prefer the first way because it's simpler for East Asians (although those "%xx%xx%xx" thing are really ugly! :wink: ) , because they don't have to create and memorize two sets of page names and titles, and it's much easier for them to make internal links; or the Chinese Wikipedia won't be able to reach 100 thousand of articles!
What do you think?