Unicode page titles allowed in the future?
Forum » Wikidot features and bugs / Bugs and problems » Unicode page titles allowed in the future?
Started by: MilchFlascheMilchFlasche
On: 1167971238|%e %b %Y, %H:%M %Z|agohover
Number of posts: 8
rss icon RSS: New posts
Summary:
Is it possible to allow Unicode characters in the page titles?
Unicode page titles allowed in the future?
MilchFlascheMilchFlasche 1167971238|%e %b %Y, %H:%M %Z|agohover

Hello Michal,

I really like this wiki engine of yours! I thought only TiddlyWiki, DokuWiki or MediaWiki could satisfy my needs, and hosted wiki services are never to be perfect. But today when I found here, I really can't imagine that there is still such a great service and elegant piece of work on earth!

So even no localization of Chinese yet, I still registered right away and have just created my own site here. But then I found something pitiful: the page titles are still ignoring Unicode characters such as Chinese Hanzi now. I'm here just to ask that do you have any plans to implement more support for Unicode? Because it would always be more convenient to display the page title in one's own language instead of English. (But at least multibyte contents are well displayed in the source, which is way better than XWiki!) Take DokuWiki for example, it allows multibyte page names, and encode them into things like %3B%4A%20 for the real file name. If Wikidot can do this, then it could attract more non-Western users I think :)

But before the implementation, hundreds of thanks are still presented to you.

My best regards,

MilchFlasche

unfold Unicode page titles allowed in the future? by MilchFlascheMilchFlasche, 1167971238|%e %b %Y, %H:%M %Z|agohover
Re: Unicode page titles allowed in the future?
MilchFlascheMilchFlasche 1167972773|%e %b %Y, %H:%M %Z|agohover

Oh! Please pardon my carelessness! Now I know that multibyte characters are not allowed in the "unix name" of documents, so they are ignored when we "add a new page" through the module; but they can be typed into the titles. Sorry, my bad :p

Solved :)


Oh! I really like these AJAX and Javascript-powered interfaces! Full of surprises everywhere!

unfold Re: Unicode page titles allowed in the future? by MilchFlascheMilchFlasche, 1167972773|%e %b %Y, %H:%M %Z|agohover
Re: Unicode page titles allowed in the future?
michal frackowiakmichal frackowiak 1167990804|%e %b %Y, %H:%M %Z|agohover

Hi,

indeed, you have found the solution ;-) I will try to fix the add new page to copy the original title.

There are several reasons to keep unix names with only letters and numbers — URL addresses are much better readable instead of %3B%4A%20… The problem for Chinese, Korean, Japanese etc. is that you have to use alphanumeric values for unix names…

This works perfectly for character sets where only some characters are specific — the "name unixifier" transcodes special characters into alphanumeric, e.g. ą ->a, ć -> c etc. But I have no idea how to do this e.g. for Chinese…

Any ideas? I believe one can live with how it works now but if there is a nice solution — it would be worth considering!

michal


Michał Frąckowiak @ Wikidot Inc.
Visit my blog at michalf.me

unfold Re: Unicode page titles allowed in the future? by michal frackowiakmichal frackowiak, 1167990804|%e %b %Y, %H:%M %Z|agohover
Re: Unicode page titles allowed in the future?
MilchFlascheMilchFlasche 1168074862|%e %b %Y, %H:%M %Z|agohover

There are tens of thousands of Hanzi, shared by Chinese, Japanese, Korean and many other Sinitic languages, and all these languages have all their own pronunciation of the Hanzi. So it's really burdensome trying to transcode Hanzi into romanzations, because it won't satisfy everyone :( So I guess we don't have to expect the transcoders to work on CJK languages. It would be natural inconvenience for East Asia people to name their web pages; blame that it's the Westerners who have invented computers :p

I myself can accept that we name the pages with meaningful English phrases, but there is one trouble: people might not know they have to do this when they type "the title in the original language" in the "add new page" field; and if they type "20070105:開站感言" like I did, they might not notice that the page name has been cut to "20070105" only, and more pages with "20070105…" would result in confusing (and conflicts?). Such problem also happened when I add a new thread on my forum: I enter the Chinese post title, but not until I have saved did I know that the Chinese part in my title were all gone, because the system automatically takes the title as the page name, and ignores non-alphanumeric characters at all, so the page name could be much less meaningful than expected. Furthermore, if I have entered a title full of Chinese characters? Would there even be anything for the page name? Hmm, I should try this later.

So for the East Asia trouble, here are my humble suggestions:

  • Allowing people to name the pages with multibyte characters, and transcode them into %xx%xx%xx as wikis like Wikipedia do. Although the page names would be impossible to understand by any human, but East Asians won't need to translate their mental page name into English, and they can directly link to these pages with [[[漢字]]](Hanzi), without any need of aliases such as [[[Hanzi|漢字]]]
    • In this case, maybe an option for people to choose to keep the current way (suitable for languages written in alphabetics), or transcode their page names always.
  • Or if you don't wish to break the semanticity of current Roman characters (no transcoding into %xx%xx), then maybe when multibyte characters are typed in the "add new page" field or "post title" field (i.e. whenever it would affect the page name), a bubble or some warning pops up to notify users that what they type are illegal names and could be ignored or truncated, and suggests them to name the page in Western characters first and key in their preferred title later in another field.
    • For the content pages, page name and title are separated fields, so at least multibyte characters can be kept in the second field. But such distinction is missing in the forum threads! So could we do a separation of page names and titles for forum thread too? (But this option can be turned off if Western users don't need it)

Personally I prefer the first way because it's simpler for East Asians (although those "%xx%xx%xx" thing are really ugly! :wink: ) , because they don't have to create and memorize two sets of page names and titles, and it's much easier for them to make internal links; or the Chinese Wikipedia won't be able to reach 100 thousand of articles!

What do you think?

unfold Re: Unicode page titles allowed in the future? by MilchFlascheMilchFlasche, 1168074862|%e %b %Y, %H:%M %Z|agohover
Re: Unicode page titles allowed in the future?
MilchFlascheMilchFlasche 1168570681|%e %b %Y, %H:%M %Z|agohover

Not feasible?

unfold Re: Unicode page titles allowed in the future? by MilchFlascheMilchFlasche, 1168570681|%e %b %Y, %H:%M %Z|agohover
Re: Unicode page titles allowed in the future?
michal frackowiakmichal frackowiak 1169036158|%e %b %Y, %H:%M %Z|agohover

I think this is a good idea but currently it would be quite difficult to implement… I hope in the future… Quite many people ask for better support for non-ascii characters so finally some solution will have to be implemented.


Michał Frąckowiak @ Wikidot Inc.
Visit my blog at michalf.me

unfold Re: Unicode page titles allowed in the future? by michal frackowiakmichal frackowiak, 1169036158|%e %b %Y, %H:%M %Z|agohover
Re: Unicode page titles allowed in the future?
MilchFlascheMilchFlasche 1169069482|%e %b %Y, %H:%M %Z|agohover

Then currently how about adding some documentation about the effect of non-ASCII titles when creating a content page or a forum thread to notify users using non-Western languages? :) I guess this might help them to name their page first in English, and know that they can change the title later :) While people might already know that they should name their content pages in Unix-form, but there seems to be no hint about this yet when people create a forum thread :)

My regards!

last edited on 1169070101|%e %b %Y, %H:%M %Z|agohover by MilchFlasche + show more
unfold Re: Unicode page titles allowed in the future? by MilchFlascheMilchFlasche, 1169069482|%e %b %Y, %H:%M %Z|agohover
Re: Unicode page titles allowed in the future?
BobChaoBobChao 1192302928|%e %b %Y, %H:%M %Z|agohover

Please fix this bug, it's a user-stopper for us. Users want to use Unicode (in our case, Chinese) characters both for page title and (escaped) URL, just like what MediaWiki can do.

unfold Re: Unicode page titles allowed in the future? by BobChaoBobChao, 1192302928|%e %b %Y, %H:%M %Z|agohover
New post

Bookmark and Share

Unless otherwise stated, the content of this page is licensed under Creative Commons Attribution-Share Alike 2.5 License.