<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Think Vitamin &#187; Steve Ellis</title>
	<atom:link href="http://thinkvitamin.com/author/steve-ellis/feed/" rel="self" type="application/rss+xml" />
	<link>http://thinkvitamin.com</link>
	<description>The Web Practitioner&#039;s Blog</description>
	<lastBuildDate>Wed, 08 Feb 2012 14:00:04 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	
		<item>
		<title>Give your web app international appeal with PHP, Part II</title>
		<link>http://thinkvitamin.com/code/give-your-web-app-international-appeal-part-ii/</link>
		<comments>http://thinkvitamin.com/code/give-your-web-app-international-appeal-part-ii/#comments</comments>
		<pubDate>Tue, 19 Jun 2007 08:00:19 +0000</pubDate>
		<dc:creator>Steve Ellis</dc:creator>
				<category><![CDATA[Code]]></category>

		<guid isPermaLink="false">http://www.thinkvitamin.com/features/webapps/give-your-web-app-international-appeal-part-ii</guid>
		<description><![CDATA[In part one we covered the basics of how to get your website internationalised. In this part we&#39;re going to take things a stage further and look at some of the real world problems you might encounter when working with other languages. Plurals Dealing with plurals is the first challenge when internationalising a website. Let&#39;s [...]]]></description>
			<content:encoded><![CDATA[<p>In <a href="http://www.thinkvitamin.com/features/webapps/give-your-web-app-international-appeal" target="_blank">part one</a> we covered the basics of how to get your website internationalised. In this part we&#39;re going to take things a stage further and look at some of the real world problems you might encounter when working with other languages.</p>
<h3>Plurals</h3>
<p>Dealing with plurals is the first challenge when internationalising a website.  Let&#39;s imagine you&#39;re building a blogging engine and want a simple label at the bottom of each post to say how many comments have been posted. You could begin by creating two entries in the PO file; one for &#39;comment&#39; and one for &#39;comments&#39;, then choosing one based on how many comments there are. The problem with this approach is you are assuming there are only ever two types of plural (which there are in English). This is not always the case with other languages.</p>
<p>Eastern European languages for example quite often have three plural forms, some Asian languages such as Japanese don&#39;t even use plural forms. The <a href="http://www.gnu.org/software/gettext/manual/html_node/Plural-forms.html" target="_blank">plural forms</a> section of the gettext manual has the following example of how they work in Polish when describing a number of files (plik):</p>
<p>1 plik<br />
2,3,4 pliki<br />
5-21 pliko&#39;w<br />
22-24 pliki<br />
25-31 pliko&#39;w
</p>
<p>As you can see it has 3 different types of plurals (plik, pliki, pliko&#39;w), other issues arise in languages, like Spanish, where other words change depending on plurality, for example:</p>
<p>the red car<br />
<em>el coche rojo</em></p>
<p>the red cars<br />
<em>los coches rojos</em></p>
<p>In this case just altering the Spanish word for car (coche) wouldn&#39;t be enough since the article and adjective need to be changed as well.</p>
<p>So how do we compensate for this? It turns out <code>gettext</code> already has a way of dealing with this using the function <code>ngettext</code> over <code>gettext</code> in places where we need to use plurals. It works the same as <code>gettext</code> but takes two extra parameters: the first is the sentence in it&#39;s plural form and the second is the number relating to the plural, so this might be the number of comments or in this case the number of cars.</p>
<p>Our PHP will look like this:</p>
<pre>
<code>
  //Old singular form
  echo gettext("the red car");

  //New plural form
  echo ngettext("the red car", "the red cars", $numberOfCars);
</code>
</pre>
<p>When we generate our PO file using <code>xgettext</code> it will see that we&#39;ve used <code>ngettext</code> and generate a place for our translator to type in the singular and plural forms of the whole sentence, like so:</p>
<pre>
<code>
  msgid "the red car"
  msgid_plural "the red cars"
  msgstr[0] "el coche rojo"
  msgstr[1] "los coches rojos"
</code>
</pre>
<p>The number in square brackets indicates the type of plural, so for Polish we would have:</p>
<pre>
<code>
  msgid "file"
  msgid_plural "files"
  msgstr[0] "plik"
  msgstr[1] "pliki"
  msgstr[2] "pliko&#39;w"
</code>
</pre>
<p>But if we wanted to print &#39;42 files&#39; how would <code>gettext</code> know which of the three plurals to pick? We have to tell it through a formula. Somewhere near the top of your PO file (you may need to open it in a plain text editor for this) you need to add something similar to the following to describe your language. In this example we&#39;ll just use English:</p>
<pre>
<code>
  "Plural-Forms: nplurals=2; plural=n != 1;n"
</code>
</pre>
<p>So what on earth does that mean? Well <code>nplurals</code> tells gettext how many plurals the language has (English has two), <code>plural</code> then defines the formula where <code>n</code> is the number passed to ngettext via the third parameter. In this case if <code>n == 1</code> then <code>plural</code> will equal 0 (i.e. not a plural) otherwise it will equal 1 (i.e. use the plural). Thankfully we can reuse this for languages such as German, Spanish, Italian and Dutch. For other languages such as Polish and Czech the formula becomes a bit more complicated:</p>
<pre>
<code>
  Plural-Forms: nplurals=3;
                plural=n==1 ? 0 :
                n>=2 &#038;&#038; n<=4 &#038;&#038; (n0<10 || n0>=20) ? 1 : 2; //Polish

  Plural-Forms: nplurals=3;
		plural=(n==1) ? 0 : (n>=2 &#038;&#038; n<=4) ? 1 : 2;  //Czech
</code>
</pre>
<p>Fortunately the plurals section of the gettext manual has a <a href="http://www.gnu.org/software/gettext/manual/html_node/Plural-forms.html" target="_blank">list</a> (they start about half way down) of most of the formulas you&#39;re ever likely to need, so you can just copy and paste them into your PO file header. This saves much banging of your head against the wall whilst trying to work out terrifying formulas for languages like Russian:</p>
<pre>
<code>
  Plural-Forms: nplurals=3;
		plural=n==1 &#038;&#038; n0!=11 ? 0 :
	        n>=2 &#038;&#038; n<=4 &#038;&#038; (n0<10 || n0>=20) ? 1 : 2;
</code>
</pre>
<p>Yikes!</p>
<h3>Dynamic Data</h3>
<p>One of the problems mentioned briefly in part one was dynamic data. For example: what if you want to place a user&#39;s name into a sentence: &#8220;John likes apples&#8221;. This could be approached like this:</p>
<pre>
<code>
  echo $name . " " . gettext("likes apples");
</code>
</pre>
<p>What happens when our new language needs to put something in front of the name? For example in Spanish this sentence would be:</p>
<p><em>A John le gustan las manzanas</em></p>
<p>An &#39;a&#39; has been added to the start of the sentence. It&#39;s not optional and we need a way to allow our translator to add it. The simplest approach to this is to use something like <a href="http://www.php.net/sprintf" target="_blank"><code>sprintf</code></a> e.g.</p>
<pre>
<code>
  echo gettext("%s likes apples");
</code>
</pre>
<p>Our translator can then provide the following translation in the PO file:</p>
<pre>
<code>
  msgid "%s likes apples"
  msgstr "A %s le gustan las manzanas"
</code>
</pre>
<p>Our PHP then becomes:</p>
<pre>
<code>
  echo sprintf(gettext("%s likes apples"), $name);
</code>
</pre>
<p><code>Sprintf</code> also allows for numbered arguments which means the translator can swap their order if they need to, numbered arguments can look at bit confusing to non-techies so you could write your own version of <code>sprintf</code> that uses a simpler numbered syntax, on Diarised this is exactly what we did which means instead of seeing something like <code>%2\$s</code>, our translators see <code>{2}</code>.</p>
<p>While <code>sprintf</code> is good for situations like the one above it&#39;s not so great for links. At present you may need to stick something like this in your PO file:</p>
<pre>
<code>
  msgid &#39;<a href="<a href="http://www.example.com" target="_blank">www.example.com</a>&#8220;>click here</a> to visit our home page&#39;
</code>
</pre>
<p>Just as we didn&#39;t want to give our translator PHP code we shouldn&#39;t be giving it HTML either. While we could break it down and add entries for both &#39;click here&#39; and &#39;to go to our home page&#39; this isn&#39;t a very good solution since your translator will almost certainly need to see the whole sentence in context in order to produce an accurate translation.</p>
<p>For Diarised we wrote a <code>sprintf</code> style function to look for square brackets and turn the text inside them into links e.g.:</p>
<pre>
<code>
  echo linkprintf("[click here] to visit our home page", "<a href="http://www.example.com" target="_blank">www.example.com</a>&#8220;);
</code>
</pre>
<p>This means we can hide HTML from our translator and also keep the urls out of the translation file, so our translator sees:</p>
<pre>
<code>
  msgid &#39;[click here] to visit our home page&#39;
  msgstr &#39;[haz clic aquÃ­] para visitar nuestra pÃ¡gina de inicio
</code>
</pre>
<h3>Localising other content</h3>
<p>As we&#39;ve seen <code>gettext</code> is great for localising text but what about things like images, videos and audio? For Diarised we needed localised images for our Diary graphic on the home page as well as for some of the buttons. We approached this by writing a function that emulated gettext but would return localised files if available, dropping back to English by default.</p>
<pre>
<code>
  function get_localised_resource(<wbr>$localePath, $normalPath, $filename){
    $locale = $_SESSION[&#8221;locale&#8221;];    //get the current locale
    $localePath = &#8220;locale/$localePath/$filename&#8221;;    //path to localised resource
    if(file_exists($localePath)){    //if our localised version exists, return it
      return $localePath;
    }else{
      $normalPath = $normalPath.&#8221;/&#8221;.$filename;    //if not check for our fallback version
      if(file_exits($normalPath)){    //if it exists, return it
        return $normalPath;
      }else{
	return null;    //otherwise return null
      }
    }
  }

  function get_localised_image($filename){
    return get_localised_resource(&#8221;images<wbr>&#8220;, &#8220;templates/_img&#8221;, $filename);
  }
</code>
</pre>
<p>Here we have a generic localised function that checks for a localised resource and an example of the kind of function you could write to use it. The first parameter we pass (&#8221;images&#8221;) tells our function which folder our localised content is in, so if our locale is Spanish our path will be locale/es_ES/images/.</p>
<p>Our second parameter is where to look if it can&#39;t find the localised version, we keep our regular images in a folder called templates but you&#39;ll obviously need to update this for your own projects. This means we can put something like this into our HTML:</p>
<pre>
<code>
  <img src="<?php echo get_localised_image("welcome.jpg")?>&#8220;>
</code>
</pre>
<p>From this point adding &#8212; for example &#8212; localised video is just a case of creating a new function get_localised_video that just passes different directory names to our main <code>get_localised_resource</code> function.  For Diarised we even went as far doing this for our e-mails. Although, since these are text, we could have used <code>gettext</code> it&#39;s always neater to keep content like e-mails separate from code and this allowed us to do it.</p>
<h3>Automatically setting the locale</h3>
<p>A nice touch for a localised website is to try and work out what language the user might want the website to appear in. Although there&#39;s no way of knowing for certain we can make an educated guess that will probably get it right for most of your visitors. The way to do this is by checking the language header sent by the user&#39;s browser when it requests a page. For example mine looks like this:</p>
<pre>
<code>
  en-gb,en-us;q=0.7,en;q=0.3
</code>
</pre>
<p>This lists the three preferred languages, en-gb (British English), en-us (US English), en (English), the last two are followed by <code>q=</code> and a number indicating the quality value, the higher this value the more the user prefers it. If the figure is missed (such as with en-gb) it uses the default value of 1. Lets look at another example:</p>
<pre>
<code>
  es-es,de;q=0.7,en;q=0.3
</code>
</pre>
<p>This particular user is saying, send me Spanish if you&#39;ve got it, if not German and if you haven&#39;t got that then English, so with this in mind how can we go about sending a page in the user&#39;s preferred language if it&#39;s available? In PHP you can check the user&#39;s preferred language by calling <code>$_SERVER["HTTP_ACCEPT_LANGUAGE<wbr>&#8220;]</code>. From there you can write a function that uses regular expressions to work out the user&#39;s preferred language or if you&#39;re feeling lazy you can just use the one we wrote for Diarised:</p>
<pre>
<code>
  function get_accept_language(){
    $matches = preg_split("/,[ ]?/", $_SERVER["HTTP_ACCEPT_LANGUAGE<wbr>&#8220;]);
    $results = array();
    foreach($matches as $match){
      if(preg_match(&#8221;/;q=(d[.d] )<wbr>/&#8221;, $match, $scoreArr)){
        $score = $scoreArr[1];
      }else{
        $score = 1;
      }
      $results[$score] = str_replace($scoreArr[0], &#8220;&#8221;, $match);
    }
    krsort($results);
    return $results;
}
</code>
</pre>
<p>This will return the languages codes in an array sorted in the order the users wants, this allows you to search through the array until you find a language you support. It&#39;s worth remembering you should only do this as a last resort when you don&#39;t know for certain which language your user wants. So if they&#39;ve chosen to see the website in French don&#39;t force it to appear in German just because their browser says is German their preferred language.</p>
<h3>Conclusion</h3>
<p>That&#39;s it. hopefully with the help of these articles you&#39;ll be able to start producing your own  internationalised websites. We&#39;ve covered a lot but actually this is just the tip of the iceberg. Other issues include formatting dates, times, numbers, names and telephone numbers. There are also region specific things even between countries that speak the same language for example the US has zip codes while in the UK we have post-codes. I&#39;ll leave you to figure out some of this for yourselves!</p>
<p>If you&#39;d like to learn more about this the <a href="http://www.w3.org/International/" target="_blank">internationalisation</a> section of the w3c website is a great start. They have lots of useful information on things such as <a href="http://www.w3.org/International/questions/qa-date-format" target="_blank">date formats</a>, <a href="http://www.w3.org/International/questions/qa-forms-utf-8" target="_blank">forms</a> and <a href="http://www.w3.org/International/tutorials/tutorial-char-enc" target="_blank">character sets</a> as well as a nice list of <a href="http://www.w3.org/International/quicktips/" target="_blank">tips</a> that cover most of the issues you might run into. I&#39;d also highly recommend reading <a href="http://www.joelonsoftware.com/articles/Unicode.html" target="_blank">The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets (No Excuses!)</a> from Joel on Software, character encoding issues can be a complete nightmare so it&#39;s well worth trying to understand how they work so you can avoid them.</p>
]]></content:encoded>
			<wfw:commentRss>http://thinkvitamin.com/code/give-your-web-app-international-appeal-part-ii/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Give your web app international appeal with PHP, Part I</title>
		<link>http://thinkvitamin.com/code/give-your-web-app-international-appeal/</link>
		<comments>http://thinkvitamin.com/code/give-your-web-app-international-appeal/#comments</comments>
		<pubDate>Mon, 04 Jun 2007 01:00:32 +0000</pubDate>
		<dc:creator>Steve Ellis</dc:creator>
				<category><![CDATA[Code]]></category>
		<category><![CDATA[PHP]]></category>

		<guid isPermaLink="false">http://www.thinkvitamin.com/features/webapps/give-your-web-app-international-appeal</guid>
		<description><![CDATA[Building web apps can be a lot of fun &#8211; particularly if people like and use them. If you dig through your stats the chances are that people from all over the world are visiting you and they speak lots of different languages. By only serving your app or website in English you&#8217;re assuming that [...]]]></description>
			<content:encoded><![CDATA[<p>Building web apps can be a lot of fun &#8211; particularly if people like and use them. If you dig through your stats the chances are that people from all over the world are visiting you and they speak lots of different  languages. By only serving your app or website in English you&#8217;re assuming that everyone can speak it, or at least understand it well enough to get by. Just as learning to speak a new language can open the door to new cultures and ways of thinking, having a website that speaks multiple languages allows you to engage people who might otherwise have left looking for an alternative.</p>
<p>This was the situation we found ourselves in back in March just after we launched <a href="http://www.diarised.com">Diarised</a>. We had people not only visiting us from all over the world but blogging about us in many different languages. We thought it would be a nice idea to get Diarised working in a few.</p>
<p>The process of getting your website ready for multiple languages is known as internationalisation. Internationalisation goes beyond getting the website working in other languages and covers aspects such as dates, times and currency. In this two part article we&#8217;re going to look at how you go about preparing a website for internationalisation. Then we&#8217;ll look at solutions to some of the real-world problems that can arise, such as dynamic data and plurals.</p>
<h3>Associative Arrays</h3>
<p>As a first stab at internationalisation we might try using associative arrays. We could store our translated text using the English text as the key, like so:</p>
<pre>
<code>
  //set up our language arrays

  $english["welcome to example.com"] = "welcome to example.com";
  $english["have a nice day"] = "have a nice day";

  $spanish["welcome to example.com"] = "Bienvenidos a example.com";
  $spanish["have a nice day"] = "Ten un d&iacute;a bueno";
  ...

  /*
  *  We'll set the locale based on a variable in the query string
  */

  switch($_GET["lang"]){
    case "es":		            //es turns the page into Spanish
      $messages = $spanish;
      break;

    default:			    //everything else defaults back to English
      $messages = $english;
      break;
  }
  ...

  echo $messages["welcome to example.com"]
</code>
</pre>
<p>Adding new languages should then be a case of adding more arrays and updating the switch statement. Perfect! Well not quite. There are a few issues with this approach:</p>
<ol>
<li>Dynamic data. What if you need to insert someone&#8217;s username into a sentence? Breaking the sentence down into lots of little phrases will only make your translator&#8217;s life difficult and will probably lead to inaccurate translations.</li>
<li>You&#8217;re assuming your translator will be able to understand PHP array syntax and avoid breaking everything. What happens when they decide to stick something in quotation marks? The last thing you want to do after receiving a translation is to start debugging syntax errors.</li>
<li>This approach won&#8217;t fail gracefully. What happens if you accidentally type: <code>echo $messages["wlcome to example.com"]</code> By missing off the &#8220;e&#8221; your translation will break and leave blank text.</li>
</ol>
<p>Clearly this method has issues. What we really want is a way to allow non-techies to translate our text for us and a way for us to simply &#8220;plug&#8221; the translation back into the website.</p>
<h3>Introducing Gettext</h3>
<p>After ruling out associative arrays to do Diarised translation&#8217;s we decided to use <a href="http://www.gnu.org/software/gettext/">gettext</a>. gettext is the GNU internationalization library and it provides an excellent way of separating code from content. If you&#8217;re hosting on Linux there&#8217;s a good chance it will already be installed on your server.</p>
<h4>So, how does it work?</h4>
<p>The official Gettext website gives a highly detailed and fairly confusing <a href="http://www.gnu.org/software/gettext/manual/gettext.html#Overview">overview</a> but the gist is:</p>
<ol>
<li>You pass every piece of text you need translated through a function called gettext</li>
<li>Once everything has been marked up you run the xgettext command to create a PO (Portable Object) file. This is a plain text file containing the source text and a place for the translated text</li>
<li>You send this to your translator to open with a PO editor</li>
<li>Once your translator has filled out the translations they send the PO file back to you</li>
<li>You compile the PO file into a MO (Machine Object) file that gettext can read</li>
<li>You set the locale of your site to a language (usually through the query string) and sit back and admire your website in a totally different language</li>
</ol>
<p>Step 1 is just a case of the following:</p>
<pre>
<code>
  &lt;p&gt;welcome to example.com&lt;/p&gt;
</code>
</pre>
<p>becomes</p>
<pre>
<code>
  &lt;p&gt;&lt;? echo gettext("welcome to example.com"); ?&gt;&lt;/p&gt;
</code>
</pre>
<p>Marking up your code to use gettext is probably the most irritating step but fortunately you only have to do this once. It will then work for as many languages as you like. The next stage is to get this information into our translation file.</p>
<h3>How to create a PO file</h3>
<p>The first step is to set up the directory structure:</p>
<ol>
<li>In your webroot folder create a folder called locale</li>
<li>Inside that create a folder for each language you plan on supporting and use the language code as the folder name, e.g. for Spanish use es_ES or if you wanted say a localised Argentine version you could have es_AR</li>
<li>Inside each of those folders create a new one called LC_MESSAGES. This is where we will keep our translation files</li>
</ol>
<p>To create the PO file you&#8217;ll need to use the command line but don&#8217;t worry we&#8217;re here to hold your hand. Fire up a terminal window, connect to your webserver and be brave.<br /> The command we need is called xgettext, this will scan a script looking for calls to gettext then grab the text you&#8217;re passing and put it into a PO file. For example:</p>
<pre>
<code>
  # xgettext -o messages.po *.php
</code>
</pre>
<p>This will search every PHP page in the current working directory and stick the results in messages.po.  Once this is done open the file with a plain text editor and make the following change:</p>
<pre>
<code>
  "Content-Type: text/plain; charset=CHARSETn"
</code>
</pre>
<p>becomes</p>
<pre>
<code>
  "Content-Type: text/plain; charset=utf-8n"
</code>
</pre>
<p>Now save it and send it to your translator</p>
<p>Although a PO file is a plain text file its contents aren&#8217;t particularly friendly. Luckily there are a few pieces of software that make editing PO files a bit easier (especially for your translator). For Windows there&#8217;s <a href="http://www.poedit.net/">poedit</a>, and for the Mac there&#8217;s <a href="http://www.triplespin.com/en/products/locfactoryeditor.html">LocFactory Editor</a>. Both are free and will make your translators lives much easier.</p>
<p>When your translator sends the PO file back stick it into the appropriate LC_MESSAGES folder created earlier and open a terminal window so we can compile it</p>
<p>In your command window go to the LC_MESSAGES folder with messages.po and issue the following:</p>
<pre>
<code>
  # msgfmt messages.po
</code>
</pre>
<p>Assuming there were no errors this will churn out a file called messages.mo. This is the compiled file gettext will actually read, any changes to your PO file will require you to redo this step to make the changes live. Now all we need to do is tell gettext which language we want our text in.</p>
<h3>Binding a locale</h3>
<p>This step can be done via a few lines of PHP near the start of your script:</p>
<pre>
<code>
  $locale = $_GET["locale"];

  putenv("LC_ALL=$locale");
  setlocale(LC_ALL, $locale);

  bindtextdomain("messages", "locale/");	//binds the messages domain to the locale folder
  bind_textdomain_codeset("messages","UTF-8"); 	//ensures text returned is utf-8, quite often this is iso-8859-1 by default
  textdomain("messages");	//sets the domain name, this means gettext will be looking for a file called messages.mo
</code>
</pre>
<p><code>$locale</code> will need to be set to the locale that you want the website appear in. To begin with it&#8217;s simplest to set this via the query string as shown above. One of the advantages of gettext is that if it can&#8217;t find a folder for the locale you pick it will just go back to English, meaning when someone sticks locale=kl expecting a Klingon version they&#8217;ll just get English.</p>
<p>That&#8217;s it for part one. At this point you should be able to at least make a start on preparing your websites for internationalisation. In part two we&#8217;re going to look at some of the real world issues you&#8217;re likely to run into and tell you what we did to solve them on Diarised.</p>
<p><script src="http://digg.com/tools/diggthis.js" type="text/javascript"></script></p>
]]></content:encoded>
			<wfw:commentRss>http://thinkvitamin.com/code/give-your-web-app-international-appeal/feed/</wfw:commentRss>
		<slash:comments>7</slash:comments>
		</item>
	</channel>
</rss>

<!-- Dynamic page generated in 0.262 seconds. -->
<!-- Cached page generated by WP-Super-Cache on 2012-02-08 20:22:00 -->

