This project creates language specific static HTML pages from a HTML page that contains information in a certain language.
The Module will test the charset metadata of the page to decide what language should be used.
The module will access a database table with two column table where the first column is the charset code (e.g. iso-8859-1) and the second column is the language code (e.g. en) the language code should handle 2 letters only (ignore variants…)
If UTF-8 encoding is used the module should try figure out the language of the page from language tags in the HTML spec, if those are missing then the module should use the langauge will be a default language value it received on initialization
The initialization also receives DBI connection string
(e.g. $i18n = I18N->new(default_language => ‘en’, dbh => ‘dbi:ODBC:I18N’) )
The module includes pre-processing method that its input is an HTML filename, the method reads the HTML file and find all language specific text in the HTML, the output will be inserted to a SQL Database table in the following format
The function will give every text a Field_id,
The Field_text will contain the text
And Language_id will be set according to the HTML encoding or language
For this file
Design and specification
Implementation, debugging, and testing services
The database will hold something like this
Field_id, Field_text, Language_id
1,Untitled Document, en
2,Design and specification, en
3,Implementation, debugging, en
4,and testing services, en
Image Alt attributes and tags tool tips should be considered as text
Every time you call the pre-processing method with new page the database grows and grows
It is not allowed that two pages will have the same Field_id.
If the pre-processing method input is a directory name then it will run on all HTML and HTM files in that directory creating the database.
The module will have an add language method – that when called it will had a new entry for the requested language
If the module is called with module->AddLanguage(‘fr’) the above database will become
Field_id Field_text Language_id
1 Design and specification en
2 Implementation, debugging, en
3 and testing services en
The module will have a “not translated yet” method that returns a reference to a list of
All Field_ids in a specific language that are empty (not translated yet)
The function input is a language_id (e.g. en, fr etc….)
In the above example the return value for this method with ‘fr’ input will be [1,2,3]
The module will have a generate pages method, the input for this method should be Directory name. This method creates sub directory for every language and put all the pages translated according to the database content
e.g. for the above database and a directory /user/data/i18n/ it will create two sub directories
The method will replace all the original text string with the translated ones from the data base.
If some strings in the database are empty then it will put the original text surrounded by a link original text
Where the $link and $target are input parameters for the module and the YYYY is the Field_id and the ZZ is the language generated
The generate pages method will insert dir=”rtl” attribute In the body tag and ,append rtl to the image paths (i.e. if image path was Src=”img/[url removed, login to view]” it will be Src=”imgrtl/[url removed, login to view]” and will convert align=”right” to align = “left” if the direction setting is ‘rtl’. This change SHOULD BE MADE ONLY when the language is either ‘ar’ or ‘he’.
The module should be OO perl module
The module should support all code pages and UTF-8 as well
this includes both knowing the code pages and the related language and both handling the string correctly
The module should be well documented and easy to maintain, you should not use third party software that has a restricting license that will not enable me to distribute this module
we should get all the source code documented and we get all copyrights and we can do whatever we want with the code including changing it and reselling it, eating it ..:-)
the module should be compatible with ms windows and all the module dependencies as well, it must be based only on open source code no special modules that cost money or limit our ability to distribute the software developed are allowed !
If you need to use binary modules the binary modules source code should be included and it should be compatible with Linux
The example html code was stripped of tags so try to look at the following file