CJKE Japanese-1 log
if this does not make sense to you, that's because you are not me. This is a log file of a project I am working on.1. Nobuyuke and I worked on 14 data files. I am copying the other 20 or so from ilab. They are first zipped. Will need to process them using emxml2dat-char.tcl, to add both the word-based and char-based SEG info.
2. emxml2data-char.tcl is revised. Aside from creating a proc readcharseg{} and adding some items in the $out[] array, the most important things are: (a) make sure the output vars are listed, and (b) the \uFFFE Unicode sign is written in the output file.
3. The tcl script takes charseg.txt as input. The file was basically an export from Nobuyuki's Excel file called all-story-letterseq.xls. Two tricks with this file: (a) I had to go through Word to safe the file with the correct \uFFFE Unicode mark, and (b) the #2 column does not have a name in the header, which took my a long time to realize.
4. Once all the files are processes, the *.dat files will be combined by using DOS copy command (copy /B j??-edited.xml-char.dat temp.txt) to combine as binary files.

0 Comments:
Post a Comment
<< Home