GNOME Bugzilla – Bug 108158
postscript exporting m17n
Last modified: 2006-07-15 18:47:31 UTC
Hello. I'm trying to improve postscript exporting with non-latin1 font. At first, I have made a Japanese L10N patch. It works fine, but it's not applicable for other languages. Now I'm working on multilingualization(M17N) in the next stage. This is a patch(-cjk1) tested only on Japanese Win32 platform. I'd like to ask some developers to review, fix and improve it on Korean/Chinese platform, and hope to add features for other languages. (1) Postscript font family name Refer legacy_fonts[] in lib/font.c newname "ms mincho" -> oldname "Ryumin-Light" The name is registered in DiaPsRenderer. (2) Encoding Refer M17Nfont[] in app/diapsrenderer.c fontname "Ryumin-Light" -> encoding index DIA_ENCODING_SJIS The index is registered in DiaPsRenderer. (3) Postscript font encoding Refer lookup_ps_encoding() in app/diapsrenderer.c fontname "Ryumin-Light" -> "Ryumin-Light-RKSJ-H" (Shift-JIS) This is the full font name to export. (4) Charset conversion Refer lookup_iconv_encoding() in app/diapsrenderer.c "UTF-8" string -> "SJIS" string for Japanese. iconv is used for charset conversion. Please link libiconv. (5) Change default encoding If JP_L10N symbol is defined, the default encoding will be changed to SJIS. It's for convenience on Japanese platform. Should I post it to dia-list? --- app/diapsrenderer.c.org 2003-01-16 17:23:14.000000000 +0900 +++ app/diapsrenderer.c 2003-03-11 20:23:18.000000000 +0900 @@ -26,12 +26,56 @@ #include <string.h> #include <time.h> +#include <ctype.h> +#include <iconv.h> #include "diapsrenderer.h" #include "message.h" #include "dia_image.h" #include "font.h" +#define DIA_ICONV_FROM ( "UTF-8" ) +#define DIA_ICONV_DEFAULT_TO ( "ASCII" ) +#define DIA_ENCODING_DEFAULT ( "latin1" ) + +#ifdef JP_L10N +#undef DIA_ICONV_DEFAULT_TO +#undef DIA_ENCODING_DEFAULT + +#define DIA_ICONV_DEFAULT_TO ( "SJIS" ) +#define DIA_ENCODING_DEFAULT ( "RKSJ-H" ) +#endif + +static char* lookup_ps_encoding(DiaM17NEncodingIndex n) +{ + static char* encoding[] = { + "latin1", + "RKSJ-H", + "KSC-UHC-H", + "GBK-EUC-H", + "ETen-B5-H" + }; + if (n < 0 || G_N_ELEMENTS(encoding) <= n) { + return DIA_ENCODING_DEFAULT; + } + return encoding[n]; +} + +static char* lookup_iconv_encoding(DiaM17NEncodingIndex n) +{ + static char* encoding[] = { + "ASCII", + "SJIS", + "EUC-KR", + "GB", + "BIG5" + }; + if (n < 0 || G_N_ELEMENTS(encoding) <= n) { + return DIA_ICONV_DEFAULT_TO; + } + return encoding[n]; +} + void lazy_setcolor(DiaPsRenderer *renderer, Color *color) @@ -238,9 +282,44 @@ set_font(DiaRenderer *self, DiaFont *font, real height) { DiaPsRenderer *renderer = DIA_PS_RENDERER(self); + int i = 0; + char *fontname; + char *p; + static struct _M17Nfont_index { + char* name; + DiaM17NEncodingIndex eidx; + } M17Nfont[] = { + { "Batang-", DIA_ENCODING_KR }, + { "BousungEG-", DIA_ENCODING_GB }, + { "Dotum-", DIA_ENCODING_KR }, + { "GBZenKai-", DIA_ENCODING_GB }, + { "GothicBBB-", DIA_ENCODING_SJIS }, + { "Gulim-", DIA_ENCODING_KR }, + { "MOESung-", DIA_ENCODING_BIG5 }, + { "Ryumin-", DIA_ENCODING_SJIS }, + { "ShanHeiSun-", DIA_ENCODING_BIG5 }, + { "Song-", DIA_ENCODING_GB }, + { "ZenKai-", DIA_ENCODING_BIG5 }, + { NULL, DIA_ENCODING_ASCII } + }; + + fontname = dia_font_get_psfontname(font); + renderer->fontfamily = fontname; + renderer->eidx = DIA_ENCODING_ASCII; + while (p = M17Nfont[i].name) { + if (strncmp(fontname, p, strlen(p)) == 0) { + renderer->eidx = M17Nfont[i].eidx; + break; + } + i++; + } - fprintf(renderer->file, "/%s-latin1 ff %f scf sf\n", - dia_font_get_psfontname(font), (double)height); + /* + * Trying CJK M17N, but tested only on Japanese platform. + * Need some test and fix on Korean/Chinese platform. + */ + fprintf(renderer->file, "/%s-%s ff %f scf sf\n", + fontname, lookup_ps_encoding(renderer->eidx), (double)height); } static void @@ -500,6 +579,12 @@ char *buffer; const char *str; int len; + iconv_t c; + char* psrc; + char* pdst; + size_t to_left; + size_t from_left; + size_t blen; if (1 > strlen(text)) return; @@ -509,9 +594,31 @@ /* TODO: Use latin-1 encoding */ /* Escape all '(' and ')': */ - buffer = g_malloc(2*strlen(text)+1); + psrc = text; + to_left = from_left = strlen(text) + 1; + str = pdst = g_malloc(to_left); + *pdst = 0; + + /* + * Trying CJK M17N, but tested only on Japanese platform. + * Need some test and fix on Korean/Chinese platform. + */ + c = iconv_open(lookup_iconv_encoding(renderer->eidx), DIA_ICONV_FROM); + if (c == NULL) { + g_free(str); + return; + } + blen = iconv(c, &psrc, &from_left, &pdst, &to_left); + if (blen == -1) { + g_free(str); + iconv_close(c); + return; + } + iconv_close(c); + pdst = str; + + buffer = g_malloc(2*strlen(str)+1); *buffer = 0; - str = text; while (*str != 0) { len = strcspn(str,"()\\"); strncat(buffer, str, len); @@ -522,8 +629,18 @@ str++; } } - fprintf(renderer->file, "(%s) ", buffer); + str = buffer; + fprintf(renderer->file, "("); + do { + if (isascii(*str) && isprint(*str)) { + fprintf(renderer->file, "%c", *((unsigned char*)str)); + } else { + fprintf(renderer->file, "\\%03o", *((unsigned char*)str)); + } + } while (*(++str)); + fprintf(renderer->file, ") "); g_free(buffer); + g_free(pdst); switch (alignment) { case ALIGN_LEFT: --- app/diapsrenderer.h.org 2002-12-08 00:38:38.000000000 +0900 +++ app/diapsrenderer.h 2003-03-11 20:23:52.000000000 +0900 @@ -5,6 +5,7 @@ #include "color.h" #include "diarenderer.h" +#include "font.h" G_BEGIN_DECLS @@ -39,6 +40,8 @@ double scale; Rectangle extent; + char *fontfamily; + DiaM17NEncodingIndex eidx; }; struct _DiaPsRendererClass --- lib/font.c.org 2003-01-22 17:23:30.000000000 +0900 +++ lib/font.c 2003-03-11 10:28:50.000000000 +0900 @@ -36,6 +36,17 @@ #include "message.h" #include "intl.h" +#define DIA_ENCODING_DEFAULT ( DIA_ENCODING_ASCII ) +#define DIA_FONT_DEFAULT ( "Courier" ) + +#ifdef JP_L10N +#undef DIA_ENCODING_DEFAULT +#undef DIA_FONT_DEFAULT + +#define DIA_ENCODING_DEFAULT ( DIA_ENCODING_SJIS ) +#define DIA_FONT_DEFAULT ( "Ryumin-Light" ) +#endif + static PangoContext* pango_context = NULL; /* This is the global factor that says what zoom factor is 100%. It's @@ -690,6 +701,8 @@ { "Dotum", "Dotum", DIA_FONT_FAMILY_ANY }, { "GBZenKai-Medium", "GBZenKai-Medium", DIA_FONT_FAMILY_ANY }, { "GothicBBB-Medium", "GothicBBB-Medium", DIA_FONT_FAMILY_ANY }, + { "GothicBBB-Medium", "ms gothic", DIA_FONT_FAMILY_ANY }, + { "GothicBBB-Medium", "ms pgothic", DIA_FONT_FAMILY_ANY }, { "Gulim", "Gulim", DIA_FONT_FAMILY_ANY }, { "Headline", "Headline", DIA_FONT_FAMILY_ANY }, { "Helvetica", "Arial", DIA_FONT_SANS }, @@ -710,9 +723,12 @@ { "Palatino-Italic", "Palatino", DIA_FONT_FAMILY_ANY | DIA_FONT_ITALIC }, { "Palatino-Roman", "Palatino", DIA_FONT_FAMILY_ANY }, { "Ryumin-Light", "Ryumin", DIA_FONT_FAMILY_ANY | DIA_FONT_LIGHT }, + { "Ryumin-Light", "ms mincho", DIA_FONT_FAMILY_ANY }, + { "Ryumin-Light", "ms pmincho", DIA_FONT_FAMILY_ANY }, { "ShanHeiSun-Light", "ShanHeiSun", DIA_FONT_FAMILY_ANY | DIA_FONT_LIGHT }, { "Song-Medium", "Song-Medium", DIA_FONT_FAMILY_ANY | DIA_FONT_MEDIUM }, { "Symbol", "Symbol", DIA_FONT_SANS | DIA_FONT_MEDIUM }, + { "Symbol", "Symbol", DIA_FONT_FAMILY_ANY }, { "Times-Bold", "Times New Roman", DIA_FONT_SERIF | DIA_FONT_BOLD }, { "Times-BoldItalic", "Times New Roman", DIA_FONT_SERIF | DIA_FONT_ITALIC | DIA_FONT_BOLD }, { "Times-Italic", "Times New Roman", DIA_FONT_SERIF | DIA_FONT_ITALIC }, @@ -785,6 +801,5 @@ } } } - return matched_name ? matched_name : "Courier"; + return matched_name ? matched_name : DIA_FONT_DEFAULT; } - --- lib/font.h.org 2002-11-19 12:08:10.000000000 +0900 +++ lib/font.h 2003-03-11 20:23:38.000000000 +0900 @@ -107,6 +107,15 @@ /* mutable */ char* legacy_name; }; +typedef enum +{ + DIA_ENCODING_ASCII = 0, /* ASCII, -latin1 */ + DIA_ENCODING_SJIS = 1, /* Japanese Shift-JIS, Codepage 932, -RKSJ-H */ + DIA_ENCODING_KR = 2, /* Korean Codepage 949, -KSC-UHC-H */ + DIA_ENCODING_GB = 3, /* Simplified Chinese, -GBK-EUC-H */ + DIA_ENCODING_BIG5 = 4, /* Traditional Chinese, -ETen-B5-H */ +} DiaM17NEncodingIndex; + /* Set the PangoContext used to render text. */ void dia_font_init(PangoContext* pcontext);
I'm a bit confused about this bug. 0.91 doesn't use PostScript fonts at all, so there shouldn't be any font encoding issues to fix. Doesn't the freetype-based outline drawing render japanese fonts correctly?
OK. It's not font rendering problem, but EPS exporting one. The set_font function in app/diapsrenderer.c exports postscript font name. It assumes that -latin1 encoding would be sufficient for any language. Dia exports Japanese text string using "Courier-latin1" font, and the text always get garbled. "Ryumin-Light-RKSJ-H" or "GothicBBB-Medium-RKSJ-H" would be good PS font name for Japanese text. And the text must be encoded in Shift-JIS charset, not UTF-8, with those fonts.
Is this on a Windows machine?
I've been building and testing Dia only on windows box now. But I guess the issue is machine-independent, and similarly on UNIX box.
No, EPS export of fonts is different between Win32 and Unix. Unix uses diapsft2renderer, which uses FreeType2 for font rendering, whereas Win32 uses standard fonts (whatever they may be).

I don't think Ryumin-Light-RKSJ-H" or "GothicBBB-Medium-RKSJ-H" would be good PS font name for Japanese text. just say "on Windows" :) Basically for non-8bit PostScript printer, only JIS can be safely used. Although your patch assumes CMap, it's supported from PostScript Level 3. If it's even acceptable, I think just using UTF8-H is best, because dia doesn't need to convert the strings. I wonder if it's acceptable though so that it means dia has no support about Level1 and Level2, even if people has something like ghostscript or the printer driver supports it. To not depend on such thing, using embeded fonts is better way.
I'd like to make the EPS (with text) output work again even for the Unix build and non 7bit languages, see : http://mail.gnome.org/archives/dia-list/2003-June/msg00050.html Though I don't have not much of clue how to get properly multi language here are some requirements for the patch : - don't make things compile-time configurable, there are not many people capable compiling Dia on windoze - try to avoid direct iconv usage. Use the wrapper provided by glib if possible
Adding the PATCH keyword.
Setting to NEEDINFO awaiting feedback on Akira's comment. We're not being explicit about PS level, but I think forcing level 3 would be a bit much. I also don't like the compile-time l10n at all.
One year of NEEDINFO should be enough, marking as obsolete.