GNOME Bugzilla – Bug 328302
can't search only Korean files.
Last modified: 2006-10-20 16:23:58 UTC
Please describe the problem: can't search cjk contents. Steps to reproduce: 1. type Korean or Japanese string. 2. 3. Actual results: Expected results: Does this happen every time? always Other information: Fedora development mono 1.1.13
Can you provide an example file and an example query, please?
Beagle can't search Korean strings only. So I made a patch of Lucene.Net which supports Hangul unicode area.
Created attachment 58063 [details] [review] Hangul support patch for Lucene.Net
Comment on attachment 58063 [details] [review] Hangul support patch for Lucene.Net >--- beagle.orig/beagled/Lucene.Net/Analysis/Standard/StandardTokenizerTokenManager.cs 2006-01-23 23:04:05.000000000 +0900 >+++ beagle/beagled/Lucene.Net/Analysis/Standard/StandardTokenizerTokenManager.cs 2006-01-23 22:49:17.000000000 +0900 >@@ -78,20 +78,21 @@ > JjCheckNAdd(jjnextStates[start]); > JjCheckNAdd(jjnextStates[start + 1]); > } >- internal static readonly ulong[] jjbitVec0 = new ulong[]{0x1ff0000000000000L, 0xffffffffffffc000L, 0xffffffffL, 0x600000000000000L}; >+ internal static readonly ulong[] jjbitVec0 = new ulong[]{0x1ff0000000000000L, 0xffffffffffffc000L, 0xfffff000ffffffffL, 0x6000000007fffffL}; > internal static readonly ulong[] jjbitVec2 = new ulong[]{0x0L, 0xffffffffffffffffL, 0xffffffffffffffffL, 0xffffffffffffffffL}; > internal static readonly ulong[] jjbitVec3 = new ulong[]{0xffffffffffffffffL, 0xffffffffffffffffL, 0xffffL, 0x0L}; > internal static readonly ulong[] jjbitVec4 = new ulong[]{0xffffffffffffffffL, 0xffffffffffffffffL, 0x0L, 0x0L}; > internal static readonly ulong[] jjbitVec5 = new ulong[]{0x3fffffffffffL, 0x0L, 0x0L, 0x0L}; >- internal static readonly ulong[] jjbitVec6 = new ulong[]{0x1600L, 0x0L, 0x0L, 0x0L}; >- internal static readonly ulong[] jjbitVec7 = new ulong[]{0x0L, 0xffc000000000L, 0x0L, 0xffc000000000L}; >- internal static readonly ulong[] jjbitVec8 = new ulong[]{0x0L, 0x3ff00000000L, 0x0L, 0x3ff000000000000L}; >- internal static readonly ulong[] jjbitVec9 = new ulong[]{0x0L, 0xffc000000000L, 0x0L, 0xff8000000000L}; >- internal static readonly ulong[] jjbitVec10 = new ulong[]{0x0L, 0xffc000000000L, 0x0L, 0x0L}; >- internal static readonly ulong[] jjbitVec11 = new ulong[]{0x0L, 0x3ff0000L, 0x0L, 0x3ff0000L}; >- internal static readonly ulong[] jjbitVec12 = new ulong[]{0x0L, 0x3ffL, 0x0L, 0x0L}; >- internal static readonly ulong[] jjbitVec13 = new ulong[]{0xfffffffeL, 0x0L, 0x0L, 0x0L}; >- internal static readonly ulong[] jjbitVec14 = new ulong[]{0x0L, 0x0L, 0x0L, 0xff7fffffff7fffffL}; >+ internal static readonly ulong[] jjbitVec6 = new ulong[]{0xffffffffffffffffL, 0xffffffffffffffffL, 0xfffffffffL, 0x0L}; >+ internal static readonly ulong[] jjbitVec7 = new ulong[]{0x1600L, 0x0L, 0x0L, 0x0L}; >+ internal static readonly ulong[] jjbitVec8 = new ulong[]{0x0L, 0xffc000000000L, 0x0L, 0xffc000000000L}; >+ internal static readonly ulong[] jjbitVec9 = new ulong[]{0x0L, 0x3ff00000000L, 0x0L, 0x3ff000000000000L}; >+ internal static readonly ulong[] jjbitVec10 = new ulong[]{0x0L, 0xffc000000000L, 0x0L, 0xff8000000000L}; >+ internal static readonly ulong[] jjbitVec11 = new ulong[]{0x0L, 0xffc000000000L, 0x0L, 0x0L}; >+ internal static readonly ulong[] jjbitVec12 = new ulong[]{0x0L, 0x3ff0000L, 0x0L, 0x3ff0000L}; >+ internal static readonly ulong[] jjbitVec13 = new ulong[]{0x0L, 0x3ffL, 0x0L, 0x0L}; >+ internal static readonly ulong[] jjbitVec14 = new ulong[]{0xfffffffeL, 0x0L, 0x0L, 0x0L}; >+ internal static readonly ulong[] jjbitVec15 = new ulong[]{0x0L, 0x0L, 0x0L, 0xff7fffffff7fffffL}; > private int JjMoveNfa_0(int startState, int curPos) > { > int startsAt = 0; >@@ -1165,6 +1166,9 @@ > > case 61: > return ((jjbitVec5[i2] & l2) != (ulong) 0L); >+ >+ case 215: >+ return ((jjbitVec6[i2] & l2) != (ulong) 0L); > > default: > if ((jjbitVec0[i1] & l1) != (ulong) 0L) >@@ -1179,23 +1183,23 @@ > { > > case 6: >- return ((jjbitVec8[i2] & l2) != (ulong) 0L); >- >- case 11: > return ((jjbitVec9[i2] & l2) != (ulong) 0L); > >- case 13: >+ case 11: > return ((jjbitVec10[i2] & l2) != (ulong) 0L); > >- case 14: >+ case 13: > return ((jjbitVec11[i2] & l2) != (ulong) 0L); > >- case 16: >+ case 14: > return ((jjbitVec12[i2] & l2) != (ulong) 0L); >+ >+ case 16: >+ return ((jjbitVec13[i2] & l2) != (ulong) 0L); > > default: >- if ((jjbitVec6[i1] & l1) != (ulong) 0L) >- if ((jjbitVec7[i2] & l2) == (ulong) 0L) >+ if ((jjbitVec7[i1] & l1) != (ulong) 0L) >+ if ((jjbitVec8[i2] & l2) == (ulong) 0L) > return false; > else > return true; >@@ -1209,10 +1213,10 @@ > { > > case 0: >- return ((jjbitVec14[i2] & l2) != (ulong) 0L); >+ return ((jjbitVec15[i2] & l2) != (ulong) 0L); > > default: >- if ((jjbitVec13[i1] & l1) != (ulong) 0L) >+ if ((jjbitVec14[i1] & l1) != (ulong) 0L) > return true; > return false; >
How did you generate this patch? Did you patch the source directly, or did you get it from the Java Lucene, or what? The code is basically impossible to follow, so I'm not sure how to test it comprehensively.
(In reply to comment #5) > How did you generate this patch? Did you patch the source directly, or did you > get it from the Java Lucene, or what? The code is basically impossible to > follow, so I'm not sure how to test it comprehensively. > First, I added unicode Hangul area in StandardTokenizer.jj and generated java code with JavaCC. And then I manually applied changes to StandardTokenizerTokenManager.cs.
after apply from dittos' patch, I can search hangul syllable("\uac00"-"\ud7a3"), but can't search hangul jamo(decomposited)("\u1100"-"\u11f9") code. so I add hangul jamo area in StandardTokenizer.jj.
Created attachment 58132 [details] [review] add Hangul Jamo and Syllable area in Lucene.Net
I've committed this patch, thanks!
Young-Ho Cha and dittos, Recently we merged Lucene.Net-1.9.1 to CVS head. The patch that was merged earlier didnot apply cleanly and then it looked like it wont be necessary in 1.9.1. As of now, the CVS head doesnt contain the patch. Could one of you test the current CVS to see everything is working correctly as before ? Thanks, - dBera
There is Hangul syllable support in CVS Head, but no support for Hangul Jamo. I'll attach a patch for Hangul Jamo support.
Created attachment 74265 [details] [review] Hangul Jamo support patch for CVS Head
Sorry, I missed this patch for the 0.2.11 release, I've just checked it in. I'm also going to submit the .jj patch upstream to Lucene.