GNOME Bugzilla – Bug 527687
ustring::erase(iterator) incorrect for non ascii characters
Last modified: 2008-04-13 10:54:14 UTC
Please describe the problem: When erasing a character from a ustring with an iterator, if the iterator points to a non ascii character, the character doesn't appear to be erased correctly. Steps to reproduce: 1. Declare a ustring. 2. Put a non ascii character in it. 3. erase the character using an iterator which points to it. 4. print the string Actual results: a Glib::ConvertError exception is thrown. Expected results: The string is printed out successfully minus the character. Does this happen every time? Yes Other information:
Created attachment 109112 [details] test case which demonstrates bug
Created attachment 109134 [details] test_with_try_catch.cpp The same test, with try/catch blocks.
Something strange definitely seems to be happening. I don't understand why the first test case throws an error but the second one doesn't.
Created attachment 109152 [details] test case with try catch which actually shows the bug now It didn't like the s += x + "cd". So I split it into s += x and s += "cd". It prints the utf 8 character 0x0545 first, then prints Error3.
Thanks. I believe I have fixed this in glibmm svn trunk: 2008-04-13 Murray Cumming <murrayc@murrayc.com> * glib/glibmm/ustring.cc erase(): Create an end iterator and use it, instead of just using the std::string(iterator) erase implementation, because that only removes one byte, which can make the whole string invalid UTF-8. Bug #527687 (Jarro). Here is the new implementation. I suggest that you do something similar in your application code until this fix is widely available: ustring::iterator ustring::erase(ustring::iterator p) { ustring::iterator iter_end = p; ++iter_end; return iterator(string_.erase(p.base(), iter_end.base())); } I'm not sure if that's the most efficient implementation, so I'd welcome any patch to improve it.