After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.
Bug 527687 - ustring::erase(iterator) incorrect for non ascii characters
ustring::erase(iterator) incorrect for non ascii characters
Status: RESOLVED FIXED
Product: glibmm
Classification: Bindings
Component: strings
2.12.x
Other All
: Normal normal
: ---
Assigned To: gtkmm-forge
gtkmm-forge
Depends on:
Blocks:
 
 
Reported: 2008-04-12 11:12 UTC by jarro.2783
Modified: 2008-04-13 10:54 UTC
See Also:
GNOME target: ---
GNOME version: 2.17/2.18


Attachments
test case which demonstrates bug (356 bytes, text/x-c++src)
2008-04-12 11:20 UTC, jarro.2783
Details
test_with_try_catch.cpp (799 bytes, text/plain)
2008-04-12 18:40 UTC, Murray Cumming
Details
test case with try catch which actually shows the bug now (805 bytes, text/x-c++src)
2008-04-13 00:30 UTC, jarro.2783
Details

Description jarro.2783 2008-04-12 11:12:54 UTC
Please describe the problem:
When erasing a character from a ustring with an iterator, if the iterator points to a non ascii character, the character doesn't appear to be erased correctly.

Steps to reproduce:
1. Declare a ustring.
2. Put a non ascii character in it.
3. erase the character using an iterator which points to it.
4. print the string


Actual results:
a Glib::ConvertError exception is thrown.

Expected results:
The string is printed out successfully minus the character.

Does this happen every time?
Yes

Other information:
Comment 1 jarro.2783 2008-04-12 11:20:07 UTC
Created attachment 109112 [details]
test case which demonstrates bug
Comment 2 Murray Cumming 2008-04-12 18:40:24 UTC
Created attachment 109134 [details]
test_with_try_catch.cpp

The same test, with try/catch blocks.
Comment 3 Murray Cumming 2008-04-12 18:43:21 UTC
Something strange definitely seems to be happening. I don't understand why the first test case throws an error but the second one doesn't.
Comment 4 jarro.2783 2008-04-13 00:30:05 UTC
Created attachment 109152 [details]
test case with try catch which actually shows the bug now

It didn't like the s += x + "cd". So I split it into s += x and s += "cd". It prints the utf 8 character 0x0545 first, then prints Error3.
Comment 5 Murray Cumming 2008-04-13 10:54:14 UTC
Thanks.

I believe I have fixed this in glibmm svn trunk:

2008-04-13  Murray Cumming  <murrayc@murrayc.com>

	* glib/glibmm/ustring.cc erase(): Create an end iterator and use it, 
	instead of just using the std::string(iterator) erase implementation, 
	because that only removes one byte, which can make the whole string 
	invalid UTF-8.
	Bug #527687 (Jarro).


Here is the new implementation. I suggest that you do something similar in your application code until this fix is widely available:

ustring::iterator ustring::erase(ustring::iterator p)
{
  ustring::iterator iter_end = p;
  ++iter_end;

  return iterator(string_.erase(p.base(), iter_end.base()));
}

I'm not sure if that's the most efficient implementation, so I'd welcome any patch to improve it.