Bug 520116 – g_utf8_strlcpy()

After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.

Bug 520116 - g_utf8_strlcpy()


Summary:	g_utf8_strlcpy()


Status:	RESOLVED WONTFIX

Product:	glib
Classification:	Platform
Component:	i18n
Version:	unspecified
Hardware:	Other Linux

Importance:	Normal enhancement
Target Milestone:	---
Assigned To:	gtkdev
QA Contact:	gtkdev

URL:
Whiteboard:

Depends on:
Blocks:

Reported:	2008-03-03 15:38 UTC by Behdad Esfahbod
Modified:	2018-02-08 00:18 UTC

See Also:
GNOME target:	---
GNOME version:	---

Attachments
Proposed implementation (1.07 KB, text/plain) 2008-11-03 15:55 UTC, Philip Page		Details
Test cases. (1.54 KB, text/plain) 2008-11-03 15:56 UTC, Philip Page		Details
docs: Clarify dest requirements of g_utf8_strncpy() (867 bytes, patch) 2017-11-28 13:27 UTC, Patrick Griffis (tingping)	none	Details \| Review
Add g_utf8_strlcpy() (4.72 KB, patch) 2017-11-28 13:28 UTC, Patrick Griffis (tingping)	none	Details \| Review
Add g_utf8_strlcat() (3.07 KB, patch) 2017-11-28 13:28 UTC, Patrick Griffis (tingping)	none	Details \| Review
docs: Clarify dest requirements of g_utf8_strncpy() (803 bytes, patch) 2017-11-28 13:31 UTC, Patrick Griffis (tingping)	committed	Details \| Review
Add g_utf8_strlcpy() (4.74 KB, patch) 2017-11-28 18:42 UTC, Patrick Griffis (tingping)	none	Details \| Review
Add g_utf8_strlcat() (3.09 KB, patch) 2017-11-28 18:43 UTC, Patrick Griffis (tingping)	none	Details \| Review
Add g_utf8_strlcat() (3.10 KB, patch) 2017-11-28 18:48 UTC, Patrick Griffis (tingping)	none	Details \| Review

Description Behdad Esfahbod 2008-03-03 15:38:22 UTC

Works like g_strlcpy(), but doesn't copy a partial character at the end of the buffer.  Will copy whole UTF-8 characters only, as much as fits.

Comment 1 Philip Page 2008-11-03 15:55:08 UTC

Created attachment 121880 [details]
Proposed implementation

Comment 2 Philip Page 2008-11-03 15:56:09 UTC

Created attachment 121881 [details]
Test cases.

Comment 3 Patrick Griffis (tingping) 2017-11-28 13:27:57 UTC

Created attachment 364558 [details] [review]
docs: Clarify dest requirements of g_utf8_strncpy()

Comment 4 Patrick Griffis (tingping) 2017-11-28 13:28:14 UTC

Created attachment 364559 [details] [review]
Add g_utf8_strlcpy()

Comment 5 Patrick Griffis (tingping) 2017-11-28 13:28:31 UTC

Created attachment 364560 [details] [review]
Add g_utf8_strlcat()

Comment 6 Patrick Griffis (tingping) 2017-11-28 13:31:22 UTC

Created attachment 364561 [details] [review]
docs: Clarify dest requirements of g_utf8_strncpy()

Comment 7 Emmanuele Bassi (:ebassi) 2017-11-28 14:04:30 UTC

Review of attachment 364559 [details] [review]:

::: glib/gutf8.c
@@ +458,3 @@
+g_utf8_strlcpy (gchar       *dest,
+                const gchar *src,
+                size_t       n)

This should be `gsize`.

@@ +460,3 @@
+                size_t       n)
+{
+  register const gchar *s = src;

`register` is not really used, unless you're targeting a compiler from the '90s.

@@ +463,3 @@
+  while (s - src < n  &&  *s)
+    {
+      s = g_utf8_next_char(s);

Coding style:

 - single statement blocks do not need curly braces
 - missing space between function name and parenthesis

@@ +467,3 @@
+  if (s - src >= n)
+    {
+      /* We need to truncate; back up one. */

As above, coding style issues:

 - single statement blocks do not need curly braces
 - missing space between function name and parenthesis

::: glib/tests/utf8-misc.c
@@ +76,3 @@
 
+static void
+test_utf8_strlcpy (void)

Coding style throughout: missing space between function name and parenthesis.

Comment 8 Emmanuele Bassi (:ebassi) 2017-11-28 14:06:00 UTC

Review of attachment 364560 [details] [review]:

::: glib/gunicode.h
@@ +775,3 @@
 
+GLIB_AVAILABLE_IN_2_56
+size_t   g_utf8_strlcat           (gchar       *dest,

Should be `gsize`.

::: glib/gutf8.c
@@ +500,3 @@
+ *
+ * Returns: Length in bytes of @src
+ **/

Missing `Since` annotation, and the gtk-doc stanza should close with `*/`.

@@ +501,3 @@
+ * Returns: Length in bytes of @src
+ **/
+size_t

This should be `gsize`.

@@ +504,3 @@
+g_utf8_strlcat (gchar       *dest,
+                const gchar *src,
+                size_t       n)

This should be `gsize`.

Comment 9 Emmanuele Bassi (:ebassi) 2017-11-28 14:06:31 UTC

Review of attachment 364561 [details] [review]:

Looks good

Comment 10 Patrick Griffis (tingping) 2017-11-28 18:42:55 UTC

Created attachment 364584 [details] [review]
Add g_utf8_strlcpy()

Comment 11 Patrick Griffis (tingping) 2017-11-28 18:43:09 UTC

Created attachment 364585 [details] [review]
Add g_utf8_strlcat()

Comment 12 Patrick Griffis (tingping) 2017-11-28 18:48:27 UTC

Created attachment 364587 [details] [review]
Add g_utf8_strlcat()

Comment 13 Philip Withnall 2018-02-03 11:14:16 UTC

Comment on attachment 364561 [details] [review]
docs: Clarify dest requirements of g_utf8_strncpy()

I pushed the a_c-n patch with a minor wording tweak.

Attachment 364561 [details] pushed as 1c0bed9 - docs: Clarify dest requirements of g_utf8_strncpy()

Comment 14 Philip Withnall 2018-02-03 11:18:59 UTC

Taking a step back, what’s the use case for these functions? Copying UTF-8 into fixed-size buffers: but who uses fixed-size buffers? i.e. Which applications/libraries are lined up to use this, and what’s the reason they’re not using g_strdup()? I’m sure there are good answers to all of these questions, but I’d rather not take more API into GLib without knowing them.

Comment 15 Philip Page 2018-02-03 22:57:23 UTC

When I initially needed it, we had fixed size columns in a database. We naturally wanted to preserve the validity of the UTF-8 string even if we had to truncate.

In the intervening nine years (!) we have moved on to C++ and don't use glib anymore.

Comment 16 Philip Withnall 2018-02-04 10:50:16 UTC

(In reply to Philip Page from comment #15)
> When I initially needed it, we had fixed size columns in a database. We
> naturally wanted to preserve the validity of the UTF-8 string even if we had
> to truncate.

Yeah, that makes sense. I’m not sure it’s a general enough use case for GLib to cater to, though.

> In the intervening nine years (!) we have moved on to C++ and don't use glib
> anymore.

Sorry for the delay, and thanks for replying. It’s useful to get this feedback.

Since the docs fix has been pushed, I’m going to close this as WONTFIX. Patrick, if you have a use case which is still relevant, please re-open the report with it and we can consider these APIs further.

Comment 17 Patrick Griffis (tingping) 2018-02-08 00:18:16 UTC

(In reply to Philip Withnall from comment #14)
> Taking a step back, what’s the use case for these functions? Copying UTF-8
> into fixed-size buffers: but who uses fixed-size buffers? i.e. Which
> applications/libraries are lined up to use this, and what’s the reason
> they’re not using g_strdup()? I’m sure there are good answers to all of
> these questions, but I’d rather not take more API into GLib without knowing
> them.

Hexchat uses them extensively, largely this is just because of legacy though. There are some places interacting with fixed-limit protocols where you still don't want to have invalid utf-8 and at a glance there isn't even an allocating API in glib to copy valid utf-8 up-to bytes limit?