GNOME Bugzilla – Bug 445316
Read psd layer names longer than 31 characters.
Last modified: 2008-01-15 13:27:56 UTC
Currently, the psd plugin only reads the short layer name (31 chars max). By parsing the long unicode name block contained within the layer extra data, we can get the longer layer name. The layer extra data section also contains more blocks of info related to that layer. For example, layer color, layer id, layer effects, and layer folders are all contained in blocks similar to the long unicode name block. I'll include a patch for psd.c that implements parsing a few of these blocks.
Created attachment 89577 [details] [review] Patch for gimp-2.2.15/plug-ins/common/psd.c
Please try to follow the GIMP coding style as defined in the file HACKING. We would also very much appreciate a patch against SVN trunk or against a recent 2.3 release as we will not add any new features to gimp-2.2.
Well, SVN would explain why I couldn't find the cvs server described in http://www.gimp.org/source/howtos/stable-cvs-get.html didn't work for me. Ok, so I think that I've created a more useful patch this weekend. I tried to follow the style correctly but the file seems a little inconsistent so let me know if I've missed something. I know how to parse the layer styles info found in the blocks lrFX and lfx2 but they're pretty complex and I think it may clutter up the code before gimp is ready to use the data from them.
Created attachment 89768 [details] [review] Patch against svn. Adds support for long layer names in Photoshop files.
This looks a lot better. The routine getunicodepascalstring() does not seem to do the right thing though. I assume that the names are encoded as UCS-2 (see http://en.wikipedia.org/wiki/UTF-16). So instead of skipping the higher-order byte, you should use g_convert() to convert to UTF-8 (the character encoding used in GIMP). There's code in app/core/gimpbrush-load.c that does something similar. You may want to use that as an example.
Created attachment 89778 [details] [review] Adds support for long layer names in Photoshop files (converts unicode correctly). You're right but I wasn't sure how to do that correctly. Thanks for the reference. Here's a new patch to correct getunicodepascalstring().
Is the encoding really UTF-16? From my experience with PS, it's more likely UCS-2. This doesn't make a difference for characters in the Basic Multilingual Plane but I'd prefer if we could get this right. Is there any documentation on the file format that could help us to answer this question?
I don't know of any public docs about it. I only use a hex editor to look for patterns in the psd files, so I can't answer that. I just picked the newer one on the assumption that Adobe would be adapting it as they release newer versions.
They can't really adapt a newer encoding in a file format without introducing new tags. So it's safer to assume that the names are in UCS-2 encoding. I have applied your patch, changed the encoding to UCS-2 and did some minor coding style cleanups. Thanks a lot for this contribution. Now I wonder if we should also add support for writing long layer names to psd-save. 2007-06-12 Sven Neumann <sven@gimp.org> * plug-ins/common/psd-load.c: applied slightly modified patch from Eric Ross that adds support for loading long layer names from the extra layer data section (bug #445316).
Thanks for adding that. Shouldn't be too difficult to add support for saving them. I'll look into it this week.
Created attachment 89865 [details] [review] Patch to write long layer names to psd files. Here's my first attempt at writing long layer names. Seems to work for everything I throw at it. Not quite sure about style yet.
Should the long name always be written? It would perhaps make sense to only write it if the layer name is longer than 31 characters or contains non-ASCII characters. What does PS do? Your code is problematic because it doesn't check if the UTF-8 to UCS-2 conversion has succeeded. Not all strings encoded in UTF-8 are representable in UCS-2 encoding.
Oh, and could you please open a new bug report for saving.
Current versions of PS always save the layer name in the current character set in an image resource and as UTF-16 in a layer resource block. Attached below is an example of a file where the layer names do not display under windows xp.
Created attachment 90381 [details] PSD with broken layer names in WinXP
(In reply to comment #14) > Current versions of PS always save the layer name in the current character set > in an image resource and as UTF-16 in a layer resource block. Attached below is > an example of a file where the layer names do not display under windows xp. > I can see where the slice name is in the image resources but not the layer name. I viewed this file with PS in Windows and Gimp in Linux and I can see 2 layers in this image, one named "Background" and another named "Color Fill 1". What does gimp show in Windows for the layer names?
Sorry, I was being dim. It's alpha channel names that are stored in the image resource, ascii layer names are stored in the layer record. Screenshot of windows layer names follows.
Created attachment 90417 [details] sScreenshot of layer names under windows
(In reply to comment #18) > Created an attachment (id=90417) [edit] > sScreenshot of layer names under windows > Interesting. My patch is making a call to g_convert() to convert the Unicode string into a UTF-8 string. I think that maybe your font isn't supporting the UTF-8 string that's being returned. I'm not sure which part is at fault here but I'm inclined to think that it's the font being used. I'll have to look into it some more.
The string only consists of ASCII characters so it's very unlikely that the font is to blame here. More likely what's happening is that iconv on Windows doesn't support the conversion we are asking for and what you are seeing are the fallback characters (and apparently you don't have a font to render those). Eric, please try to be more precise when it comes to Unicode and encodings. An UTF-8 encoded string is also an Unicode string.
John, can you make out the numbers written into the boxes? They are unreadable on your screenshot. But perhaps if you increased the font size, you might be able to make out the numbers. These are the code points and they might give us a hint on what's going wrong here.
(In reply to comment #20) > The string only consists of ASCII characters so it's very unlikely that the > font is to blame here. > > More likely what's happening is that iconv on Windows doesn't support the > conversion we are asking for and what you are seeing are the fallback > characters (and apparently you don't have a font to render those). > > Eric, please try to be more precise when it comes to Unicode and encodings. An > UTF-8 encoded string is also an Unicode string. > I used 'Unicode string' to refer to the string provided by Photoshop and used 'UTF-8 string' to refer to the string that was being returned from g_convert(). I suppose that I should avoid such shortcuts if since it's confusing.
(In reply to comment #21) > John, can you make out the numbers written into the boxes? They are unreadable > on your screenshot. But perhaps if you increased the font size, you might be > able to make out the numbers. These are the code points and they might give us > a hint on what's going wrong here. > They are all four zeros. If this is real and not just pretty pictures from windows it might suggest that the byte swapping is not working as it should be for the double byte unicode characters. Also I would suspect that recent versions of ps use utf-16 not ucs-2 as the character encoding as this is what is specified in the xmp documentation.
How does the XMP specification apply here? For the strings used here, it should also not make a difference.
The XMP spec is the most recent freely available spec from adobe relating to PSD files, however as you say it would not make a difference with these strings.
Can you please add a link to that spec to this bug-report then?
A google search turned up the URL http://www.adobe.com/devnet/xmp/ which has some information about the XMP format and a link to the PDF file containing the spec.
Created attachment 90671 [details] [review] Patch to fix layer name display under windows This patch fixes the display of unicode layer names under windows (also tested on Fedora 7) and fixes a crash if the UTF8 representation of the short layer name contains multibyte characters (a regression from 2.2).
If we use g_utf16_to_utf8() then we should probably also use g_utf8_to_utf16() in psd-save.c. For now I have committed the uncontroversial part of this patch: 2007-07-06 Sven Neumann <sven@gimp.org> * plug-ins/common/psd-load.c (do_layer_record): applied part of a patch from John Marshall that fixes handling of the short layer name (bug #445316).
Reopening this bug since we still need to fix the encoding issue.
Created attachment 91299 [details] [review] patch to change psd-load and psd-save to use UTF16 instead of UCS-2 This is an untested patch that I propose as a solution for this issue. Please review it and comment.
With this patch applied, I can save a PSD file with a long chinese layer name. Opening this file in GIMP yields the same layer name. Now we need someone to open such a file in PS and to test the loader with a file written by PS.
Sven, With the patch applied to my windows build of Gimp I can save PSD files with layer names containing 2 byte characters which display correctly in photoshop CS3. I can also confirm that files saved from ps with 2 byte characters in the layer names load correctly in GIMP.
Thanks for testing, I have committed it then. 2007-07-06 Sven Neumann <sven@gimp.org> * plug-ins/common/psd-load.c * plug-ins/common/psd-save.c: use UTF-16 encoding instead of UCS-2 for layer names (bug #445316).