GNOME Bugzilla – Bug 615428
New column 'is_container' required in sym_kind table
Last modified: 2010-05-08 19:15:22 UTC
A symbol can have potential children and the views need to know this, however this information is not retrievable from database. Given a symbol A, we want to know if it has children. At first glance I thought presence of scope_definition_id > 0 would mean it has children, but it turns out that's not useful information because symbols that aren't ought to have children (such as methods) also have scope_definition_id > 0. One possible solution is to introduce 'has_child' column in sym_kind table. We know for sure which kinds normally have children (namespace, class, enum, struct, union etc.). This information will be sufficient for views to do their jobs. Views should be able to cope with occasional 0 children for these kinds, but in general they are expected to have children.
symbol_db_engine_get_scope_members_by_symbol_id () returns the children (the members), if they exists. You should see the 'symbol' (class, struct, namespace etc) as a scope. The scopes may have (and often have) children. TYPE_SCOPE_CONTAINER is defined on libanjuta.idl to define the symbols that may have children. This said, I find the adding of a field 'has_child' on sym_kind too much cpu-intensive at population time: we would require to query for children at every symbol insertion, which isn't really fast. It's also a redundant information because we know if the scope have members: query_str = g_strdup_printf ("SELECT symbol.symbol_id AS symbol_id, " "symbol.name AS name, symbol.file_position AS file_position, " "symbol.is_file_scope AS is_file_scope, symbol.signature AS signature, " "symbol.returntype AS returntype " "%s FROM symbol a, symbol symbol " "%s WHERE a.symbol_id = ## /* name:'scopeparentsymid' type:gint */ " "AND symbol.scope_id = a.scope_definition_id %s " "AND symbol.scope_id > 0 order by name %s %s", info_data->str, join_data->str, file_path, limit, offset);
to speed up the has_child information you can call that function with limit 1, so that it would stop as soon as it finds a child.
I thought about this some more: sqlite> select * from sym_kind; 1|macro 2|prototype 3|typedef 4|struct 5|member 6|enumerator 7|function 8|variable 9|field 10|class 11|method 12|delegate 13|enum 14|union 15|namespace it wouldn't be a good thing to add the 'has_child' column because the elements aren't unique and are reused elsewhere.
(In reply to comment #1) > You should see the 'symbol' (class, struct, namespace etc) as a scope. > The scopes may have (and often have) children. > Seeing them as scope is not the issue. But the problem is methods and other are also 'scope' so there is no between them. I am strictly talking about data organization in database. I have no problem getting members of a given scope (like your function does), but the trouble is deciding which symbols should be looked for children. > This said, I find the adding of a field 'has_child' on sym_kind too much > cpu-intensive at population time: we would require to query for children at > every symbol insertion, which isn't really fast. That's not what I wanted. The information is static and can be done once during table creation. It tells which kinds are expected to have children.
(In reply to comment #4) > > That's not what I wanted. The information is static and can be done once during > table creation. It tells which kinds are expected to have children. Something like: sqlite> select * from sym_kind; 1|macro | 0 2|prototype | 0 3|typedef | 0 4|struct | 1 5|member | 0 6|enumerator | 0 7|function | 0 8|variable | 0 9|field | 0 10|class | 1 11|method| 1 12|delegate| 0 13|enum | 1 14|union | 1 15|namespace | 1
(In reply to comment #5) > > Something like: > > sqlite> select * from sym_kind; > 1|macro | 0 > 2|prototype | 0 > 3|typedef | 0 > 4|struct | 1 > 5|member | 0 > 6|enumerator | 0 > 7|function | 0 > 8|variable | 0 > 9|field | 0 > 10|class | 1 > 11|method| 1 > 12|delegate| 0 > 13|enum | 1 > 14|union | 1 > 15|namespace | 1 Opps, typo there. It's more like: sqlite> select * from sym_kind; 1|macro | 0 2|prototype | 0 3|typedef | 0 4|struct | 1 5|member | 0 6|enumerator | 0 7|function | 0 8|variable | 0 9|field | 0 10|class | 1 11|method| 0 12|delegate| 0 13|enum | 1 14|union | 1 15|namespace | 1
(In reply to comment #6) > sqlite> select * from sym_kind; > 1|macro | 0 > 2|prototype | 0 > 3|typedef | 0 > 4|struct | 1 > 5|member | 0 > 6|enumerator | 0 > 7|function | 0 > 8|variable | 0 > 9|field | 0 > 10|class | 1 > 11|method| 0 > 12|delegate| 0 > 13|enum | 1 > 14|union | 1 > 15|namespace | 1 Well, I see your point, but I still don't understand why you'd like to add that already known information which is not subject to change through projects. We can hardcode the relations symbol-children into the core and parse them when needed. Maybe you need that information to have a quicker query? That would be the only rason IMHO, which is probably faster than having a statement like [..] sym_kind.kind_name IN ('class', 'namespace', 'union', ....)
(In reply to comment #7) > > Well, I see your point, but I still don't understand why you'd like to add that > already known information which is not subject to change through projects. > We can hardcode the relations symbol-children into the core and parse them when > needed. I presume by 'already know information', you mean by running a query to see if a symbol has child? My point is to avoid running queries for every single symbol to determine if they have children. In a view showing 25 symbols, that's 25 queries to make to see if they have children. All a view needs to know is if a symbol has "potential" children, and for that running only 1 query (for main listing), instead of 26 queries is obvious. Or, do you have any example query that would give this information for a selected list of symbols? For example, please modify following query to introduce a boolean column (has_child), without making x25 queries, or without hardcoding this information. "SELECT symbol_id FROM symbol" If you have it, then the bug is obviously INVALID and I will use that. > Maybe you need that information to have a quicker query? That would be the only > rason IMHO, which is probably faster than having a statement like [..] > sym_kind.kind_name IN ('class', 'namespace', 'union', ....) Yes, that is indeed the reason. My goal is to get better and cleaner queries, not pile of band-aids (see bug 615429). Cleaner the queries, more correct database is structured for our needs. I hope you agree. "sym_kind.kind_name IN ('class', 'namespace', 'union', ....)" is a hack because database lacks this info. This information is assumed by application in wrong place (and potentially in many places if needed more). You mentioned you also hardcoded it in libanjuta.idl: that's cleary -1 again. So, now we have 3 places where this is hardcoded. what if in future a new kind 'foobar' is added in the table that has potential children (or as you called "container"), you wouldn't want to hunt down all places and fix them. I mean, this is common sense. So, unless there is a better way to get this done (i.e. solve the above SQL statement), this is the next best thing. Hence, whatever provides the list 'class', 'namespace', 'union', 'members' etc. is the right party to provide and maintain this information, which is the sym_kind table.
(In reply to comment #8) > Hence, whatever provides the list > 'class', 'namespace', 'union', 'members' etc. is the right party to provide and > maintain this information, which is the sym_kind table. I thought this would be trivial to do. May be I am wrong. If you have any bigger concerns why this can't be done cleanly, I would be happy to hear. We can then try to weight which one is less evil :)
Also, actually if you don't the term 'has_child'. Feel free to call it 'is_container', which is a better description of what it is.
(In reply to comment #8) > (In reply to comment #7) > > > > Well, I see your point, but I still don't understand why you'd like to add that > > already known information which is not subject to change through projects. > > We can hardcode the relations symbol-children into the core and parse them when > > needed. > > I presume by 'already know information', you mean by running a query to see if > a symbol has child? nope by 'already known information' I meant assign 0 or 1 if a sym_kind has a container. But I agree with your view now. > My point is to avoid running queries for every single > symbol to determine if they have children. In a view showing 25 symbols, that's > 25 queries to make to see if they have children. All a view needs to know is if > a symbol has "potential" children, and for that running only 1 query (for main > listing), instead of 26 queries is obvious. indeed, this makes sense. > > Or, do you have any example query that would give this information for a > selected list of symbols? For example, please modify following query to > introduce a boolean column (has_child), without making x25 queries, or without > hardcoding this information. > > "SELECT symbol_id FROM symbol" > > If you have it, then the bug is obviously INVALID and I will use that. > > > Maybe you need that information to have a quicker query? That would be the only > > rason IMHO, which is probably faster than having a statement like [..] > > sym_kind.kind_name IN ('class', 'namespace', 'union', ....) > > Yes, that is indeed the reason. My goal is to get better and cleaner queries, > not pile of band-aids (see bug 615429). Cleaner the queries, more correct > database is structured for our needs. I hope you agree. yes now I undestand the meaning of the bug, it wasn't clear to me before. I'll take care of adding the column 'is_container' to sym_kind. I prefer it over has_child. > > "sym_kind.kind_name IN ('class', 'namespace', 'union', ....)" is a hack because > database lacks this info. This information is assumed by application in wrong > place (and potentially in many places if needed more). You mentioned you also > hardcoded it in libanjuta.idl: that's cleary -1 again. So, now we have 3 places > where this is hardcoded. what if in future a new kind 'foobar' is added in the > table that has potential children (or as you called "container"), you wouldn't > want to hunt down all places and fix them. I mean, this is common sense. maintaining it on libanjuta.idl seems ok to me. User can query for a symbol with IAnjutaSymbolType TYPE_SCOPE_CONTAINER. And that's used by cxxparser, language-support and so on. If a new container kind is added then we'd only add it on libanjuta.idl and on sym_kind (populated once). From what I see the views don't care if a symbol is a class or a namespace or a struct, but they just want to know if the symbol they'll going to display may have children or not. So fixing this one should also fix the point 2) on bug #615429. I'll take the bug and fix it ASAP.
While you are at it, could you fix the documentation in the libanjuta.idl file? There are some copy&paste mistakes for the async methods and the enums are not documented correctly (gtk-doc has complains about the format). Thanks!
(In reply to comment #11) > > yes now I undestand the meaning of the bug, it wasn't clear to me before. > I'll take care of adding the column 'is_container' to sym_kind. > I prefer it over has_child. > Thanks a lot! Sorry that I wasn't clear before.
(In reply to comment #12) > While you are at it, could you fix the documentation in the libanjuta.idl file? > There are some copy&paste mistakes for the async methods and the enums are not > documented correctly (gtk-doc has complains about the format). > > Thanks! I'll open another bug for this. I'm closing this one as I've just pushed the changes.
(In reply to comment #14) > > I'll open another bug for this. I'm closing this one as I've just pushed the > changes. Thanks! I will check it soon.
Just one small concern. I saw you did it in http://git.gnome.org/browse/anjuta/commit/?id=c37788e999022144452bf1ed9686e3d630aa38b2 . There seem to be no protection mechanism to avoid breaking against older database that user might already have. How do you normally deal with backward compatibilities for db schema change? One simple mitigation is to reset the db if some compatibility version changes.
We maintain a database version and Massimo upgraded it properly in the next commit: http://git.gnome.org/browse/anjuta/commit/?id=f500340e31aa5174b1d8a36f97c011e55adb4793
(In reply to comment #17) > We maintain a database version and Massimo upgraded it properly in the next > commit: > http://git.gnome.org/browse/anjuta/commit/?id=f500340e31aa5174b1d8a36f97c011e55adb4793 Okay, great. I can then start using it.
It's not quite fixed yet: sqlite> select * from sym_kind; 1|macro|0 2|prototype|0 3|typedef|1 4|struct|1 5|member|0 6|enumerator|1 7|function|0 8|variable|0 9|field|0 10|class|1 11|method|0 12|delegate|0 13|enum|1 14|union|1 15|namespace|1 enumerator and typedef should not be is_container = 1. They don't have any scope members in database (which is correct since they are not containers in reality). I get empty expandable nodes for them in tree views.
I removed enumerator and typedef from scope container types from libanjuta.idl: http://git.gnome.org/browse/anjuta/commit/?id=07b3fa437e3c708abc87fbded3e517f17fddf53d Documentation says that IANJUTA_SYMBOL_TYPE_SCOPE_CONTAINER is types that can take scope, then the fix is correct since db doesn't have scopes for them. The only places I found it being used is in language-support-cpp-java/cxxparser/engine-parser.cpp. I hope it does not affect anything there. Resolving back to fixed.
well, unfortunately it affects the completion of such things: typedef struct _asd { char a; int b; } asd; void foo () { asd a; a. } typedef, when it's used for a struct, is being seen as a container. Currently it's a special case that mostly hacks structs. This is due to the 'typeref' field that ctags put on tags file when it finds struct, enums, classes (which all are containers themselves). Have a try with some files and the command anjuta-tags --sort=no --fields=afmiKlnsStTz --c++-kinds=+p main.cc && less tags Actually the patch isn't complete because it lacks the support on symbol-db-engine-priv.h, line 295. I'm reopening the bug because we should do the decision whether to maintain the things as they are (with special case) or try to fix them (don't know how much work it could be - at that time I made this choice after some thoughts but I don't remember the real reason for this). Secondly, for the cxx engine parser and for these kinds of bugfixes I think we really need a testing unit plugin. I started something at cxxparser coding time, but then left there because the real cxxparser uses a IAnjutaSymbolManager instance, which is available when Anjuta is offline. Using mocks and stubs wasn't my best aim, too much dependencies and too much difficulties to implement that in C. Integrating something like http://herzi.github.com/gutachter/ would be great. This would avoid us to fall into regressions. what do you think?
(In reply to comment #21) > well, unfortunately it affects the completion of such things: > > > typedef struct _asd { > char a; > int b; > > } asd; > > void foo () > { > asd a; > a. > > } > If you check the db, it does not claim 'a' and 'b' to be in scope of asd anyways. So claiming that asd is a container seems bogus. I don't know how autocompletion manages to get members of "asd->". Can you explain how it is determined? For example, I looked in DB for "typedef struct _SymbolDBModelFilePriv SymbolDBModelFilePriv" sqlite> select scope_id, scope_definition_id from symbol where name = 'SymbolDBModelFilePriv'; 5045|5034 sqlite> select * from symbol where scope_id = 5034; sqlite> > > typedef, when it's used for a struct, is being seen as a container. > Currently it's a special case that mostly hacks structs. > This is due to the 'typeref' field that ctags put on tags file when it finds > struct, enums, classes (which all are containers themselves). > That is not a correct way to handle it. typedef's containership or scope has nothing to do with whatever is typedefs to. For example, there is no container in 'typedef int myint'. The relationship is purely of aliasing and therefore not to be mixed with is_container or scoping. > > Have a try with some files and the command > anjuta-tags --sort=no --fields=afmiKlnsStTz --c++-kinds=+p main.cc && less > tags > A typical ctags with that for "typedef struct" produces: kind:typedef line:39 language:C++ typeref:struct:_SymbolDBModelPriv Which seems quite logical. If I guess it correctly, it say the symbol is typeref (or equivalent) to "struct:_SymbolDBModelPriv". What you need is a separate relationship in DB to associate a typedef with its alias/typeref. That is, yet another column in symbol table that holds its typeref's symbol ID. Then you can easily use this relationship in your code to nail down the final symbol which can be used to get scope members and such. It will also solve much complex and deep heirarchy such as: struct A { int a; int b; }; A* func1(); typedef (*func) FuncPtr; typedef AnotherFuncPtr FuncPtr; AnotherFuncPtr func_var Then, in theory, the autocomplete will easily solve: "func_var->" and produce "a" and "b", by chaining up typerefs. Do you think this can be accomplished with current ctags output? If we can, then this is probably the way to go about fixing this. In short, try to keep the relationships in DB clear and concrete. Mixing them or vague relationships will only spell trouble in later time. > > Actually the patch isn't complete because it lacks the support on > symbol-db-engine-priv.h, line 295. I didn't know you had it defined twice. Can you think of a way to define it once? For example, would it harm to use IAnjutaSymbol in symbol-db-engine-priv.[ch] directly? > I'm reopening the bug because we should do the decision whether to maintain the > things as they are (with special case) or try to fix them (don't know how much > work it could be - at that time I made this choice after some thoughts but I > don't remember the real reason for this). > The things as they were before was broken, so keeping it open and try to fix it proper is the way to go. It will also give us the insentive to come up with the right solution sooner too. Can you check if my proposal above is realistic? > Secondly, for the cxx engine parser and for these kinds of bugfixes I think we > really need a testing unit plugin. > I started something at cxxparser coding time, but then left there because the > real cxxparser uses a IAnjutaSymbolManager instance, which is available when > Anjuta is offline. Using mocks and stubs wasn't my best aim, too much > dependencies and too much difficulties to implement that in C. Integrating > something like > http://herzi.github.com/gutachter/ > would be great. > This would avoid us to fall into regressions. > what do you think? Unit tests would definitely be of good use, but it take lot of time to maintain it, so we have to do it in certain crucial areas only. symbol-db is such place definitely.
(In reply to comment #22) > (In reply to comment #21) > > well, unfortunately it affects the completion of such things: > > > > > > typedef struct _asd { > > char a; > > int b; > > > > } asd; > > > > void foo () > > { > > asd a; > > a. > > > > } > > > > If you check the db, it does not claim 'a' and 'b' to be in scope of asd > anyways. So claiming that asd is a container seems bogus. I don't know how > autocompletion manages to get members of "asd->". Can you explain how it is > determined? > Actually, this still seems to work despite the change. Your example with asd also works. For example, I have this: typedef struct _SymbolDBModelFilePriv SymbolDBModelFilePriv; struct _SymbolDBModelFilePriv { gchar *file_path; guint refresh_queue_id; GdaStatement *stmt; GdaSet *params; }; void func () { SymbolDBModelFilePriv *priv; priv-> [works] } Also, grepping symbol-db for SYMTYPE_SCOPE_CONTAINER finds no one else, except the new is_container thing, using it, so it's at least safe for inside symbol-db. So, I don't see anything going wrong.
(In reply to comment #23) > Actually, this still seems to work despite the change. Your example with asd > also works. > Nevermind that. I was dreaming. Recompiling language-support-cpp-java plugin was needed and it doesn't work anymore.
(In reply to comment #22) > (In reply to comment #21) > > well, unfortunately it affects the completion of such things: > > > > > > typedef struct _asd { > > char a; > > int b; > > > > } asd; > > > > void foo () > > { > > asd a; > > a. > > > > } > > > > If you check the db, it does not claim 'a' and 'b' to be in scope of asd > anyways. So claiming that asd is a container seems bogus. I don't know how > autocompletion manages to get members of "asd->". Can you explain how it is > determined? > Looking at the source of engine-parser.c, now I understand how it works. In function EngineParser::getCurrentSearchableScope (), you have following code: /* is it a typedef? In that case find the parent struct */ if (g_strcmp0 (ianjuta_symbol_get_extra_info_string (node, IANJUTA_SYMBOL_FIELD_KIND, NULL), "typedef") == 0) So, the original hack to treat typedef as scope container seems like just a convenience to include in search results executed before: ianjuta_symbol_manager_search_project(IANJUTA_SYMBOL_SCOPE_CONTAINER). If the search needs to be performed for containers *and* exceptionally for typedefs, an OR operator is more appropriate. Then, typdef neededn't be treated as container unnecessarily. I have committed http://git.gnome.org/browse/anjuta/commit/?id=07c288c1b802409eece62cdfc4858804c90c9167 to restore the typedef hack, so now autocompletion for them works (the part in switchMemberToContainer() was not necessary, which I reverted in following commit). So, about remaining typedef hack in cxxparser, you still need to get rid of it by treating typeref relationship separately like I explained in previous comment (call it "alias" in DB). That will make cxxparser more versatile to chain up easily. I will file a separate bug to record it so that we don't forget it.
sorry for delay in answering, I was totally busy. I have to test and think about the new 'alias' column in symbol-db, as it wasn't thought at design time: I'd like to be sure that we didn't leave it out for a reason or if we just forgot about this case. What you say is actually correct and makes sense. The fixing of this bug (or actually for the bug you opened) isn't immediate anyway. It affects population, querying and cxxparser. I was in the middle of having a redesing of the population flow, so things are a little bit more complicated. btw: which is the bug you opened to fix the typedef thing?
(In reply to comment #22) > > Unit tests would definitely be of good use, but it take lot of time to maintain > it, so we have to do it in certain crucial areas only. symbol-db is such place > definitely. I agree. I was thinking to enable a test case widget only for symbol-db and maybe under a #ifdef DEBUG condition. Same could be done for cxxparser. What do you think is the best option? Write a debug-only plugin to test these cases?