GNOME Bugzilla – Bug 615403
namespaces, and scope relationship in general, are weird in database
Last modified: 2020-11-06 20:23:00 UTC
A typical select from database: sqlite> select symbol_id, name, scope_definition_id, scope_id, file_position, file_path from symbol join file on symbol.file_defined_id = file_id where file.file_path = '/src/dbdimpl/sqlite/nmv-sqlite-cnx-drv.h' order by file_position; 3885|__NEMIVER_SQLITE_CNX_DRV_H__|2249|0|27|/src/dbdimpl/sqlite/nmv-sqlite-cnx-drv.h 3886|nemiver|1580|0|31|/src/dbdimpl/sqlite/nmv-sqlite-cnx-drv.h 3887|common|1549|1580|33|/src/dbdimpl/sqlite/nmv-sqlite-cnx-drv.h 3888|common|1549|1580|37|/src/dbdimpl/sqlite/nmv-sqlite-cnx-drv.h 3889|sqlite|2250|1549|38|/src/dbdimpl/sqlite/nmv-sqlite-cnx-drv.h 3910|SqliteCnxDrv|2265|2250|40|/src/dbdimpl/sqlite/nmv-sqlite-cnx-drv.h ... ... One might assume that 'SqliteCnxDrv' is in the scope of 'sqlite' namespace (sym_id = 3889) by virtue of it's scope_id == scope_definition_id = 2250. But that is not the case because there are many 'sqlite' symbols, all with scope_definition_id = 2250. sqlite> select symbol_id, name, scope_definition_id from symbol where scope_definition_id = 2250; 3889|sqlite|2250 3909|sqlite|2250 3944|sqlite|2250 sqlite> Same applies for commom (id = 3887) -> nemiver mapping in above example. sqlite> select symbol_id, name, scope_definition_id from symbol where scope_definition_id = 1580; 2518|nemiver|1580 2555|nemiver|1580 2589|nemiver|1580 2716|nemiver|1580 2754|nemiver|1580 3082|nemiver|1580 3413|nemiver|1580 3519|nemiver|1580 3558|nemiver|1580 3580|nemiver|1580 3605|nemiver|1580 3658|nemiver|1580 3771|nemiver|1580 3793|nemiver|1580 3886|nemiver|1580 3907|nemiver|1580 3942|nemiver|1580 4001|nemiver|1580 4013|nemiver|1580 4026|nemiver|1580 4054|nemiver|1580 4081|nemiver|1580 4088|nemiver|1580 4101|nemiver|1580 4111|nemiver|1580 4127|nemiver|1580 4142|nemiver|1580 4152|nemiver|1580 4179|nemiver|1580 4220|nemiver|1580 4238|nemiver|1580 4260|nemiver|1580 4333|nemiver|1580 4385|nemiver|1580 4401|nemiver|1580 4541|nemiver|1580 4554|nemiver|1580 4578|nemiver|1580 4589|nemiver|1580 4640|nemiver|1580 4703|nemiver|1580 4821|nemiver|1580 4826|nemiver|1580 4846|nemiver|1580 4862|nemiver|1580 4870|nemiver|1580 4896|nemiver|1580 4897|nemiver|1580 4923|nemiver|1580 4933|nemiver|1580 4964|nemiver|1580 4985|nemiver|1580 4989|nemiver|1580 4999|nemiver|1580 5083|nemiver|1580 5088|nemiver|1580 5149|nemiver|1580 5154|nemiver|1580 5167|nemiver|1580 5178|nemiver|1580 5269|nemiver|1580 5296|nemiver|1580 5342|nemiver|1580 5365|nemiver|1580 5694|nemiver|1580 5713|nemiver|1580 sqlite> Basically, there is no child-parent relationship in database. That is, given a symbol A, you can't find the parent symbol B. All you can find is a group of symbols (all having same name), one of which is real parent (but you can't know). namespaces are populated as symbols in 'symbol' table, yet they are not treated as other symbols in there. This results in unnecessary exceptions in application tyring to use 'symbol' table. It indicates inconsistency in database. One possibility is to separate out parent/child relationship from scope relationship in following ways: 1) A new symbol kind called 'namespace_def' could be introduced to represent file symbols being found. They would essentially be the ones currently used as namespace in database. And 'namespace' kind can then be used for real namespaces (the logical ones, so no duplicating of them). 2) A new column called 'parent' can be introduced in 'symbol' table. This would point to 'real' parent symbol_id (e.g. respective namespace_def symbols, or classes), strictly based on their definitions found in files. 3) A symbol's scope_id would point to it's scope symbol - a 'class' or 'namespace' symbol for example. In non-namespaces case, parent and scope would be same (e.g. for class members), but that's okay. So, for example in original example, it would be: ID | name | kind | parent | scope | file | line | ------------------------------------------------------------------- 1 | sqlite | namespace | ... | ... | | | 2 | sqlite | namespace_def | -1 | -1 | a.c | 10 | 3 | sqlite | namespace_def | -1 | -1 | b.c | 15 | 5 | SqliteCnxDrv | class | 3 | 1 | b.c | 16 | 6 | m_data | member | 5 | 5 | b.c | 17 |
(In reply to comment #0) > A typical select from database: > > sqlite> select symbol_id, name, scope_definition_id, scope_id, file_position, > file_path from symbol join file on symbol.file_defined_id = file_id where > file.file_path = '/src/dbdimpl/sqlite/nmv-sqlite-cnx-drv.h' order by > file_position; > 3885|__NEMIVER_SQLITE_CNX_DRV_H__|2249|0|27|/src/dbdimpl/sqlite/nmv-sqlite-cnx-drv.h > 3886|nemiver|1580|0|31|/src/dbdimpl/sqlite/nmv-sqlite-cnx-drv.h > 3887|common|1549|1580|33|/src/dbdimpl/sqlite/nmv-sqlite-cnx-drv.h > 3888|common|1549|1580|37|/src/dbdimpl/sqlite/nmv-sqlite-cnx-drv.h > 3889|sqlite|2250|1549|38|/src/dbdimpl/sqlite/nmv-sqlite-cnx-drv.h > 3910|SqliteCnxDrv|2265|2250|40|/src/dbdimpl/sqlite/nmv-sqlite-cnx-drv.h > ... > ... > > One might assume that 'SqliteCnxDrv' is in the scope of 'sqlite' namespace > (sym_id = 3889) by virtue of it's scope_id == scope_definition_id = 2250. But > that is not the case because there are many 'sqlite' symbols, all with > scope_definition_id = 2250. > > sqlite> select symbol_id, name, scope_definition_id from symbol where > scope_definition_id = 2250; > 3889|sqlite|2250 > 3909|sqlite|2250 > 3944|sqlite|2250 > sqlite> > > Same applies for commom (id = 3887) -> nemiver mapping in above example. > > sqlite> select symbol_id, name, scope_definition_id from symbol where > scope_definition_id = 1580; > 2518|nemiver|1580 > 2555|nemiver|1580 > 2589|nemiver|1580 > 2716|nemiver|1580 > 2754|nemiver|1580 > 3082|nemiver|1580 > 3413|nemiver|1580 > 3519|nemiver|1580 > 3558|nemiver|1580 > 3580|nemiver|1580 > 3605|nemiver|1580 > 3658|nemiver|1580 > 3771|nemiver|1580 > 3793|nemiver|1580 > 3886|nemiver|1580 > 3907|nemiver|1580 > 3942|nemiver|1580 > 4001|nemiver|1580 > 4013|nemiver|1580 > 4026|nemiver|1580 > 4054|nemiver|1580 > 4081|nemiver|1580 > 4088|nemiver|1580 > 4101|nemiver|1580 > 4111|nemiver|1580 > 4127|nemiver|1580 > 4142|nemiver|1580 > 4152|nemiver|1580 > 4179|nemiver|1580 > 4220|nemiver|1580 > 4238|nemiver|1580 > 4260|nemiver|1580 > 4333|nemiver|1580 > 4385|nemiver|1580 > 4401|nemiver|1580 > 4541|nemiver|1580 > 4554|nemiver|1580 > 4578|nemiver|1580 > 4589|nemiver|1580 > 4640|nemiver|1580 > 4703|nemiver|1580 > 4821|nemiver|1580 > 4826|nemiver|1580 > 4846|nemiver|1580 > 4862|nemiver|1580 > 4870|nemiver|1580 > 4896|nemiver|1580 > 4897|nemiver|1580 > 4923|nemiver|1580 > 4933|nemiver|1580 > 4964|nemiver|1580 > 4985|nemiver|1580 > 4989|nemiver|1580 > 4999|nemiver|1580 > 5083|nemiver|1580 > 5088|nemiver|1580 > 5149|nemiver|1580 > 5154|nemiver|1580 > 5167|nemiver|1580 > 5178|nemiver|1580 > 5269|nemiver|1580 > 5296|nemiver|1580 > 5342|nemiver|1580 > 5365|nemiver|1580 > 5694|nemiver|1580 > 5713|nemiver|1580 > sqlite> > > Basically, there is no child-parent relationship in database. That is, given a > symbol A, you can't find the parent symbol B. All you can find is a group of > symbols (all having same name), one of which is real parent (but you can't > know). Weel, I was able to know the parent using symbol_db_engine_get_parent_scope_id_by_symbol_id () function. > > namespaces are populated as symbols in 'symbol' table, yet they are not treated > as other symbols in there. This results in unnecessary exceptions in > application tyring to use 'symbol' table. It indicates inconsistency in > database. > now that I think: classes and their implementations could have the same problem. The implementation could be spread over more files, and that's a problem just like namespaces. > One possibility is to separate out parent/child relationship from scope > relationship in following ways: > > 1) A new symbol kind called 'namespace_def' could be introduced to represent > file symbols being found. They would essentially be the ones currently used as > namespace in database. And 'namespace' kind can then be used for real > namespaces (the logical ones, so no duplicating of them). > I see here problems when you for instance change its name or line position (so sym_id changes) and then returning to the original state. All the children related to this'll be broken. The updating step is then difficult. > 2) A new column called 'parent' can be introduced in 'symbol' table. This would > point to 'real' parent symbol_id (e.g. respective namespace_def symbols, or > classes), strictly based on their definitions found in files. > well, I find some pratical problems implementing this second point, though probably in theory this is a good idea. Ctags doesn't give us the parent/child relationship, and the way to look for parent isn't easy, especially it there are nested namespaces. > 3) A symbol's scope_id would point to it's scope symbol - a 'class' or > 'namespace' symbol for example. In non-namespaces case, parent and scope would > be same (e.g. for class members), but that's okay. > > So, for example in original example, it would be: > > ID | name | kind | parent | scope | file | line | > ------------------------------------------------------------------- > 1 | sqlite | namespace | ... | ... | | | > 2 | sqlite | namespace_def | -1 | -1 | a.c | 10 | > 3 | sqlite | namespace_def | -1 | -1 | b.c | 15 | > 5 | SqliteCnxDrv | class | 3 | 1 | b.c | 16 | > 6 | m_data | member | 5 | 5 | b.c | 17 | this needs a whole reimplementation of about all the logics involved with symbol-db. We must decide what is more important, because there's no time for everything, whether the speed or the correctness. Please note that making everything usable is not trivial and may require a lot of work.
> this needs a whole reimplementation of about all the logics involved with > symbol-db. > We must decide what is more important, because there's no time for everything, > whether the speed or the correctness. Please note that making everything usable > is not trivial and may require a lot of work. In the end it's always: stability, correctness, speed in this order. But I see that it is definitly now trivial to do. Probably it would be better to first get this working in the new project-manager branch and start internal reorganisation later.
The new project manager shouldn't change much things for the symbol-db plugin. Currently, the IAnjutaProjectManager is almost the same. I think the main difference is that the project manager is looking for all headers of every packages, so the symbol-db plugin doesn't need to do it but it's optional. The symbol-db plugin can still get all packages and get the headers itself.
> I think the main difference is that the project manager is looking for all > headers of every packages, so the symbol-db plugin doesn't need to do it but > it's optional. The symbol-db plugin can still get all packages and get the > headers itself. pkg-config 0.24 which will be released soon will hopefully support the --print-requries switch which allows to query all packages directly required so it will be easy to track with include path belongs to which package.
Ok then. I have no problems in trying to improve the stability of the scanning, but this would probably leave out bug #565773 again for 3.0. I'll then follow these steps now: 1. remove some mutexes in queries (should be easy and quick) 2. fix this (and similar) kind of bugs.
(In reply to comment #5) > Ok then. I have no problems in trying to improve the stability of the scanning, > but this would probably leave out bug #565773 again for 3.0. > Oh, no. By no means please don't take these bug reports as change in priority. I am filing them as I come cross them because we would need to address sooner or later. They are normal severity for a reason. Besides, I don't think we have reached any conclusion on how to fix this. I will follow up the discussion in further comments. You may give your inputs occasionally, time permitted. > I'll then follow these steps now: > 1. remove some mutexes in queries (should be easy and quick) If you read my last email in ML, I offered to help with the queries clean up. I could take care of it together. > 2. fix this (and similar) kind of bugs. Keep them open until the priorities are right. You may keep these bugs in mind as you go along if they mattered in course of your redesign.
(In reply to comment #6) > (In reply to comment #5) > > Ok then. I have no problems in trying to improve the stability of the scanning, > > but this would probably leave out bug #565773 again for 3.0. > > > Oh, no. By no means please don't take these bug reports as change in priority. > I am filing them as I come cross them because we would need to address sooner > or later. They are normal severity for a reason. > > Besides, I don't think we have reached any conclusion on how to fix this. I > will follow up the discussion in further comments. You may give your inputs > occasionally, time permitted. ok. Indeed I also think that a deep analysis should be done before writing a single line of code. I'll keep thinking on a good way to solve our problems. > > > I'll then follow these steps now: > > 1. remove some mutexes in queries (should be easy and quick) > > If you read my last email in ML, I offered to help with the queries clean up. I > could take care of it together. > > > 2. fix this (and similar) kind of bugs. > > Keep them open until the priorities are right. You may keep these bugs in mind > as you go along if they mattered in course of your redesign. ok then. My opinion is that these kind of bugs were difficult to predict when we designed the db. Now the practice shown some problems and they should be fixed hopefully with the minor impact. I've just seen your mail on ML, I'm going to answer there.
(In reply to comment #1) > > > > Basically, there is no child-parent relationship in database. That is, given a > > symbol A, you can't find the parent symbol B. All you can find is a group of > > symbols (all having same name), one of which is real parent (but you can't > > know). > > Weel, I was able to know the parent using > symbol_db_engine_get_parent_scope_id_by_symbol_id () function. > The bug I am trying to point out is that a symbol in DB points to N parents in database so you can't know which one is real. You can do all sorts of tricks and checks etc. to eventually approximate the parent (which I think is what symbol_db_engine_get_parent_scope_id_by_symbol_id() tries to do), but this bug is exactly about not doing that kind of things. > > > > namespaces are populated as symbols in 'symbol' table, yet they are not treated > > as other symbols in there. This results in unnecessary exceptions in > > application tyring to use 'symbol' table. It indicates inconsistency in > > database. > > > > now that I think: classes and their implementations could have the same > problem. > The implementation could be spread over more files, and that's a problem just > like namespaces. > Classes are fine as it is now (except its member's scope is also duplicated as its parent) - see the dummy table I draw. The members may be spread in different files, but class definition is found only in one place, so all their members point to the same parent. That is unlike namespace. Or did I not understand you clearly? > > I see here problems when you for instance change its name or line position (so > sym_id changes) and then returning to the original state. > All the children related to this'll be broken. > The updating step is then difficult. > A good point. That indicates our use of symbol_id as means to associate a parent or scope symbol is not going to work well. Perhaps we can then consider a quark, like you already do with scope_id/scope_definition_id. ID | quark | name | kind | parent | scope | file | line | --------------------------------------------------------------------------- 1 | 11 | sqlite | namespace | ... | ... | | | 2 | 12 | sqlite | namespace_def | -1 | -1 | a.c | 10 | 3 | 13 | sqlite | namespace_def | -1 | -1 | b.c | 15 | 5 | 14 | SqliteCnxDrv | class | 13 | 11 | b.c | 16 | 6 | 15 | m_data | member | 15 | 15 | b.c | 17 | The persistent quarks can be generated from symbol + file combination, essentially through a separate table (like the current scope table), or through a hashing function. > > well, I find some pratical problems implementing this second point, though > probably in theory this is a good idea. > Ctags doesn't give us the parent/child relationship, and the way to look for > parent isn't easy, especially it there are nested namespaces. > I don't know what ctags call it when it says "symbol x belongs inside symbol y", but I think that will be all needed for this. Here is a totally pseudo code: 1) As soon as you encounter 'container' kinds, generate a quark and drop in the symbol. 2) If ctags says the symbol belongs inside symbol y, get the last/largest quark for symbol y and set it as parent and scope of this symbol. If parent is namespace_def, we want to set the scope to real namespace instead. 3) If ctags says the symbol is namespace kind, call it namespace_def kind in our DB. Check if namespace symbol exists, create one if doesn't. Rinse, repeat. Will it work? I don't know. I guess one has to experiment a bit to see what works and what doesn't. Do you have a better solution for this which might be more realistic? > > this needs a whole reimplementation of about all the logics involved with > symbol-db. Why would it change the whole logic? All it does is add a new column, so existing queries should not even notice it (beyond a trivial rename for quark column and s/namespace/namespace_def/). The parts mostly affected is the population part and some new maintenance triggers. > We must decide what is more important, because there's no time for everything, > whether the speed or the correctness. In most real life cases, speed is often part of correctness. It's only in a very few cases where correctness is genuinely compromised for speed, but we frequently mistake to think otherwise. A simple example is sorting algorithms (fast ones aren't necessarily less correct). In fact, this bug optimizes several queries (see blocking bugs, which all require hacks that slow down their queries). It will also speed up autocompletions because parents will be known instantly. etc.. On the other hand, I don't think population won't be so much affected because there is no more extra DB operation then currently -- may be one extra DB query (for finding namespace of a namespace_def symbol).
bugzilla.gnome.org is being replaced by gitlab.gnome.org. We are closing all old bug reports in Bugzilla which have not seen updates for many years. If you can still reproduce this issue in a currently supported version of GNOME (currently that would be 3.38), then please feel free to report it at https://gitlab.gnome.org/GNOME/anjuta/-/issues/ Thank you for reporting this issue and we are sorry it could not be fixed.