Bug 615403 – namespaces, and scope relationship in general, are weird in database

After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.

Bug 615403 - namespaces, and scope relationship in general, are weird in database


Summary:	namespaces, and scope relationship in general, are weird in database


Status:	RESOLVED OBSOLETE

Product:	anjuta
Classification:	Applications
Component:	plugins: symbol-db
Version:	git master
Hardware:	Other Linux

Importance:	Normal normal
Target Milestone:	---
Assigned To:	Massimo Cora'
QA Contact:	Anjuta maintainers

URL:
Whiteboard:

Depends on:
Blocks:	615429 617472 620880

Reported:	2010-04-10 22:14 UTC by Naba Kumar
Modified:	2020-11-06 20:23 UTC

See Also:
GNOME target:	---
GNOME version:	---

Description Naba Kumar 2010-04-10 22:14:07 UTC

A typical select from database:

sqlite> select symbol_id, name, scope_definition_id, scope_id, file_position,
file_path from symbol join file on symbol.file_defined_id = file_id where
file.file_path = '/src/dbdimpl/sqlite/nmv-sqlite-cnx-drv.h' order by
file_position;
3885|__NEMIVER_SQLITE_CNX_DRV_H__|2249|0|27|/src/dbdimpl/sqlite/nmv-sqlite-cnx-drv.h
3886|nemiver|1580|0|31|/src/dbdimpl/sqlite/nmv-sqlite-cnx-drv.h
3887|common|1549|1580|33|/src/dbdimpl/sqlite/nmv-sqlite-cnx-drv.h
3888|common|1549|1580|37|/src/dbdimpl/sqlite/nmv-sqlite-cnx-drv.h
3889|sqlite|2250|1549|38|/src/dbdimpl/sqlite/nmv-sqlite-cnx-drv.h
3910|SqliteCnxDrv|2265|2250|40|/src/dbdimpl/sqlite/nmv-sqlite-cnx-drv.h
...
...

One might assume that 'SqliteCnxDrv' is in the scope of 'sqlite' namespace (sym_id = 3889) by virtue of it's scope_id == scope_definition_id = 2250. But that is not the case because there are many 'sqlite' symbols, all with scope_definition_id = 2250.

sqlite> select symbol_id, name, scope_definition_id from symbol where scope_definition_id = 2250;
3889|sqlite|2250
3909|sqlite|2250
3944|sqlite|2250
sqlite> 

Same applies for commom (id = 3887) -> nemiver mapping in above example. 

sqlite> select symbol_id, name, scope_definition_id from symbol where scope_definition_id = 1580;
2518|nemiver|1580
2555|nemiver|1580
2589|nemiver|1580
2716|nemiver|1580
2754|nemiver|1580
3082|nemiver|1580
3413|nemiver|1580
3519|nemiver|1580
3558|nemiver|1580
3580|nemiver|1580
3605|nemiver|1580
3658|nemiver|1580
3771|nemiver|1580
3793|nemiver|1580
3886|nemiver|1580
3907|nemiver|1580
3942|nemiver|1580
4001|nemiver|1580
4013|nemiver|1580
4026|nemiver|1580
4054|nemiver|1580
4081|nemiver|1580
4088|nemiver|1580
4101|nemiver|1580
4111|nemiver|1580
4127|nemiver|1580
4142|nemiver|1580
4152|nemiver|1580
4179|nemiver|1580
4220|nemiver|1580
4238|nemiver|1580
4260|nemiver|1580
4333|nemiver|1580
4385|nemiver|1580
4401|nemiver|1580
4541|nemiver|1580
4554|nemiver|1580
4578|nemiver|1580
4589|nemiver|1580
4640|nemiver|1580
4703|nemiver|1580
4821|nemiver|1580
4826|nemiver|1580
4846|nemiver|1580
4862|nemiver|1580
4870|nemiver|1580
4896|nemiver|1580
4897|nemiver|1580
4923|nemiver|1580
4933|nemiver|1580
4964|nemiver|1580
4985|nemiver|1580
4989|nemiver|1580
4999|nemiver|1580
5083|nemiver|1580
5088|nemiver|1580
5149|nemiver|1580
5154|nemiver|1580
5167|nemiver|1580
5178|nemiver|1580
5269|nemiver|1580
5296|nemiver|1580
5342|nemiver|1580
5365|nemiver|1580
5694|nemiver|1580
5713|nemiver|1580
sqlite> 

Basically, there is no child-parent relationship in database. That is, given a symbol A, you can't find the parent symbol B. All you can find is a group of symbols (all having same name), one of which is real parent (but you can't know).

namespaces are populated as symbols in 'symbol' table, yet they are not treated as other symbols in there. This results in unnecessary exceptions in application tyring to use 'symbol' table. It indicates inconsistency in database.

One possibility is to separate out parent/child relationship from scope relationship in following ways:

1) A new symbol kind called 'namespace_def' could be introduced to represent file symbols being found. They would essentially be the ones currently used as namespace in database. And 'namespace' kind can then be used for real namespaces (the logical ones, so no duplicating of them).

2) A new column called 'parent' can be introduced in 'symbol' table. This would point to 'real' parent symbol_id (e.g. respective namespace_def symbols, or classes), strictly based on their definitions found in files.

3) A symbol's scope_id would point to it's scope symbol - a 'class' or 'namespace' symbol for example. In non-namespaces case, parent and scope would be same (e.g. for class members), but that's okay.

So, for example in original example, it would be:

ID | name         | kind          | parent | scope | file | line  |
-------------------------------------------------------------------
1  | sqlite       | namespace     | ...    | ...   |      |       |
2  | sqlite       | namespace_def | -1     | -1    | a.c  | 10    |
3  | sqlite       | namespace_def | -1     | -1    | b.c  | 15    |
5  | SqliteCnxDrv | class         | 3      | 1     | b.c  | 16    |
6  | m_data       | member        | 5      | 5     | b.c  | 17    |

Comment 1 Massimo Cora' 2010-05-09 10:36:23 UTC

(In reply to comment #0)
> A typical select from database:
> 
> sqlite> select symbol_id, name, scope_definition_id, scope_id, file_position,
> file_path from symbol join file on symbol.file_defined_id = file_id where
> file.file_path = '/src/dbdimpl/sqlite/nmv-sqlite-cnx-drv.h' order by
> file_position;
> 3885|__NEMIVER_SQLITE_CNX_DRV_H__|2249|0|27|/src/dbdimpl/sqlite/nmv-sqlite-cnx-drv.h
> 3886|nemiver|1580|0|31|/src/dbdimpl/sqlite/nmv-sqlite-cnx-drv.h
> 3887|common|1549|1580|33|/src/dbdimpl/sqlite/nmv-sqlite-cnx-drv.h
> 3888|common|1549|1580|37|/src/dbdimpl/sqlite/nmv-sqlite-cnx-drv.h
> 3889|sqlite|2250|1549|38|/src/dbdimpl/sqlite/nmv-sqlite-cnx-drv.h
> 3910|SqliteCnxDrv|2265|2250|40|/src/dbdimpl/sqlite/nmv-sqlite-cnx-drv.h
> ...
> ...
> 
> One might assume that 'SqliteCnxDrv' is in the scope of 'sqlite' namespace
> (sym_id = 3889) by virtue of it's scope_id == scope_definition_id = 2250. But
> that is not the case because there are many 'sqlite' symbols, all with
> scope_definition_id = 2250.
> 
> sqlite> select symbol_id, name, scope_definition_id from symbol where
> scope_definition_id = 2250;
> 3889|sqlite|2250
> 3909|sqlite|2250
> 3944|sqlite|2250
> sqlite> 
> 
> Same applies for commom (id = 3887) -> nemiver mapping in above example. 
> 
> sqlite> select symbol_id, name, scope_definition_id from symbol where
> scope_definition_id = 1580;
> 2518|nemiver|1580
> 2555|nemiver|1580
> 2589|nemiver|1580
> 2716|nemiver|1580
> 2754|nemiver|1580
> 3082|nemiver|1580
> 3413|nemiver|1580
> 3519|nemiver|1580
> 3558|nemiver|1580
> 3580|nemiver|1580
> 3605|nemiver|1580
> 3658|nemiver|1580
> 3771|nemiver|1580
> 3793|nemiver|1580
> 3886|nemiver|1580
> 3907|nemiver|1580
> 3942|nemiver|1580
> 4001|nemiver|1580
> 4013|nemiver|1580
> 4026|nemiver|1580
> 4054|nemiver|1580
> 4081|nemiver|1580
> 4088|nemiver|1580
> 4101|nemiver|1580
> 4111|nemiver|1580
> 4127|nemiver|1580
> 4142|nemiver|1580
> 4152|nemiver|1580
> 4179|nemiver|1580
> 4220|nemiver|1580
> 4238|nemiver|1580
> 4260|nemiver|1580
> 4333|nemiver|1580
> 4385|nemiver|1580
> 4401|nemiver|1580
> 4541|nemiver|1580
> 4554|nemiver|1580
> 4578|nemiver|1580
> 4589|nemiver|1580
> 4640|nemiver|1580
> 4703|nemiver|1580
> 4821|nemiver|1580
> 4826|nemiver|1580
> 4846|nemiver|1580
> 4862|nemiver|1580
> 4870|nemiver|1580
> 4896|nemiver|1580
> 4897|nemiver|1580
> 4923|nemiver|1580
> 4933|nemiver|1580
> 4964|nemiver|1580
> 4985|nemiver|1580
> 4989|nemiver|1580
> 4999|nemiver|1580
> 5083|nemiver|1580
> 5088|nemiver|1580
> 5149|nemiver|1580
> 5154|nemiver|1580
> 5167|nemiver|1580
> 5178|nemiver|1580
> 5269|nemiver|1580
> 5296|nemiver|1580
> 5342|nemiver|1580
> 5365|nemiver|1580
> 5694|nemiver|1580
> 5713|nemiver|1580
> sqlite> 
> 
> Basically, there is no child-parent relationship in database. That is, given a
> symbol A, you can't find the parent symbol B. All you can find is a group of
> symbols (all having same name), one of which is real parent (but you can't
> know).

Weel, I was able to know the parent using symbol_db_engine_get_parent_scope_id_by_symbol_id () function.

> 
> namespaces are populated as symbols in 'symbol' table, yet they are not treated
> as other symbols in there. This results in unnecessary exceptions in
> application tyring to use 'symbol' table. It indicates inconsistency in
> database.
> 

now that I think: classes and their implementations could have the same problem.
The implementation could be spread over more files, and that's a problem just like namespaces.

> One possibility is to separate out parent/child relationship from scope
> relationship in following ways:
> 
> 1) A new symbol kind called 'namespace_def' could be introduced to represent
> file symbols being found. They would essentially be the ones currently used as
> namespace in database. And 'namespace' kind can then be used for real
> namespaces (the logical ones, so no duplicating of them).
> 

I see here problems when you for instance change its name or line position (so sym_id changes) and then returning to the original state. 
All the children related to this'll be broken.
The updating step is then difficult.

> 2) A new column called 'parent' can be introduced in 'symbol' table. This would
> point to 'real' parent symbol_id (e.g. respective namespace_def symbols, or
> classes), strictly based on their definitions found in files.
> 

well, I find some pratical problems implementing this second point, though probably in theory this is a good idea.
Ctags doesn't give us the parent/child relationship, and the way to look for parent isn't easy, especially it there are nested namespaces.

> 3) A symbol's scope_id would point to it's scope symbol - a 'class' or
> 'namespace' symbol for example. In non-namespaces case, parent and scope would
> be same (e.g. for class members), but that's okay.
> 
> So, for example in original example, it would be:
> 
> ID | name         | kind          | parent | scope | file | line  |
> -------------------------------------------------------------------
> 1  | sqlite       | namespace     | ...    | ...   |      |       |
> 2  | sqlite       | namespace_def | -1     | -1    | a.c  | 10    |
> 3  | sqlite       | namespace_def | -1     | -1    | b.c  | 15    |
> 5  | SqliteCnxDrv | class         | 3      | 1     | b.c  | 16    |
> 6  | m_data       | member        | 5      | 5     | b.c  | 17    |

this needs a whole reimplementation of about all the logics involved with symbol-db.
We must decide what is more important, because there's no time for everything, whether the speed or the correctness. Please note that making everything usable is not trivial and may require a lot of work.

Comment 2 Johannes Schmid 2010-05-09 10:49:03 UTC

> this needs a whole reimplementation of about all the logics involved with
> symbol-db.
> We must decide what is more important, because there's no time for everything,
> whether the speed or the correctness. Please note that making everything usable
> is not trivial and may require a lot of work.

In the end it's always: stability, correctness, speed in this order.

But I see that it is definitly now trivial to do. Probably it would be better to first get this working in the new project-manager branch and start internal reorganisation later.

Comment 3 Sébastien Granjoux 2010-05-09 14:57:17 UTC

The new project manager shouldn't change much things for the symbol-db plugin. Currently, the IAnjutaProjectManager is almost the same.

I think the main difference is that the project manager is looking for all headers of every packages, so the symbol-db plugin doesn't need to do it but it's optional. The symbol-db plugin can still get all packages and get the headers itself.

Comment 4 Johannes Schmid 2010-05-09 19:38:01 UTC

 > I think the main difference is that the project manager is looking for all
> headers of every packages, so the symbol-db plugin doesn't need to do it but
> it's optional. The symbol-db plugin can still get all packages and get the
> headers itself.

pkg-config 0.24 which will be released soon will hopefully support the --print-requries switch which allows to query all packages directly required so it will 
be easy to track with include path belongs to which package.

Comment 5 Massimo Cora' 2010-05-09 21:11:52 UTC

Ok then. I have no problems in trying to improve the stability of the scanning, but this would probably leave out bug #565773 again for 3.0.

I'll then follow these steps now:
1. remove some mutexes in queries (should be easy and quick)
2. fix this (and similar) kind of bugs.

Comment 6 Naba Kumar 2010-05-09 21:47:18 UTC

(In reply to comment #5)
> Ok then. I have no problems in trying to improve the stability of the scanning,
> but this would probably leave out bug #565773 again for 3.0.
> 
Oh, no. By no means please don't take these bug reports as change in priority. I am filing them as I come cross them because we would need to address sooner or later. They are normal severity for a reason.

Besides, I don't think we have reached any conclusion on how to fix this. I will follow up the discussion in further comments. You may give your inputs occasionally, time permitted.

> I'll then follow these steps now:
> 1. remove some mutexes in queries (should be easy and quick)

If you read my last email in ML, I offered to help with the queries clean up. I could take care of it together.

> 2. fix this (and similar) kind of bugs.

Keep them open until the priorities are right. You may keep these bugs in mind as you go along if they mattered in course of your redesign.

Comment 7 Massimo Cora' 2010-05-09 22:47:51 UTC

(In reply to comment #6)
> (In reply to comment #5)
> > Ok then. I have no problems in trying to improve the stability of the scanning,
> > but this would probably leave out bug #565773 again for 3.0.
> > 
> Oh, no. By no means please don't take these bug reports as change in priority.
> I am filing them as I come cross them because we would need to address sooner
> or later. They are normal severity for a reason.
> 
> Besides, I don't think we have reached any conclusion on how to fix this. I
> will follow up the discussion in further comments. You may give your inputs
> occasionally, time permitted.

ok. Indeed I also think that a deep analysis should be done before writing a single line of code. I'll keep thinking on a good way to solve our problems.

> 
> > I'll then follow these steps now:
> > 1. remove some mutexes in queries (should be easy and quick)
> 
> If you read my last email in ML, I offered to help with the queries clean up. I
> could take care of it together.
> 
> > 2. fix this (and similar) kind of bugs.
> 
> Keep them open until the priorities are right. You may keep these bugs in mind
> as you go along if they mattered in course of your redesign.

ok then. My opinion is that these kind of bugs were difficult to predict when we designed the db. Now the practice shown some problems and they should be fixed hopefully with the minor impact.
I've just seen your mail on ML, I'm going to answer there.

Comment 8 Naba Kumar 2010-05-09 23:35:30 UTC

(In reply to comment #1)
> > 
> > Basically, there is no child-parent relationship in database. That is, given a
> > symbol A, you can't find the parent symbol B. All you can find is a group of
> > symbols (all having same name), one of which is real parent (but you can't
> > know).
> 
> Weel, I was able to know the parent using
> symbol_db_engine_get_parent_scope_id_by_symbol_id () function.
> 
The bug I am trying to point out is that a symbol in DB points to N parents in database so you can't know which one is real.

You can do all sorts of tricks and checks etc. to eventually approximate the parent (which I think is what symbol_db_engine_get_parent_scope_id_by_symbol_id() tries to do), but this bug is exactly about not doing that kind of things.

> > 
> > namespaces are populated as symbols in 'symbol' table, yet they are not treated
> > as other symbols in there. This results in unnecessary exceptions in
> > application tyring to use 'symbol' table. It indicates inconsistency in
> > database.
> > 
> 
> now that I think: classes and their implementations could have the same
> problem.
> The implementation could be spread over more files, and that's a problem just
> like namespaces.
> 
Classes are fine as it is now (except its member's scope is also duplicated as its parent) - see the dummy table I draw. The members may be spread in different files, but class definition is found only in one place, so all their members point to the same parent. That is unlike namespace. Or did I not understand you clearly?

> 
> I see here problems when you for instance change its name or line position (so
> sym_id changes) and then returning to the original state. 
> All the children related to this'll be broken.
> The updating step is then difficult.
> 
A good point. That indicates our use of symbol_id as means to associate a parent or scope symbol is not going to work well. Perhaps we can then consider a quark, like you already do with scope_id/scope_definition_id.

ID | quark | name         | kind          | parent | scope | file | line  |
---------------------------------------------------------------------------
1  |    11 | sqlite       | namespace     | ...    | ...   |      |       |
2  |    12 | sqlite       | namespace_def | -1     | -1    | a.c  | 10    |
3  |    13 | sqlite       | namespace_def | -1     | -1    | b.c  | 15    |
5  |    14 | SqliteCnxDrv | class         | 13     | 11    | b.c  | 16    |
6  |    15 | m_data       | member        | 15     | 15    | b.c  | 17    |

The persistent quarks can be generated from symbol + file combination, essentially through a separate table (like the current scope table), or through a hashing function.

> 
> well, I find some pratical problems implementing this second point, though
> probably in theory this is a good idea.
> Ctags doesn't give us the parent/child relationship, and the way to look for
> parent isn't easy, especially it there are nested namespaces.
> 
I don't know what ctags call it when it says "symbol x belongs inside symbol y", but I think that will be all needed for this. Here is a totally pseudo code:

1) As soon as you encounter 'container' kinds, generate a quark and drop in the symbol.

2) If ctags says the symbol belongs inside symbol y, get the last/largest quark for symbol y and set it as parent and scope of this symbol. If parent is namespace_def, we want to set the scope to real namespace instead.

3) If ctags says the symbol is namespace kind, call it namespace_def kind in our DB. Check if namespace symbol exists, create one if doesn't.

Rinse, repeat.

Will it work? I don't know. I guess one has to experiment a bit to see what works and what doesn't. Do you have a better solution for this which might be more realistic?

> 
> this needs a whole reimplementation of about all the logics involved with
> symbol-db.

Why would it change the whole logic? All it does is add a new column, so existing queries should not even notice it (beyond a trivial rename for quark column and s/namespace/namespace_def/). The parts mostly affected is the population part and some new maintenance triggers.

> We must decide what is more important, because there's no time for everything,
> whether the speed or the correctness. 

In most real life cases, speed is often part of correctness. It's only in a very few cases where correctness is genuinely compromised for speed, but we frequently mistake to think otherwise. A simple example is sorting algorithms (fast ones aren't necessarily less correct).

In fact, this bug optimizes several queries (see blocking bugs, which all require hacks that slow down their queries). It will also speed up autocompletions because parents will be known instantly. etc.. On the other hand, I don't think population won't be so much affected because there is no more extra DB operation then currently -- may be one extra DB query (for finding namespace of a namespace_def symbol).

Comment 9 André Klapper 2020-11-06 20:23:00 UTC

bugzilla.gnome.org is being replaced by gitlab.gnome.org. We are closing all old bug reports in Bugzilla which have not seen updates for many years.

If you can still reproduce this issue in a currently supported version of GNOME (currently that would be 3.38), then please feel free to report it at https://gitlab.gnome.org/GNOME/anjuta/-/issues/

Thank you for reporting this issue and we are sorry it could not be fixed.