Bug 700673 – Parse natural language

After an evaluation, GNOME has moved from Bugzilla to GitLab. Learn more about GitLab.
No new issues can be reported in GNOME Bugzilla anymore.
To report an issue in a GNOME project, go to GNOME GitLab.
Do not go to GNOME Gitlab for: Bluefish, Doxygen, GnuCash, GStreamer, java-gnome, LDTP, NetworkManager, Tomboy.

Bug 700673 - Parse natural language


Summary:	Parse natural language


Status:	RESOLVED OBSOLETE

Product:	gnome-calendar
Classification:	Applications
Component:	General
Version:	unspecified
Hardware:	Other Linux

Importance:	Normal enhancement
Target Milestone:	3.26
Assigned To:	GNOME Calendar maintainers
QA Contact:	GNOME Calendar maintainers

URL:
Whiteboard:

Duplicates:	766906 (view as bug list)
Depends on:
Blocks:	768621

Reported:	2013-05-19 21:56 UTC by Jean-François Fortin Tam
Modified:	2017-11-24 21:30 UTC

See Also:
GNOME target:	---
GNOME version:	---

Description Jean-François Fortin Tam 2013-05-19 21:56:27 UTC

For event dates and times, it would be nice to let the user type free-form with strings like:
- Today
- Tomorrow
- Friday
- Next monday

Both in the user's native language and in English (some people to type strings directly in English).

As well as various number formats as seen on http://xkcd.com/1179/

Time could accept formats such as:
- 22h
- 22
- 22h30
- 22:30
- 2230
- 10pm, 10 PM, 10h PM, ... (if you want to go nuts)


We could even go further and autodetect date/time keywords like that inside event descriptions that are typed in the simplified "New event" dialog.

Comment 1 Erick Perez Castellanos 2013-05-20 13:18:58 UTC

There are some mockups for this in here: https://raw.github.com/gnome-design-team/gnome-mockups/master/calendar/experiments/New_event_idea.png

Comment 2 Jean-François Fortin Tam 2013-05-21 14:29:37 UTC

I like what I see there, great to see this was already on your mind :)

Comment 3 Georges Basile Stavracas Neto 2014-10-04 03:48:15 UTC

Ok, I see it as a important feature. The first and most important goal here is to implement a natural language parser (nlp) simple enough to be maintainable, and powerful enough to satisfy our need. 

I have a little experience with nlp, so I'll share some thoughts. First, we do not need a complete grammar structure representation. I believe that we can do well with just a simple token-based parser, which uses keywords and combinations of them to break the sentence (kind of what Yorba uses for California).

These keywords should be translatable so the nlp can work with different languages (at least western languages - eastern languages, specially japanese, is an absolutelly different case). We can create a series of regex to match different combinations of these keywords.

In order to start the development of this feature, we must select which fields will be parsed. Obviously date & time, but maybe location too. Also, it would be good if it supports recurring events.

After that, then, we will be able to handle voice input (sound is converted to text, then parsed by the nlp).

What do you guys think?

Comment 4 Pierre-Yves Luyten 2014-10-04 06:45:33 UTC

Have someone contacted Yorba, isn't maintaining a common library a reasonable objective?

foreign languages might add a lot of complexity here, eg https://bugzilla.gnome.org/show_bug.cgi?id=731874

Comment 5 Georges Basile Stavracas Neto 2014-10-04 11:21:44 UTC

We should avoid building a general purpose natural language parser. It would be ideally just a date, time, location and recurrency parser. I don't think maintaining a common lib is good for this particular app.

Also, I gave the regex idea exactly to avoid this particular problem. Japanese, for exaple, use sentence particles, so e.g. any time & date would be succeeded by に (ni). We can, thus, have a branched regex like "at(\w) | (\w) に" that will recognize both patterns.

But I do agree that it would be a challenge to support *every* language structure with this method.

Comment 6 Georges Basile Stavracas Neto 2014-10-04 11:35:42 UTC

I completely regret my last comment and, after worndering about this issue, I think that we could build a *word segmentation* library. Ideally an extensible one.

Maybe with this we could help GNOME apps to support natural language input, not only Calendar and California.

Comment 7 Bastien Nocera 2014-10-16 11:19:41 UTC

It might even be useful to have in GTK+ or glib. On iOS, messages will highlight addresses, phone numbers, dates, and URLs, and allow performing various actions on them. They're generally useful text parsers which would be useful in more than just the calendar.

Comment 8 Georges Basile Stavracas Neto 2014-10-16 11:48:23 UTC

The problem is the technic (or technics) that will be used. If we use any kind of trained algorithm, it'll need the training data, which can take several Mbs. But I strongly agree that embedding it in GLib or Gtk+ would be *awsome*.

Comment 9 Jean-François Fortin Tam 2014-10-16 20:52:16 UTC

Jim Nelson might be interested as per the last few comments then, given that he does have a working implementation in California.

Comment 10 Jim Nelson 2014-10-20 19:38:05 UTC

Yes, very interested in the direction of this discussion.

Would a general-purpose natural language -> iCal VEVENT library be terribly interesting to any project other than California and GNOME Calendar?  Maybe; we've discussed internally using California's Quick Add in Geary, where it would auto-detect discussion of an event and offer to add it to the user's calendar.  However, that's a lot more sophisticated than what California's parser is capable of today.

A library that could tokenize by words would be a welcome addition.  Bastien's correct, the utility of that goes well beyond calendars (or email clients).  I've not done any research for existing FLOSS libraries that do such work.  That would be my first step.

If such technology was adopted/written by GNOME, consider this one vote against putting it into GTK+.

Comment 11 Jim Nelson 2014-10-20 19:39:23 UTC

Also, some wiki pages about California's Quick Add worth linking to here:

https://wiki.gnome.org/Apps/California/HowToUseQuickAdd
https://wiki.gnome.org/Apps/California/TranslatingQuickAdd

Comment 12 Georges Basile Stavracas Neto 2014-10-21 01:20:34 UTC

California's simple NLP is a great startpoint. It actually assumes some things:

- Sentences can be separated by whitespaces and punctuation
- Significant words come AFTER prepositions
- More than that: it assumes that it NEEDS prepositions

We should avoid these assumptions if we are really willing to write an universal word segmenter lib. Maybe this fictional library could handle some other things, like word similarity (in order to, say, GNOME Shell to search for wrongly typed things), stemming, etc.

Also, I think it would be great if this library supports plugins. With it's base, devs could write amazing features (speech recognition, relationship & emotion extractors, natural language generators, anything else I can't imagine right now).

Comment 13 Jim Nelson 2014-10-21 01:50:43 UTC

That's not entirely true; some words don't require prepositions and are gleaned merely by their formatting (i.e. times and dates).  In some cases, prepositions are used as a clue for details' meanings, but are not strictly required.  This allows for those words to be stripped from the event summary (which is the "bucket" a word falls into if its not parsed).  This is why translators are told to leave certain word lists empty if they don't have corresponding words in their language.

When you say "we should avoid these assumptions if we are really willing to write an universal segmenter lib", I guess I'm thinking that this proposed library is concerned with tokenizing *words*, not the much more difficult task of generating sentence structure.  In other words, it would solve the first problem in your list but not the second or third.

The philosophy of the California parser is best-attempt.  If it's unable to glean sufficient details from the string to build an event, the event editor is displayed for the user to clean up and complete the information.  I think this is important to keep in mind going forward.

Comment 14 Georges Basile Stavracas Neto 2014-10-21 10:09:54 UTC

Thanks for your clarifying, Jim, and sorry for my misconceptions. I'm saying this because I had a minimum contact with NLP and, as far as I remember, some (most?) eastern languages cannot simply be segmented without trained algorithms.

Comment 15 Leslie Satenstein 2015-09-24 14:22:57 UTC

Regarding date fields,  can we input a date in the form as a shortcut


+0  for today or enter = for today

+1 for tomorrow

+14 for two weeks hence

+365 for next year, same month, and year.


+0 is is so much more convenient than entering  2015-09-24 

And for time, it can remain as it is, until day is resolved.

Comment 16 Leslie Satenstein 2015-09-24 14:24:51 UTC

(In reply to Jean-François Fortin Tam from comment #0)
> For event dates and times, it would be nice to let the user type free-form
> with strings like:
> - Today
> - Tomorrow
> - Friday
> - Next monday
> 
> Both in the user's native language and in English (some people to type
> strings directly in English).
> 
> As well as various number formats as seen on http://xkcd.com/1179/
> 
> Time could accept formats such as:
> - 22h
> - 22
> - 22h30
> - 22:30
> - 2230
> - 10pm, 10 PM, 10h PM, ... (if you want to go nuts)
> 
> 
> We could even go further and autodetect date/time keywords like that inside
> event descriptions that are typed in the simplified "New event" dialog.


Please review my idea about another way to do what you would like to do.

Note   +7 for next week would be the same in ALL languages.

Comment 17 Georges Basile Stavracas Neto 2017-04-19 15:09:28 UTC

*** Bug 766906 has been marked as a duplicate of this bug. ***

Comment 18 Georges Basile Stavracas Neto 2017-11-24 21:30:52 UTC

-- GitLab Migration Automatic Message --

This bug has been migrated to GNOME's GitLab instance and has been closed from further activity.

You can subscribe and participate further through the new bug through this link to our GitLab instance: https://gitlab.gnome.org/GNOME/gnome-calendar/issues/3.