GNOME Bugzilla – Bug 700673
Parse natural language
Last modified: 2017-11-24 21:30:52 UTC
For event dates and times, it would be nice to let the user type free-form with strings like:
- Next monday
Both in the user's native language and in English (some people to type strings directly in English).
As well as various number formats as seen on http://xkcd.com/1179/
Time could accept formats such as:
- 10pm, 10 PM, 10h PM, ... (if you want to go nuts)
We could even go further and autodetect date/time keywords like that inside event descriptions that are typed in the simplified "New event" dialog.
There are some mockups for this in here: https://raw.github.com/gnome-design-team/gnome-mockups/master/calendar/experiments/New_event_idea.png
I like what I see there, great to see this was already on your mind :)
Ok, I see it as a important feature. The first and most important goal here is to implement a natural language parser (nlp) simple enough to be maintainable, and powerful enough to satisfy our need.
I have a little experience with nlp, so I'll share some thoughts. First, we do not need a complete grammar structure representation. I believe that we can do well with just a simple token-based parser, which uses keywords and combinations of them to break the sentence (kind of what Yorba uses for California).
These keywords should be translatable so the nlp can work with different languages (at least western languages - eastern languages, specially japanese, is an absolutelly different case). We can create a series of regex to match different combinations of these keywords.
In order to start the development of this feature, we must select which fields will be parsed. Obviously date & time, but maybe location too. Also, it would be good if it supports recurring events.
After that, then, we will be able to handle voice input (sound is converted to text, then parsed by the nlp).
What do you guys think?
Have someone contacted Yorba, isn't maintaining a common library a reasonable objective?
foreign languages might add a lot of complexity here, eg https://bugzilla.gnome.org/show_bug.cgi?id=731874
We should avoid building a general purpose natural language parser. It would be ideally just a date, time, location and recurrency parser. I don't think maintaining a common lib is good for this particular app.
Also, I gave the regex idea exactly to avoid this particular problem. Japanese, for exaple, use sentence particles, so e.g. any time & date would be succeeded by に (ni). We can, thus, have a branched regex like "at(\w) | (\w) に" that will recognize both patterns.
But I do agree that it would be a challenge to support *every* language structure with this method.
I completely regret my last comment and, after worndering about this issue, I think that we could build a *word segmentation* library. Ideally an extensible one.
Maybe with this we could help GNOME apps to support natural language input, not only Calendar and California.
It might even be useful to have in GTK+ or glib. On iOS, messages will highlight addresses, phone numbers, dates, and URLs, and allow performing various actions on them. They're generally useful text parsers which would be useful in more than just the calendar.
The problem is the technic (or technics) that will be used. If we use any kind of trained algorithm, it'll need the training data, which can take several Mbs. But I strongly agree that embedding it in GLib or Gtk+ would be *awsome*.
Jim Nelson might be interested as per the last few comments then, given that he does have a working implementation in California.
Yes, very interested in the direction of this discussion.
Would a general-purpose natural language -> iCal VEVENT library be terribly interesting to any project other than California and GNOME Calendar? Maybe; we've discussed internally using California's Quick Add in Geary, where it would auto-detect discussion of an event and offer to add it to the user's calendar. However, that's a lot more sophisticated than what California's parser is capable of today.
A library that could tokenize by words would be a welcome addition. Bastien's correct, the utility of that goes well beyond calendars (or email clients). I've not done any research for existing FLOSS libraries that do such work. That would be my first step.
If such technology was adopted/written by GNOME, consider this one vote against putting it into GTK+.
Also, some wiki pages about California's Quick Add worth linking to here:
California's simple NLP is a great startpoint. It actually assumes some things:
- Sentences can be separated by whitespaces and punctuation
- Significant words come AFTER prepositions
- More than that: it assumes that it NEEDS prepositions
We should avoid these assumptions if we are really willing to write an universal word segmenter lib. Maybe this fictional library could handle some other things, like word similarity (in order to, say, GNOME Shell to search for wrongly typed things), stemming, etc.
Also, I think it would be great if this library supports plugins. With it's base, devs could write amazing features (speech recognition, relationship & emotion extractors, natural language generators, anything else I can't imagine right now).
That's not entirely true; some words don't require prepositions and are gleaned merely by their formatting (i.e. times and dates). In some cases, prepositions are used as a clue for details' meanings, but are not strictly required. This allows for those words to be stripped from the event summary (which is the "bucket" a word falls into if its not parsed). This is why translators are told to leave certain word lists empty if they don't have corresponding words in their language.
When you say "we should avoid these assumptions if we are really willing to write an universal segmenter lib", I guess I'm thinking that this proposed library is concerned with tokenizing *words*, not the much more difficult task of generating sentence structure. In other words, it would solve the first problem in your list but not the second or third.
The philosophy of the California parser is best-attempt. If it's unable to glean sufficient details from the string to build an event, the event editor is displayed for the user to clean up and complete the information. I think this is important to keep in mind going forward.
Thanks for your clarifying, Jim, and sorry for my misconceptions. I'm saying this because I had a minimum contact with NLP and, as far as I remember, some (most?) eastern languages cannot simply be segmented without trained algorithms.
Regarding date fields, can we input a date in the form as a shortcut
+0 for today or enter = for today
+1 for tomorrow
+14 for two weeks hence
+365 for next year, same month, and year.
+0 is is so much more convenient than entering 2015-09-24
And for time, it can remain as it is, until day is resolved.
(In reply to Jean-François Fortin Tam from comment #0)
> For event dates and times, it would be nice to let the user type free-form
> with strings like:
> - Today
> - Tomorrow
> - Friday
> - Next monday
> Both in the user's native language and in English (some people to type
> strings directly in English).
> As well as various number formats as seen on http://xkcd.com/1179/
> Time could accept formats such as:
> - 22h
> - 22
> - 22h30
> - 22:30
> - 2230
> - 10pm, 10 PM, 10h PM, ... (if you want to go nuts)
> We could even go further and autodetect date/time keywords like that inside
> event descriptions that are typed in the simplified "New event" dialog.
Please review my idea about another way to do what you would like to do.
Note +7 for next week would be the same in ALL languages.
*** Bug 766906 has been marked as a duplicate of this bug. ***
-- GitLab Migration Automatic Message --
This bug has been migrated to GNOME's GitLab instance and has been closed from further activity.
You can subscribe and participate further through the new bug through this link to our GitLab instance: https://gitlab.gnome.org/GNOME/gnome-calendar/issues/3.