Fulltext message searching

classic Classic list List threaded Threaded
10 messages Options
Reply | Threaded
Open this post in threaded view
|

Fulltext message searching

Ben-158

Hello,

Has a fulltext message searching feature been considered? I think it would
be very handy. It could use a MySQL table to store hashes of the message
text (with all HTML, nontext characters, and other formatting removed to
reduce size). When a search is performed, it could utilize MySQL's FULLTEXT
feature to deliver lightning fast search results similar to gmail, hotmail,
etc.

It could maybe have an automated indexer that works in the background
indexing one or two unindexed messages at a time on each ajax request from
the browser (after the data is returned to the browser to prevent lagging).

The feature could of course be turned off (maybe off by default?) if the
user didn't want to use the extra diskspace/processor cycles.

Does this sound like something you'd like to see in RC? If so, I can
develop the backend for this system if somebody else can write the
interface/ajax integration.

Thanks,
Ben

_______________________________________________
List info: http://lists.roundcube.net/dev/
Reply | Threaded
Open this post in threaded view
|

Re: Fulltext message searching

Ethan Erchinger
I recommend trying Sphinx. Most of the hard work has been taken care of
already.

http://www.sphinxsearch.com/


-----Original Message-----
From: Ben [mailto:[hidden email]]
Sent: Wednesday, October 17, 2007 12:16 PM
To: [hidden email]
Subject: [RCD] Fulltext message searching


Hello,

Has a fulltext message searching feature been considered? I think it
would
be very handy. It could use a MySQL table to store hashes of the message
text (with all HTML, nontext characters, and other formatting removed to
reduce size). When a search is performed, it could utilize MySQL's
FULLTEXT
feature to deliver lightning fast search results similar to gmail,
hotmail,
etc.

It could maybe have an automated indexer that works in the background
indexing one or two unindexed messages at a time on each ajax request
from
the browser (after the data is returned to the browser to prevent
lagging).

The feature could of course be turned off (maybe off by default?) if the
user didn't want to use the extra diskspace/processor cycles.

Does this sound like something you'd like to see in RC? If so, I can
develop the backend for this system if somebody else can write the
interface/ajax integration.

Thanks,
Ben

_______________________________________________
List info: http://lists.roundcube.net/dev/
_______________________________________________
List info: http://lists.roundcube.net/dev/
Reply | Threaded
Open this post in threaded view
|

Re: Fulltext message searching

Raoul Bhatia [IPAX]
In reply to this post by Ben-158
Ben wrote:
 > Hello,
 >
 > Has a fulltext message searching feature been considered? I think it
would
 > be very handy. It could use a MySQL table to store hashes of the message
 > text (with all HTML, nontext characters, and other formatting removed to
 > reduce size). When a search is performed, it could utilize MySQL's
FULLTEXT
 > feature to deliver lightning fast search results similar to gmail,
hotmail,
 > etc.
 >
 > It could maybe have an automated indexer that works in the background
 > indexing one or two unindexed messages at a time on each ajax request
from
 > the browser (after the data is returned to the browser to prevent
lagging).
 >
 > The feature could of course be turned off (maybe off by default?) if the
 > user didn't want to use the extra diskspace/processor cycles.
 >
 > Does this sound like something you'd like to see in RC? If so, I can
 > develop the backend for this system if somebody else can write the
 > interface/ajax integration.

i think it would be a much appreciated feature. and having the backend
in place, i think one can easily extend it to search other information
in the headers (e.g. explicitly search for to, mozilla flags, etc.)

thou i have no time in helping you implement it, i would gladly test it
after i get some time after nov. 19th.

cheers,
raoul
--
____________________________________________________________________
DI (FH) Raoul Bhatia M.Sc.          email.          [hidden email]
Technischer Leiter

IPAX - Aloy Bhatia Hava OEG         web.          http://www.ipax.at
Barawitzkagasse 10/2/2/11           email.            [hidden email]
1190 Wien                           tel.               +43 1 3670030
FN 277995t HG Wien                  fax.            +43 1 3670030 15
____________________________________________________________________
_______________________________________________
List info: http://lists.roundcube.net/dev/
tfk
Reply | Threaded
Open this post in threaded view
|

Re: Fulltext message searching

tfk
In reply to this post by Ethan Erchinger
Wouldn't we need another "driver" for this? I mean, I know sphinx is
great but it's nowhere as popular as myisam. ;) Even innodb is not as
common in MySQL installations today.

If anyone wants to code this up, I have no objections. But including
it by default wouldn't work out right now. :)

Till

On 10/17/07, Ethan Erchinger <[hidden email]> wrote:

> I recommend trying Sphinx. Most of the hard work has been taken care of
> already.
>
> http://www.sphinxsearch.com/
>
>
> -----Original Message-----
> From: Ben [mailto:[hidden email]]
> Sent: Wednesday, October 17, 2007 12:16 PM
> To: [hidden email]
> Subject: [RCD] Fulltext message searching
>
>
> Hello,
>
> Has a fulltext message searching feature been considered? I think it
> would
> be very handy. It could use a MySQL table to store hashes of the message
> text (with all HTML, nontext characters, and other formatting removed to
> reduce size). When a search is performed, it could utilize MySQL's
> FULLTEXT
> feature to deliver lightning fast search results similar to gmail,
> hotmail,
> etc.
>
> It could maybe have an automated indexer that works in the background
> indexing one or two unindexed messages at a time on each ajax request
> from
> the browser (after the data is returned to the browser to prevent
> lagging).
>
> The feature could of course be turned off (maybe off by default?) if the
> user didn't want to use the extra diskspace/processor cycles.
>
> Does this sound like something you'd like to see in RC? If so, I can
> develop the backend for this system if somebody else can write the
> interface/ajax integration.
>
> Thanks,
> Ben
_______________________________________________
List info: http://lists.roundcube.net/dev/
Reply | Threaded
Open this post in threaded view
|

Re: Fulltext message searching

Thomas Bruederli
In reply to this post by Ben-158
Ben wrote:
> Hello,
>
> Has a fulltext message searching feature been considered? I think it would
> be very handy. It could use a MySQL table to store hashes of the message
> text (with all HTML, nontext characters, and other formatting removed to
> reduce size). When a search is performed, it could utilize MySQL's FULLTEXT
> feature to deliver lightning fast search results similar to gmail, hotmail,
> etc.
>
Hi Ben,

Actually fulltext search is already available but it is currently done by
the IMAP server. By preceding your search term with body: it will send an
according request to the mail server. Depending on the IMAP software this
is more or less fast.

I agree that searching should be done as close to the client as possible.
Using the database we already have would be the best way I guess (apart
from building a proprietary fulltext index), but can we make sure the index
is complete right after the first login? This is what the user expects. But
we don't have any passwords nor does RoundCube know about all users (we
only know the ones who already logged in once).

Indexing a mailbox is something that requires communication between RC and
the IMAP server in advance and over night. To achieve this we have to
change the basics how RC manages user accounts.

Good idea but there are a few things to consider before we can start.

~Thomas

_______________________________________________
List info: http://lists.roundcube.net/dev/
Reply | Threaded
Open this post in threaded view
|

Re: Fulltext message searching

Michael Baierl-2
Thomas Bruederli wrote:
> Good idea but there are a few things to consider before we can start.

Please don't solve problems that don't exist. Yes, IMAP search is
considerably slower then a MySQL Fulltext search, but the question is if
Roundcube wants to be Google? If that's the case it would be fine to
focus on one single Mail server, optimize it for that and maybe directly
store all mails in a SQL database anyways....

I have a lot of mails and usually Courier is fast enough to deliver
results on time. Yes, it would be nice to have results within 0.2
seconds, but I think it's not worth the effort of having all these
tables, duplicate all data in MySQL (no, I don't wanna do that!) and
having a complicate setup.

Instead focus on the important stuff and just do it the smart way. What
bugs me in Squirrelmail Search is that the search *results* are not
cached - so anytime I view a mail and go back to view the next one it's
again performing a search... this is the real problem there, not the
speed of the first initial search.

And honestly, the IMAP Server holds the data and he is also able to
search it as defined in the standard, so the Webmail System should not
go ahead and try to duplicate this functionality. The same applies for
filters etc....

My 2 cents,

Mike
--
Michael Baierl
mbaierl.com   http://mbaierl.com/
- - - - - - - - - - - - - - - - -
"Die große Mehrzahl unserer Importe kommt von außerhalb des Landes."
George W. Bush
_______________________________________________
List info: http://lists.roundcube.net/dev/
Reply | Threaded
Open this post in threaded view
|

Re: Fulltext message searching

Thomas Mangin
In reply to this post by Thomas Bruederli
Thomas Bruederli wrote:
> Actually fulltext search is already available but it is currently done by
> the IMAP server. By preceding your search term with body: it will send an
> according request to the mail server. Depending on the IMAP software this
> is more or less fast.
>  
Thank you for the "body:" trick, is there a similar "hidden" way to get
roundcude to only display un-read emails ?

Thomas
_______________________________________________
List info: http://lists.roundcube.net/dev/
Reply | Threaded
Open this post in threaded view
|

Re: Fulltext message searching

fuzzyping
In reply to this post by Thomas Bruederli
On Thu, 18 Oct 2007 09:08:34 +0200, Thomas Bruederli <[hidden email]> wrote:
>
> Actually fulltext search is already available but it is currently done by
> the IMAP server. By preceding your search term with body: it will send an
> according request to the mail server. Depending on the IMAP software this
> is more or less fast.

As a workaround, would it be reasonable to pull "foo" and "body:foo" queries in parallel and return the unique messages?

--
Jason Dixon
DixonGroup Consulting
http://www.dixongroup.net

_______________________________________________
List info: http://lists.roundcube.net/dev/
Reply | Threaded
Open this post in threaded view
|

Re: Fulltext message searching

Jason Fesler
> As a workaround, would it be reasonable to pull "foo" and "body:foo"
> queries in parallel and return the unique messages?

Enabling and using server side search should be a config
flag. Don't just hit both and thrash everyone and everything.
That'll burn CPU and worse I/O capacity for no good reason.

BTW, if you want fast imap server side search.. look at cyrus with daily
runs of the cyrus squatter (full text indexing) utility.   Cyrus is just a
bit much to set up and maintain.  The server side search with squatter
databases built is wicked fast even on huge mailboxes.  When I last ran
Cyrus I was regularly opening/searching/closing mailboxes with 30,000
messages in them.

_______________________________________________
List info: http://lists.roundcube.net/dev/
Reply | Threaded
Open this post in threaded view
|

Re: Fulltext message searching

fuzzyping
On Thu, 18 Oct 2007 08:16:56 -0700 (PDT), Jason Fesler <[hidden email]> wrote:

>> As a workaround, would it be reasonable to pull "foo" and "body:foo"
>> queries in parallel and return the unique messages?
>
> Enabling and using server side search should be a config
> flag. Don't just hit both and thrash everyone and everything.
> That'll burn CPU and worse I/O capacity for no good reason.
>
> BTW, if you want fast imap server side search.. look at cyrus with daily
> runs of the cyrus squatter (full text indexing) utility.   Cyrus is just a
> bit much to set up and maintain.  The server side search with squatter
> databases built is wicked fast even on huge mailboxes.  When I last ran
> Cyrus I was regularly opening/searching/closing mailboxes with 30,000
> messages in them.

I use Courier and am very happy with the speed of it.  I'm just recommending the option for ease-of-use.  I've been following RC since damn near its inception, and this is the first I've heard of "body:" snatch^H^H^H^H^H^H searching.  :)

--
Jason Dixon
DixonGroup Consulting
http://www.dixongroup.net

_______________________________________________
List info: http://lists.roundcube.net/dev/