PmWiki / SearchImprovements

SaintMagoo? December 28, 2010, at 3:24 AM: Thanks, Peter. At the moment we would hate to have to usher-on a database. One of the reasons why we love PmWiki is that we did not have to mess with that.

Thanks anyway,

-Rn :)


If you're interested I had a pretty good start towards replacing .pageinfo with an SQLite database. The problem I ran into was simply a question of optimization. Every search I threw at it worked better with the simple fullscan of the text file (.pageindex) as compared with the indexed database... Let me know if you'd be interested in the code. Peter Bowers? December 26, 2010, at 04:38 PM


SaintMagoo? December 26, 2010, at 10:44 AM: Thanks, Peter. Impressive - most impressive :)

Christmas Idea: Like .pageinfo, how about creating another two files? One that is indexed: Containing only file names, it can be a way to register a page that has been processed. After that, .pageinfo2 could use the registered-page index-number in a sorted word-list dictionary. Naturally sorted and binary searched, imo such a new search-system could easily replace + hyper-speed up the present searching subsystem?

The only drawback here is the relative speed of growing the dictionary-file: Inserting new words could be slow. However, if the Dictionary is also indexed, then the main dictionary need not be sorted per se. -Only the dictionary-index-file need be sorted. Thereafter, inserting a mere integer + offset into the Dictionary's index - when a new word is added - would speed dictionary-growth-up a lot?

I am considering writing this beastie. Even (worst case) for Cron use, when a page-file mtime is greater than the registry-file mtime, it is a sign that a page or three might need to be re-indexed. Should be fun.

.02 ends


Try GoogleSearch?. Peter Bowers? December 25, 2010, at 03:02 PM


SaintMagoo? December 24, 2010, at 09:30 AM: We just uploaded 200,000 pages to our PmWiki. Not too surprisingly, searching is now glacial. After taking a peek at .pageinfo we see that is it a listing of pages, with keywords. Are there any plans afoot to do something a little more 'Googlie?

In the mean time, aside from deleting .pageinfo, is there an easy way to turn the search feature off? Found it - just remove it from z forms in the .tmpl. One might also want to set $EnablePageIndex = 0;

Tanks,

-Rn


lordmundi? November 06, 2007, at 10:54 AM: Just to add to the discussion below, I thought I would put a link to a sample search result on Renato's site with Sphider integrated:

lordmundi? November 05, 2007, at 09:05 AM: Wow... I really like the Sphider integration you did Renato?!! It looks great. Looking at the sphider site, this looks like it could be a great cookbook recipe for pmwiki. I'm wondering how you or someone else might do the following:

All in all, it looks really nice. -- FG?
Thanks! And I'm sorry I've messed up with your original post, but I guess this was going to be clearer than doing a new "post" above that one. So: my comments are in purple not to mess it ALL up. lol Renato?

08/31/07 - Renato? - Okay, six months later, I think I've got an idea. I've been playing with Sphider since yesterday. I could implement the search feature in one night (I had some difficulties managing on how to get the results INSIDE the main "window" on PmWiki - most of PmWiki code is Greek to me)... I'm having only a small bug with the "Did you Mean" feature (it gets weird when I use capital letters), but other than that (or if you disable it), the search engine is running ok. I'll try to solve that by tonight, but I can't make any promises. And I've messed with a lot of codes, randomly, so I'll have to check it all so to discover what I've done. :P (yeah, I should start writing changes history...)

Oh, anyone can take a look at it on my site, if you don't mind reading Portuguese. :P Good keyphrases are "guitarra elétrica", "symphony x", "steve howe". It will give you the idea. If you want to know what the bug is, search for "Stevee". I've changed the code so to take everything in lowercase. If that's ok with PmWiki, it will be an option. :)

Henning? July 18, 2007, at 12:38 PM: It just occurred to me that it would be nice to have a search engine that on request excludes pages older than a certain date from the result (in order to concentrate on recent content). Just brainstorming ...

Henning? February 22, 2007, at 10:40 AM:' I'd be interested in a solution for multiple buttons, too. I`ve seen multiple search buttons used in a non-wiki CMS, and it looks like an efficient user interface device I'd like to copy.

02/07/07 - Renato? - The tips on this thread (PmWikiUsers:2006-October/034807.html) are great for the ones willing to search only for titlenames. Is it possible to have two buttons (Go/Search, as in MediaWiki, for instance)? One for searching titles and the other one for searching content?

12/14/06 - (:searchresults:) can be customized by editing page Site.Search, see also Search for pages.

6/5/06 - I totally understand the frustration with PmWiki's search results... But perhaps the issue has come to enough of a head that it's time for me to go ahead and implement a valid way to excerpt (and possibly rank) search results, even if it's very suboptimal in a number of respects. Most notably, it will be suboptimal in terms of speed -- every task and option we add to searching/page lists makes it run even slower than it does now.

I think I need to remind the group that PmWiki is not a search engine, has never been designed to be a search engine, and I have no intent to make it one. My stance on searching continues to be that if a site wants fast searches with relevance ranking of results and excerpted text outputs, then get a "real" search engine that is designed for such tasks and let it index the PmWiki site. (Bonus: such an engine can index and search things that aren't wiki pages, such as attachments or other static pages on the site.)

I should also point out that any author can create a custom search page on pmwiki.org, it doesn't require me to do it. For example, to have a search page that defaults to fmt=#title for its output, just create a page that looks something like:

(:searchbox:)

(:searchresults fmt=#title order=title:)

See, for example, https://www.pmwiki.org/wiki/Test/SearchByTitle . Then use that custom search page for searching instead of the PmWiki default.

Still, I'll see if I can write up an page variable in the very near future, as well as an order=rank option.

Pm


Also visit PmWiki.Search for a documented custom search page.


2/1/06 - A lot of people continue to ask for improvements to PmWiki's search capabilities. In the past I've essentially taken the position that "PmWiki is not a search engine", and that using another search engine package (one that is optimized for performing searches) would be much better than me trying to build one of my own.

The pmwiki.org site is starting to become so heavily used that I probably need to set up a search engine there, if only to help keep the server load down. Does anyone have any suggestions for a good, easy-to-install search engine package?

The two I've looked at in detail in the past include:

ht://Dig -- I've used this several times in the past for other

   projects, but it doesn't appear to be actively maintained
   anymore, and integrating it to PmWiki would be slightly kludgey.

swish-e -- I did a few experiments with this and concluded that

   it could be made to work, but curiously it seems to lack any
   sort of convenient "excerpting" capability.  (I could probably
   live without this.)

I also briefly looked at mnoGoSearch, but for some reason I didn't think it was a good fit with what I'm trying to do.

Any suggestions?


6/13/05 - PmWiki's search engine scans the markup text directly, not the page's rendered output.

might make it possible for PmWiki's search to also scan the rendered version of the text, so maybe we could go that way... :-)


4/15/05 - I've always maintained that PmWiki *isn't* a search engine, and for advanced searches a site is much better off integrating an existing search engine package rather than us trying to reinventing that particular wheel.

Still, there are times when it may be useful to provide teasers to things that aren't "searches". Most search engines have no clue of PmWiki's structures such as groups, trails, or categories, and so being able to provide teaser information in the context of those structures still makes a lot of sense.


6/13/04 - But your point is well taken. I never really thought of searching for markup sequences. :-) > > I actually do that now and then, so I think we actually have to implement > our own search engine.

Well, I wasn't planning to eliminate the search engine, either. I've just felt that once a basic search capability is available that meets the needs of most PmWiki users, my time and effort is better spent on other aspects of PmWiki and not reinventing search engines that already exist.


(Old content added to this page before Pm ever got a chance to write anything.)

After Pm made this empty entry, I shamelessly hijacked it to think aloud, maybe spark ideas in others :) I'll presently move these scribbles to PITS entries.

   $SearchPatterns['normal'][] = "!^$FullName\$!";

-Radu? March 11, 2005, at 01:43 PM


Radu? March 14, 2005, at 11:25 PM
But (:pagelist:) does not allow SearchString ... or does it? It's definitely not documented under Page Directives

Pico? March 27, 2006, at 03:51 PM
Apparently it does. Take a look at PageLists for more about pagelist (and searchresults?). As for Page Directives, that needs work and is catagorized for Documentation To Do.

Category: PmWiki Design DocumentationToDo

This page may have a more recent version on pmwiki.org: PmWiki:SearchImprovements, and a talk page: PmWiki:SearchImprovements-Talk.