School Map:
Home / Enroll
$ Student Records; Class
$ News @ LIVE
Tech Radio * Support/FAQ
| Store | FORUMS
Class / Forums / homeroom ![]() |
MySQL Full Text Searching |
(#89 2003-02-13 21:02:41) Post Reply |
|
logspirit
|
Full Text Searches in MySQL
Since MySQL version 3.23.23 MySQL has had the ability to do searches on CHAR, VARCHAR, and TEXT fields using a FULLTEXT index. These are very useful to create search engines for Your web site. This only works with MyISAM tables, which are the default. To create a table with searchable fields use the following syntax: CREATE TABLE ArticleSearch(art_id INT AUTO_INCREMENT NOT NULL, art_title VARCHAR(60), art_author_fname VARCHAR(15), art_author_lname VARCHAR(15), article TEXT, PRIMARY KEY(art_id), UNIQUE ID(art_id), FULLTEXT(art_author_lname, art_title, article)); This creates FULLTEXT indexes. If You already have a database with tables set up and want to add search capabilities use the following syntax: ALTER TABLE table_name ADD FULLTEXT(field_name_a, field_name_b); Note: One or more fields may be indexed as FULLTEXT if they are CHAR, VARCHAR, or TEXT. To search for words use the following SQL statement: SELECT * FROM ArticleSearch WHERE MATCH(article) AGAINST('word_a,word_b'); which returns entire rows where the search is sucessful for either or both of the words: 'word_a' and 'word_b' Searches are Case Insensitive, so 'ROCK', 'rock', and 'rOCk' will all perform the same way. Although the WHERE key word is optional, by using it, only the relevant results are returned in desending order - by revelance - best matches first. Without WHERE, all rows are returned, including those with 0 revelance!!! - or - SELECT art_id FROM ArticleSearch WHERE MATCH(art_title, article) AGAINST('word_a'); which will return the row index for further processing. Of course You can request returns on whichever field(s) You want. To return the revelance ranking or score: SELECT art_id, MATCH(art_title, article) AGAINST('word_a') AS score FROM ArticleSearch WHERE MATCH(art_title, article) AGAINST('word_a'); The above query will SELECT two fields for each revelant row: the art_id and the search revelance ranking AS score. MySQL 'knows' that the same MATCH( ) AGAINST search is requested, and only does it once, to save time. There are several built in limitations to this process: - words of 3 or less characters are IGNORED - words that appear in MORE than 50% of the searched fields are IGNORED - many common 'junk' words, built into a list within MySQL are IGNORED To overcome these limitations when using versions of MySQL of 4.0.1 or later, a BOOLIAN SEARCH may be used. Boolian searches allow the use of terms such as ('+rock, +roll') These operators are used: '+' must appear, '-' must not appear, '<' decrease revelance, '>' increase revelance, '( )' parentheses separate and/or combine expressions, '~' reverses revelance ranking for word, '*' wildcard to append to word, '" "' quotes create full phrases to be searched for. To do a Boolian Search use the following syntax: SELECT art_id FROM ArticleSearch WHERE MATCH(art_title, article) AGAINST('+word_a -word_b <word_c ~word_d Band* "rock and roll" (>rock >roll)' IN BOOLIAN MODE); Notice the key phrase IN BOOLIAN MODE which makes it so. Here word_a must appear, word_b must not appear, word_c has reduced revelance, appearances of word_d have negative value, Band Bands Bandaid bandstand may or may not appear as in regular searches, instances of the phrase "rock and roll" will be found, the words: rock roll will have higher revelance. For both regular searches and boolian searches: The order of the words in the search string, their uniqueness in the English language, and the number of times they appear in a field all affect search results - rare words have HIGHER rank. From a comment posted by Patrick O'Lone on the MySQL site: http://www.mysql.com/doc/en/Fulltext_Search.html It should be noted in the documentation that IN BOOLEAN MODE will almost always return a relevance of 1.0. In order to get a relevance that is meaningful, you'll need to: SELECT MATCH('Content') AGAINST ('keyword1 keyword2') as Relevance FROM table WHERE MATCH ('Content') AGAINST('+keyword1 +keyword2' IN BOOLEAN MODE) HAVING Relevance > 0.2 ORDER BY Relevance DESC Notice that you are doing a regular relevance query to obtain relevance factors combined with a WHERE clause that uses BOOLEAN MODE. The BOOLEAN MODE gives you the subset that fulfills the requirements of the BOOLEAN search, the relevance query fulfills the relevance factor, and the HAVING clause (in this case) ensures that the document is relevant to the search (i.e. documents that score less than 0.2 are considered irrelevant). This also allows you to order by relevance.
|
|
View Previous Thread Print This Page |
School Map:
Home / Enroll
$ Student Records; Class
$ News @ LIVE
Tech Radio * Support/FAQ
| Store
Advertising
| Link To Us | Privacy
Copyright 2000-2002 Script School Productions / KMR
Enterprises
No part of this website may be reproduced, copied and/or distributed
in any medium
without express written permission