Subject: | Re: Like
| Date: | Thu, 28 Dec 2017 08:38:41 +1000
| From: | Bernie van't Hof <berniev@bje.com.au>
| Newsgroups: | pnews.paradox-dos
|
tldr;
It certainly appears likely that pdox uses a form of soundex. First
character critical then focusing on next four or so characters. Use sql
SOUNDEX function.
A bit of research:
Soundex was originally patented in 1918 (!!). So-called "American
Soundex" was first used in 1930. The accepted modern version is as Knuth
described in 1973. But he also described a modified version in the same
document.
mysql states it uses the original version defined by Knuth, but Wiki
suggests most db's (except PostgreSQL) use the modified version so
things get a bit unclear.
PHP provides soundex per Knuth (but doesn't say which one) as well as
levenshtein and metaphone.
I don't think it really matters which one we use as it seems nobody
knows what pdox was doing anyway.
An interesting trick to solve the first character limitation is to
prepend a common character to each term, but as we're emulating pdox
behavior that is is irrelevant here.
Conclusion:
Looks like sql default soundex is an appropriate translation for pdox
"LIKE".
Anyone disagree?
- Bernie
On 22/12/17 10:05 pm, Michael Kennedy wrote:
> On 22/12/2017 00:38, Steven Green wrote:
>> don't recall all of the testing that was done, but it was generally
>> concluded that it was skewed towards the first 5 characters of the
>> string, and was less predictable than most dot-dot options.. don't
>> know how you'd want to reproduce it, if you can't predict it
>
> AFAIR, I NEVER used "like", because I never knew how it worked.
>
> But, with that 5-char skew that Steve remembered, perhaps it was an
> implementation of (standard?) "Soundex"? If so, it would apply only to
> alphabetic (maybe alphanumeric?) strings:
> - always do an exact match on the very first char,
> - then match on the (english?) "sound" of the next 3/4/5/6/.. chars.
>
> Eg:
>
> If QBE has: Like paradox, this would translate to a Soundex code of P632
>
> Then, DB records with Paradox, Poridex, Piridicks, etc (which also
> translate to P632) would be matches.
>
> (I've not run the above example).
>
> - Mike
>
>
>> OK Steve, I'll take the bait!
>>
>> What is known of the traps/bugs/features/poo in 'LIKE'?
>>
>> Anyone wanna do some tests?
>>
>> I'll be quiet for a while trying to sensibly parse what we have so far,
>> but will be lurking ...
>>
>> - Bernie
>>
>> ---
>> This email has been checked for viruses by Avast antivirus software.
>> https://www.avast.com/antivirus
>>
|