|Subject:||Re: date grief
|Date:||Wed, 20 Dec 2017 07:29:39 +1000
|From:||Bernie van't Hof <email@example.com>
Thanks for staying with me on this.
Firstly, following your suggestion I added records:
Query Prev Now
========== ======== ========
1. PPOO => PPOO
2. "PP"+"OO" => PPOO
3. "PP".. => PPOO PP, PPOO
4. "PP".."O" => PPOO
5. PP+"OO" => "Wrong type" (amended) But CALC PP+"OO" => PPOO (!)
6. "PP"+"O".. => Empty
So the only difference is #3 which behaves as expected.
My Stream/Tokeniser/Parser combination actually works quite well. It is
simple left-to-right with effectively one function per token (actually
uses switch statements) handles precedence and is recursive. I actually
did study a number of docs like you suggest when writing it.
I recognise that issues in qbe may require some extra smarts in the
parser, but so far the it has handled everything (except some qbe
expressions) that has been thrown at it.
Summary of difficulties:
a) It is proving very difficult to understand how pal qbe is MEANT
to be interpreted. As seen in these discussions there appears
no simple answer, and we have the best pdox dos minds on the
b) Operators in dates have context-sensitive meaning.
E.g. '/', '-' can be arith or date sep, '.' decimal or date sep.
c) 3 (or 4) date formats (Except where a wildcard is used) using 3
d) qbe doesn't rely on white-space separation
E.g. 23-Sep-2000-1-2 is legal and means 23-Sep-2000 -1 -2
e) Wildcards plus arith operators appears illogical. We can fit an
argument to suit the question, but that's not much help to the
'normal' programmer who reasonably expects consistency with the
way pal works. With respect, that "PP".."O" is interesting, after
all the discussions here, is telling, but that example DOES work
exactly as one might expect.
f) Separators in dates with wildcards must be per CCP default.
Others are not supported.
g) If a wildcard is used with an arith op the answer is empty. Even,
as we have seen most recently, with string literal concatenation.
h) In a date calc the left side must be a date
i) Different behaviours in CALC
At the risk of repeating myself, the current problem is not how to parse
the qbe, it is precisely defining its rules/grammar.
IMHO Results so far indicate we have learnt to make our queries fit what
works, not following some simple, consistent, well-documented rules.
You can't write the grammar if you don't know what you expect it to do
or if what it is to do doesn't make reasonable sense, is ambiguous or
The parsing certainly has difficulties as outlined, but that can be
handled one way or another later, currently experimenting with a sort of
My current target rule set looks like:
* =============== Rules=============
* Supported date formats:
* YY can be Y, YY, YYY or YYYY
* 2-Digit years: >=70 => 19xx, <70 => 20xx
* Date calc left side must be date (might remove this one)
* Arith only if left side is date or num
* String concat "PP"+OO and PP+"OO" ok
* Recognise only default separator (per CCP)
* No arith allowed in expressions containing wildcards (treat as
* Non-Integer numeric components rounded UP in date calcs (makes
sense as anything after a date is a later date)
* Space terminates date (but not required)
* ... is DotDot Dot (lexer)
Note I have decided (for now) to only allow date separator per CCP
default, and will allow string concat as shown.
On 19/12/17 9:07 pm, Leslie wrote:
> You seem to be getting totally confused because it makes sense to me,
> and this is why you need to build a proper expression evaluator into
> your parser.
> So "PP" + "O"..
> effectively becomes "PP" + ("O"..) and not ("PP" + "O")..
> which I think is what you are seeing. Of course you could specify the
> brackets in the QBE yourself if you meant the latter.
> Now it depends upon Paradox as to how the expression gets evaluated, but
> I would guess that ("O"..) gets evaluated first and fails to match, and
> so we now get:
> "PP" + failed match result, which could either immediately fail to match
> or gets evaluated as "PP" + "" which also fails to match if you only
> have "PPOO" in the table.
> Even if Paradox evaluates "PP" first, then you get: fails to match +
> "OO".. which again could either immediately fail to match, or be seen as
> "" + "OO".., which again fails to match.
> So I do not expect a silent fail but rather an empty answer table based
> upon the test data.
> If you adjust your test data to have a record with just "PP" and also
> just "OO" what gets returned? Give variations like this in your test
> data a try to see if you can prove what happens.
> Note that "PP".. + "O" should also return no results (depending upon
> your test data, because ("PP"..) + "O" would result in "PPOO" + "O"
> which becomes "PPOOO" and does not match.
> Again, create a test record, to try and prove this.
> The one I do find interesting is: "PP".."O"
> Because a correct result gets returned, it means that it truly is seeing
> .. as a pattern matching placeholder and not an operator and thus
> matches on the single string PP..OO, - there is actually very clever
> coding going on inside Paradox to handle this.
> Bernie I know the Paradox documentation refers to .. as a wildcard, but
> we have now proven many times that in fact it is not.
> .. is actually a pattern match placeholder, which matches differently
> depending upon the actual field type.
> In other words although @ could be considered to equate to the ?
> standard wildcard
> .. cannot be considered to equate to the * standard wildcard or even the
> % SQL operator (but it is close in some cases).
> You should compare it to pattern characters in regex expressions to get
> a better idea of what it it trying to achieve (albeit more primitive).
> Also if you rewrite the expressions on paper to include brackets as I
> have above it also may help to clarify things when the results appear
> On 19/12/2017 1:12 PM, Bernie van't Hof wrote:
>> It doesn't end..
>> Data in column: PPOO
>> In corresponding text field query:
>> PPOO => OK
>> "PP"+"OO" => OK
>> "PP".. => OK
>> "PP".."O" => OK
>> PP+"OO" => OK
>> "PP"+"O".. => Silent fail
>> Looks like string concatenation operator is is same bucket as arith when
>> wildcard present.
>> I guess after all we've been through I shouldn't be surprised.
>> - Bernie
>> PS: Merry Christmas!
>> On 12/12/17 12:07 am, Leslie wrote:
>>> I have been thinking about the leading Zero problem as per Dan Ehrmann.
>>> Now this may not need to be said, but I will say it anyway as it may
>>> help explain some of the results.
>>> The fact that ../9/.. does not match records when it should, but
>>> ../09/.. does is actually quite important.
>>> What this tells me, and I am ready to stand corrected, is that Paradox
>>> is not performing any true evaluations with .. and probably @@ as well.
>>> What I think Paradox is doing is retrieving the value from the record
>>> and constructing it into its display format and holding as a string.
>>> Then it is doing a string comparison, whilst also catering for
>>> wildcards. So ../9/.. will never pattern match 01/09/2016 which is an
>>> example of what gets constructed - hence an empty result set.
>>> This is completely different from how I would do it, which would be to
>>> retrieve the date, extract the Day, Month and Year components and
>>> compare their values with those specified in the QBE (which I would
>>> also convert to shorts in the process). In other words no pattern
>>> matching occurs which removes the need for display formatting
>>> information. However, they did not do this as it might have been too
>>> expensive in terms of the code required (we only have 640K remember).
>>> So, this means that Paradox really is doing pattern matching when a ..
>>> operator is present. This may also be true for other types of fields
>>> as well, in fact I would put money on it.
>>> But do not forget that Dan also states that 5/9/89 does return the
>>> correct result set without needing the leading zero. In this scenario,
>>> I believe that Paradox is doing binary matching and so the display
>>> format never comes into play.
>>> Therefore, and I think this is what Bernie shows with his results, QBE
>>> expressions get evaluated using an entirely different algorithm when
>>> .. is present and that is likely always as a string and possibly for
>>> any field types.
>>> I also think I can explain the 1+ and +1 before and after the pattern
>>> as well.
>>> 1 + Date pattern should be able to be evaluated like Date Pattern + 1
>>> does. Instead we get the Wrong type error.
>>> Upon reflection I think this is due simply how they parse the Date
>>> expression to extract out the date pattern itself. So they are parsing
>>> left to right in a single pass and then when they hit a non-date
>>> operator *after* the *date* expression/pattern they then syntax check
>>> it probably to make sure they got a valid pattern.
>>> At this point we get the Wrong Type error (but Syntax Error might have
>>> been a better message).
>>> Operators after the date do get evaluated correctly as they have
>>> already processed/evaluated the pattern match and so they can then
>>> apply date arithmetic as they continue on their merry way parsing left
>>> to right.
>>> A bit of a mess and I would need to test this theory out to be sure,
>>> but what it comes down to is that some things could be done for cheap
>>> or even free in terms of the coding, but others could require
>>> something more complex such as multiple passes, which would require
>>> more memory.
>>> So, fancy stuff got dropped and this was all probably documented. We
>>> just don't have that documentation.
>>> So to conclude, I think that once the wildcard operators are present
>>> Paradox then does string matching and this can cause strange results
>>> to be produced as Bernie reports.
>>> It makes sense (to me at least).
>>> This email has been checked for viruses by AVG.