|Subject:||Re: date grief
|Date:||Wed, 20 Dec 2017 10:54:12 +1000
|From:||Leslie <"ViaThe List"@NOSPAM.com>
I am following this more as a mental exercise than any goal because I
used to write compilers back in the day for TRS80/Z80 machines, so I
find it interesting.
Anyway it appears that the #6 result with the modified test data shows
1. "PP" + "O".. *is not* being treated as ("PP + "O")..
but likely (unproven) as "PP" + ("O"..)
2. It does not resolve to any of the following combinations
"PP" + ""
"PP + "O",
or "PP" + "OO"
What this means is that when a pattern match placeholder is present,
Paradox does *not* break the expression down into sub-expressions and do
a recursive term/factor evaluation.
In fact, the only conclusion I can make is that (at least for a string
field) as soon as there is a pattern match placeholder present, the
presence of any arithmetic operators produces an empty result set.
Which closely follows the results when using patterns in date fields.
So perhaps it is much simpler than we thought.
In terms of the rules:
I am getting a picture in my mind that there are multiple expression
evaluators going on inside Paradox and *two sets of evaluation rules
exist per field type*.
One is the "expected" field type specific term/factor processing.
The other being that as soon as the .. pattern matching placeholder is
present, a different evaluation process kicks in.
My guess is that .. was added later and rather than modify the existing
sets of rules and potentially breaking things, they added a new set
which gives them better control over the pattern matching *per field type*.
I also believe that an expression with both .. and arithmetic operators
present work only either by accident or specific limited cases were
coded to work - with .. always taking priority.
IMO this locked them into the parsing nightmare you are seeing, with
somethings working as expected but mostly producing empty result sets.
Unless we disassemble Paradox itself, I cannot think of a way to prove
this but it makes sense given the results across your various posts.
The last nagging issue for me is test #5. This should have resulted with
Instead some sort of syntax check kicked in to produce the error message.
So in this case, I assume that if one part of the expression is quoted
then all parts must be quoted.
It is interesting (and annoying) that Calc is different.
In conclusion and without further information to hand.....
I think that if .. and arithmetic operators are present then returning
an empty result set is a valid response. In other words, the table does
not even get read.
And finally as I suggested in a previous post,
"PP" + "O".. is not the same as 'PP' + 'O%' in SQL.
On 20/12/2017 7:29 AM, Bernie van't Hof wrote:
> Thanks for staying with me on this.
> Firstly, following your suggestion I added records:
> Query Prev Now
> ========== ======== ========
> 1. PPOO => PPOO
> 2. "PP"+"OO" => PPOO
> 3. "PP".. => PPOO PP, PPOO
> 4. "PP".."O" => PPOO
> 5. PP+"OO" => "Wrong type" (amended) But CALC PP+"OO" => PPOO (!)
> 6. "PP"+"O".. => Empty
> So the only difference is #3 which behaves as expected.
> My Stream/Tokeniser/Parser combination actually works quite well. It is
> simple left-to-right with effectively one function per token (actually
> uses switch statements) handles precedence and is recursive. I actually
> did study a number of docs like you suggest when writing it.
> I recognise that issues in qbe may require some extra smarts in the
> parser, but so far the it has handled everything (except some qbe
> expressions) that has been thrown at it.
> Summary of difficulties:
> a) It is proving very difficult to understand how pal qbe is MEANT
> to be interpreted. As seen in these discussions there appears
> no simple answer, and we have the best pdox dos minds on the
> b) Operators in dates have context-sensitive meaning.
> E.g. '/', '-' can be arith or date sep, '.' decimal or date sep.
> c) 3 (or 4) date formats (Except where a wildcard is used) using 3
> different separators.
> d) qbe doesn't rely on white-space separation
> E.g. 23-Sep-2000-1-2 is legal and means 23-Sep-2000 -1 -2
> e) Wildcards plus arith operators appears illogical. We can fit an
> argument to suit the question, but that's not much help to the
> 'normal' programmer who reasonably expects consistency with the
> way pal works. With respect, that "PP".."O" is interesting, after
> all the discussions here, is telling, but that example DOES work
> exactly as one might expect.
> f) Separators in dates with wildcards must be per CCP default.
> Others are not supported.
> g) If a wildcard is used with an arith op the answer is empty. Even,
> as we have seen most recently, with string literal concatenation.
> h) In a date calc the left side must be a date
> i) Different behaviours in CALC
> At the risk of repeating myself, the current problem is not how to parse
> the qbe, it is precisely defining its rules/grammar.
> IMHO Results so far indicate we have learnt to make our queries fit what
> works, not following some simple, consistent, well-documented rules.
> You can't write the grammar if you don't know what you expect it to do
> or if what it is to do doesn't make reasonable sense, is ambiguous or
> The parsing certainly has difficulties as outlined, but that can be
> handled one way or another later, currently experimenting with a sort of
> state machine.
> My current target rule set looks like:
> * =============== Rules=============
> * Supported date formats:
> * MM/DD/YY
> * DD-Mon-YY
> * DD.MM.YY
> * YY.MM.D
> * YY can be Y, YY, YYY or YYYY
> * 2-Digit years: >=70 => 19xx, <70 => 20xx
> * Date calc left side must be date (might remove this one)
> * Arith only if left side is date or num
> * String concat "PP"+OO and PP+"OO" ok
> * Recognise only default separator (per CCP)
> * No arith allowed in expressions containing wildcards (treat as
> potential separators)
> * Non-Integer numeric components rounded UP in date calcs (makes
> sense as anything after a date is a later date)
> * Space terminates date (but not required)
> * ... is DotDot Dot (lexer)
> Note I have decided (for now) to only allow date separator per CCP
> default, and will allow string concat as shown.
> - Bernie
> On 19/12/17 9:07 pm, Leslie wrote:
>> You seem to be getting totally confused because it makes sense to me,
>> and this is why you need to build a proper expression evaluator into
>> your parser.
>> So "PP" + "O"..
>> effectively becomes "PP" + ("O"..) and not ("PP" + "O")..
>> which I think is what you are seeing. Of course you could specify the
>> brackets in the QBE yourself if you meant the latter.
>> Now it depends upon Paradox as to how the expression gets evaluated,
>> but I would guess that ("O"..) gets evaluated first and fails to
>> match, and so we now get:
>> "PP" + failed match result, which could either immediately fail to
>> match or gets evaluated as "PP" + "" which also fails to match if you
>> only have "PPOO" in the table.
>> Even if Paradox evaluates "PP" first, then you get: fails to match +
>> "OO".. which again could either immediately fail to match, or be seen
>> as "" + "OO".., which again fails to match.
>> So I do not expect a silent fail but rather an empty answer table
>> based upon the test data.
>> If you adjust your test data to have a record with just "PP" and also
>> just "OO" what gets returned? Give variations like this in your test
>> data a try to see if you can prove what happens.
>> Note that "PP".. + "O" should also return no results (depending upon
>> your test data, because ("PP"..) + "O" would result in "PPOO" + "O"
>> which becomes "PPOOO" and does not match.
>> Again, create a test record, to try and prove this.
>> The one I do find interesting is: "PP".."O"
>> Because a correct result gets returned, it means that it truly is
>> seeing .. as a pattern matching placeholder and not an operator and
>> thus matches on the single string PP..OO, - there is actually very
>> clever coding going on inside Paradox to handle this.
>> Bernie I know the Paradox documentation refers to .. as a wildcard,
>> but we have now proven many times that in fact it is not.
>> .. is actually a pattern match placeholder, which matches differently
>> depending upon the actual field type.
>> In other words although @ could be considered to equate to the ?
>> standard wildcard
>> .. cannot be considered to equate to the * standard wildcard or even
>> the % SQL operator (but it is close in some cases).
>> You should compare it to pattern characters in regex expressions to
>> get a better idea of what it it trying to achieve (albeit more
>> Also if you rewrite the expressions on paper to include brackets as I
>> have above it also may help to clarify things when the results appear
>> On 19/12/2017 1:12 PM, Bernie van't Hof wrote:
>>> It doesn't end..
>>> Data in column: PPOO
>>> In corresponding text field query:
>>> PPOO => OK
>>> "PP"+"OO" => OK
>>> "PP".. => OK
>>> "PP".."O" => OK
>>> PP+"OO" => OK
>>> "PP"+"O".. => Silent fail
>>> Looks like string concatenation operator is is same bucket as arith when
>>> wildcard present.
>>> I guess after all we've been through I shouldn't be surprised.
>>> - Bernie
>>> PS: Merry Christmas!
>>> On 12/12/17 12:07 am, Leslie wrote:
>>>> I have been thinking about the leading Zero problem as per Dan Ehrmann.
>>>> Now this may not need to be said, but I will say it anyway as it may
>>>> help explain some of the results.
>>>> The fact that ../9/.. does not match records when it should, but
>>>> ../09/.. does is actually quite important.
>>>> What this tells me, and I am ready to stand corrected, is that Paradox
>>>> is not performing any true evaluations with .. and probably @@ as well.
>>>> What I think Paradox is doing is retrieving the value from the record
>>>> and constructing it into its display format and holding as a string.
>>>> Then it is doing a string comparison, whilst also catering for
>>>> wildcards. So ../9/.. will never pattern match 01/09/2016 which is an
>>>> example of what gets constructed - hence an empty result set.
>>>> This is completely different from how I would do it, which would be to
>>>> retrieve the date, extract the Day, Month and Year components and
>>>> compare their values with those specified in the QBE (which I would
>>>> also convert to shorts in the process). In other words no pattern
>>>> matching occurs which removes the need for display formatting
>>>> information. However, they did not do this as it might have been too
>>>> expensive in terms of the code required (we only have 640K remember).
>>>> So, this means that Paradox really is doing pattern matching when a ..
>>>> operator is present. This may also be true for other types of fields
>>>> as well, in fact I would put money on it.
>>>> But do not forget that Dan also states that 5/9/89 does return the
>>>> correct result set without needing the leading zero. In this scenario,
>>>> I believe that Paradox is doing binary matching and so the display
>>>> format never comes into play.
>>>> Therefore, and I think this is what Bernie shows with his results, QBE
>>>> expressions get evaluated using an entirely different algorithm when
>>>> .. is present and that is likely always as a string and possibly for
>>>> any field types.
>>>> I also think I can explain the 1+ and +1 before and after the pattern
>>>> as well.
>>>> 1 + Date pattern should be able to be evaluated like Date Pattern + 1
>>>> does. Instead we get the Wrong type error.
>>>> Upon reflection I think this is due simply how they parse the Date
>>>> expression to extract out the date pattern itself. So they are parsing
>>>> left to right in a single pass and then when they hit a non-date
>>>> operator *after* the *date* expression/pattern they then syntax check
>>>> it probably to make sure they got a valid pattern.
>>>> At this point we get the Wrong Type error (but Syntax Error might have
>>>> been a better message).
>>>> Operators after the date do get evaluated correctly as they have
>>>> already processed/evaluated the pattern match and so they can then
>>>> apply date arithmetic as they continue on their merry way parsing left
>>>> to right.
>>>> A bit of a mess and I would need to test this theory out to be sure,
>>>> but what it comes down to is that some things could be done for cheap
>>>> or even free in terms of the coding, but others could require
>>>> something more complex such as multiple passes, which would require
>>>> more memory.
>>>> So, fancy stuff got dropped and this was all probably documented. We
>>>> just don't have that documentation.
>>>> So to conclude, I think that once the wildcard operators are present
>>>> Paradox then does string matching and this can cause strange results
>>>> to be produced as Bernie reports.
>>>> It makes sense (to me at least).
>>>> This email has been checked for viruses by AVG.