Re: [glue-wg] When is data stale?

21 Apr 2015

      Hi Florido,

Thanks for your reply; my comments below.

On 21/04/15 11:53, Florido Paganelli wrote:
...
I also have the feeling the discussion is becoming a bit sterile. We can
make the GLUE2 spec better but I hardly understand how Paul definitions
without using the actual terms we want to define could help.
Sorry, it was meant only as an aide towards writing good descriptions. 
It's certainly not a requirement.
...
On 2015-04-20 19:46, Paul Millar wrote:
...
Put another way, the concept of 'information being created' is too loose
a term: it could mean almost anything, so defines nothing.
Well, this is a rhetorical game and not a scientific discussion anymore
IMHO. I understand you want a definition out of the practical
implementation, and since you seem to like riddles, I will avoid the
words creation and time (at this point a mere exercise of wording)
here it is:
The CreationTime is the number of seconds elapsed since the Epoch
(00:00:00 Coordinated Universal Time (UTC), Thursday, 1 January 1970)
formatted as described in the GLUE2 document when BOTH these two are true:
1) the GLUE2 record for a GLUE2 entity is being generated
2) the data contained in the record, that is, the data that describes
the entity the record refers to, is being collected.
Great, thanks for taking the time to define this.
...
I see no fallacy nor circularity. It's a definition. It does
NOT require the knowledge of provider, resource- whatever-BDII
Yes, absolutely.
...
Of course, if you want to be really picky there is a time drift between
1) and 2) because a Turing machine is sequential. But we can avoid this
discussion I hope...
Certainly, despite evidence to the contrary, I don't want to nitpick.

Now, I believe your definition also applies to a site-level BDII.  When 
it refreshes information, it generates a new record and populates this 
with information it collects from the resource-level BDII.  Conditions 
1) and 2) are satisfied, so the site-level BDII may set CreationTime.

There's a (translational?) symmetry between a site-level BDII fetching 
information from resource-level BDIIs, and a resource-level BDII 
fetching information from info-providers.

Having said that, the problem only appears in hierarchical systems, like 
BDII.  So, perhaps having a hierarchical profile document would be a 
better way of solving this.
...
I can provide a similar definition for Validity if you like... but I
will shift to Stephen's suggestion that this is community-driven, but
it's not because of the model, it's because what is "Valid" is community
driven, and by experience I can tell it will be even if you try to
define it otherwise!
I guess it's unclear to me what should happen if CreationTime+Validity 
is in the past.  From what others have said, it seem we make no claims 
what this means; the client must decide.

My naïve thinking was that, if information is updated periodically and 
CreationTime+Validity is in the past then the data should be considered 
"stale" as it should have been updated by now.
...
Maybe the only real outcome of this discussion is Jens' comment that
'Validity' was a bad name! :D
Yeah, I think that's true!

[..]
...
...
To me, this points to a deficiency in GLUE 2.
I do not see the needs to describing it in the model. One describes that
in an implementation of a hierarchical information system (today only
BDII and maybe EMIR, which nobody uses)
Otherwise we need a model that takes into account hierarchical
propagation of information (as mentioned before, an aggregation model)
But for me having the above in the GLUE2 model sounds like if physicist
should describe the Standard Model in terms of the pieces of paper,
emails, research papers, people, historical events needed to describe
the physics in it...
:-D

OK, perhaps this could be in a separate document (a profile?) that 
describes a hierarchical GLUE system?  That could refine concepts, like 
CreationTime, describe how aggregation happens, etc.

This would avoid "polluting" GLUE-2 base document with these 
hierarchy-specific issues.
...
...
...
[...]
I don't see that this is different to any attribute - what you
publish needs to be driven by the use cases. It wouldn't be
especially difficult to publish a different Validity for each object
type, or even for e.g. different batch systems, but unless you have
something to specify the use there's nothing to motivate such a
varying choice.
My use-case was what you might expect: allowing detection of a
particular failure mode.  Specifically, the information publishing "got
stuck" at one site.  The details don't matter, but the result was old
("stale") data continued to be re-published.
In ARC, we decided long time ago that the information system should NOT
be used as a monitor for the information system itself. If one does that
it does it at his own risk; the reason lies behind the fact that the
information system is more like a business card. It presents services to
users. It might fake some of the information to please the
users/communities needs, or to hide faults
in the system in a way that the overall system still works (and this is
what actually happens!)
Using the information system as a monitoring tool requires a different
approach, namely, the information system itself must be able to
self-diagnose. Apart from the philosophical question if this is even
possible, for ARC this is difficult because the information system is
part of/triggered by other parts of the middleware: if the middleware
dies the infosys dies with it. This is not up to GLUE2 to define, and is
not part of most current architectures, and to me it indicates that
proper monitoring should be done with third party tools. As a matter of
fact that claim applies to most software.
So if you want to know if the information publishing "got stuck" you'd
better be a good sysadmin and use a decent process monitoring tool, let
it be Nagios or a simple cronjob that sends emails...
As with all things: hindsight is 20-20 and failure modes oft choose the 
gaps in monitoring.

In this particular case, the "mechanical" refresh process was working 
correctly, with the site-level BDII fetching data correctly.  Direct 
monitoring of BDII/LDAP object creation time (the built-in 
'createTimestamp' attribute) would not have revealed any problem.

Publishing CreationTime and Validity (with the semantics of 
now()>CreationTime+Validity => problem) would have allowed a script to 
detect the problem.

This isn't to say this is the only way of achieving this, nor that it is 
necessarily the best way; however, it did seem to fit with the idea of 
CreationTime and Validity.

Publishing just the CreationTime allows a script to detect the problem, 
provided it happens to know the refresh period.  Although this is less 
idea, it's probably the best I can do, given everyone else feels 
Validity has a different meaning.

Cheers,

Paul.