Dan, As background ... Retrieving the model/schema is one of the basic operations of CIM - just like getting a property value.  Also, changing units between minor releases of the schema is not allowed. 
 
Andrea


From: owner-nm-wg@ggf.org [mailto:owner-nm-wg@ggf.org] On Behalf Of Dan Gunter
Sent: Wednesday, September 28, 2005 9:01 AM
To: nm-wg@ggf.org
Subject: Re: [nm-wg] Specifying units

Andrea Westerinen (andreaw) wrote:
In the DMTF, we address the units issue by defining it in the
meta-data/schema and requiring that implementations return the dictated
units (btw, percent is a type of "unit").  So, having the schema
definition provides the unit information.  This has required
implementations to convert kbps to bps, etc. but saves the client from
doing instance-level conversions.

  
Thanks for the input, Andrea, it is good to know how other groups are handling this issue.

My $0.02. To me, this is a somewhat XML-schema-centric view of the world. Other schema languages, e.g. Relax-NG schemas, do not modify the infoset (do not define default values for attributes). The disadvantage is potentially redundant information in each message. The advantage is that the converse of the above, that not having the schema means you don't have the unit information, is not a problem (also consider what happens when the units change between schema versions). Martin has also argued that whether the sender or receiver is the more natural (convenient and/or correct) place for unit conversion depends on the situation. To throw a spanner into the works, I would even argue that just because you have two measurements with the same units and value does not necessarily mean that you have, even approximately, the same result without considering issues of resolution, accuracy, and precision. Oh dear, I've said too much, I can hear the black helicopters circling overhead.. :-)

-Dan
Andrea

  
-----Original Message-----
From: owner-nm-wg@ggf.org [mailto:owner-nm-wg@ggf.org] On 
Behalf Of Martin Swany
Sent: Monday, September 26, 2005 3:54 PM
To: nm-wg@ggf.org
Subject: Re: [nm-wg] Specifying units

Hi Loukik,


    
It has come to our notice that the messages in v2 responses do not 
specify units while giving back measurement data (ex: Bandwidth 
Utilization and Capacity). Specifying such units is necessary and 
messages should be enhanced to support this.

      
I think that what you've actually saying is that the 
PerfSONAR prototype doesn't return units.  The NM-WG v2 schema dated
20050802 actually includes the units in almost every 
measurement and this has been in the schema for a long time.

We've talked a lot about the way to do it.  The current 
examples feature units in the datum element, but as you note, 
that wastes space.  Some of the examples also depict it as 
part of the metadata (which is what I've been mostly in favor 
of as the least offensive current option.)  There are  
definitely issues with that -- mainly that the way that the 
data is stored is really different than the way in which it 
is collected.

So, it can go in parameters, but it might be nice if it were 
able to be presented in the data section, but not in every datum.
We've discussed this one as well, and thus far there have 
been no good solutions (or none that we found generally workable.)


    
we could specify it just once..maybe like this:

<perfsonar:data dataUnits="bps">

      
This really requires the data element to be in a specific 
namespace or to become an omnibus for all the things that 
might be common in the enclosed datum elements.  There could 
be other numeric values in the datum that require unit 
information and we'd have to add support for each of them to data.

For example, from the current schema:
<iperf:datum interval="2.0-3.0 sec" numBytes="231"  
numBytesUnits="MBytes" value="1.94" valueUnits="MBytes/sec"/>

There are multiple numeric values that need unit qualification.


    
<perfsonar:data>
<perfsonar:units dataUnits="bps"/>

      
This one is even more thorny.  We referred to this as the 
"older sibling" model as it makes siblings dependent on 
order, and we decided to avoid that so we could use things 
like hashes.


    
or something else...

      
Something else is really where we are now.   What we
have discussed is a general way to "factor out" common 
attributes or elements from a set of datum elements.
CommonTime is an example of this, but a general mechanism 
would be nice.  I  proposed something before where an 
enclosed element in a datum could enclose a set of datum 
elements indicating that this value was common.  It was 
greeted with a mixture of animosity and indifference, often 
coexisting in the same person's reaction.

Actually, I think that the newly-discussed "bag of parameters"
might be a partial solution to this problem but it still 
doesn't help when the things common to a set of datums are 
complex and not simple attributes (like a time range.)


    
The second comment is: Choice of units.

After a chat with Jeff on this, I can list out two options

Jeff suggested that: Service uses the units that the data 
      
is already 
    
in (for ex. in rrd tool, data is in octets per second.
Hence service continues to provide data in octets per second) and 
continues to return data in the same units. However, the 
      
units in use 
    
should be clearly specified using any of the above suitable methods.

      
Data in RRDTool is not necessarily in octets per second, BTW. 
 Interface utilization data is generally fetched from an 
octet counter (just to be pedantic.)


    
An option that I would like to propose is usage of units used in 
common practice. For example, bandwidth, as known to me, is more 
commonly expressed in bps (and their factors).  A service 
      
should hence 
    
*reasonably* strive to use the units that are in common practice. 
Either way, specification of units using any of the 
      
suitable methods 
    
mentioned previously is absolutely necessary.

      
I agree that reasonably trying to use units in common 
practice is a good goal.  Many discussions over many years 
have led me to believe that in many cases, common practices 
aren't so common.
For instance bandwidth is in bits per second when talking 
about link capacity or sending bogus test data, but in often 
in bytes per
second when an application is using the data.   I think that trying to
mandate a "best practice" is a slippery slope. I vote for 
unambiguous specification and easy translation.


    
Nevertheless, if a service returns capacity and utilization in the 
same message, it would be nice to have them both in the same units 
(unlike the current case with Perfsonar prototype where capacity is 
bps and utilization is octets per second)

Question here is: Which option is ideal? Should we provide capacity 
and utilization in the same units in our prototype?

      
We can let everyone else weigh in, but if the units are 
specified (as  
they
can easily be) then why not just divide and convert?  That 
seems easier
to me than forcing one or the other into a less-than-natural format.

martin