[saga-rg] Data Streams Requirements from Use Cases

15 Dec 2004

      This attempts to extract/summarize the requirements for the stream API 
defined by the various use cases.  I yanked all paragraphs that 
described the usage of remote sockets/streams from the various use 
cases and consolidated them in this document.  I then summarize the 
requirements.  Most of the requirements focus on simplifying the 
authentication/authorization for IP socket connections.  The security 
models that must be supported include GSI, Unicore, SSL, and WSRF.  The 
languages that must be supported are Fortran, C, C++, and native java 
(some indicated JNI was sufficient, but others seem to require native 
java).

--------------------------------

Relevant use cases
	UC-2 : DiVA
	UC-3 : DRMAA
	UC-4 : GridLab
	UC-5 : KoDaVis
	UC-7 : RealityGrid
	UC-9 : Visit
	UC-10 : VizService

--------------------------------

Summary of Requirements Extracted from these use cases
UC-2 : DiVA
	Usage: For adjusting parameters on remotely launched components
	Functional Requirements: Simplify Authentication and encryption of 
streams.
	Sec Model: SSL and/or GSI
	Stream Types: IP sockets
	Languages: C/C++ bindings (wrapped to provide TCL and Python bindings)
UC-3 : DRMAA
	Usage: Not certain (seems to have remote vis requirement)
	Functional Requirements: ?
	Security Model: ?
	Stream Types: IP sockets (probably)
	Languages:Java/C/C++/Fortran
UC-4 : GridLab
	Usage: Remote connect and steering of applications
	Functional Requirements: Authentication, tunnel through firewalls (not 
in use case though)
		substrate for higher level steering interface abstraction.
	Security Model: GSI ?
	Stream Type: IP sockets
	Languages: Java/C/C++/Fortran
UC-5 : KoDaVis
	Usage: Remote connect for visualization of large data.  
Collaborative/multiuser/syncrhonized vis
	Functional Requirements: Authentication and simpler socket/stream 
abstraction
	  Support for multiuser/collaborative interfaces
	  - login mechanism for connecting to data- and interaction-server
	  - data exchange (send, receive) for the scientific data
	  - data exchange to synchronize the collaborative, distributed session
	  - naming mechanism to identify objects which are shared
		between several visualization systems
	Security Model: unkown
	Stream Type: probably IP sockets
	Languages: Java/C++/C
UC-7 : RealityGrid
	Usage: Remote steering/visualization of running simulations
	Functional Requirements: Authentication (potentially encryption)
		substrate for higher-level parameter/steering interface.
	Security Model: GSI  (embedded in tools layer)
	Stream Type: IP sockets
	Languages: JNI-Java C++.  Instrumenting C/C++/Fortran codes
UC-9 : Visit
	Usage: Remote monitoring and visualization of running simulations
		using AVS/Express (ParView)
	Functional Requirements: Authentication
		multiple connect/disconnect of clients
	Security Model: Unicore
	Stream Type: IP sockets
	Languages: F90/C/C++  .  Interfaces to AVS/Express
UC-10 : VizService
	Usage: Remote connect and visualization for large scale simulations
	Functional Requirements: Authentication/authorization
		multiple connect/disconnect of multiple clients
		targeting a higher-level steering interface abstractions
	Security Model: Now Unicore and/or Globus as middleware. Later, any 
OGSA or
            		 WSRF implementation will do.
	Stream Type: IP sockets
	Lanaguages: Fortran/C/Java

=====================================================
Raw Text: Extracted all paragraphs relating to stream requirements
	for the different use cases.
=====================================================
=====================================================
UC-2 : DiVA
=====================================================
    Vis application developers are spending 80% of their time on 
wrangling
    transport, job launch, and security issues.  This is particularly
    difficult because most the issues involved in transport,
    job-launching,  and security are outside of the visualization
    application developer's  core expertise, so these tasks take a long
    time and the developers do a  poor job of it.

  f) The components will need to establish secure socket links where
    they  are not located in the same memory space.  In some contexts,
    this may  only be authenticated socket connections, but other
    contexts require  the sockets be encrypted as well.
  f) Connecting the Components Together: Once the components are
    "launched," the components that reside on different machines need to
    be  able to communicate with one another via network sockets.  This
    leads  to both security and performance considerations (discussed
    below).  Secure Sockets Abstraction:  The state-of-the-art use of
    sockets for  remote visualization depends on security practices that
    are largely  debunked as "unsafe" by our security professionals.
    Most distributed  visualization application developers presume that
    if components were  launched securely, the sockets that connect these
    components together  are also secure (who could guess the port number
    and startup time...  its security through obscurity).  This is
    clearly not safe, but the  level of effort necessary to move to
    simple authenticated sockets is  considerable (it is not practical to
    expect vis people to do this).  At  minimum, a new TCP socket must
    exchange an  "shared secret" in order to  make sure the connection is
    "authentic".  If there was a single call to "OpenAuthenticatedSocket
    (host,port)", then that would be very useful.

   Likewise, an
    "OpenEncryptedSocket ()" would be useful in other circumstances.  For
    instance, SGI's VizServer does not encrypt the socket used to convey
    your login & passwd to connect to the server (its  as bad as telnet).
    File directory listings probably should be encrypted.  This should
    just be targeted at a single TCP socket.  We can use GridFTP for
    performance, but we really need to get the control  sockets
    protected.

    Visualization application developers typically use C/C++ to implement
    applications.  However, Python, TCL, and Java are increasingly
    popular  for assembling applications that use components that are
    written in  native C or C++ (for instance VTK supports wrapping in
    any of those  bindings).

    The services should be accessible using C and C++ bindings as a
    baseline.  Those bindings can typically be wrapped so they are
    accessible in any of the above scripting languages.

   If we can, at
    minimum,  simplify the API calls required to establish an
    authenticated socket  and even make it simple to establish an
    encrypted socket (even for  low-bandwidth), that would fullfill that
    requirement.  Distributed  visualization applications need to meet
    the bare minimum security  standards (something that most currently
    do *not* do).

    In most of the scenarios we have constructed, the security needs
    primarily involve supporting proper authentication in order to
    connect  components together.  Authorization is primarily required if
    a  component is actually a persistent service that is not running as
    the  same UID as the user who constructed the workflow.  We try to
    set up  the bulk of DiVA valid use cases so that the components are
    running as  the user's UID in order to sidestep such cases as much as
    possible.

    The socket traffic for GUI events and component execution does not
    typically need to be encrypted, except in rare cases such as passing
    privileged information (encryption keys or passwords).

It is important that the security API interact nicely with SSH.  When
    GSI/PKI certs are not available, ssh works quite well.  While the
    single-sign-on abilities of GSI are extremely important, many have
    found that ssh combined with ssh-auth can do a reasonably good job of
    emulating the same ability if used as a last resort.  This should
    *not*  be ignored (this consideration should be taken into account
    when  designing these APIs).
  // support for simple authenticated TCP sockets
   SAGA_OpenAuthenticatedTCPServer (securityhandle, portnumber);
   SAGA_OpenAuthenticatedTCPSocket (securityhandle, host, port,
                                    passwdcallback);
                                    // where passwdcallback gets a passwd
                                    // if needed in order to  support
                                    // SSL/TLS security models.

   // support for simple encrypted TCP sockets
   SAGA_OpenEncryptedTCPServer (securityhandle, portnumber);
   SAGA_OpenEncryptedTCPClient (securityhandle, portnumber, 
passwdcallback);
=====================================================
UC-3 : DRMAA
=====================================================
    There were 120 remote computational nodes and few client nodes with
       good graphics support.

  The application was done in Fortran and C.
       Part of the distributed solution was implemented in Java.

   8.3 What are your security needs: authentication, authorisation,
       message protection, data protection, anonymisation, audit
       trail, or others?

   Authentication mostly.
=====================================================
UC-4 : GridLab
=====================================================
         - remote steering or monitoring systems
   8.3 What are your security needs: authentication, authorisation,
       message protection, data protection, anonymisation, audit
       trail, or others?

         authentication, authorisation, basic data protection
=====================================================
UC-5 : KoDaVis
=====================================================
Because the resulting dataset is very big,
(about 1GB) it cannot be stored locally but only at one or more central
servers.There is a demand to visualize this data to analyse it. To do 
this, the
visualization systems must be coupled to the data servers by a fast 
network
to get online access to the data.

All changes in the scene are synchronized online by the interaction 
server
which guarantees a consistent visualization to allow a well coordinated
collaborative work session.
It also could happen that an additional scientist will enter a running
session. In this case the actual scene has to be distributed to this new
client.

The visualization systems will load an initial scene, connect to the 
data-
servers and request a portion of the data, e.g. a time interval of 
interest
or data for a selection of chemical tracers.

The data which is send over the network is sensitive.
There should be access restrictions and authentication.

8.3 What are your security needs: authentication, authorisation,
       message protection, data protection, anonymisation, audit
       trail, or others?

   Authentication.

   8.4 What are the most important issues which would simplify your
       security solution?  Simple API, simple deployment, integration
       with commodity technologies.

An api which provides a reliable certification mechanism to 
authenticate a
user who connects to the system.

   What are the things which are important to scalability and to what
   scale - compute resources, data, networks ?

Network is very important to scale up the number of possible 
participants
of a collaborative visualization session.

No details can be given now because the project is in an early stage of
development.
The API should have the following functionality:
- login mechanism for connecting to data- and interaction-server
- data exchange (send, receive) for the scientific data
- data exchange to synchronize the collaborative, distributed session
- naming mechanism to identify objects which are shared between several
   visualization systems

=====================================================
UC-7 : RealityGrid
=====================================================
The  use case common to all applications is the "computational
    steering" of the code. However "computational steering" is a broad
    term and incorporates many ideas.
    At the simplest level, computational steering involves the ability to
    monitor the evolution of the system under study, and to manipulate
    parameters that affect the system's behaviour.

  f. Visualize one or more aspects (e.g. physical fields) of the running
      simulation - on-line visualization
   m. Realtime analysis of live streams of data emitted from a simulation
      (this is a generalization of the concurrent visualization 
requirement
      in (f)).

  Steering clients exist in several flavours:
   * The Qt/C++ GUI for workstations uses the client-side functions of
     the RealityGrid Steering Library.
   * A .NET client suitable for PDAs is tooled against the WSDL
     description of the RealityGrid Steering Grid Service and
     Service Registry.
   * A Java client packaged as a GridSphere Portlet is tooled
     against WSDL.
   * A Java client in the ICENI framework has been built against
     the Steering Library using JNI.

       GSI (but this is not built into the application, but into the
       tools layer).

	Security of the middle-tier services used for
       computational steering is currently lacking, due largely to the
       absence of a standardised security model that supports delegation
       in a Web service world.  Message and data protection matter, but
       are not urgent (however some industrial collaborators take a
       different view).

=====================================================
UC-9 : Visit
=====================================================
The progress of the running simulation
   can be monitored by ParView, an AVS/Express based application that
   allows an online-visualization of the coupled simulation. Among the
   steering capabilities of ParView are the ability to insert solutants
   into running simulations and the ability to select 3D-points in the
   simulated area for which more detailed data analysis and recording
   is required (so called break-through curves - BTC).

visualization can be dynamic, attachment to a simulation from
   different systems with different capabilities (performance,
   network bandwidth)

  The simulation establishes a network connection to an external
   application (the visualization/steering application) and is
   controlled by it. Therefore the steerer needs to authenticate
   properly. Besides that, there are no special security
   considerations.

   8.3 What are your security needs: authentication, authorisation,
       message protection, data protection, anonymisation, audit
       trail, or others?

   authentication and authorisation of the steerer.

Here an example of the simulation (client) API of VISIT:

   /* attach to a visualization, using a service-name and a password */
   vcd = visit_connect_seap(servicename, password, timeout);

   /* send some data */
   visit_send(vcd, tag, timestanp, data, datatype, dimensions);

   /* receive some data */
   visit_recv(vcd, tag, ×tamp, data, &datatype, &dimensions);

   /* detach from visualization */
   visit_disconnect(vcd);

*********server API
  /* init connection for application 'trace' */
   lvisit_trace_init();

   while (SimTime) {

     /* test connection, open a new one if necessary  */
     lvisit_trace_check_connection();

     /* receive parameters from visualization in application structure 
'parm' */
     lvisit_trace_parm_recv(&parm)

     /* send 3d vector data 'velo' (which is distributed) to 
visualization */
     lvisit_trace_velo_send(&velo,nx,ny,nz,3);
   }

   /* close the connection */
   lvisit_trace_close()

=====================================================
UC-10 : VizService
=====================================================
This API serves to connect two remote
    systems client and server by the grid medium and offers the client a
    set of tools to perform visualization of large-scale simulations. The
    visualization is based on images transferred from the server to the
    client and visualization controls sent by the client to the server.

- Send visualization control commands and receive a sequence of images
     back (streaming capability). [common]
   - Steer the simulation/application (pause, stop, resume, etc).
     [concurent]
   - Plug-in or Plug-out from a running simulation without affecting it.
     [concurent]
   - Real-time tracking of simulation parameters. [concurent]

   // Generate an instance of Concurrent service.
   VisualizationService cvs = VisualizationServiceFactory.newInstance
                                (CONCURRENT_SERVICE, ...);
   // Get the visulization streeming. Images or image sections are
   // streamed on an input stream.
   InputStream vizInput = cvs.getInputStream ();

[saga-rg] Data Streams Requirements from Use Cases

John Shalf