[Thali-talk] How to handle disconnect for stories -1 and 0

Yaron Goland yarong at microsoft.com
Thu Jun 25 00:54:31 EDT 2015


Warning: This is to the public alias. Please respond here but just keep that in mind.



The good news is that I actually got Story -1 to pass on Android, more or less. I need to tweak it a bit but the fundamentals did work from my Samsung to my Nexus. I am having a problem with the Nexus talking to the Samsung (discovery works but when the Nexus tries to connect it gets an io error) but given how old the Samsung S4 running Jelly Bean is I don't care all that much. I'll be getting two new phones soon.



That having been said playing around with the code made me realize that I was wrong in my proposal to get rid of disconnect. The reason is that we are going to have all sorts of nasty race conditions and make life harder for our customers without it.



I think Jukka had the right logic all along.



The requirements would be:



The TCP client bridge defines a state model with the following states:

Bound

Unbound



When the connect function is called with a peer ID and a callback then a TCP client bridge instance MUST be created and bound to a local TCP/IP port. >From the moment the local TCP/IP port is bound the TCP client bridge associated with the submitted peer ID is in the bound state.



When the TCP client bridge enters the bound state it MUST call the associated connect callback with the local port it is bound to.



The TCP client bridge MUST form a local P2P connection either when it enters the bound state or when there is a connection to its local TCP/IP port. The exact choice is implementation dependent and based on how long it takes to establish a local P2P connection.



If the local P2P connection is lost while there are connections to the local TCP/IP port then the open sockets MUST all throw an exception and dispose of the connected sockets.



If the TCP client bridge loses the local P2P connection for any reason while in the bound state then it MUST either immediately reconnect over the local P2P transport to the associated peer ID or wait until there is a new connection to the TCP client bridge's local TCP/IP port (since, per the previous requirement, any existing TCP/IP connections would have been terminated with an exception).



So long as the TCP client bridge is in the bound state it MUST maintain control over the port it returned in the connect callback.



The TCP client bridge MAY restrict the number of incoming TCP/IP connections to the local port to 1 or more. If attempts are made to create more connections than can be supported by the local P2P connection then the additional TCP/IP connection requests MUST be rejected.



Note that the previous requirement effectively mandates the use of a semaphore or equivalent facility in the native code that can guarantee that at any instance only the maximum allowed number of connections and no more are allowed.



Open Issue: Although it's theoretically possible to run multiple simultaneous named streams using the iOS multi-peer framework I'm not convinced that is worth worrying about right now. As such unless we really need it, it seems to me that we should limit iOS the same way we are forced to limit Android (unless we want to invent our own mux layer) to 1 connection total. If, however, it turns out we need more connections then we will need a local function that the Node code can call to find out how many connections will be available. This is necessary for setting up the node.js level connection pools.



If a TCP/IP connection to the TCP client bridge local port is lost for any reason then future connections MUST be treated as completely brand new. In practice this means that if the local TCP/IP connection is lost then either the local P2P connection has to be torn down (to make sure all associated state is lost) or some kind of explicit clear signal has to be sent to make sure the remote server understands that the coming bytes form a new connection. The point of this is to prevent a situation where a client has a TCP/IP connection to the TCP client bridge, sends some bytes, loses the connection for some reason, reconnects and the server on the other side has no clue any of this happens and think that the client is just continuing the previous connection. This kind of behavior will lead to data corruption.



If the connection function is called on a peer ID with a TCP client bridge in the bound state then the callback MUST return an error.



If the disconnect function is called with a peer ID and a call back on a peer ID that is not associated with a TCP client bridge then the callback MUST return success.



If the disconnect function is called with a peer ID and a call back on a peer ID that is associated with a TCP client bridge in the bound state then the TCP client bridge MUST move to the unbound state, reject all future connections to the local TCP/IP port, throw an exception and close all existing connections to the local TCP/IP port and only once this is all done call the callback with success. Once the callback has been successfully delivered to JXCore the TCP client bridge MAY release its local TCP/IP port. It is considered best practice to hold onto the local TCP/IP port for a minute or two just to guarantee that there will be no confusion about the peer ID the port is associated with.



If the disconnect function is called with a peer ID and a call back on a peer ID that is associated with a TCP client bridge in the unbound state then the callback MUST be immediately executed with a success result.



Note, if necessary, we can do a two stage commit where the node code calls disconnect, the native code calls the call back and then the node code calls a native function confirming the receipt of the callback at the application level. But I'm hoping this isn't necessary and so not specifying it for now. It's easy to add later if we need it.



It's worth pointing out that if we agree with this proposal then I will use these requirements to validate and as necessary write tests to make sure we have the right behavior. This set of interfaces is at the very root of our local P2P stack. It needs to be rock solid. So there will need to be lots of tests to be sure.



               Thanks,



                                             Yaron
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://pairlist10.pair.net/pipermail/thali-talk/attachments/20150625/0b943825/attachment-0001.html>


More information about the Thali-talk mailing list