i author of r package (clustermq
0) distributes function calls on hpc schedulers using using zeromq bindings (rzmq
). have used simple combination of req
/rep
sockets, workers requesting first common data tasks (the function call , constant arguments), , data each call should evaluate master. worked far, because running computations order of magnitude slower sending , receiving data.
one issue common data can have several hundred mb in size, while iterated data small. can happen master busy sending huge chunk of common data , can not send iterated data @ same time. because of this, there noticeable delay when starting distributed computation.
however, may not caused actual sending rather preparing message. documentation states:
zeromq not send message (single or multipart) right away, @ indeterminate later time.
so i'm wondering:
- is zeromq sending out data put in queue
send()
1 after other or in parallel?1 make difference or negligible? there way influence this?- as far understand, switching
rep
router
here not change anything.2 correct? - if it's serial might want separate data slow , fast sockets
- as far understand, switching
- is main delay caused happens before, i.e. copying big chunks of memory creating message object?3 (i
serialize
once)- in case want interface zeromq message objects without copying
note i'm looking answer design rationale of zeromq , not comment can benchmark.
some clarifications below:
0 not meant implemented in theoretically efficient way, rather using functions rzmq
provides. goal improve upon packages store on nas , retrieve there (which quite low bar). side project , not systems engineer (and i'm not proficient in low-level zeromq). benchmarking both overhead , real-world (a.k.a. actual work) examples, hasn't made docs yet.
1 assume cases (tcp): 1 rep
master , n req
clients; 1 router
master , n req
clients; push
/pull
alternative approach. there way interface apart using different sockets (probably not high-level bindings rzmq
pointing me relevant low-level documentation help; haven't found information in user guide)
2 mean if connect req
clients router
master, manage envelopes myself (and have send id , empty frame manually), not change code zeromq uses under hood send messages. or it? documented? (i couldn't find in user guide)
3 valid answer bottlenecks memory copy initializing message in main thread , sending messages 1 client after other in separate thread, not blocking main (if case, or whatever happens message)
1 ) showing zero-code means answer may @ high-level
the trailing note:
note i'm looking answer design rationale of zeromq , not comment can benchmark.
did not either.
so, let's start point after point:
is zeromq sending ... one after other or in parallel?
- zeromq
context
-instance master answer this. depends on how code has instantiated data-pumping engines. zero-code posted, no 1 can tell either or.
would make difference or is negligible?
- be sure makes difference, big one.
is there way influence this?
- yes, there several ways influence this. depends on code. depends on advertised hpc/cluster project end-to-end architecture. far experience goes, there no universal one-size-fits-all or cheap ( or free ) magic-wand. best use project pool of indepth knowledge real-time system scheduling ( , benchmark, benchmark, benchmark -- if want withhold git-posted promise of superior performance package ought both achieve in tests , sustain exhibit in real-world deployments ).
switching rep router
here not change anything.
- this mixed part. repeatedly advocate avoid in professional grade system naive use of
req/rep
, because of it's un-avoidable intrinsic affinity of falling principal, un-salvageable mutual deadlock ( may read other posts, warning quite presented , explained in colourful detailss )
is correct?
- no 1 serious ever tell without posting architecture, implementation rationale , code itself. is 42 correct or not? knows?!? ( sure, except mice and, maybe, marvin. ( all relevant facts , details found in hitchhiker's guide -- idea borrowed there ) )
is main delay caused happens before, i.e. copying big chunks of memory creating message object? ( serialize
once )
- the answer ( using probabilistic view ) 100% hidden in code. zeromq
context
, if configured properly, not add remarkable delay on own. process documented in zeromq api documentation, if 1 tries marshall 1kb, 1 mb or "several hundreds mb" blob.send()
-method, 1 ought know pretty reasons doing in his/her own way.
in case i want interface zeromq message objects without copying
- well, preferred way how dispatch data inside zeromq. warned, zero-copy maxim not cover o/s-kernel data-buffers manipulations, serious project plan ought account realistic operations ( quantum entanglement mass-less instant signalling @ infinite distance in zero-time or teleportation not work in our current o/s-kernels, rather bear in mind current known silicon , hardware principles )
Comments
Post a Comment