[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [MiNT] State of the Union
Evan K. Langlois wrote:
Frank Naaumann wrote:
Ah, I see. Personally I wanted to go into the kqueue() direction as 
introduced and used by the BSD systems. This solution looks very easy to 
use and is extremly flexible and extensible. It's basically a 
filedescriptor and you add a list of events (determined by id and a 
filter) you want to wait for.
epoll() is about the same, only for Linux.
epoll() only works for filedescriptors and is not extensible. kqueue() 
works for a variety of other things, such as signals and filesystem 
events, and has a well-defined extension mechanism.
Howard Chu said:
Requiring use of Fnctl() or any other system call to alter the set of
events imposes too much overhead. On servers where the set of
descriptors of interest changes frequently, this kind of approach is a
serious loser.
Well, most of the studies I've seen and all the benchmarks and test 
applications all show that kqueue() and epoll() perform significantly 
better than select() and poll() when you have large numbers of file 
descriptors, which is the exact opposite of what you claim.
I've seen those studies too. They had me convinced enough to go 
implement epoll() for OpenLDAP. Too bad their results were not 
applicable. They only model the behavior of a web server, like Apache. 
They are not general enough to represent all the types of client/server 
interaction one encounters on the internet.
Perhaps you 
should find a method where you aren't constantly changing the set of 
descriptors!
Don't be stupid. It is possible to design an event mechanism that 
efficiently handles all classes of application load, as I have 
described. Not doing so, and forcing all applications to work only one 
way, is ridiculous.
I believe kqueue() and epoll() both have some quirks, but 
perform similarly for more file descriptors than any MiNT program is 
likely to see .. ie .. this won't be the bottleneck of the application.
I agree that the class of problems any MiNT program is likely to see is 
probably different from the problem I was faced with. In that regard, 
I'm willing to let most of this discussion drop.
Also, under Linux the trip across the user/kernel barrier is nothing.  
Nonsense. I have the benchmarks that prove this statement to be false.
The problem, as I see it, is that if you notify the OS of what file 
handles you want, you are indeed taking a slight performance hit for the 
extra OS call, but you are only going to be doing this once per new file 
handle.  The most common case where this is critical is in web servers, 
so this is done when a new socket is opened, and another call is done 
when its closed.  The OS knows exactly what files its watching the rest 
of the time.  This information stays static between calls.
Again, it's not up to you to dictate the connection behavior of a system 
/ protocol.
By comparison, something like select() can have a different set of file 
descriptors for each and every call to select().  This means lots of 
internal housekeeping by the kernel for every select() call.   If you 
have 30 applications that call select() you aren't going to check 30 
bitmaps to see if that application is waiting on that event when it 
happens.  When the IO comes in, you'll need to know exactly which 
application(s) need to be woken up.  I don't see parsing a bitmap of 
8000 bits x 30 applications to set this up, each and every call to 
select().
I'm not saying that select is perfect, otherwise I wouldn't have 
bothered to write my equeue() description.
Now, an mmap() of a shared region sounds nice, but .. under MiNT() ?  I 
don't see it being a good solution for the Atari, and if you look at 
Fselect(), you are passing pointers anyway.  GEMDOS can write the data 
directly to the structure.  Only a pointer is passed.  The same with 
epoll().  There isn't a big copy problem here.  You can't just change 
the shared memory region and hope that the kernel will reflect the 
changes.  It doesn't know you changed something, and its NOT going to 
check your table every time an IO event happens.
You're not paying attention. Here's how it works:
  you provide a set of descriptors of interest to the kernel in a table 
of event structures.
  the kernel creates pointers from its internal file structures that 
point back to copies of your event structures.
  when an event happens on the descriptor, the kernel checks to see if 
the event pointer is set, and if so, it examines the event structure and 
performs whatever event wakeup is required.
So far, that's exactly what kqueue/epoll do. Now the added feature in 
equeue:
  the event structure is directly accessible from user and kernel 
context, it is not copied by the kernel. The user can set a MASK bit in 
the event structure at any time. If an event happens on a descriptor, 
and the kernel checks the event pointer, and sees the MASK bit set, it 
ignores the event. Like I said, this is similar to the sigprocmask concept.
Thus, you can change things in the shared memory region and they will 
take effect directly without requiring a system call.
 >From Howard Chu's link he posted:
Another point where select (and poll) wins is that there is a fast 
mapping from the input set to the result set - i.e., if you want to 
know "did event #5 occur?" you can find out in constant time, because 
it's just a fixed bitfield lookup. For all the other mechanisms that 
either return events one at a time or in a flat list, you have to 
iterate thru the list to look for "event #5". They have thus taken the 
linear search that the kernel does for select and kicked it out into 
userland, thus perceiving a great savings in kernel CPU time but not 
really improving life for the application developer. There is an 
obvious way to solve both of these problems with no additional cost - 
do both.
Select and poll() never win, unless you are writing your program in 
reverse.  Maybe you are missing out in how an event driven program 
works.
Once again, you're assuming that there's only one way to do things, and 
that your way is the right way for all purposes. Anybody who thinks that 
is pretty much always wrong.
I don't want to check if something happened on file descriptor 
#5.  I NEVER have to iterate through the returned structures looking for 
descriptor #5.  This is where your logic breaks and why the existing 
stuff doesn't work for you.  Userland isn't doing any linear searches!
Maybe you don't use prioritized events, but other applications do.
If you want to do priorities instead of FIFO, then you can easily read 
the events, grab a priority number from your passed structure, and drop 
the pointer into the proper priority queue.  You can then service your 
priority queue directly, or have a separate thread do it (with the right 
locking).
Again, that introduces more unnecessary overhead. Your approach requires 
a complete roundtrip through the event list to find the priority events 
and place them on the queue, then a second roundtrip through the list to 
handle the normal events. Such an approach will never scale. This is 
exactly what I meant about pushing the linear search overhead into userland.
Having a table of indirect indexes just creates more record 
keeping.   I'm not exactly sure how this indirection idea allows you to 
do priority ... do you sort the returned list or something so you can 
handle the lower indexes first? 
No. I have an array of event structures, each element in the array 
corresponds to a descriptor of interest. My app happens to know that 
descriptor 5 is "special" and it happens to know that descriptor #5 is 
slot #3 in the array of structures. When the event call returns, I can 
immediately check slot #3 and test the flags to see whether an event was 
triggered. Once that's handled, I go through the list of offsets 
handling each remaining event in turn.
 >From Howard Chu's link he posted:
This approach completely eliminates any copying of event
records/descriptor lists between user/kernel space, so constructing
event lists and modifying them is essentially zero-cost. Also, since the
array of events remains in place, the application can inspect higher
priority events of interest by direct reference.
When you say that constructing event lists is zero cost, do you actually 
mean to say that the kernel works with this memory mapped data 
directly?
Yes.
How does the kernel know when you change something if you 
don't make an OS call to tell it?   Do you expect the kernel to scan 
this shared memory region constantly?
No, that would be stupid. You perform one system call when you want to 
wait on events. As part of that system call, you can also indicate 
whether the event table has changed. This is all explained in my equeue 
writeup.
It's OK to be opinionated, but better to have well-founded facts on 
which those opinions are based. It doesn't sound to me like you've ever 
written an application / server that handles thousands of clients and 
hundreds of thousands of operations per second. I have, many times.
I'm interested in what other ideas you have on solving the "unified 
event loop" situation.  I've already pitched one idea, but not yet heard 
specifically where the problems are that need to be redesigned.
For example, should AES events come through a GEMDOS handle (as I 
mentioned), or should GEMDOS file handles be a special form of AES event 
(the expanded evnt_multi approach), or should they be on equal ground, 
both coming from a more flexible interface?
Don't be like Sun and forget about signals. epoll() certainly is an 
improvement for some class of applications on Linux, but without a 
totally integrated kitchen-sink approach like kqueue/equeue you're going 
to go through all this churn and still have an unsolved event management 
problem.
--
  -- Howard Chu
  Chief Architect, Symas Corp.       Director, Highland Sun
  http://www.symas.com               http://highlandsun.com/hyc
  Symas: Premier OpenSource Development and Support