[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [MiNT] State of the Union



Evan K. Langlois wrote:

Frank Naaumann wrote:

Ah, I see. Personally I wanted to go into the kqueue() direction as introduced and used by the BSD systems. This solution looks very easy to use and is extremly flexible and extensible. It's basically a filedescriptor and you add a list of events (determined by id and a filter) you want to wait for.

epoll() is about the same, only for Linux.

epoll() only works for filedescriptors and is not extensible. kqueue() works for a variety of other things, such as signals and filesystem events, and has a well-defined extension mechanism.

Howard Chu said:

Requiring use of Fnctl() or any other system call to alter the set of
events imposes too much overhead. On servers where the set of
descriptors of interest changes frequently, this kind of approach is a
serious loser.

Well, most of the studies I've seen and all the benchmarks and test applications all show that kqueue() and epoll() perform significantly better than select() and poll() when you have large numbers of file descriptors, which is the exact opposite of what you claim.

I've seen those studies too. They had me convinced enough to go implement epoll() for OpenLDAP. Too bad their results were not applicable. They only model the behavior of a web server, like Apache. They are not general enough to represent all the types of client/server interaction one encounters on the internet.

Perhaps you should find a method where you aren't constantly changing the set of descriptors!

Don't be stupid. It is possible to design an event mechanism that efficiently handles all classes of application load, as I have described. Not doing so, and forcing all applications to work only one way, is ridiculous.

I believe kqueue() and epoll() both have some quirks, but perform similarly for more file descriptors than any MiNT program is likely to see .. ie .. this won't be the bottleneck of the application.

I agree that the class of problems any MiNT program is likely to see is probably different from the problem I was faced with. In that regard, I'm willing to let most of this discussion drop.

Also, under Linux the trip across the user/kernel barrier is nothing.

Nonsense. I have the benchmarks that prove this statement to be false.

The problem, as I see it, is that if you notify the OS of what file handles you want, you are indeed taking a slight performance hit for the extra OS call, but you are only going to be doing this once per new file handle. The most common case where this is critical is in web servers, so this is done when a new socket is opened, and another call is done when its closed. The OS knows exactly what files its watching the rest of the time. This information stays static between calls.

Again, it's not up to you to dictate the connection behavior of a system / protocol.

By comparison, something like select() can have a different set of file descriptors for each and every call to select(). This means lots of internal housekeeping by the kernel for every select() call. If you have 30 applications that call select() you aren't going to check 30 bitmaps to see if that application is waiting on that event when it happens. When the IO comes in, you'll need to know exactly which application(s) need to be woken up. I don't see parsing a bitmap of 8000 bits x 30 applications to set this up, each and every call to select().

I'm not saying that select is perfect, otherwise I wouldn't have bothered to write my equeue() description.

Now, an mmap() of a shared region sounds nice, but .. under MiNT() ? I don't see it being a good solution for the Atari, and if you look at Fselect(), you are passing pointers anyway. GEMDOS can write the data directly to the structure. Only a pointer is passed. The same with epoll(). There isn't a big copy problem here. You can't just change the shared memory region and hope that the kernel will reflect the changes. It doesn't know you changed something, and its NOT going to check your table every time an IO event happens.

You're not paying attention. Here's how it works:
you provide a set of descriptors of interest to the kernel in a table of event structures. the kernel creates pointers from its internal file structures that point back to copies of your event structures. when an event happens on the descriptor, the kernel checks to see if the event pointer is set, and if so, it examines the event structure and performs whatever event wakeup is required.

So far, that's exactly what kqueue/epoll do. Now the added feature in equeue: the event structure is directly accessible from user and kernel context, it is not copied by the kernel. The user can set a MASK bit in the event structure at any time. If an event happens on a descriptor, and the kernel checks the event pointer, and sees the MASK bit set, it ignores the event. Like I said, this is similar to the sigprocmask concept.

Thus, you can change things in the shared memory region and they will take effect directly without requiring a system call.

 >From Howard Chu's link he posted:

Another point where select (and poll) wins is that there is a fast mapping from the input set to the result set - i.e., if you want to know "did event #5 occur?" you can find out in constant time, because it's just a fixed bitfield lookup. For all the other mechanisms that either return events one at a time or in a flat list, you have to iterate thru the list to look for "event #5". They have thus taken the linear search that the kernel does for select and kicked it out into userland, thus perceiving a great savings in kernel CPU time but not really improving life for the application developer. There is an obvious way to solve both of these problems with no additional cost - do both.

Select and poll() never win, unless you are writing your program in reverse. Maybe you are missing out in how an event driven program works.

Once again, you're assuming that there's only one way to do things, and that your way is the right way for all purposes. Anybody who thinks that is pretty much always wrong.

I don't want to check if something happened on file descriptor #5. I NEVER have to iterate through the returned structures looking for descriptor #5. This is where your logic breaks and why the existing stuff doesn't work for you. Userland isn't doing any linear searches!

Maybe you don't use prioritized events, but other applications do.

If you want to do priorities instead of FIFO, then you can easily read the events, grab a priority number from your passed structure, and drop the pointer into the proper priority queue. You can then service your priority queue directly, or have a separate thread do it (with the right locking).

Again, that introduces more unnecessary overhead. Your approach requires a complete roundtrip through the event list to find the priority events and place them on the queue, then a second roundtrip through the list to handle the normal events. Such an approach will never scale. This is exactly what I meant about pushing the linear search overhead into userland.

Having a table of indirect indexes just creates more record keeping. I'm not exactly sure how this indirection idea allows you to do priority ... do you sort the returned list or something so you can handle the lower indexes first?

No. I have an array of event structures, each element in the array corresponds to a descriptor of interest. My app happens to know that descriptor 5 is "special" and it happens to know that descriptor #5 is slot #3 in the array of structures. When the event call returns, I can immediately check slot #3 and test the flags to see whether an event was triggered. Once that's handled, I go through the list of offsets handling each remaining event in turn.

 >From Howard Chu's link he posted:

This approach completely eliminates any copying of event
records/descriptor lists between user/kernel space, so constructing
event lists and modifying them is essentially zero-cost. Also, since the
array of events remains in place, the application can inspect higher
priority events of interest by direct reference.

When you say that constructing event lists is zero cost, do you actually mean to say that the kernel works with this memory mapped data directly?

Yes.

How does the kernel know when you change something if you don't make an OS call to tell it? Do you expect the kernel to scan this shared memory region constantly?

No, that would be stupid. You perform one system call when you want to wait on events. As part of that system call, you can also indicate whether the event table has changed. This is all explained in my equeue writeup.

It's OK to be opinionated, but better to have well-founded facts on which those opinions are based. It doesn't sound to me like you've ever written an application / server that handles thousands of clients and hundreds of thousands of operations per second. I have, many times.

I'm interested in what other ideas you have on solving the "unified event loop" situation. I've already pitched one idea, but not yet heard specifically where the problems are that need to be redesigned.

For example, should AES events come through a GEMDOS handle (as I mentioned), or should GEMDOS file handles be a special form of AES event (the expanded evnt_multi approach), or should they be on equal ground, both coming from a more flexible interface?

Don't be like Sun and forget about signals. epoll() certainly is an improvement for some class of applications on Linux, but without a totally integrated kitchen-sink approach like kqueue/equeue you're going to go through all this churn and still have an unsolved event management problem.

--
  -- Howard Chu
  Chief Architect, Symas Corp.       Director, Highland Sun
  http://www.symas.com               http://highlandsun.com/hyc
  Symas: Premier OpenSource Development and Support