[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [MiNT] State of the Union

To: "Evan K. Langlois" <Evan@CoolRunningConcepts.com>
Subject: Re: [MiNT] State of the Union
From: Howard Chu <hyc@highlandsun.com>
Date: Wed, 16 Mar 2005 10:38:00 -0800
Cc: mint@fishpool.com
Delivered-to: fnaumann@mail.boerde.de
In-reply-to: <1110983111.11997.89.camel@taro.coolrunningconcepts.com>
List-help: <mailto:ecartis@lists.fishpool.fi?Subject=help>
List-id: <mint.lists.fishpool.fi>
List-unsubscribe: <mailto:mint-request@lists.fishpool.fi?Subject=unsubscribe>
References: <20050314031912.r4p1ykehn44googk@coolrunningconcepts.com> <Pine.NEB.4.62.0503151021460.421@wh58-508.st.uni-magdeburg.de> <1110945609.9126.54.camel@taro.coolrunningconcepts.com> <Pine.NEB.4.62.0503160911130.421@wh58-508.st.uni-magdeburg.de>
Sender: mint-bounce@lists.fishpool.fi
User-agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8a6) Gecko/20050111

Evan K. Langlois wrote:

Frank Naaumann wrote:
Ah, I see. Personally I wanted to go into the kqueue() direction asintroduced and used by the BSD systems. This solution looks very easy touse and is extremly flexible and extensible. It's basically afiledescriptor and you add a list of events (determined by id and afilter) you want to wait for.

epoll() is about the same, only for Linux.

epoll() only works for filedescriptors and is not extensible. kqueue()works for a variety of other things, such as signals and filesystemevents, and has a well-defined extension mechanism.

Howard Chu said:

Requiring use of Fnctl() or any other system call to alter the set of
events imposes too much overhead. On servers where the set of
descriptors of interest changes frequently, this kind of approach is a
serious loser.

Well, most of the studies I've seen and all the benchmarks and testapplications all show that kqueue() and epoll() perform significantlybetter than select() and poll() when you have large numbers of filedescriptors, which is the exact opposite of what you claim.

I've seen those studies too. They had me convinced enough to goimplement epoll() for OpenLDAP. Too bad their results were notapplicable. They only model the behavior of a web server, like Apache.They are not general enough to represent all the types of client/serverinteraction one encounters on the internet.

Perhaps youshould find a method where you aren't constantly changing the set ofdescriptors!

Don't be stupid. It is possible to design an event mechanism thatefficiently handles all classes of application load, as I havedescribed. Not doing so, and forcing all applications to work only oneway, is ridiculous.

I believe kqueue() and epoll() both have some quirks, butperform similarly for more file descriptors than any MiNT program islikely to see .. ie .. this won't be the bottleneck of the application.

I agree that the class of problems any MiNT program is likely to see isprobably different from the problem I was faced with. In that regard,I'm willing to let most of this discussion drop.

Also, under Linux the trip across the user/kernel barrier is nothing.


Nonsense. I have the benchmarks that prove this statement to be false.

The problem, as I see it, is that if you notify the OS of what filehandles you want, you are indeed taking a slight performance hit for theextra OS call, but you are only going to be doing this once per new filehandle. The most common case where this is critical is in web servers,so this is done when a new socket is opened, and another call is donewhen its closed. The OS knows exactly what files its watching the restof the time. This information stays static between calls.

Again, it's not up to you to dictate the connection behavior of a system/ protocol.

By comparison, something like select() can have a different set of filedescriptors for each and every call to select(). This means lots ofinternal housekeeping by the kernel for every select() call. If youhave 30 applications that call select() you aren't going to check 30bitmaps to see if that application is waiting on that event when ithappens. When the IO comes in, you'll need to know exactly whichapplication(s) need to be woken up. I don't see parsing a bitmap of8000 bits x 30 applications to set this up, each and every call toselect().

I'm not saying that select is perfect, otherwise I wouldn't havebothered to write my equeue() description.

Now, an mmap() of a shared region sounds nice, but .. under MiNT() ? Idon't see it being a good solution for the Atari, and if you look atFselect(), you are passing pointers anyway. GEMDOS can write the datadirectly to the structure. Only a pointer is passed. The same withepoll(). There isn't a big copy problem here. You can't just changethe shared memory region and hope that the kernel will reflect thechanges. It doesn't know you changed something, and its NOT going tocheck your table every time an IO event happens.


You're not paying attention. Here's how it works:

you provide a set of descriptors of interest to the kernel in a tableof event structures.the kernel creates pointers from its internal file structures thatpoint back to copies of your event structures.when an event happens on the descriptor, the kernel checks to see ifthe event pointer is set, and if so, it examines the event structure andperforms whatever event wakeup is required.

So far, that's exactly what kqueue/epoll do. Now the added feature inequeue:the event structure is directly accessible from user and kernelcontext, it is not copied by the kernel. The user can set a MASK bit inthe event structure at any time. If an event happens on a descriptor,and the kernel checks the event pointer, and sees the MASK bit set, itignores the event. Like I said, this is similar to the sigprocmask concept.

Thus, you can change things in the shared memory region and they willtake effect directly without requiring a system call.

 >From Howard Chu's link he posted:
Another point where select (and poll) wins is that there is a fastmapping from the input set to the result set - i.e., if you want toknow "did event #5 occur?" you can find out in constant time, becauseit's just a fixed bitfield lookup. For all the other mechanisms thateither return events one at a time or in a flat list, you have toiterate thru the list to look for "event #5". They have thus taken thelinear search that the kernel does for select and kicked it out intouserland, thus perceiving a great savings in kernel CPU time but notreally improving life for the application developer. There is anobvious way to solve both of these problems with no additional cost -do both.

Select and poll() never win, unless you are writing your program inreverse. Maybe you are missing out in how an event driven programworks.

Once again, you're assuming that there's only one way to do things, andthat your way is the right way for all purposes. Anybody who thinks thatis pretty much always wrong.

I don't want to check if something happened on file descriptor#5. I NEVER have to iterate through the returned structures looking fordescriptor #5. This is where your logic breaks and why the existingstuff doesn't work for you. Userland isn't doing any linear searches!


Maybe you don't use prioritized events, but other applications do.

If you want to do priorities instead of FIFO, then you can easily readthe events, grab a priority number from your passed structure, and dropthe pointer into the proper priority queue. You can then service yourpriority queue directly, or have a separate thread do it (with the rightlocking).

Again, that introduces more unnecessary overhead. Your approach requiresa complete roundtrip through the event list to find the priority eventsand place them on the queue, then a second roundtrip through the list tohandle the normal events. Such an approach will never scale. This isexactly what I meant about pushing the linear search overhead into userland.

Having a table of indirect indexes just creates more recordkeeping. I'm not exactly sure how this indirection idea allows you todo priority ... do you sort the returned list or something so you canhandle the lower indexes first?

No. I have an array of event structures, each element in the arraycorresponds to a descriptor of interest. My app happens to know thatdescriptor 5 is "special" and it happens to know that descriptor #5 isslot #3 in the array of structures. When the event call returns, I canimmediately check slot #3 and test the flags to see whether an event wastriggered. Once that's handled, I go through the list of offsetshandling each remaining event in turn.


 >From Howard Chu's link he posted:

This approach completely eliminates any copying of event
records/descriptor lists between user/kernel space, so constructing
event lists and modifying them is essentially zero-cost. Also, since the
array of events remains in place, the application can inspect higher
priority events of interest by direct reference.

When you say that constructing event lists is zero cost, do you actuallymean to say that the kernel works with this memory mapped datadirectly?


Yes.

How does the kernel know when you change something if youdon't make an OS call to tell it? Do you expect the kernel to scanthis shared memory region constantly?

No, that would be stupid. You perform one system call when you want towait on events. As part of that system call, you can also indicatewhether the event table has changed. This is all explained in my equeuewriteup.

It's OK to be opinionated, but better to have well-founded facts onwhich those opinions are based. It doesn't sound to me like you've everwritten an application / server that handles thousands of clients andhundreds of thousands of operations per second. I have, many times.

I'm interested in what other ideas you have on solving the "unifiedevent loop" situation. I've already pitched one idea, but not yet heardspecifically where the problems are that need to be redesigned.
For example, should AES events come through a GEMDOS handle (as Imentioned), or should GEMDOS file handles be a special form of AES event(the expanded evnt_multi approach), or should they be on equal ground,both coming from a more flexible interface?

Don't be like Sun and forget about signals. epoll() certainly is animprovement for some class of applications on Linux, but without atotally integrated kitchen-sink approach like kqueue/equeue you're goingto go through all this churn and still have an unsolved event managementproblem.


--
  -- Howard Chu
  Chief Architect, Symas Corp.       Director, Highland Sun
  http://www.symas.com               http://highlandsun.com/hyc
  Symas: Premier OpenSource Development and Support

Follow-Ups:
- Re: [MiNT] State of the Union
  - From: "Evan K. Langlois" <Evan@CoolRunningConcepts.com>

References:
- [MiNT] State of the Union
  - From: evan@coolrunningconcepts.com
- Re: [MiNT] State of the Union
  - From: Frank Naumann <fnaumann@boerde.de>
- Re: [MiNT] State of the Union
  - From: "Evan K. Langlois" <Evan@CoolRunningConcepts.com>
- Re: [MiNT] State of the Union
  - From: Frank Naumann <fnaumann@boerde.de>
- Re: [MiNT] State of the Union
  - From: "Evan K. Langlois" <Evan@CoolRunningConcepts.com>

Prev by Date: Re: [MiNT] State of the Union
Next by Date: Re: [MiNT] State of the Union
Previous by thread: Re: [MiNT] State of the Union
Next by thread: Re: [MiNT] State of the Union
Index(es):
- Date
- Thread