The KDE2 meeting, multimedia SIG discussion protocol - v0.1 - 18 Oct 1999 ============================================================================= Note that this protocol doesn't yet describe the official KDE2.0 api, or final results on what will be done how. It is published nevertheless, - to give people who were desperately waiting what KDE2.0 multimedia would be like a chance - to give those that were in the SIG a remainder what we talked about - to allow others to join in, or to discuss aspects of what we mentioned in here - to be the base for a less confusing, coherent official document about our further plans - in the firm belief, that open source projects work best when information flow is truely open SIG Participants Christian Esken Antonio Larrosa Stefan Westerfeld Overview 1. Goals 2. Existing services/apis 2.1. MidiClasses in KMid 2.2. KMedia 2.3. aRts 2.4. Evaluation of aRts by a use-case ... why it isn't usable for everything 3. Talking about multimedia data types 3.1 Audio 3.2 Midi 3.3 Video 3.4 Trackers (MOD,XM,SID,MP4) 3.5 CDs 4. KDE2 client APIs 4.1 assignment policies AKA purposes 4.2 KApplication::play 4.2 KAudioStream 4.3 extensions to the aRts main API for handling KMedia stuff 5. roadmap ============================================================================= 1. Goals ============================================================================= - signal to the outside world => what the hell are we going to do? - get rid of CORBA in the realtime sector - sort out different media types; provide one concept - share the resources of the machine between many clients - make it easy for people who are developing KDE2 applications to do "real" multimedia development - network transparency at least at the level of esd (?) why (network transparency at esd level) - PR - people asked for it - computer pools - X11 philosophy why not (network transparency at esd level) - introduces parallel approaches (CORBA vs. TCP streams) - security issues - PR (hard to sell that task A,B,C is network transparent, while task C,D,E isn't) ============================================================================= 2. Existing services/apis ============================================================================= ============================================================================= 2.1. MidiClasses in KMid ============================================================================= These provide a [ MidiManager ] -> [ External Midi ] \ [ Gus ] | -> [ OSS/Alsa ] [ AWE ] | [ FM ] / kind of concept. Parallel playing and displaying things in the Gui is handled by playing two timestamped series of midi events, one on the Gui and one against OSS/Alsa. While it works for pieces of music with a normal length, theoretically the Gui and the Output will diverge after some time. ============================================================================= 2.2. KMedia ============================================================================= Christian Eskens original statement: "KMedia is nothing but a huge remote control". Layout [ KMedia ] <-----------> [ maudio ] [ mtracker ] [ mplaymidi ] [ msidplay ] libmediatool communication KMedia sends: - setFileName - play - stop - pause - seek - quit KMedia receives: - mediaName - playingStatus - currentTime - overallTime - capabilities ============================================================================= 2.3. aRts ============================================================================= >> What << aRts provides multimedia signal flow evaluation. Every multimedia task is splittable - after the aRts philosophy - in small modules with simple operations, that are connected by wires. There are four areas of aRts needs to cover to achieve that: a) signal flow editing b) scheduling c) plugins d) remote control >> How? << a) signal flow editing aRts provides a CORBA API for signal flow editing. b) scheduling => intelligent Scheduling flow graphs is handled internally, so it's really *fast*. It needs no CORBA, no IPC, no object model. Just internal calls to C++ functions, which means there is no overhead against "conventional" solutions. Scheduling is trying to be intelligent, that means to decide when what needs to happen. c) plugins => dumb The plugins are normally just derived from some base class. There is not much intelligence inside the plugins, as the scheduler tries to handle everything from outside. So the plugins are really easy to write, and don't depend on how internals of aRts work. d) remote control By providing all important stuff via the CORBA API, aRts is fully remote controllable. ============================================================================= 2.4. Evaluation of aRts by a complex use-case ============================================================================= or: why it isn't usable for everything right now >> the setup << Clients (outside aRts): - Mixer frontend: a nice gui frontend with sliders and knobs for the mixer - Game: a game which uses a) an mp3 soundtrack and b) an own technique of generating 3d sound effects - Sequencer: a midi sequencer - Virtual Midi KBD: a gui keyboard which is drawn on the screen and plays midi events for clicks Plugins (inside aRts): - mp3player module - the game's own 3d code - midi out module, connected to - synth (which outputs to mixer) - awe synth - midi in module, conencted to - external synth - virtual midi keyboard stuff - mixer module Output: - audio speaker - external synth - awe synth >> why doesn't that work currently? << - limited specification The specification of module interfaces is limited, ports must be one of { input, output } x { float (for audio), string } x { signal, property }. While every combination is implemented as specification, midi connections, the connection between the game and mp3player/game's own 3d code, etc. don't work. In fact more than for instance a filename string needs to be passed to the mp3 player. - signal passing: two server approach Currently, aRts splits in a GUI server and a synthesis server. The following signal passing is implemented: passing inside the GUI-Server * audio signals => are misused for passing widget IDs around * string signals => no (also: useless) * float properties => limited support * string properties => limited support passing inside the SynthServer * audio signals => yes * string signals => no (also: useless) * float properties => yes * string properties => yes passing from GUI -> Synthesizer * audio signals => broken * string signals => no * float properties => no * string properties => no passing from Synthesizer -> GUI * audio signals => no * string signals => no * float properties => no * string properties => no Summary: the two servers approach currently seriously lacks communication facilities. The number of data types that it can communicate with is very limited, and the IPC is somewhat broken, mainly since CORBA is a pain to use if you need really fast IPC. >> Fixing aRts << - write a new multimedia transport layer between the servers - merge most code for the server implementation - scheduling optimization aRts is lacking some scheduling optimizations that would make it unneccessary to do calculations to find out how silence sounds in a concert hall. Well, if you know that certain modules (operations) show certain behaviour on processing silence, you can for instance know that you'll get a silent signal if you: * add two a silent signals * multiply with at least one silent signal problems for instance occur with recursive delays and similar stuff. - more datatypes We need something like datatype midi, probably even a type between property and signal, which would be event. >> a note on how midi works in aRts now << There is that midibus interface, which is basically a very limited CORBA interface, which consists of: interface MidiChannel { oneway void noteOn(in octet channel, in octet note, in octet volume); oneway void noteOff(in octet channel, in octet note); }; As you see, this is by no means complete, nor fast, nor a very extendable concept. Just a quick hack. Should be changed. ============================================================================= 3. Talking about multimedia data types ============================================================================= ============================================================================= 3.1 Audio ============================================================================= => MP3 applications - games - standalone (kmpg) - sequencer - generic KMedia player => WAV applications (additional to mp3) - editors ( => think about plugins here ) - format abstraction would be useful for samples => other applications - internet audio conferencing - speech recognition Plans for KDE2: - focus on audio - stable realtime streaming stuff (KAudioStream & Co) - experimental / old seekable audio media (integrating the KMedia stuff) ============================================================================= 3.2 Midi ============================================================================= => MIDI Applications - games - sequencing (play & record) - standalone - generic player (KMedia) Plans for KDE2: - KMid, KMidi as usual - Client API (I think this was a remark about Antonios kmidi classes) ============================================================================= 3.3 Video ============================================================================= => VIDEO Applications - video players - games - dictionaries, computer aided learning - realtime video conferencing Plans for KDE2: - perhaps integrate KAktion or similar - ask companies with know how to contribute technology (such as Corel) ============================================================================= 3.4 Trackers (MOD,XM,SID,MP4) ============================================================================= Trackers might be supported when integrating the KMedia stuff. ============================================================================= 3.5 CDs ============================================================================= CDs are not considered integratable to a high degree and are currently ignored. Bussiness as usual (kscd and similar). ============================================================================= 4. KDE2 client APIs ============================================================================= ============================================================================= 4.1 assignment policies AKA purposes ============================================================================= Usually, when the KDE2 audio subsystem (aRts) will be used to it's full capacity, many applications will require access to audio services at the same time. On the other hand, aRts can do almost anything, from simply mixing everthing together to applying reverb to your window manager sounds (while flanging your ksirtet background tune and not touching your ksirtet sound effects). That means, the audio subsystem should be able to see how to treat which sounds. On the other hand, you wouldn't want a dialog box to pop up for each and every window manager sound, asking "how would you want that sound to be treated?". While btw. that very dialog box could cause another sound ;), but thats a different story. So to figure out how a sound should be treated, we invented purpose definitions. By seeing the purpose of the sound, aRts should then be able to reverb it, or play it louder, or not at all, or only on the left speaker. [ FIXME: list is incomplete ] Purposes are: - game soundtrack - game fx - window manager - ... ============================================================================= 4.2 KApplication::play ( no CORBA libs required ) ============================================================================= KApplication::play(QString filename, KAudioPurpose purpose); Simply plays that file. WAVs should definitely work, if we have too much time, that thing should also accept mp3s. So you could configure your KDE startup sound to be an mp3. >> Implementation << => TCP Service - auth cookie - purpose - appname (from KApplication) - filename => may be implemented in class KAudioPlayer internally, to not clutter up KApplication code too much with TCP stuff >> What for? << KWMSound KReversi all apps that only need simple "one shot and forget" playing ============================================================================= 4.2 KAudioStream ============================================================================= class KAudioStream { KAudioStream(long audioFormat, long samplingRate, long channels, KAudioPurpose purpose, QString description); bool connectionOK(); void setBufferedSamples(long samples); long bufferedSamples(); }; /* upon creation of KAudioStream, the connection to the server is made, and you'll get callbacks periodically, either for "needData", if it's a playing stream, or "haveData", if it's a recording stream audioFormat is something like 16bits, signed, little endian samplingRate is something like 44100 channels is 2 for stereo or 1 for mono description is an arbitary description that should include the application name or something similar, so that the user can see which streams are active (and can mute them, adjust their volume, and so on) */ class KAudioRecordStream :public KAudioStream { public signals: void haveData(void *data, long size); }; class KAudioPlayStream :public KAudioStream { public signals: void needData(); public slots: void writeData(void *data, long size); }; /* still undecided interface */ class KAudioRecordDialog; class KAudioPlayDialog; >> Implementation << Combined usage of the aRts CORBA interface (for instance to set the sampling rate or to obtain the buffering status), with the aRts TCP interface (to stream the data). >> What for? << KMp3 KWave all apps that want to handle things themselves, but interact with the audio subsystem ============================================================================= 4.3 extensions to the aRts main API for handling KMedia stuff ============================================================================= to Arts::Synthesizer (or similar) PlayObject createPlayObject(in string filename, in string (ArtsPurpose?) purpose); creates a KMedia like playing object for a certain file enum poState { playing, finished, paused }; enum poCapabilities { seek, pause }; struct poTime { long ms, seconds; // -1 if undefined float custom; // some custom time unit (changed to float, is that ok?) // -1 if undefined string customUnit; // for instance for a tracker "pattern" }; interface PlayObject { attribute string description; attribute poTime currentTime; readonly attribute poTime overallTime; attribute poCapabilities capabilities; readonly attribute string mediaName; readonly attribute poState state; void play(); void seek(in ArtsTime newTime); // could be handled by setting currentTime void pause(); }; ============================================================================= 5. roadmap ============================================================================= 10/99 - signal to the outside world - publish purposes - scheduling optimizations (or at least experiments with that) 11/99 - KAudioPlayer up & running - KMid/i, classes, libs 12/99 - KAudioStream up & running - KAudioRecordDialog 01/99 - volume meters - KMedia - configuration 02/99 - common server implementation (?) -> only if time 03/99 - format abstraction - panel applets KDE 2.0 - first step towards easy "real multimedia application development" under KDE - simple network transparence - AUDIO => focus on - gaming - beeps - KMedia - some help for WavEditing - desktop sound events (window manager, dialogs, etc.) KDE 2.1 (note that this is somewhat speculative and subject to change anytime) - more datatypes - transfer layer between apps - midi / real midibus => focus on - sequencing KDE 3.0 (note that this is highly speculative and subject to change anytime) - get rid of CORBA in the realtime sector (?) - network transparency - even better datatypes & control flow - other media (VIDEO) - wave editing - plugins -- -* Stefan Westerfeld, stefan@space.twc.de (PGP!), Hamburg/Germany KDE Developer, project infos at http://space.twc.de/~stefan/kde *-