Software Mummert-dissertation (SWMUMDIS)
(Diese Seite auf Deutsch)
This software collection provides procedures for aurally adapted time-frequency-representations of audio described in the dissertation by Mummert [Mum]. The part-tone-time-pattern (PTTP) procedure by Heinbach [Hei] and its variants as well as the spectrogram resynthesis by Horn [Hor] are also provided. The procedures are based on the Fourier-t-Transformation (FTT) defined by Terhardt [Ter]. Also contained in the collection are functions to generate and process time-signals and to further process the time-frequency-representations generated. The collection may be copied and used free of charge (for details see licence-template).
The collection consists of several universal C-programs and shell-scripts. These pass on data using special file-formats, mostly realizing only parts of the procedures. Simple administration of the entire procedures is provided by special scripts (front-ends) that execute all steps necessary. The front-ends are named according to the abbreviations introduced by the dissertation. Only the signal file name and a mode have to by stated to analyse, visualize and reconstruct the representation. Using individual programs on the other hand can be quite complicated as numerous parameters have to be specified.
The software can be installed on Linux- and Unix-systems (see supported systems), presently there is no experience as to which extent it is installable on other systems. All user-interfaces are text-oriented, the task of visualization is passed on to external image-viewers. The format of the audio-signals processed is `raw'. The image-format generated is `pbm' or `pgm'. Presently most runtime-instructions, source comments and documentation are written in German. The front-ends mentioned above, however, have their user-interface written in English. The sources contain README-files in English on each directory level which in a few cases provide specific documentation.
The following text is a template for the licence text being part of the sources of each individual program. Within some programs licence term 6 has a simplified content which allows distribution of object code without any restrictions.
for the software package named
consisting of the files named
Munich, February 24th 1999
END OF LICENCE
Front-ends for time-frequency-representation procedures
The following procedure names will be installed as links to a script named ctxadmin (C). It calls individual programs and supplies them with parameters as needed. Each procedure provides execution of analysis, visualization and reconstruction. Also, a help-text can be displayed. The time-signals processed may have sample rates of 8, 11.025, 12.8, 16, 22.05, 32, 44.1 and 48 kHz.
Front-ends for codecs
The following procedure names will be installed as links to a script named drdadmin (C). It calls individual programs and supplies them with parameters as needed. The procedures will only be simulated only, i.e. an input signal will be fed through the entire codec. Saving a compressed signal and visualization are not supported. The signals must have a sample rate of 12.8 kHz.
Analyse time-frequency-representations from time-signals
Reconstruct time-signals from time-frequency-representations
Modify and convert time-frequency-representations
All programs have a text-oriented user-interface just like traditional Unix-commands: They will be called and supplied with parameters for a single run only, there is no interactive change or reentering of parameters, all user input is text. For historical reasons, the programs found in the collection employ four different concepts to read parameters:
The program is called with its parameters on the commandline. A call without parameters will issue a usage-message. Parameters are identified by prepending option letters or simply by the order of appearance.
Concerning the usage-message there is an exception for programs that use standard-I/O (stdio) to process data. After a call without parameters they issue no message at all and seem to `hang' (exit - depending on the system - e.g. with key <CRTL-C>, <BREAK> or <DEL>). In this case a usage-message can be forced by a nonsense-option, e.g. -helpme.
The program is called with its parameters on the commandline. A call without parameters will issue a usage-message. Among other the option INFO is listed.
The concept allows a versatile parameter supply. Parameter sources are (ascending priority): internal defaults, a predefined default parameter file, an explicitly stated parameter file and, at last, parameters from the commandline. A consequence of this concept is, for example, that a basic parameter set can be loaded that is modified by values explicitly stated on the commandline.
A parameter on the commandline is specified by a keyword followed by a value. Legal keywords can be listed with the options INFO and INFOPARM. In a parameter file not more than one keyword/value pair per line is expected, separated by tabs or blanks. A "#" introduces a comment until the end of line.
The effective set of parameters can be monitored by appending the option INFO or INFOPARM to the commandline, causing the program to exit without starting the computation. Also, a parameter file can be loaded, changed by commandline parameters, monitored and saved back again in one pass (by using options LOADPARM, INFO and SAVEPARM). Parameters may have internal limits which will be shown by the option INFOPARM.
There are no commandline parameters. The program prompts individually for each parameter before it begins with the computation. Incorrect input requires termination (depending on the system - e.g. with key <CRTL-C>, <BREAK> or <DEL>) and restart. The progress of computation is displayed by a a cycle counter.
For batch-processing a parameter file can be created that contains a parameter each line, without tabs or blanks, regarding the order requested interactively. The standard input of the program is redirected to this file, the standard output can be discarded (like this: program < parameter_file > /dev/null).
Basically the same behaviour as with "pseudo-interactive", i.e. the program prompts individually for each parameter. A new feature is the optional commandline parameter that is taken as the name of a parameter file, turning off interactive parameter input. Programs of this type can be identified by the message "Aufruf mit Parameter Eingabedatei möglich" or "Eingabedatei als Parameter möglich" issued at start-up.
The parameter file contains not more than one parameter each line, regarding the order requested interactively. A "#" introduces a comment until the end of line, allowing identification of parameters within the file. The sources of programs of this type contain a commented sample parameter file <program_name>.par.
Calling the program with a parameter file causes the parameters read to be merged into the standard output. This is useful for debugging. Yet in many cases the standard output can be discarded (like: program parameter_file > /dev/null; note that the cycle counter now is invisible too).
As with the previous concept the parameter file can be used for redirecting standard input (now like this: program < parameter_file > /dev/null). The difference is that the parameter file may still contain comments.
Restrictions imposed by the file formats for time-frequency-representations are the following: Levels are resolved within steps of 0.5 dB, frequencies are resolved within steps of the sample-rate of the time-signal divided by 65536, phases are resolved within steps of (2*pi)/256. Only a few programs can process data in ASCII-format.
This is a headerless stream of 16bit-signed words for a single channel. The byte order depends on the machine ("bigendian/littleendian") but can be swapped by the compiler flag -DUNNATURAL_BYTEORDER during installation. There are a number of fixed sample rates for procedures realized by ctxadmin and drdadmin whereas individual programs allow arbitrary values. To convert formats use sox (e.g. raw <-> wave/RIFF), to convert sample rates use rateconv (e.g. 8kHz <-> 44.1kHz).
This is a stream of data sets taken from a frequency file and a level file. A set represents frequency-contours (alias part-tones) at an evaluation instance or time-contours next to an evaluation instance. The first value of a set determines the number of frequency values or, respectively, the number of level values following. Values of the frequency file are always 16bit-words and may be signed or unsigned. Frequencies are coded in steps of the sample rate of the time-signal divided by 65536. Values of the level file are always 8bit-unsigned. Levels are coded in steps of 0.5 dB, they normally don't exceed 90d B.
Naming convention: Some shell scripts only require a base name and will append the extentions .mxf and .mxl to compose the names of the frequency file and the level file respectively. It is suggested to stick to this convention, when programs prompt for individual file names!
The frequency file is always twice as large as the level file. Because of the limited range of a 8bit-unsigned number the first value of a level set will be disregarded if the corresponding number of frequencies exceeds 255. In all other cases input routines require the first values of a set to be the same in both files.
In addition to the mxf/mxl-format a phase file is kept. Its structure is identical to that of the level file, but instead of levels phases are coded in steps of (2*pi)/256. Names of phase files should have .mxp as extension!
Levels are coded as with the mxf/mxl-format. The value for the number of levels in a set is missing because it is invariant and can be calculated from parameters. Frequencies aren't coded at all for the same reason. The order of levels in a set corresponds to ascending frequencies. For file names the extension .ftt should be used.
The program ftttopgm allows conversion of ftt-format into a greyscale picture (pgm-format) and on into arbitrary picture formats by using netpbm. It is not intended for visualization, as it doesn't provide exact frequency scales as costximg does. Rather it allows modification of spectrograms by image processing software (Horn). A greyscale picture in pgm-format can be converted back into ftt-format by pgmtoftt. ftttopgm requires the parameter floor((NFI + GRID - 1)/GRID) (NFI and GRID see frequency table).
This format is identical with the mxf/mxl-format., the extension for the frequency and level file being .shf and .shl for historical reasons. It is used by the programs tzshk and tzmoshk. Compared to the ftt-Format frequencies and set sizes become explicit again. The advantage is that this format can be processed by programs requiring mxf/mxl-format.
fbtab has a special mode to convert ftt-format into shf/shl-format alias mxf/mxl-format. The other way is more difficult: A shl-file can be converted into a ftt-file by cutting out the set size bytes. addsp or subsp can be `abused' to do this, requiring a second spectrogram in ftt-format with zero-levels. This is just a stream of zeros which can be delivered by /dev/zero. Care has to be taken if the order of ascending frequencies within a set has been destructed, e.g. some programs change the order according to ascending levels. In this case contline can be `abused' to rearrange in the order of ascending frequencies (MODE fr, TLENL and TLENH as large as possible). Of course mxf/mxl-processing programs could also have dropped level/frequency-pairs. Then even more effort has to be spent ...
The C-instruction printf("P4\n%d %d\n", width, height) is used to write an ASCII-header. It is followed by floor(width*height - 7)/8) bytes that represent the bitmap of the picture line-by-line, beginning in the left-upper corner corresponding to the most-significant bit in the first byte. Bit-values of 0 and 1 represent white and black respectively. The pbm-format is taken from the package netpbm.
The C-instruction printf("P5\n%d %d\n255\n", width, height) is used to write an ASCII-header. It is followed by width*height bytes that represent the grey values of a picture line-by-line, beginning in the left upper corner. Values of 0 and 255 represent black and white respectively. The pgm-format is taken from the package netpbm.
The analysing frequencies at which a time-variant spectrum should be evaluated are determined by constructing a table. Because the frequencies aren't kept by the ftt-format this table has to be recalculated by each program processing this format. Programs for reconstruction from contours need a similar table to determine synthesizing frequencies.
To construct the table the following parameters are needed: the total number NFI of table frequencies, the first table frequency F0 in Hz, the table frequency distance in UNIT (Hz, Prozent, ERB, Bark oder spinc) times value DF and a step width GRID. Successive frequencies are calculated by adding at the preceeding frequency the size of UNIT in Hz times DF.
This `differential' procedure is tolerable for small frequency distances, yet fails for frequency-transformations on a large scale, with a deviation being dependent on DF. For the time-frequency-representations treated this deviation is neglectable, as long as all programs use the same table. This requires F0, UNIT and DF to be the same everywhere. To still be able to down-sample the frequency spacing (e.g. for texture-representation) the parameter GRID is provided. It specifies the distance of table indices at which spectral samples are actually written or expected. Table frequencies can be printed with fbtab. By the way: costximg converts the table into a transformation that is correct on a large scale.
The software was tested on the following systems:
The following Software should have been installed already:
[search-term given in square brackets to search public ftp-servers, e.g. via http://www.ftpsearch.com; numbers given don't necessarily reflect the latest version]
The gzip/tar-archive of the complete sources named swmumdis-xxxxxx.tar.gz is available at the ftp-directory for download. The file size is about 1.5 Mbyte. A subdirectory named swmumdis contains all programs for individual download. Shell scripts (all front-ends) and documentation are bundled as scripts-xxxxxx.tar.gz and doc-xxxxxx.tar.gz respectively.
Unpack using gzip and tar, then change into the directory swmumdis-xxxxxx. Call make to receive further instructions. Special options found in the Makefile are listed here in advance:
Known problems and bugs
Homepage of Mummert-dissertation and SWMUMDIS
Authors of SWMUMDIS
Copyright (c) 1998 Dr.-Ing. Markus Mummert
$Id: overview_e.html,v 1.4 2015/07/27 20:38:51 mummert Exp mummert $