Pliska Studia Mathematica Bulgarica
Volume 16, 2004
Proceedings of the X International Summer Conference on Probability and Statistics and Seminar on Statistical Data Analysis, Sozopol, 2003
GUEST EDITORS: N.Yanev, D. Vandev
Sofia, 2004
C O N T E N T S
- Atanasov, D. Study on Robustness of Strehler-Mildvan Model. (pp. 5-11)
- Bakeva, V. Probabilistic Models in Cryptography, Coding Theory and Tests for PRNG. (pp. 13-22)
- Boshnakov, G. On some Concepts of Residuals. (pp. 23-33)
- Dimova, R., Neykov, N. Application of the d-fullness Technique for Breakdown Point Study of the Trimmed Likelihood Estimator to a Generalized Logistic Model. (pp. 35-41)
- Gonzalez, M., Molina, M., Puerto, I. Recent Results for Supercritical Controlled Branching Processes with Control Random Functions. (pp. 43-54)
- Ivanovska, S. Monte Carlo Method for Reconstruction of the Densities. (pp. 55-64)
- Jacob, C., Lalam, N. Estimation of the Offspring Mean in a General Single-Type Size-Dependent Branching Process. (pp. 65-88)
- Kolkovska, E. On a Stochastic Partial Differential Equation with a Noisy Term. (pp. 89-099)
- Lopez-Mimbela, J. Branching Particle Representations of a Class of Semilinear Equations. (pp. 101-119)
- Mateev, P., Titianova, E., Tarkka, I. Gait Measurments and Motor Recovery after Stroke. (pp. 121-128)
- Minkova, L. A Modified Model of Risk Business. (pp. 129-135)
- Mitov, K., Yanev, N. Limiting Distributions for Lifetimes in Alternating Renewal Processes. (pp. 137-145)
- Molina, M., Mota, M., Ramos, A. Discrete Time Bisexual Branching Processes in Varying Environments. (pp. 147-158)
- Nadarajah, S., Mitov, G. , Mitov, K. An Estimate of the Probability Pr ( X < Y). (pp. 159-170)
- Noncheva, V., Gamallo, P., Agustini, A., Lopes, G. A Stochastic Approach for Finding of Semantically Related Words. (pp. 171-182)
- Robeva, R. S., Pitt, L. On the Equality of Sharp and Germ s -fields for Gaussian Processes and Fields. (pp. 183-205)
- Silva, J., Mexia, J., Coelho, C.A., Lopes, G. A Statistical Approach for Multilingual Document Clustering. (pp. 207-228)
- Slavtchova-Bojkova, M., Becker-Kern, P., Mitov, K. Total Progeny in a Subcritical Branching Process. (pp. 229-243)
- Stoimenova, E., Datcheva, M., Schanz, T. Application of Two-Phase Regression to Geotechnical Data. (pp. 245-257)
- Stoimenova, V., Atanasov, D., Yanev, N. Simulation and Robust Modifications of Estimates in Branching Processes. (pp. 259-271)
- Tarkka, I. Multiple Dipole Source Models for Scalp-Recorded Event-Related Potentials: Example from Complex Visual Processing in the Human Brain. (pp. 273-277)
- Tsvetanova, Y., Grozeva, N. Classification of Chenopodium Genus Populations and Species based on Continuous and Categorical Variables. (pp. 279-290)
- Vandev, D. Interactive Stepwise Discriminant Analysis in MATLAB. (pp. 291-298)
- Vandev, D., Romisch, U. Comparing Several Methods of Discriminant Analysis on the case of Wine Data. (pp. 299-308)
- Vuchkov, I. Quality Improvement through Experiments with Mixtures. (pp. 309-315)
A B S T R A C T S
Study on Robustness of Strehler-Mildvan Model
Dimitar Atanasov datanasov@fmi.uni-sofia.bg
2000 AMS Subject Classification: 62F35, 62F15
Key words: robust statistics
The probability that a device will work properly after a certain period of
time can be studied using the Strehler-Mildvan model. Let us suppose that the
functionality of a device depends on an unknown parameter X, which
decreases progressively in time. The device stops working if X goes
below a certain given value.
We can use the method of maximum likelihood to estimate the parameters of the
model and to estimate the probability of proper work at a future moment using
the survival function.
This model can be modified in order to improve its robust performance. We
will consider the breakdown properties of the model using the WLTE(k) Estimators
and the theory of d -fullness of the set of subcompact functions.
Probabilistic Models in Cryptography and Coding
Theory
Verica Bakeva verica@pmf.ukim.edu.mk
2000 AMS Subject Classification: 94A29, 94B70
Key words: quasigroups,
stream cypher, tests for pseudo-random number generators, error-correcting
codes.
This paper is a review of some applications of probabilistic models in
cryptography, coding theory and tests for pseudo-random number generators
(PRNG). Using quasigroup transformations, we design streams
cyphers and error-correcting codes with suitable properties. Some tests for
pseudo-random number generators are designed, too . They are based on
random walk on discrete coordinate plane.
Total Progeny in a Subcritical Branching Process with Two Types of Immigration
Maroussia Slavtchova-Bojkova bojkova@math.bas.bg
P. Becker-Kern
Kosto V. Mitov
2000 AMS Subject Classification: 60J80, 60F05
Key words: central limit theorem, total progeny, Bellman-Harris branching processes, law of large numbers, renewal processes.
We consider subcritical Bellman-Harris branching processes
with two types of immigration - one appears whenever the process
hits zero state and another one is in accordance of an independent
renewal process. The law of large numbers (LLN) for the total
progeny of theese processes and Anscombe's type central limit
theorem (CLT) for the total number of particles in the cycles
completely finished by the moment t are obtained.
On Some Concepts of Residuals
Georgi N. Boshnakov Georgi.Boshnakov@umist.ac.uk
2000 AMS Subject Classification: 60E10, 62G15, 62M20
Key words: concentration function,
confidence density, confidence residual highest density region
We introduce confidence residuals and standartised confidence
residuals. These residuals may be especially useful for asymmetric
and multimodal distributions.
Application of the d-fullness
Technique for Breakdown Point Study of the Trimmed Likelihood
Estimator to a Generalized Logistic
Model
Rositsa Dimova
Neyko Neykov
2000 AMS Subject Classification: 62J12, 62F35
Key words: Breakdown
Point, Subcompact Function, d-fullness, Robustness, Trimmed
Likelihood Estimator, generalized logistic model.
A new definition for a d -fullness of a set of functions
is proposed and its equivalence to the original one given by Vandev
is proved. The breakdown point of the WTL(k) estimator of Vandev and
Neykov for a grouped binary linear regression model with generalized
logistic link is studied.
A Statistical Approach for Multilingual Document Clustering and Topic Extraction from Clusters
Joaquim Silva mbox0106@di.fct.unl.pt
Joao Mexia
Carlos A. Coelho
Gabriel Lopes
2000 AMS Subject Classification: 62H30
Key words: cluster analysis, applied statistics, document clustering, text mining, topics extraction.
This paper describes a statistics-based methodology for document unsupervised
clustering and cluster topics extraction.
For this purpose, multiword lexical units (MWUs) of any length
are automatically extracted from corpora using the LiPXtractor - a
language independent statistics-based tool.
The MWUs are taken as base features to characterize
documents. These features are transformed and a document
similarity matrix is constructed. From this matrix, a reduced set of
features is selected using an approach based on Principal Component
Analisys. Then, using the Model Based Clustering Analisys
software, it is possible to obtain the best number of clusters.
Precision and Recall for document-cluster assignment range above 90
%.
Most important MWUs are extracted from each cluster and
taken as document cluster topics.
Results on new document classification will just be
mentioned.
Recent Results for Supercritical Controlled Branching Processes with Control Random Functions
Miguel Gonzalez mvelasco@unex.es
Manuel Molina mmolina@unex.es
Ines del Puerto idelpuerto@unex.es
2000 AMS Subject Classification: 60J80, 60F05
Key words: controlled branching process, extinction problem, limiting behaviour.
In this paper we are concerned with the controlled branching
processes with random control functions. Recently, we have considered
them under the condition of asymptotically linear growth of the
mathematical expectations associated to the random control variables. We
present a review of the main results obtained until now, mainly, in the
supercritical case.
Monte Carlo Method for Reconstruction of the Densities
Sofiya Ivanovska sofia@copern.bas.bg
2000 AMS Subject Classification: 65C05
Key words: Monte Carlo algorithms
The present paper considers the problem how to construct the
unknown density having N realizations of a random variable using
B-splines approximation, least squares method and Monte Carlo
method. It is shown that B-splines are appropriate for density
modeling. The results from approximation of an unknown density
distribution for the considered algorithm are compared with some
non-parametric statistical methods such as histogram and kernel
density estimation. A large number of experiments are made using
Matlab 6.
Estimation of the Offspring Mean in a General Single-Type Size-Dependent branching process
Christine Jacob
Nadia Lalam
2000 AMS Subject Classification: 60J80,62F12, 62P10
Key words: Size-dependent branching process, Controlled branching process.
We consider a general single-type size-dependent branching
process {Nn}n such that the offspring mean
converges to a limit m і 1 with a rate of
convergence of order Nna, as the population size Nn
grows to Ґ, and the variance may increase
at the rate Nnb,
where -1 Ј b 1. We assume that
m(N)=m1(N)+m2(N), where m1(N)
depends on an unknown asymptotically identifiable parameter q0 that belongs either to the limit
model or to the transient model, and m2(N) is a nuisance
term that is assumed asymptotically negligible relatively to
m1 (N).
We estimate q0 on the
non-extinction set from the observations {Nh,ј,Nn}, by using the conditional least
squares method weighted by {Nn-1-g}n. We study the strong
consistency of the estimator according to g, with either h or n-h
remaining constant as n®Ґ, by using either
the minimum contrast method or a Taylor's approximation of the first
derivative of the contrast. The main sufficient and probably also
necessary condition for the strong consistency of the transient
parameter is that b+2a Ј 1. We also give the
asymptotic distribution of the estimator by using Rahimov's central
limit theorem for random sums and we show that the best rate of
convergence is reached for g = 1+b. All the results are independent of the value
taken by the nuisance term m2 (N).
On the fractal Burgers equation with a stochastic noisy term
Ekaterina T. Kolkovska
2000 AMS Subject Classification: 60H15, 60H40
Key words: Burgers equation, white noise, weak and strong solutions, Hilbert space
regularity.
We review results obtained in [13] and [14] on a one-dimensional
Burgers-type stochastic differential equation involving fractional
power of the Laplacian in its linear part, perturbed by a whole
noise term, with Dirichlet boundary conditions. We discuss existence
of weak solutions and regularity of solutions.
Branching Particle Representations of a Class of Semilinear Equations
Jose Alfredo Lopez-Mimbela
2000 AMS Subject Classification: 60J80, 60J85
Key words: Markov branching process, semilinear partial differential equation, global and nonglobal
solutions, mild solutions
We review several probabilistic techniques that were
developed in a series of papers to study blowup properties of positive
(mild) solutions of semilinear equations of the form:¶
u(t,x)/¶
t =Au(t,x)+ub
(t,x), u(0,x)=f(x) where
A is the generator of a strong Markov process in a locally
compact space S, b
> 1 is an integer, and f:S®
[0, +Ґ
) is bounded and measurable, The emphasis is on probabilistic representation of positive solutions, and on
qualitative properties of solutions.
Gait Measurments and Motor Recovery After Stroke
P. Mateev pmat@math.bas.bg
E. Titianova
I. Tarkka
2000 AMS Subject Classification: 62P10, 92C20
Key words: gait analysis, recovery after stroke.
Gait analysis is one of the methods used for estimation of the
degree of restoration of motor recovery after stroke. The purpose of
the present study was to examine the disgnostic value of the
footprint parameters and their relationship with the functional
ambulation profile (FAP) scores provided automatically by the
pressure sensor walkway system for gait examination. The patterns of
walking were studied in a group of 23 patients with chronic
unilateral stroke and 72 healthy subjects. Their predictive
value was compared with some other gait indicators for motor
recovery after stroke.
A Modified Model of Risk Business
Leda D. Minkova
2000 AMS Subject Classification: 60K10, 62P05
Key words: Pуlya - Aeppli risk model, ruin probability, Cramйr - Lundberg approximation.
We consider the risk model in which the claim counting process
{N(t)} is a modified renewal process. {N(t)} is governed by a
sequence of independent and identically distributed inter-occurrence
times with a common distribution function with mass at zero equal to
r > 0. In the case of r = 0 this model is called the Sparre - Andersen
model. The particular case of the Pуlya - Aeppli risk model is
studied. The Cramйr - Lundberg approximation and the martingale
approach of the model are given.
Limiting Distributions for Lifetimes in Alternating Renewal Processes
Kosto V. Mitov
Nickolay M. Yanev
2000 AMS Subject Classification: 60K05
Key words: alternating renewal processes, spent working time, spent waiting time, residual working
time, limiting disributions, infinite mean renewal periods.
The spent life time and the residual life time are well
investigated characteristics of an ordinary renewal process. In
the present paper a generalization of these lifetime processes
associated with an alternating renewal process is
considered. Limiting distributions are presented in the case of
infinite mean renewal periods.
Discrete Time Bisexual Branching Processes in Varying Environments
Manuel Molina mmolina@unex.es
Manuel Mota mota@unex.es
Alfonso Ramos aramos@unex.es
2000 AMS Subject Classification: 60J80
Key words: discrete time branching processes, bisexual processes, branching processes in varying
environments.
This paper concerns with the bisexual branching process in
varying enviroments introduced in [2]. For such a model a survey of
results is provided. Previously, brief descriptions about the
bisexual branching process and some bisexual models derived from it
are given.
An Estimate of the Probability Pr (X < Y)
Saralees Nadarajah
Georgi K. Mitov
Kosto V. Mitov
2000 AMS Subject Classification: 33C90, 62E99
Key words: beta distribution, bootstrap confidence intervals, exponential distribution, gamma
distribution, normal distribution, stress-strength, uniform
disribution.
In the area of stress-strength models there has been a large
amount of work as regards estimation of the probability R = Pr(X
< Y) when X and Y are independent random variables belonging to
the same univariate family of distributions. In this paper we
propose an estimate of this quantity based on a simple property of
the uniform distribution. We illustrate the use of the estimate with
bootstrap confidence intervals for four commonly known distributions
(normal, exponential, gamma and beta).
A Stochastic Approach for Finding of Semantically Related Words
Veska Noncheva bojkova@math.bas.bg
Pablo Gamallo
Alexandre Agustini
2000 AMS Subject Classification: 62P99, 68T50
Key words: syntatic context, semantic preferences, c2 goodness of fit
test.
Semantically related words are modelled as words having the same
probability distribution on the set of ssyntatic contexts occuring in text
corpora. A learning algorithm for finding of clusters of
semantically related words is developed. In that algorithm c2 statistics is used as a performance
measure.
On the Equality of Sharp and Germ s -fields for Gaussian Processes and Fields
Loren D. Pitt
Raina S. Robeva
2000 AMS Subject Classification: 60G15, 60G60
Key words: Gaussian processes, Gaussian fields, germ fields, sharp Markov property, spectral syntesis.
There is no universally accepted agreement in the literature defining the Markov property for random fields and various
definitions are in use. Specifically, if F[( def) || =] {f(x)
: x О Rn } is a random field and G М Rn,
define the sharp s-field of
F as F(F, G)[( def) || =] s{f(x): x О G} and the germ
s-field of F as [`(F)](F,G) [( def) || =]Зe >
0F(F,Ge) , where Ge is the
uniform neighborhood { x: dist(x,G) < e} of
G. When G is a closed set separating Rn into two complementary open sets
D+ and D-, the following definitions were introduced by Pitt in [1]: F is said to satisfy the germ field Markov property at G if the s-fields [`(F)](F,
D+) and [`(F)]( F,D-) are conditionally independent given [`(F )](F,G). If the random field F satisfies the more restrictive condition that
[`(F)]( F,D+) and [`(F)]( F,D-) conditionally independent given F
(F,G), F is said to satisfy the sharp Markov
property at G. In [2] on the other hand, Dalang and Walsh use slightly different definitions: F is said to satisfy the germ field Markov property at G if the s-fields F(F,
D+) and F( F,D-) are
conditionally independent given [`(F )]( F,G) and to satisfy the sharp Markov property at G when F( F,D+) and F( F,D-) are
conditionally independent given F(F,G ).
Thus, to determine when the definitions in [1] and [2] coincide and to find conditions that imply the equivalence of the germ field
and the sharp Markov properties, the following question needs
consideration: If S М Rn, what conditions on the set S imply that F(F, S) =
[`(F)](F ,S)?
For Gaussian random fields F, we present a general answer to this question as a necessary and sufficient
condition for spectral synthesis in the reproducing kernel Hilbert space associated with F . More detailed
conditions are derived for Gaussian fields that arise as solutions of certain pseudo-differential equations.
Application of Two-phase Regression to Geotechnical Data
Eugenia Stoimenova jeni@math.bas.bg
Maria Datcheva
Tom Schanz tom.schanz@bauing.uni-weimar.de
2000 AMS Subject Classification: 62F10, 62J05, 62P30
Key words: two-phase regression, transition point,air entry value.
A method for estimating a transition parameter in two-phase
regression is described. The two phases are fitted and
simultaneously the transition point is estimated. Practical
application of the method is demonstrated on the data for
determining soil hydraulic properties.
Simulation and Robust Modifications of Estimates in Branching Processes
Vessela Stoimenova stoimenova@fmi.uni-sofia.bg
D. Atanasova
Nickolay Yanev yanev@math.bas.bg
This study is focused on the comparison and modification of
different estimates arising in the branching processes. Simulations
of models with or without migration are put through. Due to the
complexity of the computations the algorithms are designed with the
language of technical computing MATLAB. Using the simulations,
estimates of the offspring mean of the generated processes are
calculated. It is well known in the literature that under certain
conditions the asymptotic distribution of the estimates is proved to
be normal. Using the asymptotic normality a modified method of
maximum likelihood is proposed. The aim is to obtain trimmed maximum
likelihood estimates based on several sample paths with the same
number of generations. Thus in a natural way the observations,
inconsistent with the aprior information about the asymptotic
normality are excluded from the model. The computation of the
standard error allows the comparison of different types of
estimates.
Multiple Dipole Source Models for Scalp-Recorded Event-Related Potentials: Example from Complex Visual Processing in
the Human Brain
I.M.Tarkka
2000 AMS Subject Classification: 62P10, 92C20
Key words: electroencephalography, source modelling, visual recignition.
Electrical activity of the human brain can be recorded on the
scalp. One of the advantages of the electrical recordings is the
high temporal resolution by which e.g. cognitive processes can be
followed from millisecond to millisecond. It is not uncommon to
record simultaneously 128 electrode sites with high sampling rate
and thus advanced mathematical and statistical methods are needed to
sufficiently process the obtained data. Here an example of the
analysis of data recorded during a complex visual processing task is
presented. Using advanced methods large amounts of data can be
reduced and new information of the function of the human brain can
be invstigated.
Classification of Chenopodium Genus Populations and Species Based on Continuous and Categorical
Variables
Yanka Tsvetanova
Neli Grozeva
2000 AMS Subject Classification: 62P10, 62H30
Key words: distance between populations based on continuous and discrete variables, genus
Ghenopodium, cluster analysis.
The estimation of statistical distance between populations arises
in many multivariate analysis techniques. Whereas distance measures
for continuous data are well developed, those for mixed discrete and
continuous data are less so because of the lack of a standard model
for such data. Such mixture of variables arise frequently in the
field of medicine, biometry, psychology, econometrics and only
comparatively few models have been developed for evaluating distance
between populations. The aim of the presented paper is to apply
methods for analysis of dissimilarity between 44 populations of 13
species of Ghenopodium genus, presented by 15 variables
– 10 continuous and 5 categorical. The previously
developed by another authors distance measures between populations
presented by mixed attributes turned out not appropriate for the
available data of Ghenopodium genus. The matrices with
distances between populations and species were used as input for
Hierarchical Cluster Analysis to explore the taxonomic structure of
the Ghenopodium genus.
Interactive Discriminant Analysis in MATLAB
Dimitar Vandev
Key words: stepwise discriminant analysis linear quadratic MATLAB.
The program ldagui.m is
an interactive tool for linear and quadratic discriminant analysis.
The reason for developing such a tool consists in failing of
conformity with conventional statistical programs in following
aspects: treating of missing data; interaction with the user;
testing the quality of obtained models.
Comparing Several Methods of Discriminant Analysis on the Case of Wine Data
Dimitar Vandev vandev@fmi.uni-sofia.bg
Ute Romisch
2000 AMS Subject Classification: 62H30, 62J20, 62P12, 68T99
Key words: application linear quadratic discriminant analysis SVM.
The main problem of this European wine project (WINE-DB) is the
identification of the geographical origin based on chemico-analytical
measurements. At first the type of data collected in preparation of this
project will be analysed.
Then different procedures of Discriminant Analysis are
described. Our special attention will be focused to some new techniqies as
Support Vector Mashines (also known as Kernel Mashines) - procedures
from the field of Mashine Learning.
We test traditional techniques of Linear, Quadratic and Nonparametric
Discriminant Analysis as well as the Support Vector Mashines on the base
of our data and comment the results.
Quality Improvement through Experiments with Mixtures
I.N.Vuchkov
Component amounts in a mixture formulation are often set with
errors in mass production. That causes variations in the product
performance characteristic and the product quality of such products
is low. A quality improvement problem arises, which is defined so
that to minimize this variation while keeping the performance
characteristic on a preliminary given target.
The component proportions in mixture models are ratios of
component amounts. Therefore they are nonlinear functions of the
errors in the component amounts. This makes the problem of error
transmission in mixture experiments quite specific and the known
models of error transmission cannot be directly applied.
In this paper we propose explicit models of mean and variance of
a mixture performance characteristic for the case when the mixture
amounts are set with errors in the production process. They are
based on Taylor expansion up to third order terms of a mixture model
expressed through the component amounts. Properties of these models
are studied.
An example from rubber industry is given.