YSM 2018 Abstracts – in no specific order.
Name of presenter: Amenah Al-Najafi
Authors: Amenah Al-Najafi; Lillian Oluoch
Affiliation: PhD student, Univeristy of Szeged
Title: Gaussian regression analysis of drilling protocols for dental implantation surgery
Investigations on the drilling protocols used in dental implantation surgery was performed, making use of the assumption that the drilling process pass through two different types of bone structures (dandelion and swine bone) having different constant friction parameter. Cylindrical drills were tested for two protocols (A and B) and two types of one drills (groups C and D). Drilling process was therefore carried out in a homogeneous material with constant friction parameter and the angle speed of drilling was also constant, hence the drilling moment is a linear function of the elapsed time
of drilling. The best fit pattern of each experiment was determined using mean-square minimization, data smoothing performed to establish typical patterns and construction of splines to obtain interpolation formulas. The stretched spline functions were used to produce average curves representing typical patterns. We verified that the clusters had insignificant changes even when calculated with smoothed data as the prior groupings could be constructed almost perfectly.
Name of presenter: Szabó, Marianna
Authors: Szabó, Marianna
Affiliation: University of Debrecen
Title: Statistical calibration of dual-resolution ensemble forecasts
The more accurate prediction of various weather quantities is the main incentive for continuous research in all major weather prediction centres. In the last 20 years, they complement the single-valued point forecast obtained by numerical weather prediction models by providing the forecast ensembles of different weather quantities, which are obtained from multiple runs of these models with various initial conditions and model parameterizations. One of these organisations, the European Centre for Medium-Range Weather Forecasts (ECMWF) produces operational ensemble-based analyses and predictions that describe the range of possible scenarios and their likelihood of occurrence. According to its strategic plans till 2025, ECMWF wants to improve the resolution of ensemble forecasts from 18 km to a 5 km grid which requires a substantial increase of computation resources. Beside aiming at bettering predictive performance, researchers at ECMWF experiment with a mixture of high and low resolution ensemble forecasts to determine the optimal combination on a fix computational cost.
We perform statistical post-processing of ECMWF dual-resolution global ensemble forecasts for temperature by applying the ensemble model output statistics (EMOS; Gneiting et al., 2005) approach. As high resolution we consider the 50-member operational TCo639 ensemble of the ECMWF (18 km resolution) together with the 200-member TCo399 ensemble having a horizontal resolution of 29 km. Tests with local and semi-local (Lerch and Baran, 2017) EMOS post-processing support the existence of a superior combination of high- and low resolution forecasts, however, statistical calibration reduces the differences in verification scores.
Gneiting, T., Raftery, A. E., Westveld, A. H. and Goldman, T. (2005) Calibrated probabilistic forecasting using ensemble model output statistics and minimum CRPS estimation. Mon. Wea. Rev. 133, 1098–1118.
Lerch, S., Baran, S. (2017) Similarity-based semi-local estimation of EMOS models. J. R. Stat. Soc. Ser. C Appl. Stat. 66, 29–51.
Name of presenter: Mayer, Balázs
Authors: Mayer, Balázs
Affiliation: Eötvös Loránd University, Budapest
Title: The effect of homophily on opinion dynamics processes in social networks – agent based social simulation
I have studied the effect of homophily on opinion dynamics processes in social networks by agent based social simulation. My main hypothesis (based on the findings of Gargiulo and Gandica, 2017) was that greater opinion homophily leads to an increased chance of consensus formation. My research may be considered as a supplement to and reflection upon aforementioned study and its somewhat counterintuitive results.
Contrary to the original paper where the opinion variable had a random uniform distribution and similarity between agents was only measured in this one direction, my own growing network model considers both the well known phenomenon of preferential attachment, the homophily of agents by their demographic attributes (derived from ego-network data about the Hungarian society in the 2000s using a case-control framework) and five different (simulated and real-world) opinion distributions, according to which homophily could be tuned. For implementing demographic homophily during network creation two frameworks were used, considering its effect contemporary to and preceding the effect of opinion homophily. Former approach yields graphs capturing the scale free degree distribution of real-world social networks, latter does not but is more sociologically driven.
The resulting graphs could capture the phenomenon of more similar agents being connected with greater probability both according to their opinion and demographical attributes and networks with increased opinion homophily display greater modularity than simple preferential attachment ones. However, the networks created only considering the effects of similarity of demographic attributes did not show increased modularity. Upon analysing the opinion dynamics processes in the networks the initial hypothesis was confirmed. It seems that the consensus stimulating effect of opinion homophily does not depend greatly on the distribution of the opinion variable, neither does introducing demographic homophily change this association. The findings are quite robust: the two network generating mechanisms yield topologically very different graphs but qualitatively similar opinion dynamics processes.
Gargiulo, F., Gandica, Y. 2017. The Role of Homophily in the Emergence of Opinion Controversies. Journal of Artificial Societies and Social Simulation 20 (3) 8.
Name of presenter: Ðumić, Mateja
Authors: Mateja Ðumić, Dominik Šišejković, Rebeka Čorić, Domagoj Jakobović
Affiliation: Department of Mathematics, J.J. Strossmayer University of Osijek
Title: Evolving priority rules for resource constrained project scheduling problem with genetic programming
The resource constrained project scheduling problem (RCPSP) is combinatorial optimization problem in which the main task is to allocate limited resources to activities over time periods while optimizing one or several criteria. This problem belongs to class of NP hard problems and exact algorithms are impractical for very large problem instances. It’s hardness and use on every day basis results in numerous solving algorithms. Developing or choosing appropriate algorithm is not trivial and recently, hyper-heuristic approaches are being used for that task. Hyper-heuristic approaches incorporate machine learning techniques which can be looked from the angle of data science, specifically applied statistics.
In this paper we present one of such approaches. We used genetic programming (GP) as hyper-heuristic. GP is one of machine learning techniques and in this work, it is used for evolving priority rules. Evolution of priority rules is iterative process in which new rules are generated by specific operators that use probability in their implementation.
Evolved priority rules are tested on standard benchmark and compared with best existing priority rules. The results show that this approach is good for dynamic environment in which changes are common and good reaction to them is necessary. The advantage of this approach is automated development of suitable scheduling algorithm which is characteristic for machine learning techniques.
Keywords: Genetic programming, Resource constrained scheduling problem, Hyper-heuristics, Machine learning
- Đumić, D. Šišejković, R. Čorić, D. Jakobović, Evolving priority rules for resource constrained project scheduling problem with genetic programming, Future Generation Computer Systems 86 (2018), 211-221
- Čorić, M. Đumić, D. Jakobović, Complexity Comparison of Integer Programming and Genetic Algorithms for Resource Constrained Scheduling Problems, 40th International ICT Convention – MIPRO 2017, Opatija, 2017, 1394-1400
Name of presenter: Pavković, Ana
Affiliation: Zagreb, Croatia
Title: Investment Portfolio Structure and Profitability of Croatian Insurance Companies
Insurance companies have always been strictly regulated owing to the nature of their business. As financial intermediaries, they mobilize long term savings and allocate them conservatively, guided primarily by the safety principle instead of the profitability principle. Main aim of this research is to investigate the structure of the investment portfolios of Croatian insurers and to quantify the link between asset allocation and profitability of insurance companies, firstly in the period of strict regulation, and then after the introduction of Solvency II. Econometric research includes three analyses: cluster analysis, panel data analysis and comparison of pre-Solvency II and Solvency II business results. Cluster analysis is employed for classification of insurers according to their investment strategies and its results help in predicting the changes in asset allocation that financial reregulation is expected to bring. The results of panel data analysis reveal that investing in riskier categories positively affects the business results, while the investment in debt securities does not encourage profit growth. The analysis contributes to the existing empirical research on the asset allocation-profitability-nexus and is valuable for assessing the impact of Solvency II regulations.
Keywords: asset allocation, cluster analysis, financial regulation, insurance, panel data model
Name of presenter: Džidić, Ante, PhD
Affiliation: Zagreb/ Mostar, Croatia
Title: Impact of Capital Market Development on Dividend Policy
The present study is designed to examine the relationship between the capital market development and the importance of dividend policy using multiple linear regression. The importance of dividend policy is presented by the concept of smoothing dividends, while the capital market development is measured by the size and liquidity of the market, the level of investor protection and the use of traditional sources of financing, bank loans. The use of dividend smoothing concept as a dependent variable is justified if the unwillingness to reduce dividends is proven to be global phenomenon. In this respect, after analyzing the relationship between capital market development and dividend policy, the sensitivity of dividends to earnings changes was examined using correlation analysis, fixed effects regression model and multinomial logistic regression model. Finally, the degree of dividend sensitivity was compared among two models of classification, the first being the distinction between developed and developing countries, while the second refers to the distinction between the legal families. The results of the first part of empirical research show that the importance of dividends increases with the development of the capital market and the level of investor protection, and decreases with the higher dependence on bank loans. The results of the second part of research show that current earnings are significant dividend factor in all sample countries and that the likelihood of increasing or retaining dividends per share is greater than the probability of reducing or cutting dividends irrespective of earnings direction. Finally, when it comes to classifications of developed and developing countries, and distinction between legal families, the results of the research show that the probability of cutting dividends is lower in developed countries, especially in those countries whose laws originate from common law tradition.
Keywords: dividends, regression analysis, multinomial logistic regression, fixed effects model, dividend smoothing
Name of presenter: Cugmas, Marjan
Authors: Marjan Cugmas, Aleš Žiberna, Anuška Ferligoj
Affiliation: Faculty of Social Sciences, University of Ljubljana, Slovenia
Title: The emergence of the global network structure in kindergarten: A simulation approach
The analysis of the interactional networks, collected among pre-school children (Head Start preschools, United States, data collected in 2004—2006) will be presented in order to show that the proposed symmetric core-cohesive blockmodel type can appear in such empirical networks.
A blockmodel is a network where the units are clusters of units from the studied network. The term block refers to a submatrix showing the links between two clusters. The symmetric core-cohesive blockmodel consists of three or more clusters. The units from each cluster are internally well linked while those from different clusters are ideally not linked to each other. The exception are the units from so called “core cluster”. These units have a mutual links to all the units in the network. The other clusters are called “cohesive clusters”.
After the analysis of the empirical networks, the presentation will address the main research question, which is whether the proposed blockmodel type can emerge as a consequence of the selected social mechanisms. The characteristics of the units are not considered. The social network mechanisms that will be considered, i.e., popularity, transitivity, mutuality and assortativity, have been extensively studied in not only interactional networks among pre-school, but also in many other types of networks. The Monte Carlo simulations are used to answer the main research question.
Name of presenter: Manevski, Damjan
Authors: Damjan Manevski
Affiliation: Faculty of Medicine, University of Ljubljana
Title: Confidence intervals for the Mann-Whitney test
The Mann-Whitney test is a commonly used non-parametric alternative of the t-test. Despite its frequent use, it is only rarely accompanied with confidence intervals of an effect size. If reported, the effect size is usually measured with the difference of medians or the shift of the two distribution locations. Neither of these two measures directly coincides with the test statistic of the Mann-Whitney test, so the interpretation of the test results and the confidence intervals may be importantly different.
In this talk, we will focus on the probability that the value of the random variable X is lower than the random variable Y. This measure is often referred to as the degree of overlap or the probabilistic index, its estimator is in one-to-one relationship with the Mann-Whitney test statistic. The measure also equals the area under the ROC curve. Several methods have been proposed for the construction of the confidence interval for this measure. Here, we will review the most promising ones and explain their ideas. We will show some properties of the different variance estimators and the small sample problems of the confidence intervals construction. We will identify scenarios in which the existing approaches yield inadequate coverage probabilities. We will conclude that the DeLong variance estimator is a reliable option regardless of the scenario, but the intervals should be constructed using the logit scale to avoid values above 1 or below 0 and the poor coverage probability that follows. A correction is needed for the case when all values from one group are smaller than the values of the other. We will propose a method that improves the coverage probability also in these cases.
Name of presenter: Zalokar, Ana
Authors: Ana Zalokar, Mihael Perman
Affiliation: University of Primorska, Slovenia
Title: Optimal switching among hedging strategies
Equity-linked insurance policies are one of the most widespread insurance products. In many cases such contracts have guarantees like a minimum return over the lifetime of the policy. Liabilities arising from such guarantees must be hedged by suitable investments. There are restrictions on hedging strategies in many jurisdictions but with the more flexible regulatory framework of Solvency 2 there are alternative ways to hedge certain guaranteed products using derivative securities. In this talk we investigate when it is optimal to switch to hedging liabilities with derivative securities in the framework of the Cox-Ross-Rubinstein model. This leads to optimal stopping problems that can be solved explicitly. Mortality is also incorporated in the model. The results may indicate the level of reserves necessary to meet obligations with the desired level of confidence. In particular the strategy may be applicable in adverse market conditions.
Name of presenter: Hosszejni, Darjus
Authors: Darjus Hosszejni, Gregor Kastner
Affiliation: Institute for Statistics and Mathematics, WU Viena
Title: Efficient Bayesian Inference for the Stochastic Volatility Model with Leverage
The sampling efficiency of MCMC methods in Bayesian inference for stochastic volatility (SV) models is known to highly depend on the actual parameter values, and the effectiveness of samplers based on different parameterizations differs significantly. We extend an existing sampling method for the practically highly relevant SV model with leverage where the return process and innovations of the volatility process are allowed to correlate. We derive a novel sampler for the non-centered parameterization of this model. Moreover, based on the idea of ancillarity-sufficiency interweaving, we combine the resulting samplers in the hope of achieving superior sampling efficiency, irrespectively of the baseline parameterization. The method is implemented using R and C++, with the help of Rcpp for easy interfacing between the two languages.
Name of presenter: Kivaranovic, Danijel
Authors: Danijel Kivaranovic and Hannes Leeb
Affiliation: Department of Statistics and Operations Research, University of Vienna
Title: Expected length of post-model-selection confidence intervals conditional on polyhedral constraints
Valid inference after model selection is currently a very active area of research. The polyhedral method, pioneered by Lee et al. (2016), allows for valid inference after model selection if the model selection event can be described by polyhedral constraints. In that reference, the method is exemplified by constructing two valid confidence intervals when the Lasso estimator is used to select a model. We here study the expected length of these intervals. For one of these confidence intervals, that is easier to compute, we find that its expected length is always infinite. For the other of these confidence intervals, whose computation is more demanding, we give a necessary and sufficient condition for its expected length to be infinite. In simulations, we find that this condition is typically satisfied.
Name of presenter: Posch, Konstantin
Authors: Konstantin Posch
Affiliation: Alpen-Adria-University, Department of Statistics
Title: Hyperspectral Deep Learning for Fruit and Vegetable Recognition and Bayesian Deep Learning to Accurately Determine Model Uncertainty
Theoretical fundament of this presentation are convolutional neural networks as specific deep learning models for image classification. Two related topics will be presented. Reliable automatic classification of fruits and vegetables is of great interest for a wide and ever-increasing range of applications. However, due to the similarity of the classes in both shape and colour the task is considered as difficult and even state of the art deep neural networks are often still not accurate enough. It will be shown how the increased spectral resolution of hyperspectral cameras in comparison to classical RGB cameras can be used to train more accurate models. Deep learning has two major drawbacks. On the one hand, deep neural networks require a huge amount of training data, otherwise they tend to over fit, on the other hand only point estimates are computed. The absence of model uncertainty information through merely computing point estimates makes deep learning of limited use for many fields of application, such as medicine. Both problems are well addressed by using Bayesian statistics. A new method for training deep nets in a Bayesian way will be presented.
Name of presenter: Azzolina, Danila
Authors: Danila Azzolina
Affiliation: Department of Cardiac, Thoracic and Vascular Sciences, University of Padova
Title: A Bayesian sample size estimation procedure based on a B-Spline semiparametric elicitation method
Sample size determination is a prerequisite for a clinical trial and binomial sample size determination is arguably the most common design situation faced by trialists.
A Bayesian approach to sample size definition uses prior information about the binomial parameter rather than a point estimate, and fully accounts for the uncertainty in the predicted data, thus offering an attractive alternative to the frequentist formulae, especially when the recruitment difficulties are known beforehand.
In some circumstances, there may be a little objective evidence available to build a prior and summarizing experts’ opinions may be indispensable (Spiegelhalter, 2004).
Elicitation is the process of formulating a person’s knowledge and beliefs about one or more unknown quantities of interest into a probability distribution for those quantities (Garthwaite et al, 2005).
We investigate the binomial sample size problem using generalized versions of the Average Length and Average Coverage Criteria, as well as the Worst Outcome Criterion. The original approach, proposed by Joseph (Joseph, 1997), is parametric and based on Beta priors.
In this theoretical framework we propose a more flexible approach for binary data leading to consider not only parametric solutions (Beta priors) but also semi-parametric priors based on B-splines, used in the elicitation process of experts’ opinions (Bornkamp, 2009).
Name of presenter: Giudici, Fabiola
Authors: Fabiola Giudici
Affiliation: Biostatistics Unit, Department of Medical, Surgical & Health Sciences, University of Trieste, Italy
Title: eEF1A2 Protein Expression In Triple Negative Breast Cancer: Potential Role As Negative Prognostic Biomarker
Background: Eukaryotic elongation factor 1 alpha 2 (eEF1A2) is a translation factor selectively expressed by heart, skeletal muscle, nervous system and some specialized cells. Its ectopic expression relates with tumour genesis in several types of human cancer. No data are available about the role of eEF1A2 in Triple Negative Breast Cancers (TNBC). This study investigated the relation between eEF1A2 protein levels and the prognosis of TNBC.
Methods: A total of 84 TNBC diagnosed in the period 2002-2011 were included in the study. eEF1A2 protein level was measured by immunohistochemistry (IHC) in a semiquantitative manner -sum of the percentage of positive cells x staining intensity–on a scale from 0 to 300 (H-score). To estimate the interobserver variability regarding the IHC scoring, each sample was evaluated by two blinded pathologists: inter-rater reliability for the H-score was assessed by means of intraclass correlation coefficient (ICC)and Bland Altman analysis. A time-dependent ROC curve analysis was performed for determining the prognostic accuracy of eEF1A2 and used to identify a possible ‘‘optimal’’ immunohistochemical cut-off score. The association between H-score and clinical-pathological factors (age, type of surgery, pathologic tumour stage (pT), pathologic nodal stage (pN), grading (G), Ki67, p53, recurrences and death) were evaluated through Mann-Whitney test or Kruskal-Wallis test, as appropriate. The association of clinical-pathological and molecular factors with the two time-to-event end-points, disease free survival (DFS) and breast cancer specific survival (BCCS) was analyzed separately using Cox proportional hazards regression. Statistically significant variables at 10% level at univariate analysis, were selected as candidate prognostic factors for multivariate Cox analysis.
Results: Median age was 59 years (range, 28–78) and the median follow-up period was 9.05 years (range, 5.55–15.35). From inter-rater reliability analysis, H-score resulted reproducible (ICC=0.72, 95% CI: 0.60-0.80).eEF1A2 was not found prognostic factor using H-score’s cut-offs determined by time-dependent ROC method: since it has not been possible categorized women in two groups respect to an optimal cut-off, we analyzed the prognostic role of eEFA1 using continuous score. Increased values of eEF1A2 were associated with elder age at diagnosis (p=0.003), and androgen receptors positivity (p=0.002). At univariate Cox analysis, eEF1A2 levels did not significantly associate with DFS and BCSS (p=0.11 and p=0.08, respectively). However, adjusting for stage of disease, elevated values of eEF1A2 associated with poor prognosis (HR=1.05, 95% CI: 1.01-1.11, p=0.04 and HR=1.07, 95% CI: 1.01-1.14, p=0.03 for DFS and BCSS, respectively).
Our data suggest a negative prognostic role of high eEF1A2 protein levels in TNBC. Further studies using a larger independent cohort of TNBCs are necessary to confirm the prognostic value of eEF1A2 protein as marker before implementation in clinical practice. Certainly, the measure of protein level is very interesting because immunohistochemistry is the main technique available in most pathology labs to evaluate a marker. However, we do not exclude the evaluation of mRNA expression levels in formalin-fixed, paraffin-embedded tissue could contribute to clarify the potency of eEF1A2 as prognostic biomarker in TNBC and possibly as therapeutic target.
Keywords: Triple negative breast cancer; Eukaryotic elongation factor 1 alpha-2 (eEF1A2); Immunohistochemistry; BCSS; DFS
Name of presenter: Tarekegn, Adane Nega
Authors: Adane Nega Tarekegn, Alemu Kumlachew Tegegne (Faculty of Computing, Bahir Dar University, Ethiopia)
Affiliation: Department of Mathematics, University of Torino, Italy
Title: Combining Data Mining with Intelligent systems: Application on healthcare
Intelligent system is a part of artificial intelligence (AI) techniques that can be applied in healthcare to solve complex medical problems. Case-based reasoning (CBR) and rule based reasoning (RBR) are the two more popular AI techniques which can be used in medical applications. Both techniques deal with medical data and domain knowledge in diagnosing patient conditions. This paper proposes an intelligent system that uses data mining technique as a tool for knowledge acquisition process. Data Mining solves the knowledge acquisition problem of rule based reasoning by supplying extracted knowledge to rule based reasoning system. We use WEKA for model construction and evaluation, Java NetBeans for integrating data mining results with rule based reasoning and Prolog for knowledge representation. To select the best model for disease diagnosis, four experiments were carried out using J48, BFTree, JRIP and PART. The PART classification algorithm is selected as best classification algorithm and the rules generated from the PART classifier are used for the development of knowledge base of intelligent system. In this study, the proposed system measured an accuracy of 87.5% and usability of 89.2%.
Keywords: intelligent systems, Data mining, Rule based reasoning; Case based reasoning, knowledge acquisition.