
|
Features of STATISTICA Base
STATISTICA Base (a stand-alone product) - offers a comprehensive set of essential statistics in a user-friendly package and all the performance, power, and ease of use of the STATISTICA technology.
STATISTICA Base is compatible with Windows 95, Windows 98, Windows NT, Windows 2000, Windows XP, Windows Me. It features the following modules:
All STATISTICA Graphics Tools
Descriptive statistics, breakdowns, and exploratory data analysis
Correlations
Basic Statistics from Results Spreadsheets (Tables)
Interactive Probability Calculator
T-Tests (and other tests of group differences)
Frequency Tables, Crosstabulation Tables, Stub-and-Banner Tables,
Multiple Response Analysis
Multiple Regression Methods
Nonparametric Statistics
ANOVA/MANOVA
Distribution Fitting
![]()
BASIC STATISTICS FROM RESULTS SPREADSHEETS (TABLES). STATISTICA is a single integrated analysis system that presents
all numerical results in spreadsheet tables that are suitable (without any further modification) for input
into subsequent analyses. Thus,
basic statistics (or any other statistical analysis) can be computed for results tables from previous analyses; for example, you
could very quickly compute a table of means for 2000 variables, and next use this table as an input data file to further analyze the
distribution of those means across the variables. Thus, basic statistics are available at any time during your analyses, and can be
applied to any results spreadsheet.
Block Statistics. In addition to the detailed descriptive statistics that can be computed for every spreadsheet, you can also highlight blocks of numbers in any spreadsheet, and produce basic descriptive statistics or graphs for the respective subset of numbers only. For example, suppose you computed a results spreadsheet with measures of central tendency for 2000 variables (e.g., with Means, Modes, and Medians, Geometric Means, and Harmonic Means); you could highlight a block of, for example, 200 variables and the Means and Medians, and then in a single operation produce a multiple line graph of those two measures across the subset of 200 variables. Statistical analysis by blocks can be performed by row or by column; for example, you could also compute a multiple line graph for a subset of variables across the different measures of central tendency. To summarize, the block statistics facilities allow you to produce statistics and statistical graphs from values in arbitrarily selected (highlighted) blocks of values in the current data spreadsheet or output Spreadsheet.
|
INTERACTIVE PROBABILITY CALCULATOR. A flexible, interactive Probability Calculator is accessible from all toolbars. It
features a wide selection of distributions (including Beta, Cauchy, Chi-square, Exponential, Extreme value, F, Gamma, Laplace,
Lognormal, Logistic, Pareto, Rayleigh, t (Student), Weibull, and Z (Normal)); interactively (in-place) updated graphs built into
the dialog (a plot of the density and distribution functions) allow the user to visually explore distributions taking advantage of
the flexible STATISTICA Smart MicroScrolls which allow the user to advance either the last significant digit (press the
LEFT-mouse-button) or next to the last significant digit (press the RIGHT-mouse-button). Facilities are provided for generating
customizable, compound graphs of distributions with requested cutoff areas. Thus, this calculator allows you to interactively
explore the distributions (e.g., the respective probabilities depending on shape parameters).
|
t-TESTS and Other Tests of Group Differences. T-tests for dependent and independent samples, as well as single
samples (testing means against user-specified constants) can be computed, multivariate Hotelling's T 2 tests are also available
(see also ANOVA/MANOVA, and GLM (General Linear Models) offered in STATISTICA Advanced Linear/Non-Linear Models. Flexible options are provided to allow comparisons between variables (e.g.,
treating the data in each column of the input spreadsheet as a separate sample) and coded groups (e.g., if the data includes a
categorical variable such as Gender to identify group membership for each case). As with all procedures, extensive diagnostics and
graphics options are available from the results menus. For example, for the t-test for independent samples, options are provided to
compute t-tests with separate variance estimates, Levene and Brown-Forsythe tests for homogeneity of variance, various
box-and-whisker plots, categorized histograms and probability plots, categorized scatterplots, etc. Other (more specialized) tests
of group differences are part of many modules (e.g., Nonparametrics (below), Survival Analysis (available in STATISTICA Advanced Linear/Non-Linear Models), Reliability/Item Analysis (available in STATISTICA Multivariate Exploratory Techniques).
|
![]()
FREQUENCY TABLES, CROSSTABULATION TABLES, STUB-AND-BANNER TABLES, MULTIPLE RESPONSE ANALYSIS, AND TABLES. Extensive facilities are provided to tabulate continuous, categorical, and multiple response variables, or multiple dichotomies. A
wide variety of options are offered to control the layout and format of the tables. For example, for tables involving multiple response variables or multiple dichotomies, marginal counts and percentages can be based on the total number of respondents or responses, multiple response variables can be processed in pairs, and various options are
available for counting (or ignoring) missing data. Frequency tables can also be computed based on user-defined logical selection
conditions (of any complexity, referencing any relationships between variables in the dataset) that assign cases to categories in
the table. All tables can be extensively customized to produce final (publication-quality) reports. For example, unique "multi-way
summary" tables can be produced with breakdown-style, hierarchical arrangements of factors, crosstabulation tables may report row,
column, and total percentages in each cell, long value labels can be used to describe the categories in the table, frequencies
greater than a user-defined cutoff can be highlighted in the table, etc. The program can display cumulative and relative
frequencies, Logit- and Probit-transformed frequencies, normal expected frequencies (and the Kolmogorov-Smirnov, Lilliefors, and
Shapiro-Wilks' tests), expected and residual frequencies in crosstabulations, etc. Available statistical tests for crosstabulation
tables include the Pearson, Maximum-Likelihood and Yates-corrected Chi-squares; McNemar's Chi-square, the Fisher exact test
(one- and two-tailed), Phi, and the tetrachoric r; additional available statistics include Kendall's tau (a, b), Gamma,
Spearman r, Sommer's D, uncertainty coefficients, etc.
Graphs. Graphical options include simple, categorized (multiple), and 3D histograms, cross-section histograms (for any "slices" of the one-, two-, or multi-way tables), and many other graphs including a unique "interaction plot of frequencies" that summarizes the frequencies for complex crosstabulation tables (similar to plots of means in ANOVA). Cascades of even complex (e.g., multiple categorized, or interaction) graphs can be interactively reviewed. See also the section on Block Statistics, above, and sections on Log-linear Analysis (available in STATISTICA Advanced Linear/Non-Linear Models) and Correspondence Analysis (available in STATISTICA Multivariate Exploratory Techniques).
|
![]()
MULTIPLE REGRESSION METHODS. The Multiple Regression module is a comprehensive implementation of linear regression
techniques, including simple, multiple, stepwise (forward, backward, or in blocks), hierarchical, nonlinear (including polynomial,
exponential, log, etc.), Ridge regression, with or without intercept (regression through the origin), and weighted least squares
models; additional advanced methods are provided in the General Regression Models (GRM) module (e.g., best subset regression,
multivariate stepwise regression
for multiple dependent variables, for models that may include categorical factor effects; statistical summaries for validation and
prediction samples, custom hypotheses, etc.). The Multiple Regression module will calculate a comprehensive set of statistics and
extended diagnostics including the complete regression table (with standard errors for B, Beta and intercept, R-square and adjusted
R-square for intercept and non-intercept models, and ANOVA table for the regression), part and partial correlation matrices,
correlations and covariances for regression weights, the sweep matrix (matrix inverse), the Durbin-Watson d statistic, Mahalanobis
and Cook's distances, deleted residuals, confidence intervals for predicted values, and many others.
Predicted and residual values. The extensive residual and outlier analysis features a large selection of plots, including a variety of scatterplots, histograms, normal and half-normal probability plots, detrended plots, partial correlation plots, different casewise residual and outlier plots and diagrams, and others. The scores for individual cases can be visualized via exploratory icon plots and other multidimensional graphs integrated directly with the results Spreadsheets. Residual and predicted scores can be appended to the current data file. A forecasting routine allows the user to perform what-if analyses, and to interactively compute predicted scores based on user-defined values of predictors. By-group analysis; related procedures. Extremely large regression designs can be analyzed. An option is also included to perform multiple regression analyses broken down by one or more categorical variable (multiple regression analysis by group); additional add-on procedures include a regression engine that supports models with thousands of variables, a Two-stage Least Squares regression, as well as Box-Cox and Box-Tidwell transformations with graphs. An add-on package, STATISTICA Advanced Linear/Non-Linear Models, also includes general nonlinear estimation modules (Nonlinear Estimation, Generalized Linear Models (GLZ), Generalized Additive Models (GAM), Partial Least Squares models (PLS)) that can estimate practically any user-defined nonlinear model, including Logit, Probit, and others. The add-on also includes SEPATH, the general Structural Equation Modeling and Path Analysis module, which allows the user to analyze extremely large correlations, covariances, and moment matrices (for intercept models).
|
![]()
NONPARAMETRIC STATISTICS. The Nonparametric Statistics module features a comprehensive selection of inferential and
descriptive statistics including all common tests and some special application procedures. Available statistical procedures include
the Wald-Wolfowitz runs test, Mann-Whitney
U test (with exact probabilities [instead of the Z approximations] for small samples), Kolmogorov-Smirnov tests, Wilcoxon matched
pairs test, Kruskal-Wallis ANOVA by ranks, Median test, Sign test, Friedman ANOVA by ranks, Cochran Q test, McNemar test, Kendall
coefficient of concordance, Kendall tau (b, c), Spearman rank order R, Fisher's exact test, Chi-square tests, V-square statistic,
Phi, Gamma, Sommer's d, contingency coefficients, and others. (Specialized nonparametric tests and statistics are also part of many
add-on modules, e.g., Survival Analysis, STATISTICA Process Analysis, and others.)
All (rank order) tests can handle tied ranks and apply corrections for small n or tied ranks. The program can handle extremely large
analysis designs. As in all other modules of STATISTICA, all tests are integrated with graphs (that include various
scatterplots, specialized box-and-whisker plots, line plots, histograms and many other 2D and 3D displays).
|
ANOVA/MANOVA. The ANOVA/MANOVA module includes a subset of the functionality of the General Linear Models module (part of the Advanced Linear/Non-Linear Models add-on), and can perform univariate and multivariate analysis of variance of factorial designs with or without one repeated measures variable. For more complicated linear models with categorical and continuous predictor variables, random effects, and multiple repeated measures factors you need the General Linear Models module (stepwise and best-subset options are available in the General Regression Models module). In the ANOVA/MANOVA module, you can specify all designs in the most straightforward, functional terms of actual variables and levels (not in
technical terms, e.g., by specifying matrices of dummy codes), and even less-experienced ANOVA users can analyze very complex
designs with STATISTICA. Like the General Linear Models module, ANOVA/MANOVA provides three alternative user interfaces for
specifying designs: (1) A Design Wizard, that will take you step-by-step through the process of specifying a design, (2) a simple
dialog-based user-interface that will allow you to specify designs by selecting variables, codes, levels, and any design options
from well-organized dialogs, and (3) a Syntax Editor for specifying designs and design options using keywords and a common design
syntax. Computational methods. The program will use, by default, the sigma restricted parameterization for factorial designs, and
apply the effective hypothesis approach (see Hocking, 19810) when the design is unbalanced or incomplete. Type I, II, III, and IV
hypotheses can also be computed, as can Type V and Type VI hypotheses that will perform tests consistent with the typical analyses
of fractional factorial designs in industrial and quality-improvement applications (see also the description of the Experimental
Design module). Results statistics. The ANOVA/MANOVA module is not limited in any of its computational routines for reporting
results, so the full suite of detailed analytic tools available in the General Linear Models module is also available here (please
see the detailed description of the General Linear Models module for details); results include summary ANOVA tables, univariate
and multivariate results for repeated measures factors with more than 2 levels, the Greenhouse-Geisser and Huynh-Feldt adjustments,
plots of interactions, detailed descriptive statistics, detailed residual statistics, planned and post-hoc comparisons, testing of
custom hypotheses and custom error terms, detailed diagnostic statistics and plots (e.g., histogram of within-cell residuals,
homogeneity of variance tests, plots of means versus standard deviations, etc.).
|
![]()
DISTRIBUTION FITTING. The Distribution Fitting options allow the user to compare the distribution of a variable with a
wide variety of theoretical distributions. You may fit to the data the Normal, Rectangular, Exponential, Gamma, Lognormal,
Chi-square, Weibull, Gompertz, Binomial,
Poisson, Geometric, or Bernoulli distribution.
The fit can be evaluated via the Chi-square test or the Kolmogorov-Smirnov one-sample test (the fitting parameters can be
controlled); the Lilliefors and Shapiro-Wilks' tests are also supported (see above). In addition, the fit of a particular
hypothesized distribution to the empirical distribution can be evaluated in customized histograms (standard or cumulative) with
overlaid selected functions; line and bar graphs of expected and observed frequencies, discrepancies and other results can be
produced from the output Spreadsheets.
Other distribution fitting options are available in STATISTICA Process Analysis, where the user can compute
maximum-likelihood parameter estimates for the Beta, Exponential, Extreme Value (Type I, Gumbel), Gamma, Log-Normal, Rayleigh, and
Weibull distributions.
Also included in that module are options for automatically selecting and fitting the best distribution for the data, as well as
options for general distribution fitting by moments (via Johnson and Pearson curves). User-defined 2- and 3-dimensional functions
can also be plotted and overlaid on the graphs. The functions may reference a wide variety of distributions such as the Beta,
Binomial, Cauchy, Chi-square, Exponential, Extreme value, F, Gamma, Geometric, Laplace, Logistic, Normal, Log-Normal, Pareto,
Poisson, Rayleigh, t (Student), or Weibull distribution, as well as their integrals and inverses. Additional facilities to fit
predefined or user-defined functions of practically unlimited complexity to the data are available in
Nonlinear Estimation (available in STATISTICA Advanced Linear/Non-Linear Models).
|
![[StatSoft]](../images/sssmall.gif)
2300 East 14th Street, Tulsa, OK 74104
Phone: (918) 749-1119; Fax: (918) 749-2217
e-mail: info@statsoft.com
©Copyright StatSoft, Inc., 1984-2004.
StatSoft, StatSoft logo, STATISTICA, SEWSS, SEDAS, Data Miner, SEPATH and GTrees are trademarks of StatSoft, Inc.