Oracle Statistical Functions Included in Oracle 11g Database is a compelling array of statistical functions accessible from through SQL. These include descriptive statistics, hypothesis testing, correlations analysis, test for distribution fit, cross tabs with Chi-square statistics, and analysis of variance (ANOVA). Oracle continues to expand its support for advanced analytics functionality in the database. The Oracle Advanced Analytics Option, a combination of Oracle Data Mining and Oracle R Enterprise, delivers predictive analytics, data mining, text mining, statistical analysis, advanced numerical computations and interactive graphics inside the database. It brings powerful computations to the database resulting in dramatic improvements in information discovery, scalability, security, and savings. The statistical functions in the database can be used in a variety of ways, for example, users can call Oracle's DBMS_STAT_FUNCS to obtain basic count, mean, max, min and standard deviation information for their dataset; or users can determine the strength of relationships using hypothesis testing statistics such as a t-test, f-test or ANOVA. Users are able to not only complete a wide range of statistics, but also include these results in more advanced SQL queries and analytical pipelines. Features include: - Descriptive statistics
- Hypothesis testing
- Correlations analysis (parametric and nonparametric)
- Ranking functions
- Cross Tabulations with Chi-square statistics
- Linear regression
- ANOVA
- Test Distribution fit
- Window Aggregate functions
- Statistical Aggregates
- LAG/LEAD functions
- Reporting aggregate functions
STATS_T_TEST_INDEPU Example: The following example determines the significance of the difference between the average sales to men and women where the distributions are known to have significantly different (unpooled) variances: SELECT SUBSTR(cust_income_level, 1, 22) income_level, AVG(DECODE(cust_gender, 'M', amount_sold, null)) sold_to_men, AVG(DECODE(cust_gender, 'F', amount_sold, null)) sold_to_women, STATS_T_TEST_INDEPU(cust_gender, amount_sold, 'STATISTIC', 'F') t_observed, STATS_T_TEST_INDEPU(cust_gender, amount_sold) two_sided_p_value FROM sh.customers c, sh.sales s WHERE c.cust_id = s.cust_id GROUP BY ROLLUP(cust_income_level) ORDER BY income_level, sold_to_men, sold_to_women, t_observed; INCOME_LEVEL SOLD_TO_MEN SOLD_TO_WOMEN T_OBSERVED TWO_SIDED_P_VALUE ---------------------- ----------- ------------- ---------- ----------------- A: Below 30,000 105.28349 99.4281447 -2.0542592 .039964704 B: 30,000 - 49,999 102.59651 109.829642 2.96922332 .002987742 C: 50,000 - 69,999 105.627588 110.127931 2.3496854 .018792277 D: 70,000 - 89,999 106.630299 110.47287 2.26839281 .023307831 E: 90,000 - 109,999 103.396741 101.610416 -1.2603509 .207545662 F: 110,000 - 129,999 106.76476 105.981312 -.60580011 .544648553 G: 130,000 - 149,999 108.877532 107.31377 -.85219781 .394107755 H: 150,000 - 169,999 110.987258 107.152191 -1.9451486 .051762624 I: 170,000 - 189,999 102.808238 107.43556 2.14966921 .031587875 J: 190,000 - 249,999 108.040564 115.343356 2.54749867 .010854966 K: 250,000 - 299,999 112.377993 108.196097 -1.4115514 .158091676 L: 300,000 and above 120.970235 112.216342 -2.0726194 .038225611 107.121845 113.80441 .689462437 .490595765 106.663769 107.276386 1.07853782 .280794207 14 rows selected. (See link below to SQL Language Reference for STATS_T_TEST_*) Unlike statistical software vendors, which charge separate license fees for these statistical capabilities, Oracle 11g Database includes this wide feature set with the database license. As such, users not only can reduce the license cost to perform equivalent functionality, but can also eliminate the movement and staging of data to external systems to perform statistical analysis. Eliminating data movement and staging increases security since copies of the data need not reside in insecure environment. Eliminating data movement and staging decreases time to results since the overhead associated with moving the data is eliminated. Because the statistical results are immediately available through SQL and integrated in Oracle Database, statistical results can be immediately used across the Oracle stack - unleashing many more opportunities to leverage your results in spontaneous and unexpected ways. Additionally, Oracle R Enterprise exposes the SQL based statistical functions through the open source R statistical programming language and pushes down computations to the database for significant performance gains. Oracle Data Mining also leverages the SQL based statistical functions through the Oracle Data Miner graphical user interface (GUI) extension to SQL Developer 3.1. | |