12c database logo

Oracle Statistical Functions


Included in Oracle 12c Database is a compelling array of statistical functions accessible from through SQL. These include descriptive statistics, hypothesis testing, correlations analysis, test for distribution fit, cross tabs with Chi-square statistics, and analysis of variance (ANOVA). Oracle continues to expand its support for advanced analytics functionality in the database. The Oracle Advanced Analytics Option, a combination of Oracle Data Mining and Oracle R Enterprise, delivers predictive analytics, data mining, text mining, statistical analysis, advanced numerical computations and interactive graphics inside the database. It brings powerful computations to the database resulting in dramatic improvements in information discovery, scalability, security, and savings.

The statistical functions in the database can be used in a variety of ways, for example, users can call Oracle's DBMS_STAT_FUNCS to obtain basic count, mean, max, min and standard deviation information for their dataset; or users can determine the strength of relationships using hypothesis testing statistics such as a t-test, f-test or ANOVA. Users are able to not only complete a wide range of statistics, but also include these results in more advanced SQL queries and analytical pipelines. Features include:
  • Descriptive statistics
  • Hypothesis testing
  • Correlations analysis (parametric and nonparametric)
  • Ranking functions
  • Cross Tabulations with Chi-square statistics
  • Linear regression
  • ANOVA
  • Test Distribution fit
  • Window Aggregate functions
  • Statistical Aggregates
  • LAG/LEAD functions
  • Reporting aggregate functions

STATS_T_TEST_INDEPU Example: The following example determines the significance of the difference between the average sales to men and women where the distributions are known to have significantly different (unpooled) variances:

SELECT SUBSTR(cust_income_level, 1, 22) income_level,
    AVG(DECODE(cust_gender, 'M', amount_sold, null)) sold_to_men,
    AVG(DECODE(cust_gender, 'F', amount_sold, null)) sold_to_women,
    STATS_T_TEST_INDEPU(cust_gender, amount_sold, 'STATISTIC', 'F') t_observed,
    STATS_T_TEST_INDEPU(cust_gender, amount_sold) two_sided_p_value
  FROM sh.customers c, sh.sales s WHERE c.cust_id = s.cust_id
  GROUP BY ROLLUP(cust_income_level)
  ORDER BY income_level, sold_to_men, sold_to_women, t_observed;

INCOME_LEVEL           SOLD_TO_MEN SOLD_TO_WOMEN T_OBSERVED TWO_SIDED_P_VALUE
---------------------- ----------- ------------- ---------- -----------------
A: Below 30,000          105.28349    99.4281447 -2.0542592        .039964704
B: 30,000 - 49,999       102.59651    109.829642 2.96922332        .002987742
C: 50,000 - 69,999      105.627588    110.127931  2.3496854        .018792277
D: 70,000 - 89,999      106.630299    110.47287  2.26839281        .023307831
E: 90,000 - 109,999     103.396741    101.610416 -1.2603509        .207545662
F: 110,000 - 129,999     106.76476    105.981312 -.60580011        .544648553
G: 130,000 - 149,999    108.877532    107.31377  -.85219781        .394107755
H: 150,000 - 169,999    110.987258    107.152191 -1.9451486        .051762624
I: 170,000 - 189,999    102.808238    107.43556  2.14966921        .031587875
J: 190,000 - 249,999    108.040564    115.343356 2.54749867        .010854966
K: 250,000 - 299,999    112.377993    108.196097 -1.4115514        .158091676
L: 300,000 and above    120.970235    112.216342 -2.0726194        .038225611
                        107.121845    113.80441  .689462437        .490595765
                        106.663769    107.276386 1.07853782        .280794207
14 rows selected.

(See link below to SQL Language Reference for STATS_T_TEST_*)

Unlike statistical software vendors, which charge separate license fees for these statistical capabilities, Oracle 11g Database includes this wide feature set with the database license. As such, users not only can reduce the license cost to perform equivalent functionality, but can also eliminate the movement and staging of data to external systems to perform statistical analysis. Eliminating data movement and staging increases security since copies of the data need not reside in insecure environment. Eliminating data movement and staging decreases time to results since the overhead associated with moving the data is eliminated. Because the statistical results are immediately available through SQL and integrated in Oracle Database, statistical results can be immediately used across the Oracle stack - unleashing many more opportunities to leverage your results in spontaneous and unexpected ways.

Additionally, Oracle R Enterprise exposes the SQL based statistical functions through the open source R statistical programming language and pushes down computations to the database for significant performance gains. Oracle Data Mining also leverages the SQL based statistical functions through the Oracle Data Miner graphical user interface (GUI) extension to SQL Developer 3.1.

Technical Information

red-arrow-box Oracle Database Data Warehousing Guide—SQL Analysis and Reporting Documentation
red-arrow-box Oracle Database SQL Language Reference—T-test Statistical Functions
red-arrow-box Oracle Statistical Functions Overview - Presentation (PDF, 0.6MB)

OTN Home Pages

red-arrow-box Oracle R Enterprise OTN Home red-arrow-box Oracle Data Mining OTN Home

Data Sheets

red-arrow-box Oracle R Enterprise Component Sheet red-arrow-box Oracle Data Mining Component Sheet

Blogs

red-arrow-box Oracle R Enterprise Blog red-arrow-box Oracle Data Mining Blog

Discussion Forums

red-arrow-box Oracle R Enterprise Discussion Forum red-arrow-box Oracle Data Mining Discussion Forum

 Oracle Advanced Analytics Option
 Oracle BI Home
red-arrow-box Oracle Database BI Features
 Oracle Exalytics In-Memory Machine
red-arrow-box Oracle BI DW Technology Center
red-arrow-box SQL Developer
red-arrow-box Oracle OLAP
Oracle Open World 2014 Banner

In-Memory Replay Banner