User Tools

Site Tools


Sidebar

**HPL/SQL is included to Apache Hive since version 2.0** * [[home|Home]]\\ * [[why|Why HPL/SQL]]\\ * [[features|Key Features]]\\ * [[start|Get Started]]\\ * [[doc|HPL/SQL Reference]]\\ * [[download|Download]]\\ * [[new|What's New]]\\ * [[about|About]]

summary

====== SUMMARY Statement ======= The SUMMARY statement outputs the summary statistics for a table or result set. For each column it includes the data type, number of distinct values, non-NULL rows, mean, median, standard deviation, 5%, 25%, 75% and 95% percentiles, min and max values. The statement helps you perform quick and easy exploratory data analysis. Syntax: <code language=sql> SUMMARY [TOP num] FOR table_name [WHERE condition] [LIMIT num] | select_statement; </code> **Examples** Summary for a table: <code language=sql> summary for src; </code> <code> Column Type Rows NonNull Unique Avg Min Max StdDev p05 p25 p50 p75 p95 KEY string 500 500 309 260.18 0 98 143.07 26.00 146.00 255.50 395.00 479.00 VALUE string 500 500 309 null val_0 val_98 null null null null null null </code> Summary for a query result: <code language=sql> summary for select code, total_emp, salary from sample_07; </code> <code> Column Type Rows NonNull Unique Avg Min Max StdDev p05 p25 p50 p75 p95 code string 823 823 823 null 00-0000 53-7199 null null null null null null total_emp int 823 823 806 489748.24 340 134354250 4858790.94 4054.50 17270.00 49335.00 162662.50 1238941.00 salary int 823 819 759 47963.63 16700 192780 25706.09 21860.00 30547.50 40700.00 58747.50 92025.50 </code> Top 3 values for each column in table: <code> summary top 3 for sample_07; </code> <code> CODE DESCRIPTION TOTAL_EMP SALARY 53-7199 1 Aircraft mechanics and service technicians 1 25500 2 null 4 00-0000 1 Aircraft cargo handling supervisors 1 9910 2 34220 3 11-0000 1 Agricultural workers, all other 1 112300 2 35470 3 </code> **Version:** HPL/SQL 0.3.31