This document provides an overview of NetFlow data processing in large organizations using Hadoop and Vertica. It describes the logical view of the NetFlow processing workflow, including filtering, graph properties generation, aggregation, deduplication and querying. It then discusses the implementation of this workflow using a MapReduce framework in Hadoop and storing the output in the columnar database Vertica. Finally, it provides performance hints for optimizing NetFlow data ingestion and processing in Hadoop, such as JVM tuning, sorting configuration, compression and reducer distribution.
Related topics: