We gathered all kind of security data including vulnerabilities, IP block lists, and ports. Software reporting files of all kinds. Data has been formatted and placed in our datalake.
Datalake was massive, with 2 million records added every day. It was an expensive operation, but it required real-time analysis and scanning. We used that information to check for and detect intruders.
Apache Spark and Hadoop are two of the most popular solutions for creating and managing data lakes. A data lake is a centralized repository that can hold all of your structured and unstructured data at any scale.
You can store your data in its raw form without first structuring it. And use many sorts of analytics to lead smarter decisions, such as dashboards and visualizations, big data processing, real-time analytics, and machine learning.