Vulnerable Disclosures: Leveraging Hadoop for Security Analysis in Networks

Thursday, November 6, 2014

Leveraging Hadoop for Security Analysis in Networks

I have been trying to stay active on the blog but as you all can tell we have become somewhat busy over the past few weeks. This is to be expected when we are in a predictive mode instead of a reactive one. Earlier tonight I was showing another security engineer how we can pull various data points into our hadoop cluster and analyze things quicker and more efficiently than we can on a single computer. Over the past 5 years I have built cloud environments for many purposes some of which ended up being long term solutions and some of which went away when administrators realized that operating a cloud requires a specific set of skills and the right people in the right places. A single person can maintain a cloud computing environment with the right software management in place but the work can be consuming until you automate everything.

As part of our business model the cloud is an integral part of what we do. From collecting news and intelligence information for analysis, to running batch jobs to do task that would take a security engineer hour upon hour to complete manually, the cloud makes life easier. There are some tradeoffs when utilizing the cloud as you have to fully understand what each piece of software does and how they interact with one another to get a job to a completed state.

So why this article? Because as security engineers you can't possibly look at all the data in an enterprise. You can however build watch list in a cloud to notify you when certain content is being ingested into the system. This saves the engineers time. Here's an example. Say a new malware variant comes on the scene and you want to analyze how many machines are infected with the malware. You will need a few things to get the task done in a reasonable time frame. Doing task manually make take weeks or even months but leveraging hadoop, solr, cassandra, hive, pig, etc, etc. you can do these same task in under a day. I like to let the cloud work while I sleep. I wake up feeling somewhat productive if the task are completed when I start my work day. But let's get back to our example. In order to find that elusive new malware variant you need some things. Let me list them out.

Computing Power (CPU's) - Processing Gigabytes or Petabytes of data requires heavy usage of CPU's. By leveraging multicore CPU's in a cluster of machines you eliminate this problem.

Memory (RAM) - You have to have memory. Things are read and written much quicker in memory than on disc drives (think solid state drives here). If you can afford them put SSD's in your cloud. Your batch jobs will thank you.

Disc Space - You have to have the space to store things. If you can't store the Petabytes of data you can't analyze it either. You need a place to store your vectors, configurations, investigative files, etc.

Vectors - You have to have points of data to work with. In our malware example let's say we will use the MD5 hashes of the malware to detect it. That's a vector of identification. Once we identify it we need to process it.

Scripts, Parsers, Libraries and such - You have to have a consistent, and standardized way of doing things so your jobs are repeatable. You want predictable results without error. You will use a multitude of scripts, mapreduce jobs, indexes (to speed searches and queries), and parsers to find what your looking for.

Now that we have found the malware on the network what do we do with it. One of the most likely things you will do with a cloud is build statistics. In this case we want to build a list of infected IP addresses and domain names so we know what entity is infected so we can report on it (probably on a blog such as this one). However we don't want to sit and sift through Petabytes of data so we write our process out in mapreduce or some other language and let it work while we sleep.

Live Streaming Data - In order to find malware infections in near real time we need to have near real time data. Products such as sqoop and flume help in this regard. So we pull in things such as network pcaps, honeypot logs, malware submission reports, etc, etc.

So using all these various tools and data points we begin collecting statistics but that's not all we have to do. We want to identify the IP address or domain owners so we can notify them (just like we do when a patients information get's released to the public). We have to know whom to notify so it's imperative to identify.

Tracking - Once you have the information in hand you now need a way to track the outcome of your notifications. This is where old technology such as a pen and paper, an electronic notebook or hell maybe even one of those fancy trouble ticketing systems would work.

In order to make it in this fast moving world you have to do things quicker, more accurate and get the information out there before your competition. This is what a cloud does for us and what it could do for your organization.

Happy Hadooping!

About the author: Kevin Wetzel has been a leading researcher and cloud engineer since 2006. He has worked for various organizations to include the Department of Defense, Department of Homeland Security, various Health Care and Insurance organizations, business owners and politicians as well as private parties. Mr. Wetzel is a fan of cloud computing to make business processes run more efficiently. SLC Security Services LLC relies heavily on this type of technology in many of our services and products. Cloud computing can mean the difference between just getting the job done and getting the job done efficiently and before your competition. Kevin is a CCHA (yeah I got the certification before they changed it to CCAH), a licensed Investigator and Counterintelligence Specialist with SLC Security Services LLC. For more information on SLC Security you can visit the company website at www.slcsecurity.com.

No comments:

About SLC Security

The driving factor in us deciding to provide this service to consumers is the growing cost of cybersecurity defense and notification systems. We are providing an RSS feed of content as a public service. It is our policy to only release the full details of data breach information directly to the companies or entity that was the target of the breach or attack. If you need assistance researching the source of the breach or leak please visit SLC Security Services LLC to obtain assistance.

NOTICE: All information posted to this blog is derived from open source intelligence systems developed by SLC Security Services LLC. The OSINT-X platform is available via subscription and via a paid RSS Feed. The OSINT-X system only maintains 90 days but this timeframe may and will change without notice depending on the amount of data we are processing. We also provide a delayed RSS feed that may not contain all feed sources. The public RSS feed is on this page on the right hand side and is provided without charge. The moderators of this site are all volunteers and are not paid for their services.

If your company needs a TSCM Sweep or Vulnerability assessment feel free to contact us through the contact form on this page or call us at (717) 831-TSCM to schedule an audit.

NOTICE: Starting in January 2015 we will only discuss issues on the blog or in our feeds with the clients directly. We receive upward of 200+ calls per day requesting information. It is impossible for our volunteers to field that number of calls and still get our work done. While we would love to help every person that calls remember we are a for profit business and answering calls takes time. If we are not busy you may get in touch with us. The best approach is to email us at soc@slcsecurity.com instead of calling. Please include your name, telephone number and a brief reason for the call or communication and we will get back to you as soon as possible time permitting.

About this Page

The purpose of this page is to provide awareness to individuals and organizations that are leaking information and the information of their customers. The entities listed on this site are verified to be leaking personal information sometimes without the company even being aware. We will include information on what type of information is being leaked but we will not release the methods in which the information is being leaked unless we are under non-disclosure agreements with the organization. The information posted on this site will contain scrubbed information if we release it to protect the information source and to ensure that the person or persons being affected are not farther harmed by the disclosure of their personal information.

Before a breach is reported it is reported to the entity affected and we normally wait at least 5 days for a response. We only post disclosures whenever there have been no response by the organization or when it involves confirmed leaks or we can verify that the security issue has not been resolved by the organization. Certain items will remain on the blog if they are a major release or new information is being posted frequently concerning the incident.

We do NOT maintain data on the leaked information as we would not want to create a second incident. Reports are submitted by security researchers, patients, clients, corporations and through open source identification as well as through passive monitoring of open source systems and proprietary algorithms.

The information on this site is provided by SLC Security Services LLC a leading cyber security and investigation company located in Raleigh, NC. If your company appears on this list and you would like additional information you may contact us by mail at 2664 Timber Dr Suite 342 Garner NC 27529 or by email via the contact form available at www.slcsecurity.com or by phone at (717)831-8726.

The Stats

Reporting Stats are available upon written request.

Please report all known security issues to soc@slcsecurity.com. We will review each report manually whenever possible. Please note that not all reports will be published to the disclosure list. Also you can specifically request that the data NOT be posted during your submission.

RSS OSINT-X FEED PERMALINK
Feed Delayed 30-60 Minutes
Not all sources we monitor are in this RSS feed. This feed contains mostly news sites but does not include IRC, Darknet or File Dump site monitoring that our commercial products monitor for your organization. This feed is limited in scope. For full access you must be a customer under a service contract. If interested in a full service contract call (919)441-7353 to inquire about pricing and services available.

TWITTER FEED

Thursday, November 6, 2014

Leveraging Hadoop for Security Analysis in Networks

No comments:

Post a Comment