The Subtle Art of OSINT
Recently, I have been barraged with requests about how OSINT works and how to actually carry out the work after talking about it on Cloak & Swagger. This post is a response on the tenets of the discipline as well as a basic how to. You all can download the documents I link to here as well as go out and locate tools such as Maltego (by Paterva) and attempt to use the precepts/tools to do your own OSINT gathering and analysis.
Many of you out there who read me though may in fact do this every day though. For you guys, well, hang in there.. Maybe check out the dox I linked because you may not have seen them before.
OSINT: Open Source Intelligence
OSINT: is the acronym for Open Source Intelligence and has been gaining steady purview in the internet age due to the ease of access to all kinds of information via the net.
Open-source intelligence (OSINT) is a form of intelligence collection management that involves finding, selecting, and acquiring information from publicly available sources and analyzing it to produce actionable intelligence. In the intelligence community (IC), the term “open” refers to overt, publicly available sources (as opposed to covert or classified sources); it is not related to open-source software or public intelligence.
The use of OSINT has grown within the private sector as well as has been a mainstay of the military and the intelligence services for years. Earlier on, these sources of information that were being culled and combed through by the likes of Langley, now can be easily done by the likes of you and I with a few tools on the web or applications that you can install on your machines at home. The key though to the whole process of OSINT is that it is a subtle art that needs its other half to be of real value to anyone. That other half of the picture is “Analysis” which is key to making assessments of the data you get from the open sources you are looking at.
Today it is common to see corporations using OSINT but perhaps calling it “Competitive Intelligence” Still though, the processes are OSINT much of the time. By researching various sources online and in the media, one can gain quite a bit of intelligence on a subject and be able to extrapolate a lot about what a company, individual, group, or country is up to and maybe where they are headed. Much of this type of data gathering (harvesting) is now going on as well tied to predictive analysis engines online (such as Silo.com or basistech etc) that ostensibly can “predict future actions” as they claim. However, the base idea of OSINT is to gather open source information to then analyse to generate reports on subjects…
Such analysis can also lead to predictive behaviour analysis and forecasts. It all depends on your goals as the analyst really.
Intelligence Analysis and Bias
Before delving into tools and methods, it is important to cover the “Analysis” part of the picture. Much of the time the data that you are gathering as an OSINT analyst can be confusing or perhaps even disinformation. One must be able to weed through facts, comments, data, and others analysis (news cycle) to then take all of what you have gathered and sift it for the core data you seek. Raw data has to be parsed and you, as the analyst must judge what is true and what is not as well as decide on the weights of the sources.
A key to this is to not be biased in your thinking when performing an OSINT analysis. An example of this may be something like looking at a Fox news report and taking it at face value. As we all pretty much know, Fox is not known for their stellar reporting nor their unbiased approach to “news” However, there may in fact be core kernels of data within their reporting that might be true. At the very least, the compare and contrast model has to be used and weighed as you collect data to create a whole picture on a subject. It was the “group think” issue that got the US into trouble within intelligence circles during the Bush Presidency with regard to the WHIG (White House Iraq Group) It was a small cabal of like minded analysts under the direction of Dick Cheney, that led us quite astray on the topic of Saddam and CBRN materials.
It is important to conduct OSINT and analysis of the informatics that you get from the collection, in a broad minded way and not to get too stove piped in your thinking.. If you do, the intel that you generate will likely be incorrect.
Unravelling The Strands and Yanking
Much of the OSINT that I personally have been carrying out has been around persons of interest and not so much about governments. However, the “persons of interest” in fact may be part of a larger movement or group that could be equivalent to a government or a company in reality, so the macro and the micro are interconnected when doing this kind of work. Primarily, one has to be able to take a lot of data, sort it, mill it down, and then extrapolate the connections between people as well as motives etc.
Sometimes it is even necessary for the analyst to interact with the subjects in certain ways to confirm data. This means that the process is not a dead one, but the analyst must also be aware and able to interact with subjects as well. Think of the process overall though, as akin to being a reporter or a detective. You have to follow the clues, ask questions, and generally keep a log of everything to extrapolate from later on. It is also key that like any good detective or reporter, that you verify your sources and data.
It’s also easy to get lost in the data as well. So be aware when you are getting into the mindset of not seeing the forest for the trees so to speak…
Tools of The Trade
Much OSINT today can be gathered with something simple as a Google search. However, to leverage everything you can out of Google, one has to become adept at “Google Hacking” (i.e. key searches and strings that get you much more granular results) There are books on the subject out there you can buy, but here are some basic strings that may be of help.
site:.gov | .mil inurl:/FOUO/ filetype:pdf
site:.mil | .gov "FOUO" filetype:pdf
site:.mil | .gov FOUO filetype:pdf
site:.mil | .gov //SIGINT filetype:pdf
- Filetypes can be just about anything .xls .pdf .txt etc.
Etc etc… You get the picture. You use the defined search parameters and go right after what you want. Of course for most pentesters this is also what you would use on any given domain you are attacking to see what flaws there are or what documents are available to give you the in to their systems. In the case of something like user ID’s or screen names it becomes a matter of doing concentric Google searches for the value you want.
- Googling just a user name to start: “TNT_ON” for example
- site: alfajr.com “TNT_ON”
- “TNT_ON@hotmail.com” if you have the address
Alternatively you can also use Google alerts as well. This will perform key word searches and email you the results when the crawler locates them. This is handy when it comes right to you and you need not go searching for subjects (I have one set up for LIGATT) Thus I keep on top of things this way. All of this is probably within your repertoire already if you use Google regularly to do searches. The same types of strings apply not only to just keywords though, you can put whole sentences in (like if you were say looking into some plagiarism) Google will often spit out results where cut and pastes of articles have been put out there by others or in fact just RSS copied into feeds on other pages. By refining your searches though, you can narrow down quite a bit and winnow out the real data you want using Google.
The Wayback Machine:
Sometimes you run into searches that turn up sites that are archived online at Google (cache) but often times sites that are no longer online are in fact archived by the likes of the Wayback Machine. This site has been really helpful lately for sites that were around circa 2001 but were taken down since then by people who did not want to have their data out there any more. I recommend using this site to attempt to find the content if it is not online presently. You may in fact hit paydirt.
Social Media Search Tools:
Twitter, Facebook, Tumblr, etc are all great sources of information as people put a lot of stuff out there that they likely shouldn’t. This includes governments and companies as well. News sources also fall into this category, so the sites listed below grab all those from search engines like Google and perform key word searches then aggregate the data for you, often in graphical formats.
WHOIS and other Tools … ROBTEX
Today it is easy to attempt to obfuscate who you are if you own a domain and you don’t want people to know who really owns it. This privacy shield though sometimes is an afterthought if one at all so, one can gain a great deal of information about a target or a piece of the puzzle by looking at the domain data. Many engines and sites exist out there and I would just Google around some more for the ones you like. Some of them are meta engines and will give you a lot of relational data to boot. One such site is Robtex.
Robtex is nice because it gives you a lot of info about the domain, the IP it sits on, the domain owner data, as well as things like what other domains reside on the same server space.
Infosniper is a “geolocational” search engine for IP addresses and domains. This will give you a graphical picture of where a server resides physically. This ties into Google maps and comes in handy if you are seeking to lock down the location of a server in case say someone wants to serve a warrant on it. This becomes key in such things as terrorist investigations when jurisdiction is a matter of concern (US vs EU etc)
This is the big boy of the tool kits as far as I am concerned. Maltego by Paterva is a meta search engine and graphical/relational database tool that I use on a daily basis. Of course in some ways I am using Maltego kind of unconventionally but this, like I said, is the Swiss army knife of data collection and OSINT. With transforms being created every day, you get a plethora of data that can be sifted and winnowed down to a usable product.
I suggest anyone who wants to do OSINT get a copy of the CE client and work with it. Read the tutorials and be creative in their searches. *HINT* just by using the “phrase” search capability, you get a lot of hits that you can then focus in on. By removing data from the map that is extraneous, you can keep the data tight and not have a messy map as well. It is a process of using your brain though to delineate good from bad data, and that takes some investigation and some guess work at times.
Maltego and “Relational Mapping” One of the nice things about Maltego is that it does a “weight based” mapping of data points. This allows you to look at the map (like the one at the top of this page) and see the connections between data points (or in the case of above, users) so you can see easily who talks to who, and what data is related to other data. This is something to get used to and to leverage heavily in OSINT. Often times you are looking for “connections” between disparate data and this is a key thing in say looking at terrorists and who they talk to for instance.
Casefile is a new product by Paterva and it is a kind of “Maltego Light” in a way, however, it has one real advantage. It is really a kind of digital white board or “murder board” as you might call it (ala the police drama’s on TV) You can attach names and pictures to create “case files” on entities and I like this quite a bit. I wish though that they would port it to *nix for us people not wanting to use Micro$oft. I have yet to really play with this tool but I plan on implementing it soon to make some nifty case files that can be used in posts or sent on to clients.
Translate.google.com and other Online Translation Services
Today much of the content out there is in languages other than the one you might speak fluently. This is a problem for some even with the tools out there to translate the media for you. Google does an ok job at most languages, but when you get semantic challenges like Arabic to translate, it gets a little tricky. One has to take in what the text that comes back says in a loose way and try to interpret the meanings if the translation fails for you. The best thing though is to either speak the languages in question (unless you are a polyglot, that ain’t easy) you can rely on these tools to a certain extent.
Remember though, these tools rely on algorithms that do not usually take into account for slang and the nuances of linguistics so your mileage will vary greatly.
Paid Services for Public Information
Sometimes you have to pay for data. Yep, its true. Search out different sources online and you may be able to get public information for free from some states. However, the one stop shopper will go to a place like Intelius for data. It can be a bit pricey, but in the end it can also give you data you did not have before to use in further searches and to hone in on your target.
There Are Very Few “Schools” for This
Most of all, I wanted to let you all know that this is not something that is taught frequently. Most of the time you will only see this type of analysis and tutorials about it in the military sector under IO (information Operations) This is where I culled many documents and learned the ropes so to speak.
Much of the subtle art here is taught within the intelligence gathering units of the military or civilian services like the CIA. It is key that you pay attention to the “analysis” portion of this post as well. Analysis is the key factor here, without really paying attention and taking good notes (or making case files and maps) you will only end up with a blog of information that you may in fact misinterpret.
It is also very important that any analyst already have a good grasp of the targets that they are looking into (i.e. if you are looking at Islamic Jihad, then you need to understand the territory, the lingo, the ideals etc) unless you have a basis of knowledge to work from, you will be useless in gathering intelligence never mind actually developing analysis of what you locate.
All in all, play with the tools and footprint your targets.. Then extrapolate what you find into actionable intelligence.