Advanced Google search tips
As a student GOOGLE is your friend and so is YouTube but not as much as Google! And like any good friend they can be manipulated if you push the right buttons!
I like to think of Google as the oracle that's always there, a bit like the one in the matrix for Neo only Google is better! Ask the right questions and you will find the right answers! Have you ever thought about what Google actually is? (Apart from a web site an a search engine obviously) well It's just a massive database and knowing a few simple search tips and tricks can help you return results you wouldn't normally get.
For instance my favourite THE INDEX SEARCH the index search can take you to parts of a website on Apache servers that you wouldn’t normally be able to reach. If I do an index search on a specific PDF, MP3 or video file I can find site after site containing hundreds or thousands of MP3’s or videos etc. that I Can freely download with a simple click! Put that together with a little bit of ‘Wget’ magic and you can download huge amounts of music, videos and Ebooks in minutes. What is WGET I hear you ask, well it’s an open source command line tool for windows, Linux and UNIX systems originally designed for backing up websites but by adding a few simple commands you can target specific file types stored on a web server such as mp3’s and pdf’s.
Now this all sounds great doesn't it? But there's may be a few legal technicalities with using these methods to obtain data that I feel I should point out. Using index searches on Google is perfectly fine and legal and it's not any form of hacking as some naive wannabes might call it and neither is using Wget for downloading data from a website. (Although there may be some grey areas on that last bit)
From what I can gather Google are aware of index searching and have done nothing to prevent it so obviously don't think it's that much of a concern or even an issue. But downloading the data found on certain sites and servers maybe illegal unless the site states otherwise so don't say you haven't been warned!!;)
So if you're ready to become a search engine Jedi, keep reading.
First of all I just want to say that this isn't some tutorial on how to hack websites, servers or Google so if that's what you're into then you're in the wrong place! All these tips are widely known by many and can be found all over the internet in fact that's where I got them from;)
Okay so, search engine Jedi might be a bit of an exaggeration but knowing some simple trickery will defiantly help you maximise the potential of finding exactly what you are looking for on the net without ever having to leave the first Google results page.
Using Boolean operators and some other simple witchcraft
Boolean operators can be used to filter out key words in a Goole search result (If you’re not sure what Boolean operators are click here if you’re not bothered keep reading, it’s not important)
Phrase search ("") By putting double quotes around a set of words, you are telling Google to consider the exact words in that exact order without any change. Google already uses the order and the fact that the words are together as a very strong signal and will stray from it only for a good reason, so quotes are usually unnecessary. By insisting on phrase search you might be missing good results accidentally. For example, a search for [ "Alexander Bell" ] (with quotes) will miss the pages that refer to Alexander G. Bell.
Search single word exactly as is ("") Google employs synonyms automatically, so that it finds pages that mention, for example, childcare for the query[ child care ] (with a space), or California history for the query [ ca history ]. But sometimes Google helps out a little too much and gives you a synonym when you don't really want it. By putting double quotes around a singleword, you are telling Google to match that word precisely as you typed it.
Search within a specific website (site:) Google allows you to specify that your search results must come from a given website. For example, the query[ iraq site:nytimes.com ] will return pages about Iraq but only from nytimes.com. The simpler queries[ iraq nytimes.com ] or [ iraq New York Times ] will usually be just as good, though they might return results from other sites that mention the New York Times. You can also specify a whole class of sites, for example[ iraq site:.gov ] will return results only from a .gov domain and [ iraq site:.iq ] will return results only from Iraqi sites.
Terms you want to exclude (-) Attaching a minus sign immediately before a word indicates that you do not want pages that contain this word to appear in your results. The minus sign should appear immediately before the word and should be preceded with a space. For example, in the query[ anti-virus software ], the minus sign is used as a hyphen and will not be interpreted as an exclusion symbol; whereas the query [ anti-virus -software ] will search for the words 'anti-virus' but exclude references to software. You can exclude as many words as you want by using the - sign in front of all of them, for example[ jaguar -cars -football -os ]. The - sign can be used to exclude more than just words. For example, place a hyphen before the 'site:' operator (without a space) to exclude a specific site from your search results.
Fill in the blanks (*) The *, or wildcard, is a little-known feature that can be very powerful. If you include * within a query, it tells Google to try to treat the star as a placeholder for any unknown term(s) and then find the best matches. For example, the search [ Google * ] will give you results about many of Google's products (go to next page and next page -- we have many products). The query[ Obama voted * on the * bill ] will give you stories about different votes on different bills. Note that the * operator works only on whole words, not parts of words.
The OR operator Google's default behaviour is to consider all the words in a search. If you want to specifically allow either one of several words, you can use the OR operator (note that you have to type 'OR' in ALL CAPS). For example,[ San Francisco Giants 2004 OR 2005 ] will give you results about either one of these years, whereas[ San Francisco Giants 2004 2005 ] (without the OR) will show pages that include both years on the same page. The symbol | can be substituted for OR. (The AND operator, by the way, is the default, so it is not needed.)
Okay now lets put all that together!
The Index search and some more Google black magic
In this next section we will learn how to put all this together to do some pretty cool things with Google searches.
By combining some advanced search operators you can find specific files types such as Mp3’s and PDFs in an Apache web server directory allowing you to turn Google into ultimate source for music, videos and ebooks.
Go ahead and give it a try, copy and paste the string below in to the Google search box at the top of this page and hit search!
-inurl:(htm|html|php) intitle:"index of" +"last modified" +"parent directory" +description +size +(wma|mp3) "Nirvana"
Now open one of the search results and you should see a directory listing like the one below, now just double click any of the mp3 file files and the file should start playing in a new window, to save the file just right click and select - save target as!
In the example above if you look at the string of operators you’ll notice we just searched for the group "Nirvana" just substitute this for any artist or track you like and see what you can come up with.
There are also other variations of these search strings which I have listed below and by experimenting with different file names and file formats you can find much more than just Mp3s!
-inurl:htm -inurl:html intitle:"index of" name mp3
-inurl:htm -inurl:html intitle:"index of" +("/ebooks"|"/book") +(chm|pdf|zip)
-inurl:htm -inurl:html intitle:"index of" "Last modified" comics +(cbr|pdf|zip)
?intitle:index.of? avi name
In the next section we will look at how we can download a complete directory with a few simple terminal commands using a small tool called Wget.
The magic of Wget
Okay now for the interesting stuff, letts breakout out a little old school UNIX command line tool for ripping files off web servers called Wget. You can download Wget for windows from HERE or for mac from HERE.
Installing Wget: Once you download the Zip folder extract it to your desktop or somewhere convenient, you should now have a folder called wget-1.10.2b if you downloaded the latest stable version. Navigate to the C: drive and locate the folder called Program Files or program Files (x86) if on a 64bit system. Now drag the whole folder called wget-1.10.2b to the Program files folder.
Next we need to set the path variable so windows knows where to find Wget when we call it from the command line.
- Open the program files folder
- Open wget-1.10.2b folder
- Right click the Wget application
- Select properties
- In the area where it says location on the left you should see something like (C:\Program Files (x86)\wget-1.10.2b) highlight the location and right click and select copy.
- Now click the windows start menu
- Right click the Computer option
- Select properties
- Select advanced system settings
- Under the Advanced tab select environment variables (bottom right)
- Click new under 'user variables'
- For Variable name enter 'Path'
- And under variable value past the Wget address (C:\Program Files (x86)\wget-1.10.2b)
- Hit ok to exit all the dialog boxes
- Hit the windows key + "R" to lance the run command
- Type "cmd" and hit enter
- Once presented with the command prompt Type "wget" and hit enter!
Now to see if it works!
Now if everything when according to plan your command prompt should say something like the message below!
If you got this message then good times await! If not then follow the steps outlined above again!
Okay so now before we start unleashing the power of Wget we need to locate some file we are interested in. So right now I have a college assignment on assembly language and what I know about assembly language you could fit on the back of a postage stamp so clearly I need to find some quality information to enlighten myself on the subject and I’ve found just the thing here using an index search.
Now this site has like 25 PDFs on the art of assembly language that I think might be quite useful. Now I could just download each PDF one by one but who has the time for that?! Not me!
Now the methods used here can be modified to grab any file type you like as long as you know where they’re located but for this tutorial we will target these PDFs.
Okay so copy this url: http://flint.cs.yale.edu/cs422/doc/art-of-asm/pdf/ and follow the steps bellow.
- Hit the windows + 'R' to launch the run utility
- Type "cmd" and hit enter
- Now type 'dir' and hit enter
- Type 'cd Desktop' hit enter
- Type 'mkdir pdfs' hit enter
- Type 'cd pdfs' hit enter
Okay so if you're not sure what we just did there, basically we just created a new folder on the desktop called pdfs to store out assembly language swag! To learn more about using the windows command line check out this site here
Now type this command wget –r -a.pdf and right click in past in the URL you copied earlier. You command should look like the one below!
Wget -r -a.pdf http://flint.cs.yale.edu/cs422/doc/art-of-asm/pdf/
Now hit enter!
The downloads should now begin and in a matter of minutes you should have all the PDFs on assembly language stored in the folder you just created on the desktop, there will be some other junk in there too but just delete what you don't want and away you go, not bad eh?!
So this is just a small part of what you can do with Wget and it will have hopefully demonstrated how useful this tool can be to a student on the hunt for free information on the net. As I said before this can be used to download most content so long as you know the right commands and file locations.
You can also set Wget to periodically download new content when it becomes available on a server. To find out more about how to use Wget check out the Wikipedia page here! http://en.wikipedia.org/wiki/Wget
Okay so that's the end of your Google Jedi training so now go out there and try combining the skills you've just learnt here and hunt down those files that have been eluded you for so long.
Have fun and try not to break anything:) -FSR