Information is put on the internet in one of two ways: deliberately by people and automatically my machines. We have to consider the motivation for the information existing and being publicly accessible. We must also consider the interfaces of platforms and our ability to query information. Finally, Information varies in its persistence, some information is temporary and some information is permanent. On the web all information is temporary, all sites will eventually cease to exist and the information they contain will stop being public.


We have to practice information gathering techniques. Corpus building is done by creating a list of words associated with what you’re looking for, using a thesaurus is helpful. Mind maps help you come up with ideas related to the main concept. Decision trees are useful for planning your data collection route while under stress or if times does not permit introspection.

Use direct queries when possible

Recommendation systems do not benefit users, their main objective is to increase time on site and increase how often users return to the website. These problems are associated with social media platforms where the users ability to query data is optimized for touch screen interface (meaning passively consuming media). The less control over a search you have, the more room the site has to inject its own interpretations as to what to display.

Use a thesaurus and dictionary or a glossary

Using proper terms for your queries can improve their results.

For example looking for: How to pass a function to a function in Javascript

Is not as effective as: Higher order functions in Javascript

That being said the first sounds more intuitive but people who write quality articles would use the proper term not the term most people would use. Any quality book an a subject will have a glossary in the back to use.

Create awareness of terms in your mind, use or create your own glossary if the topic you’re researching has no official definitions invent your own. Key point here is to know what you’re looking for. If you have no knowledge of a concept take out a pen and paper and go through as much text, video or audio you can taking down things which you can use later.

Try using a different language

Try running your search terms through a translator and repeat your search. Most SEO is spent on English words so this can bypass a lot of it.

Use Your Imagination

This may sound like a meme but grade school creativity activities such as mind maps help. Take a paper and write down everything you can imagine about the subject you’re looking for, imagine every possible thing which can be related to that subject and write it down. Even if you don’t know why your mind came up with something, write it down. Your mind is the best tool for creativity and its trying to point you in the right direction. Try to carry around a note book to write down your ideas if nothing comes to you at the particular minute. Changing your environment puts the mind into a different state which can change its thought process.

Easy to use Tools

  1. Wget Lets you to clone an entire website from the command line.
  2. Wkhtmltopdf Converts html documents into pdf documents.
  3. Youtube-dl can archive videos from youtube and a hundred other video sharing websites
  4. Dict an offline dictionary and thesaurus
  5. Argos offline language translation
  6. Glogg log viewer built for huge files

Programmable Tools

  1. Grep Is a tool to search for patterns in text.
  2. Selenium A programmable web browser
  3. GraphViz Graph visualization program (more of an analysis tool)
  4. Sqlite A light weight portable database

Search Engines

Before I begin, I don’t recommend wasting time with “privacy” anything. “privacy” oriented services are snake oil that can’t be quantified beyond being a coping mechanism or a method of expressing snobbery to peers. If you need “privacy” stop using the internet. Privacy is functionally impossible for web search engines to accomplish because they need to read the search string and run a query on their database. I I don’t believe we have magic algorithms which can hide your query from human eyes while being usable by the search engine.

Major general search engines,

  • Google
  • Bing
  • Yandex
  • Baidu
  • Gigablast (just for fun)

Search engines omit results based on local filter rules, one engine may contain information another has de-listed. One engine may rank results higher due to their internal algorithm.

Search operators

Use search operators, otherwise the search string will be treated as a general suggestion to the engine, not a direct order for information. Sifting through SEO optimized garbage is not fun, always use search operators.

Google The term “howtofindinfo.php” must be in the URL


Google Limit searches to a site


Yandex Limit results to the xyz domain

domain:xyz dbuild

Yandex Limit results to specific file type

mime:pdf book

Browse the references below:

Google Bing Yandex Gigablast