INTERNET DATA MINING TOOLS

 

 

 

 

 

 

 


GENERAL PURPOSE SEARCH ENGINES:

Open Directory Pegasus Alta-Vista Scrub the Web excite Infoprobe
Search King Splat Lycos Northern Light Questfinder Yahoo
Phatoz Hotrate Omniseek.com Homepageseek Sunbrain 1st Spot
Matilda Searchgate All the Web (Fast) Google Spitfire! Fathead
Ah-ha.com MSN Looksmart--Pay Site to Submit Seekon.com Highforce Searchport International Directory
Hotbot Hit-Net National Directory NBCI (formerly Snap) goeureka Galaxy

Alta-Vista

Dogpile.com

 


 

ACADEMIC SEARCH

 

 

 

 

 

 

 

 

 

 

 


 

How USA Residents Use the Internet

 

NO (in million) PURPOSE
21 Get additional careeer training
17 Dealing with major illness
17 Selecting school for siblings
16 Auto purchase
16 Financial decision
10 Relocation
8 Changing jobs
7 Cope with illness
How US residents use the InternetSource:  Pew internet and the American Life Project - 2006

 

 

 

 

 

 

 


SEARCH USING NATURAL LANGAGE/EXPERTS

 

Ask a question on any topic and get answers from real people.
Lets users bypass multi-layered menus. Applications built with Answers Anywhere can even anticipate needs third generation Internet search engine with advanced Natural Language Processing technologies START, the world's first Web-based question answering system

Semantic technology for custom built search engines based on natural language processing. The key to InQuira's performance is its ability to understand natural language search intents kozoru Q&A search system that will give users the ability to find specific answers to their questions START, the world's first Web-based question answering system

 

 

 

 

 

 


 

 

 

NICHE SEARCH ENGINES

Copyright Violation Search The Internet Dictionary Now part of Ask.com
Website YellowBook Patent Search Book finding using ISBN numbers US Post Office Zip Codes
CAPWIZ
Find government reps Find gov reps and legislation FCC ID numbers Find Device Drivers Bable Fish translator FCC product no.

SEARCH USING ARTIFICIAL INTELLIGENCE  (for advanced translators)

Exchange Rate

Leading Chinese search engine

People/email search Business People search   Case search
Fictitious Business Name State Bar of California DNS Check OC CA. Courts California Courts Case summary
 
California Business Portal  

 


 

 

 

 

SPECIALIZED INFORMATION TECHNOLOGY SEARCH ENGINES

US Post office - find zip codes Find Government Re

 

 


 

SEARCH USING ARTIFICIAL INTELLIGENCE

You can use the intelligent agents below as assistants, to search, track or  knowledge management:

 Alexa:  "Learns" and suggests sites; provides statistics about sites (owned by Amazon.com)  

BonziBUDDY:  He talks to you, browses the web and searches the Internet as your sidekick. With his built-in artificial intelligence, he learns from you (your likes and interests.)  

Copernic  - Copernic Agent is a Meta search engine, invisible web explorer, online research assistant and extensive toolbox, all combined into an elegant, easy to use program.

edokey2000 - Its is a peer-to-peer software that allows you to search documents on line.

Karnak - Karnak is the virtual library of infinite knowledge through the web where you can enter to receive a constant flow of relevant information. Inside, you'll be guided through the query process.

LexiBot  - will search nearly 600 sources at once. Then it will filter results by date, country, URL or size.  

MyIvan -  Imagine being able to find anything you want on the Internet simply by talking to your computer. A new product, called myIVAN™, allows you to do just that -- search the web by asking IVAN, the Intelligent Voice Animated Navigator™, questions and tell him where you want to go and what you want to find

TrademarkTracker.com - searches actual Internet content for mentions of possible brand abuses and violations. If there’s a website abusing your corporate brand, TrademarkTracker.com will find it.  

 Text Analyst - examines a text file and creates a semantic network of importance; produces abstract automatically  

The Easy Bee - The Easy Bee is a software product for Windows that allows everyone to easily automate tedious Web navigation tasks and build aggregated pages with always up-to-date Web extracts  

TurboStart - Turbo Start is an Internet search utility that gives you access to 270 of the web's most popular search engines from your browser. Unlike most other search utilities, Turbo Start is fast and it puts you in control

TVEyes - Watches TV and tracks keywords for you.

WebSeeker - WebSeeker leverages the power of many search engines (8 commercial search engines to be exact), and uses your computer to refine the results.

suggest a agent:  URL@atwebo.com 

 


CUSTOMIZED SEARCH

ROLLYO

Rollyo stands for "Roll Your Own Search Engine."

Using Rollyo, you create a searchroll.  A searchroll is a collection of the sites you trust and find useful. It's a personal search engine you create to provide relevant results from a hand selected list of reliable sites. 

Each searchroll gets its own Web address, so you don't have to wade through the whole Rollyo site to get to it, and you can email this address to others. You can even add your searchroll to the drop-down list of search engines in the toolbar of the Firefox Web browser, so you can search it without first navigating to the Rollyo site.

 

PubSub

PubSub is an automated system that constantly matches your search terms against millions of blogs, online discussions, news releases and SEC filings, and notifies you when there is a match

Aesir

Tool helps customize your search from search engines you trust.

 

 

 

 

 

 


 

 

Purpose Source Engine
Get Free Code Planet Source Code

Search Thousands of lines of free code at www.Planet-Source-Code.com

Vb World   Java World C++ World ASP World

Advanced Search    Browse

Get Free Code Planet Source Code
Get CRM Info SearchCRM.com SearchCRM.com
 Windows NT/2000-Specific  SearchWin2000.com SearchWin2000.com
Technology Research Penn/NET Technology Research
IT Encyclopedia TECHTARGET TECHTARGET
 

 

 

 

BLOG SEARCH

 

 

 


 

 

BLOG BUZZ  - A listing of most influential Blogs in different industries

Accounting - AccountingObserver - Rants about corporate troubles 

Advertising - Adrants - keep up with what is going on with Madison Avenue

Digital Content - PaidContent - Tracks the latest developments from a range of businesses interested in the development of digital content. 

Currencies - RGMonitor - Tracks monetary issues through a macroeconomic lens/ 

Economics - BigPicture - Market commentary and musings of inner workings of glamorous industries

Health Care:  PharmaMarketing - Best practices for drug companies to deliver accurate and reliable information to doctors and consumers |  HealthCareBlog - Everything you wanted to know about the health care industry, but were afraid to ask.  

Hollywood - Defamer - Entertainment news and opinion

Insurance - InsuranceScrawl - Legal issues facing property-casualty insurers 

Music - Lefsetz Letter - Anything and everything to do with music.

Popular Opinion - JeffMathews - Popular opinion and news analysis among traders and institutional investors.

Publishing - PublisherMarketPlace - Paid site with selection of print and web-based book-publishing stories |   BookSlut - Reviews, news and commentary 

Real EstateCurbed - attempts to deflate real estate hype |  Slatin Report - commentary on commercial real estate

Research on legal issues re:  M&A:  Dealalwyers - Dissects M&A flow based on obscure and widely known legal issues.

Tech Blogs - Engadget - Round up of gadgets | SlashDot - Technical, social and political issues - DanGilmolmor - tech and political issues | PhoneScoop - Everything you wanted to know about phones |  DigitalCameras - All about digital cameras |  Ipods - All about iPods |  CrazyAppleRumors 

Television - MediaBistro - TV news and major network decisions 

Taxes - TaxAnalyst - Handy way to catch up on breaking news |  TaxProf - Tax news, academic papers and other links 

Theater - BroadwayStars - List of daily theater news 

Wall Street - Footnoted - The Neiman Marcus Watchdog Project of the financial world.

 


 

SOCIAL BOOKMARKINGS

  ACM tutorial on social bookmarkings |   Bookmarking managers |Social Bookmarking tools 

Social bookmarking is a user-defined taxonomy system for bookmarks. Such a taxonomy is sometimes called a folksonomy and the bookmarks are referred to as tags. Unlike storing bookmarks in a folder on your computer, tagged pages are stored on the Web and can be accessed from any computer. Technorati, a blogging site, describes the system as "The real-time Web, organized by you." Web sites dedicated to social bookmarking, such as Flickr and del.icio.us, provide users with a place to store, categorize, annotate and share favorite Web pages and files.

According to the January 24, 2006 issue of the Wall Street Journal, “Yahoo and Others Embrace “Tagging” as a Better Way to Find and Store Information”.

This article explains “Americans conduct nearly 200 million internet searches every day.  Now several companies want to make this process by transforming the way people look for and store information.

The new method, dubbed “tagging” addresses a common complaint of many Internet users – that searching is often clumsy and inefficient.  Web surfers often must sift through multiple pages of search results to find what they are looking for.  And retrieving the best sites a second time often means redoing the search or trolling trough an unorganized list of sites that you have haphazardly saved in a “favorites” folder.

Although tech geeks have been using this new method for the last couple of years, and social bookmarking research has been going on for a while, it is only recently that “..tagging is moving into the mainstream. …Last month, Yahoo Inc., bought the popular tagging site Del.icio.us.  Now the Sunnyvale, CA company says it plans to allow Del-iciou.us users to access their tagged links through MyWeb 2.0, Yahoos’ own tagging site.

Bookmarking managers:

Backflip |Blinklist | BlinkPro |Bookmark Buddy | Bookmark Commando | Bookmark Magic | Bookmark Tracker | BookmarkSync | Bookmarx | Bookmax.net | CiteULike | a free service to help academics to share, store, and organize  academic papers | ClickMarks| Connectedy.com | Connotea - free online reference management service for scientists to store or share articles and links | de.lirio.us - Open source clone of del.icio.us with private bookmarking, tagging, blogging, and notes| Dude, Check This Out! |   Frassle | Freelink.org | Furl | GlobusPortGUIcookies | Hotlist Anywhere |HydraLinks | Hyperlinkomatic | IC Soft, Inc. | iKeepBookmarks.comitList | JotsLink2MarkLinkroll | Links2GoLiveFavorites| MURL | My Bookmark Manager | MyBookmarks | myHq | NetvouzopenBM |   PeerMarkPluck Web Edition (PWE) | Powermarks|Save Your Links | SearchFox |Shadows| SimpySiteJotSpurl.net| SV Bookmark | Sync2It| URLBlaze | Web Feeds | WhatLink.com| Whitelinks | Wists | Womcat Bookmarks | World Wide Wisdom | wURLdBook | Yahoo! Bookmarks | Zoogim.com Online Bookmarks |

 

 

 


 

Geotagging /Mashes and Maps - geotagging allows users to geographic information, such as an address, or latitude and longitude, to any digital content - everything from photographs and videos to news articles and blog posts.  Then the content can be easily displayed on an online map or cross-referenced with other information about the location.  Geotagging is related to another online practice called "mashups", where users place information, such as real estate listings onto an online map.

So far, the most popular application for geotagging has been online photos.

 

Geotagger Sites

 

 


 

Evaluating Sources of Information

New Social Contract Paradox of the Information Age:


"Despite the existence of more and better information than ever before, time pressure prevents decision makers from gathering all that they need and from sharing it," 
-- Peter Tobia, author, "Decision Making in the Digital Age: Challenges and Responses,"

 

There are two ways to navigate through life easily:  First is to question everything, the second is to question nothing.  In either case, thinking is not required.   This may just support the assertion that second hand information is like second-hand smoking, and just as deadly, particularly in the exponentially growing digital universe.

 

Six Spokes of Trust- - Adapted From CCI Leadership (2006), Six Spokes of Trust 

All we need to evaluate sources of information found in the digital universe we learnt in kindergarten:  stranger danger!  That is, if we do not know the source of the information, refer to the warning about second hand smoking.  From there, we know that trust is not like instant coffee:  It takes time to get to know all the forces of influence acting on the sources.  Take for instance the findings  published in the Wall Street Journal May 11, 2005 describing what some authors in the  Journal of the American Medical Association do:  

  • Describe original main goal as secondary – 34%
  • Fail to disclose original goal – 26 %
  • Turn original secondary goal into main goal – 19%
  • Create new main goal

 

Evaluating online sources of information is not much different than critical reading – below is an outline of a suggested process, including attributes of information, attributes of poor problem solvers and getting to knowledge

Context and Timeliness Analysis

A. Author

What are the author's credentials--institutional affiliation (where he or she works), educational background, past writings, or experience?

B. Date of Publication

When was the source published?

Is the source current or out-of-date for your topic?

C. Edition or Revision

Is this a first edition of this publication or not? Further editions indicate a source has been revised and updated to reflect changes in knowledge, include omissions, and harmonize with its intended reader's needs.

D. Publisher

If the source is published by a university press, it is likely to be scholarly. Although the fact that the publisher is reputable does not necessarily guarantee quality, it does show that the publisher may have high regard for the source being published.

E. Title of Journal

Is this a scholarly or a popular journal? This distinction is important because it indicates different levels of complexity in conveying ideas.

 

 

Content Analysis

A. Intended Audience

What type of audience is the author addressing? Is the publication aimed at a specialized or a general audience? Is this source too elementary, too technical, too advanced, or just right for your needs?

B. Objective Reasoning- Is this intended to persuade or manipulate?

Is the information covered fact, opinion, or propaganda? I

Does the information appear to be valid and well-researched, or is it questionable and unsupported by evidence? Assumptions should be reasonable.

Are the ideas and arguments advanced more or less in line with other works you have read on the same topic? The more radically an author departs from the views of others in the same field, the more carefully and critically you should scrutinize his or her ideas.

Is the author's point of view objective and impartial? Is the language free of emotion-arousing words and bias?

C. Coverage

Does the work update other sources, substantiate other materials you have read, or add new information? Does it extensively or marginally cover your topic?

Is the material primary or secondary in nature? Primary sources are the raw material of the research process. Secondary sources are based on primary sources.

D. Writing Style

Is the publication organized logically? Are the main points clearly presented? Do you find the text easy to read, or is it stilted or choppy? Is the author's argument repetitive?

 

 

Attributes of poor problem solvers - unable to properly evaluate sources of information

  • Cannot settle on a way to begin..
  • Convince themselves they lack sufficient knowledge (even when that is not the case).
  • Plunge in, jumping haphazardly from one part of the problem to another, trying to justify first impressions instead of testing them.
  • Lack a critical attitude and take too much for granted

 


 

 

 

 

Clustering Search Engines

Overview of Clustering and Clusty Search Engine

Jan 5, 2007 ... Written by Alex Iskold Earlier this week we wrote about The Race to beat Google. In that article we discussed various approaches that ...
www.readwriteweb.com/archives/overview_of_clu.php - Cached 

Use Cluster Search Engines To Plan Your Writing

Jan 20, 2007 ... Cluster style search engines give the information in a non linear format. Instead of the big G for your next web search, give these a spin ...
www.growyourwritingbusiness.com/?p=98 - Cached - Similar

Yippy – Welcome to the Cloud.

yippy/turn privacy ON | |; advertising | |; about |; help |; privacy |; toolbars |; sitesearch |; technology |; contact us ...

search.yippy.com/ - Cached - Similar

Search Engines with Cluster Technology

Search Engines with Cluster Technology, generating Groups of Search Results, optimized Navigation. Innovative Search Engine Technologies with Audio and ...
www.folden.info/searchengineclustertechnology.shtml - Cached - Similar

 

WebClust - Clustering Search Engine

WebClust is a meta search engine based on a technology called Documentclustering : the automatic organization of documents into meaningful groups.
www.webclust.com/ - Cached  

 

Carrot2 Search Results Clustering Engine - Carrot2 Clustering Engine

Carrot2 Search Results Clustering Engine. Carrot2 organizes your search results into topics. With an instant overview of what's available, you will quickly ...
search.carrot2.org/stable/search - Cached - Similar

 

Visual and Clustering Search Engines

Another in the Bootcamp series of podcasts , these slides show examples.
www.slideshare.net/.../visual-and-clustering-search-engines - 

 

Mooter - Web Search

Clusters search results. It presents a diagram of themes within the results, from which the user can select one or all results. Options to search Australian ...
www.mooter.com/ - Cached - Similar

 

iBoogie - MetaSearch Document Clustering Engine and Personalized ...

iBoogie MetaSearch Engine with automatic document clustering. ... Documentclustering technology - Read all about clustering and meta search. ...
www.iboogie.tv/ - Cached - Similar

 

What is clustering? » Federated Search Blog

Jan 22, 2008 ... Some search engines and some federated search enginesprovide clustering features. A very simplistic form of clustering is to group search...
federatedsearchblog.com/2008/01/22/what-is-clustering/ - Cached 

 

A Concept-Driven Algorithm for Clustering Search Results

File Format: PDF/Adobe Acrobat - Quick View
by S Osinski - Cited by 73 - Related articles


Clustering Search. Results. Stanislaw Osinski and Dawid Weiss, Poznan University of Technology. Search engines rock! Right? Without search engines, the ...
dollar.biz.uiowa.edu/~nstreet/01439479.pdf

 

 

 

 


 

Attributes of information

  • Timeliness 
  • Sufficiency - completeness. 
  • Level of Detail or Aggregation - are the data broken down into meaningful units
  • Redundancy - not too much, but enough
  • Understandability 

practicality 


simplicity 


minimization of perceptual errors 


difficulty with encoding

  • Freedom from Bias
  • Reliability - is information correct & verifiability
  • Decision-Relevance - predictive power, significance
  • Cost-efficiency - consider the change in decision behavior after obtaining the information minus the cost of obtaining it
  • Cost-effectiveness 
  • Comparability 


consistency of format 
consistency of aggregation 
consistency of fields

  • Quantifiability 
  • Appropriateness of format , medium of display 
    • ordering of the information 
      graphical vs. tabular display
  • Quantity: more is not better! 

How Google Evaluates Information - The Panda Express

Panda is a Google search algorithm and "... just one of roughly 500 search improvements we expect to roll out to search this year,” writes Google Fellow Amit Singhal on the Google Webmaster Central blog. “In fact, since we launched Panda, we’ve rolled out over a dozen additional tweaks to our ranking algorithms. Search is a complicated and evolving art and science, so rather than focusing on specific algorithmic tweaks, we encourage you to focus on delivering the best possible experience for users.”

  1. Would you trust the information presented in this article?

  2. Is this article written by an expert or enthusiast who knows the topic well, or is it more shallow in nature?

  3. Does the site have duplicate, overlapping, or redundant articles on the same or similar topics with slightly different keyword variations?

  4. Would you be comfortable giving your credit card information to this site?

  5. Does this article have spelling, stylistic, or factual errors?

  6. Are the topics driven by genuine interests of readers of the site, or does the site generate content by attempting to guess what might rank well in search engines?

  7. Does the article provide original content or information, original reporting, original research, or original analysis?

 

 

  1. Does the page provide substantial value when compared to other pages in search results?
  2. How much quality control is done on content?
  3. Does the article describe both sides of a story?
  4. Is the site a recognized authority on its topic?
  5. Is the content mass-produced by or outsourced to a large number of creators, or spread across a large network of sites, so that individual pages or sites don’t get as much attention or care?
  6. Was the article edited well, or does it appear sloppy or hastily produced?
  7. For a health related query, would you trust information from this site?
  1. Would you recognize this site as an authoritative source when mentioned by name?
  2. Does this article provide a complete or comprehensive description of the topic?
  3. Does this article contain insightful analysis or interesting information that is beyond obvious?
  4. Is this the sort of page you’d want to bookmark, share with a friend, or recommend?
  5. Does this article have an excessive amount of ads that distract from or interfere with the main content?
  6. Would you expect to see this article in a printed magazine, encyclopedia or book?
  7. Are the articles short, unsubstantial, or otherwise lacking in helpful specifics?
  8. Are the pages produced with great care and attention to detail vs. less attention to detail?
  9. Would users complain when they see pages from this site?

 



MOBILE SEARCH ENGINES


Google MobileYahoo Mobile  

 

 

 

 

 

 

 

 

   

 

 



MLA Style
"Page_title.”  @WEBO, year

@WEBO, day, month,  year
<http://www.atwebo.com/page.htm>

APA Style
Page._title.  (year)

Retrieved day, month,  year, from http://www.atwebo.com/page.htm

Link to this page:

<a href=http://www.atwebo.com/page_.htm>page_title</a>

 

Send mail to webperson@atwebo.com with questions or comments about this web site.
Copyright © 2001-2011 @WEBO: Increasing Social Capital - Thought leadership, best business practices and innovation in information technology outsourcing
Last modified: May 17, 2013