URL Filtering usng cProfile Rumi, January 14, 2012 cProfiles: Real-Time Website Profiler cProfiles provides the SafeSquid users, the much needed mechanism of classifying web-sites, in one or more categories. Usage is very simple, yet will allow security managers a lot of room, to handle challenges rather inventively. Over 3 million web-sites have been classified in a variety of categories like news, webmail, adult, porn, arts, etc. Policy makers can create rules to determine if a web-site belongs to one or more categories, and "ADD PROFILE" of their choice, say – "NOT_BUSINESS", and then use this Profile, in any of the other SafeSquid's sections like URL Filter, or MiMe Filter, or Cookie Filter etc. to allow or deny the transaction, as per enterprise policy. Categories: Ads Adult Adult Education Arts Chat Drugs Education Fileshare Finance Gambling Games Government Hacking Hate Highrisk Housekeeping Instant Messaging Jobs Leisure Mail Multimedia News Porn Proxy Search Engines Shopping Social Sports System Utilities Travel Business How cProfiles works Policy makers can configure cProfiles to "add a profile" to a request for any website that is listed under one or more categories. Whenever a user requests for any website, the cProfiles module verifies if the website is listed under the specified categories. It first checks its cache for an entry. If the entry is found in the cache, cProfiles adds the profile instantly to the request. If the entry is not found in the cache, the cProfiles module sends a query to SafeSquid's Content Categorization Service (CCS). cProfiles uses DNS technology to query the CCS. This naturally updates all the enroute caching Nameservers. So if you even restart SafeSquid, the resolutions will be quickly retrieved from the nearest DNS provider. Unlike legacy technologies that forces users to store huge databases, cProfiles caches only 'really visited' websites and therefore, utilizes very little system resources. Since the categorization happens in real-time, users do not have to regularly download updates to keep their database up to date. The CCS has been initially seeded with a little over 3 Million web-sites. CCS has been built with a unique self-learning technology, that allows it to build a list of web-sites that must be categorized for the benefit of it's users, and CCS then automatically creates the "suggested classifications" for these web-sites, in real-time. These results are then validated by human editors, on an hourly basis, allowing the data to be instantly useable by the real-users. Learn how to use cProfiles with SafeSquid, see – cProfiles Documentation Key Benefits On-the-fly categorization of websites Low system resource utilization Caches only "really visited" website scores No large database download Real-Time updation Flexibility of taking a variety of actions on positive matches, instead of just allowing / denying Analysis of most visited categories Key Highlights Enhances content filtering capabilities of SafeSquid. Real-time content classification of over 3 Million web-sites. Revolutionary technology, ensures world's smallest foot-print for such a huge web-site database. Fully automatic and intelligent database updation. Raison-de-etre cProfiles is extremely useful in optimizing SafeSquid's overall resource utilization. It greatly reduces the overall footprint of the application. cProfiles seeks to provide a better alternative to the legacy technologies. The legacy technologies that provide categorized web-site databases, require at least a few hundred Megabytes of disk-storage. This naturally causes an increase in the physical memory utilization too, because such databases must be "entirely" loaded into memory, for speed. These technologies therefore also demand maintenance processes that require periodic downloads of hundreds of Megabytes of data for updates for new web-sites". The legacy technologies traditionally suffer from "deficiency in customization", for example – each user in Germany had no choice but to fetch updates for even Chinese language web-sites, even if none of the real users actually ever visited such sites. And if the administration thought that it was being smart by NOT fetching updates only for one or more specific categories, it had to bear the consequences, if a real user actually visited one of the sites that would have been determined via such categories. cProfiles was therefore designed to avoid this "obesity". cProfiles ensures that categorizations for only "really visited" web-sites are held on the systems physical resources. The updates are always incremental and performed in real-time. When a particular web-site is accessed for the first-time. This updation introduces a maximum latency of less than 300 milliseconds. SafeSquid when coupled with cProfiles, should make the overall infrastructure application, to be better served as an embedded application, in low cost routers, and other similar appliances. Application proxysquidurlfilter