Skip to content
Bots!
Bots!
  • About
    • Myself
    • আমার দোয়া
  • Bookmarks
    • Bookmarks
    • My OCI Bookmarks
    • Useful Proxmox Commands & Links
    • Learning Nano
    • Useful Sites
    • Useful Virtualbox Command
    • Useful MySQL Command
    • Useful Linux Command
    • BTT-CAS
  • Resources
    • Webinar on Cloud Adoption for Project Managers
  • Photos
  • Videos
  • Downloads
Bots!

URL Filtering usng cProfile

Rumi, January 14, 2012

cProfiles: Real-Time Website Profiler

 
cProfiles provides the SafeSquid users, the much needed mechanism of classifying web-sites, in one or more categories. Usage is very simple, yet will allow security managers a lot of room, to handle challenges rather inventively.
Over 3 million web-sites have been classified in a variety of categories like news, webmail, adult, porn, arts, etc. Policy makers can create rules to determine if a web-site belongs to one or more categories, and "ADD PROFILE" of their choice, say – "NOT_BUSINESS", and then use this Profile, in any of the other SafeSquid's sections like URL Filter, or MiMe Filter, or Cookie Filter etc. to allow or deny the transaction, as per enterprise policy.
 
Categories:
  • Ads
  • Adult
  • Adult Education
  • Arts
  • Chat
  • Drugs
  • Education
  • Fileshare
  • Finance
  • Gambling
  • Games
  • Government
  • Hacking
  • Hate
  • Highrisk
  • Housekeeping
  • Instant Messaging
  • Jobs
  • Leisure
  • Mail
  • Multimedia
  • News
  • Porn
  • Proxy
  • Search Engines
  • Shopping
  • Social
  • Sports
  • System Utilities
  • Travel
  • Business

How cProfiles works

Policy makers can configure cProfiles to "add a profile" to a request for any website that is listed under one or more categories. Whenever a user requests for any website, the cProfiles module verifies if the website is listed under the specified categories. It first checks its cache for an entry. If the entry is found in the cache, cProfiles adds the profile instantly to the request. If the entry is not found in the cache, the cProfiles module sends a query to SafeSquid's Content Categorization Service (CCS). cProfiles uses DNS technology to query the CCS. This naturally updates all the enroute caching Nameservers. So if you even restart SafeSquid, the resolutions will be quickly retrieved from the nearest DNS provider.
Unlike legacy technologies that forces users to store huge databases, cProfiles caches only 'really visited' websites and therefore, utilizes very little system resources. Since the categorization happens in real-time, users do not have to regularly download updates to keep their database up to date.
The CCS has been initially seeded with a little over 3 Million web-sites. CCS has been built with a unique self-learning technology, that allows it to build a list of web-sites that must be categorized for the benefit of it's users, and CCS then automatically creates the "suggested classifications" for these web-sites, in real-time. These results are then validated by human editors, on an hourly basis, allowing the data to be instantly useable by the real-users.
Learn how to use cProfiles with SafeSquid, see – cProfiles Documentation
 
Key Benefits
  • On-the-fly categorization of websites
  • Low system resource utilization
  • Caches only "really visited" website scores
  • No large database download
  • Real-Time updation
  • Flexibility of taking a variety of actions on positive matches, instead of just allowing / denying
  • Analysis of most visited categories
Key Highlights
  • Enhances content filtering capabilities of SafeSquid.
  • Real-time content classification of over 3 Million web-sites.
  • Revolutionary technology, ensures world's smallest foot-print for such a huge web-site database.
  • Fully automatic and intelligent database updation.
Raison-de-etre
cProfiles is extremely useful in optimizing SafeSquid's overall resource utilization. It greatly reduces the overall footprint of the application.
cProfiles seeks to provide a better alternative to the legacy technologies. The legacy technologies that provide categorized web-site databases, require at least a few hundred Megabytes of disk-storage. This naturally causes an increase in the physical memory utilization too, because such databases must be "entirely" loaded into memory, for speed. These technologies therefore also demand maintenance processes that require periodic downloads of hundreds of Megabytes of data for updates for new web-sites". The legacy technologies traditionally suffer from "deficiency in customization", for example – each user in Germany had no choice but to fetch updates for even Chinese language web-sites, even if none of the real users actually ever visited such sites. And if the administration thought that it was being smart by NOT fetching updates only for one or more specific categories, it had to bear the consequences, if a real user actually visited one of the sites that would have been determined via such categories. cProfiles was therefore designed to avoid this "obesity".
cProfiles ensures that categorizations for only "really visited" web-sites are held on the systems physical resources. The updates are always incremental and performed in real-time. When a particular web-site is accessed for the first-time. This updation introduces a maximum latency of less than 300 milliseconds.
SafeSquid when coupled with cProfiles, should make the overall infrastructure application, to be better served as an embedded application, in low cost routers, and other similar appliances.
Application proxysquidurlfilter

Post navigation

Previous post
Next post

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

Myself…

Hi, I am Hasan T. Emdad Rumi, an IT Project Manager & Consultant, Virtualization & Cloud Savvyfrom Dhaka, Bangladesh. I have prior experience in managing numerous local and international projects in the area of Telco VAS & NMC, National Data Center & PKI Naitonal Root and CA Infrastructure. Also engaged with several Offshore Software Development Team.

Worked with Orascom Telecom-Banglalink, Network Elites as VAS partner, BTRC, BTT (Turkey) , Mango Teleservices Limited and Access to Informaiton (A2I-UNDP)

Currently working at Oracle Corporation as Principal Technology Solution and Cloud Architect.

You can reach me [h.t.emdad at gmail.com] and I will be delighted to exchange my views.

Tags

Apache Bind Cacti CentOS CentOS 6 CentOS 7 Debain Debian Debian 10 Debian 11 Debian 12 DKIM Docker endian icinga iptables Jitsi LAMP Letsencrypt Linux Munin MySQL Nagios Nextcloud NFS nginx pfsense php Postfix powerdns Proxmox RDP squid SSH SSL Ubuntu Ubuntu 16 Ubuntu 18 Ubuntu 20 Varnish virtualbox vpn Webmin XCP-NG zimbra

Topics

Recent Posts

  • Install Jitsi on Ubuntu 22.04 / 22.10 April 30, 2025
  • Key Lessons in life April 26, 2025
  • Create Proxmox Backup Server (PBS) on Debian 12 April 19, 2025
  • Add Physical Drive in Proxmox VM Guest April 19, 2025
  • Mount a drive permanently with fstab in Linux April 16, 2025
  • Proxmox 1:1 NAT routing March 30, 2025
  • Installation steps of WSL – Windows Subsystem for Linux March 8, 2025
  • Enabling Nested Virtualization In Proxmox March 8, 2025
  • How to Modify/Change console/SSH login banner for Proxmox Virtual Environment (Proxmox VE / PVE) March 3, 2025
  • Install Proxmox Backup Server on Debian 12 February 12, 2025

Archives

Top Posts & Pages

  • Install Jitsi on Ubuntu 22.04 / 22.10
©2025 Bots! | WordPress Theme by SuperbThemes