Panda Security recently released (in beta form) what it claims is the first cloud-based anti-virus / anti-malware solution for Windows PCs. Not only does it sound like a clever tool for data loss prevention, but it demonstrates another way in which information service providers can aggregate individual user data to develop classifications or benchmarks valuable to every user, a mechanism I’ve explored in previous blog posts.
In essence, every computer using Panda’s Cloud Antivirus is networked together through Panda’s server to form a “collective intelligence” for malware detection and prevention. Here’s how it works: users download and install Panda’s software – it’s a small application known as an “agent” because the heavy lifting is done on Panda’s server. These agents send reports back to the Panda server containing information about new files (and, I presume, related computer activity which might indicate the presence of malware). When the server receives reports about previously unknown files which resemble, according to the logic of the classification engine, files already known to be malware, these new files are classified as threats without waiting for manual review by human security experts.
For example, imagine a new virus is released onto the net by its creators. People surfing the net, opening emails, and inserting digital media start downloading this new file, which can’t be identified as a virus by traditional anti-virus software because it hasn’t been placed in any virus definitions list yet. Computers on which the Panda agent has been installed begin sending reports about the new file back to the Panda server. After some number of reports about the file are received by Panda’s server, the server is able to determine that the new file should be treated as a virus. At this point all computers in the Panda customer network are preemptively warned about the virus, even though it has only just appeared.
According to Panda’s April 29, 2009 press release:
Utilizing Panda’s proprietary cloud computing technology called Collective Intelligence, Panda Cloud Antivirus harnesses the knowledge of Panda’s global community of millions of users to automatically identify and classify new malware strains in almost real-time. Each new file received by Collective Intelligence is automatically classified in under six minutes. Collective Intelligence servers automatically receive and classify over 50,000 new samples every day. In addition, Panda’s Collective Intelligence system correlates malware information data collected from each PC to continually improve protection for the community of users.
Because Panda’s solution is cloud-based and free to consumers, it will reside on a large number of different computers and networks worldwide. This is how Panda’s cloud solution is able to fill a dual role as both sampling and classification engine for virus activity. On the one hand Panda serves as manager of a communal knowledge pool that benefits all consumers participating in the free service. On the other hand, Panda can sell the malware detection knowledge it gains to corporate customers – wherein lies the revenue model that pays for the free service.
I have friends working in two unrelated startups, one concerning business financial data and the other Enterprise application deployment ROI, that both work along similar lines (although neither are free to consumers). Both startups offer a combination of analytics for each customer’s data plus access to benchmarks established by anonymously aggregating data across customers.
Panda’s cloud analytics, aggregation and classification mechanism is also analogous to the non-boolean document categorization software for eDiscovery discussed in previous posts in this blog, whereby unreviewed documents can be automatically (and thus inexpensively) classified for responsiveness and privilege:
- The Evolution of eDiscovery Analytics, Part II: A Conversation with Nicholas Croce
- The Evolution of eDiscovery Analytics, Part I: Trusting Analytics
- Understanding the “about” of documents using content-based clustering
Deeper, even more powerful extensions of this principle are also possible. I anticipate that we will soon see software which will automatically classify all of an organization’s documents as they are created or received, including documents residing on employees laptop and mobile devices. Using Panda-like classification logic, new documents will be classified accurately whether or not they are of an exact match with anything previously known to the classification system. This will substantially improve implementation speed and accuracy for search, access control and collaboration, document deletion and preservation, end point protection, storage tiering, and all other IT, legal and business information management policies.
2 Replies to “Cloud-seeding: SaaS data classification via Panda Security’s new anti-virus offering”