Security Concepts
Introduction:
HAVP (HTTP AntiVirus proxy) is a proxy with an anti-virus filter. It does not cache or filter content. At the moment the complete traffic is scanned. The reason for this is the chance of malicious code in nearly every filetype e.g. HTML with JavaScript or JPEG. But writing a HTTP Anti Virus Proxy is a real dilemma! Huge downloads are a problem for virus scanning proxies. A Client should not receive data which is unchecked by the virus scanner, but big downloads should not timeout.
In 2005 I read and tested common techniques and did not like the idea of an intermediary info website for large download. So I decided to write my own proxy deamon for linux.
There is a nice anti virus solution for the Dansguardian content filter where I found some good ideas. Also I found some information at openantivirus and on proxies at tinyproxy and squid.
Design:
Havp writes data from a server in a temporary file and hard locks the end of a file. A second process begins scanning all written data. In the meantime, data is sent to the client after a configurable period. All Data? No, until the anti virus scan is complete, only some bytes are send in a configurable time interval. You can also define the size of data which is always held back until the scanning is complete. The HTTP Traffic (except the HTTP Header) is copyed to a temp file. The missing data in the temp file is hard locked. So the anti virus scanner can start scanning while the HTTP response is downloaded. With this method every anti virus software with an efficient API can be used.
Advantages:
Scanning starts simultaneous with download. Compressed files are a problem because they are only extracted during download.
Disadvantage:
If the scanning process is too slow and the file is larger than the defined “hold back data” you can still receive a virus! If the file contains a virus and the file is bigger than the “hold back data” the download will be cancelled with no warning.