Posted by: atri | September 19, 2007

## Project 11: Private information retrieval schemes and locally decodable codes

Say you want to query an online database but do not want the database (or the server holding the database) to glean any information about your query. To simplify the problem a bit, assume that the database is a vector in $\{0,1\}^n$ and your query just asks for the value at the $i$th position in the string. One trivial way to do this will be for the server to return you the full $n$ bit string. The natural question to ask is whether one can do better?

Unfortunately, it turns out that if there is just one copy of the database then this is the best that one can do. However, if there are $k\ge 2$ copies of database at $k$ different servers and you can query all the $k$ copies, then the total amount of information that the servers need to send you (without anyone of them getting to know what your query was) can be sub-linear. This is known as the Private Information Retrieval (or PIR) problem.

PIR is also closely related to locally decodable codes, which are codes that have decoders with the following very localized property. Any symbol in the codeword that needs to be output can be recovered by looking at very small number (ideally a constant independent of the block length of the code) of symbols in the received word.

Thus, given $k\ge 2$, the upper bound problem is to find a PIR scheme in which the servers have to return as few bits as possible to you. There has been a recent break through work in this direction– see the paper titled Towards 3-Query Locally Decodable Codes of Subexponential Length (STOC07) by Yekhanin. The lower bound problem is to prove lower bounds on the minimum number of bits that the $k$ servers need to communicate to you. Unfortunately, not much is known about general lower bounds yet. One student can survey the upper bound results while another can survey the lower bound results.

(Update- 4/27/09) Yekhanin’s result has been improved recently by Efremenko.

A great resource for this problem is Bill Gasatrch‘s webpage on Private Information Retrieval, which also covers additional versions of the PIR problem that has not been discussed in this entry.

This was the last entry on project descriptions: now it’s your turn to get to work 🙂