Did Vicarious Solve the Cocktail Party Problem?
There is something a little weird about Vicarious's recent announcement in which they claim to have developed a machine learning program that can solve CAPTCHAs at 90% accuracy. What is interesting, from my vantage point, is that some CAPTCHAs print irrelevant words behind the actual text to be recognized. Paypal, for example, displays copies of the word 'PayPal' behind the CAPTCHA text. Take a look at this CAPTCHA, which Vicarious claims to be able to solve.
Quick and dirty techniques such as thresholding can be used to reduce or eliminate background noise but that would be cheating. Is this what Vicarious is using here? I don't know but it would seem that thresholding would not work very well in this case unless you knew in advance what to look for. This means that, if we are to believe their claim, Vicarious's AI program is sophisticated enough to be able to focus on certain things while ignoring others. Wow. Really? Are we to understand that Vicarious solved a visual analog of the cocktail party problem, which is essentially the ability to pay attention to one object within a group of many others? If the answer is yes, it would be a monumental breakthrough because this is one of the hardest unsolved problems in computer science. Even so, the question becomes, how can the program tell which letters in the picture are relevant and which are not? There is something either fishy or missing in this story.