We are not in a century that has thinking, dreaming machines. We have statistical models with massive amounts of human-generated input doing statistical correlation on the forms of text and the forms of images. That's why you get funny numbers of fingers and a cook pouring ingredients into his arm.
There are no cognitive objects here, no model of a "hand", "person" or "pizza".