After some thought, I decided that I didn't like the idea of using Lojban for knowledge representation. After all, it's based on first-order logic, and would therefore have all the same benefits and drawbacks. However, as it is easier to parse than a full natural language (ignoring subsets of, say, English), it would be useful for a 'natural' language interface with the user. The Lojban text would simply have to be translated into something else, something that does a more complete job of representing and processing knowledge.
Then there are the other senses. One of my biggest complaint of Searle's Chinese Room is that his counter argument to the use of vision, audio, etc. required that the vision only be of symbols (the same as the original input) instead of a picture. (For the record, my other, and probably larger, complaint is that the brain processes real world data with on-and-off-again neurons, making it as symbolic as his Room. Yet we can 'understand' the world.) If a program can process Lojban (through speech or text) as well as 'look' at data through an advanced vision system (advanced in that it is able to handle the same/similar basic preprocessing as the human vision system), and link them together in its knowledge representation system, then why can't understanding follow? Granted, there are likely steps and processes missing from that hypothetical system, but those can be worked out in time.
For a project like the one I'm describing, I would need an interface (text and/or audio) that processes Lojban, a complex computer vision system, a 'complete' knowledge representation system (complete in that it can handle any kind of knowledge, at any level of abstraction), a reasoning/problem solving/creativity system to process the knowledge, and an output to respond to the user (audio/visual/text). Stir until it thickens, serve while warm.
Sounds like a life's work to create all of these system, make them work together, and teach it.