This field is known as machine listening.
Polyphonic transcription of digitally encoded music is one of the holy grails of machine listening. It is an unsolved problem, and an area of active research. The sub-fields include:
- Onset detection
- Beat extraction (detection of the metric structure, time sig, etc)
- Pitch detection (possible using auto-correllation, and other methods, on monophonic signals, but an unsolved problem when applied to complex polyphonic music)
- Key detection (key signature detection).
Depending on the nature of your project, you might find it useful to explore the SuperCollider programming environment. SC is a language designed for projects such as this, already has a large number of machine listening plugins (ugens), and a comprehensive framework for dealing with FFT, audio signals, and much more.