There is a lot of buzz these days about not using locks and using Message passing approaches like Erlang. Or about using immutable datastructures like in Functional programming
For correctness, shared is the way to go, and keep the data as normalized as possible. For immediacy, send messages to inform of changes, but always back them up with polling. Messages get dropped, duplicated, re-ordered, delayed - don't rely on them.
If speed is what you're worried about, first do it single-thread and tune the daylights out of it. Then if you've got multiple cores and know how to split up the work, use parallelism.
Real-world systems are always hybrids anyway: I don't believe the modern paradigms try, in practice, to get rid of mutable data and shared state.
The objective, however, is not to need concurrent access to this shared state. Programs can be divided into the concurrent and the sequential, and use message-passing and the new paradigms for the concurrent parts.
Not every code will get the same investment: There is concern that threads are fundamentally "considered harmful". Something like Apache may need traditional concurrent threads and a key piece of technology like that may be carefully refined over a period of years so it can blast away with fully concurrent shared state. Operating system kernels are another example where "solve the problem no matter how expensive it is" may make sense.
There is no benefit to fast-but-broken: But for new code, or code that doesn't get so much attention, it may be the case that it simply isn't thread-safe, and it will not handle true concurrency, and so the relative "efficiency" is irrelevant. One way works, and one way doesn't.
Don't forget testability: Also, what value can you place on testing? Thread-based shared-memory concurrency is simply not testable. Message-passing concurrency is. So now you have the situation where you can test one paradigm but not the other. So, what is the value in knowing that the code has been tested? The danger in not even knowing if the other code will work in every situation?
There are some implicit assumption in your questions - you assume that all the data can fit on one machine and that the application is intrinsically localised to one place.
What happens if the application is so large it cannot fit on one machine? What happens if the application outgrows one machine?
You don't want to have one way to program an application if it fits on one machine and a completely different way of programming it as soon as it outgrows one machine.
What happens if you want make a fault-tolerant application? To make something fault-tolerant you need at least two physically separated machines and no sharing. When you talk about sharing and data bases you omit to mention that things like mySQL cluster achieve fault-tolerence precisely by maintaining synchronised copies of the data in physically separated machines - there is a lot of message passing and copying that you don't see on the surface - Erlang just exposes this.
The way you program should not suddenly change to accommodate fault-tolerance and scalability.
Erlang was designed primarily for building fault-tolerant applications.
Shared data on a multi-core has it's own set of problems - when you access shared data you need to acquire a lock - if you use a global lock (the easiest approach) you can end up stopping all the cores while you access the shared data. Shared data access on a multicore can be problematic due to caching problems, if the cores have local data caches then accessing "far away" data (in some other processors cache) can be very expensive.
Many problems are intrinsically distributed and the data is never available in one place at the same time so - these kind of problems fit well with the Erlang way of thinking.
In a distributed setting "guaranteeing message delivery" is impossible - the destination machine might have crashed. Erlang cannot thus guarantee message delivery - it takes a different approach - the system will tell you if it failed to deliver a message (but only if you have used the link mechanism) - then you can write you own custom error recovery.)
For pure number crunching Erlang is not appropriate - but in a hybrid system Erlang is good at managing how computations get distributed to available processors, so we see a lot of systems where Erlang manages the distribution and fault-tolerent aspects of the problem, but the problem itself is solved in a different language.
and other languages are used
A few comments on the misunderstanding you have of Erlang:
Your facts are not facts about Erlang. Even if you think Erlang programming is a pain, you will find other people create some awesome software thanks to it. You should attempt writing an IRC server in Erlang, or something else very concurrent. Even if you're never going to use Erlang again, you would have learned to think about concurrency another way. But of course, you will, because Erlang is awesome easy.
Those that do not understand Erlang are doomed to re-implement it badly.
Okay, the original was about Lisp, but... its true!
For e.g. in a DB, you have to access and modify the same record
But that is handled by the DB. As a user of the database, you simply execute your query, and the database ensures it is executed in isolation.
As for performance, one of the most important things about eliminating shared state is that it enables new optimizations. Shared state is not particularly efficient. You get cores fighting over the same cache lines, and data has to be written through to memory where it could otherwise stay in a register or in CPU cache.
Many compiler optimizations rely on absence of side effects and shared state as well.
You could say that a stricter language guaranteeing these things requires more optimizations to be performant than something like C, but it also makes these optimizations much much easier for the compiler to implement.
Many concerns similar to concurrency issues arise in singlethreaded code. Modern CPUs are pipelined, execute instructions out of order, and can run 3-4 of them per cycle. So even in a single-threaded program, it is vital that the compiler and CPU is able to determine which instructions can be interleaved and executed in parallel.