I tuned in late, so I'll sound off on multiple topics in one over-long post. First off, GeoPaveTraffic is right about the inefficiency of piles for slope reinforcement, particularly if they have to bridge a very thick layer so the moments get very large. If you had just a single plane of weakness (say, a soft clay layer within the dense sand and gravel), the pile would only need to provide shear reinforcement and might work well.
A few years back, the Corps built a seismic rehab for a dam in MS (not far enough from the New Madrid Fault) and used piles (which were really beams installed like piles) to span a fairly thin weak layer of liquefiable silt. They did a boatload of 3D numerical analysis to verify the concept (especially transfer of force and moment from beam to soil), and they had to use an awful lot of resteel in the beams, which were not off the shelf. Anyway, it can be done, but only for fairly thin layers, and expect to fight the analysis all the way. The analysis is more complicated than rock bolts or piles for vertical loads.
For the probabilistic discussions, remember that not all factors of safety are created equal. Most of us (especially geotechs who are used to great uncertainty in material properties) tend to add minor conservatism at every step of the way - ignoring the cohesion, using the highest piezometer of the set, etc. - so a calculated 1.2 may be absolutely bulletproof. Is your 1.4 really 1.4, or is there unnecessary conservatism built in? If you only have one material parameter to deal with, like phi' for a dry sand, it could be reliably stable with something less than 1.2 -- UNLESS -- there is some overlooked detail, in which case higher FS may not do any good. On the other hand, if you're dealing with a cut in soft clay or something like that, a calculated FS of 1.5 may not be very reliable at all, as in the example cited by BigH. With undrained shear strength, there can indeed be uncertainty in strength that exceeds the margin provided by FS=1.5, especially if the clay is at all sensitive. About 20 years ago, I saw an analysis by one of the early leaders in geotechnical risk analysis that showed an undrained FS of 1.3 for an upstream-type tailings dam gave a more reliably stable slope than 1.5 with a drained analysis, by incorporating the uncertainties associated with each of the input parameters. This was done for the purpose of showing the state regulators that a good 1.3 was better than a typical 1.5. This was NOT smoke and mirrors, but a realistic depiction of the uncertainties of soil parameters and different forms of analysis.
An appropriate probability of failure? Depends on how much a failure would cost and how much it costs to raise the FS. If failure could kill people, real small. If it means just a few $K for regrading, not so small.