Almost any basic FE book should cover this; Bathe's book, "Finite Element Procedures in Engineering Analysis," Szabo's book, 'Finite element Analysis,' etc. But here's my short answer to your questions: The FE method is derived from the principle of virtual work. In its simplest form, the principal of virtual work is an equation that relates two integrals--the strain energy in the body to the work done on the surface of the body.
Closed form solutions normally do not exist for these complex integrals; use Gauss integration technique to approximate the integrals. Recall that the Gauss integration technique approximates an integral with a linear sum of weight functions multiplied by the value of the integral at those weight functions. When someone says "integration points," that person normally means that he is telling you information at the Gauss integration points used to approximate the FE integrals.
How are stresses computed? You already know that the FE 'solution' is the displacement field (if you are using h-elements, the solution is the displacement field at the nodes). The stresses are easy to compute then once you set up the functions properly, since the stresses are related to the strains; the strains are the spatial gradients of the displacements. Say you are interested in the stress over an element, and you know the displacements in that element. The displacement in the x direction, 'u' is the linear sum of the x-direction displacements of the nodes multiplied by the shape functions N
u=U(1)*N(1)+U(2)*N(2)+...U

*N

, where 'n' is 4 for a linear element, 8 for a 8-noded quadrilateral, and 9 for a 9-noded quad. The functions N(i), the shape functions, are actually functions of a different coordinate system, xi-eta, the 'parent element.' Therefore you need to use the Jacobian transformation J in the spatial coordinates; J is the chain rule for the differentials transforming one coordinate system (your global 'x-y') to the parent element system ('xi-eta').
I have often read in eng-tips that users think that stresses at the integration points seem more accurate than stresses at the nodes. I've never seen a publication that explains why, so I have always considered these observations 'empiric', meaning experience based observations with no scientific method applied, though not necessarily wrong! Frankly, I don't know how they could be more accurate at the integration points than the nodes, since stresses at both locations are computed by derivatives of the displacement fields.