Big List cleanup
Big List cleanup
(OP)
I have wondered about this and do it manually myself. If you have long lists and they have simple errors like o instead 0 and dates in the wrong format and data with other simple error, is there tool to help remove those errors before you validate the data.





RE: Big List cleanup
For the date formats, that's another problem... maybe someone here can help... You can copy one date, and then past this correct date to an offending cell... other than that, I don't know.
Dik
RE: Big List cleanup
Filters and sorting can make many simple errors visually obvious. O vs. 0 can be easily found this way if it's a single character. the number will be on one side of the column, the letter on the other.
If there is a "rule" about what is supposed to be in a cell then you can use a formula or VBA to check for compliance to the rule.
Identifying the exact nature of a rule violation and correcting it is much harder, and usually it's faster to make the correction manually.
Better yet, use data validation to prevent erroneous entries in the first place.
RE: Big List cleanup
My cases were chemical lists for chemical management where the lists come from all over the university and they fill in alphapropanol (50%) as a-propanol, a-propanol (using the alpha symbol and alpha-propanol and lists of microsoft codes with dates in wrong formats. Equally the errors could be due a data source error like a data-logger that occasionally makes a symbol instead of a number or screws up the date so its American rather than British.
RE: Big List cleanup
Regular expressions might be helpful in a VBA function to help weed things out, but any programmatic approach can only catch errors that you can foresee.
RE: Big List cleanup
If you would have a specific (long) list of your chemicals you could add a Conditional formatting to the cells or indeed a data validation.
However, data validation either prevents to enter certain values or will issue a warning. Afterwards you could not see if somebody did not care about the warning. Also data validation works only if you are really entering data manually. If you would copy/paste data the data validation would be lost.
Conditional formatting could also be added to fields that should contain numeric values.
A disadvantage of Conditional formatting could be that it may take more space and could somewhat slow down your Workbook if applied to many 1000s of cells.
Why an easy solution if you can make it complicated?
Greetings from the Netherlands
RE: Big List cleanup
I've done that sort of thing (just this week actually) and while I'm quite comfortable writing VBA, the patterns are too distributed for a program to help much (i.e., out of a million records across 2000 sheets there are 100,000 errors, but no single error occurs more than 100 times, faster to edit it than write a program).
I always start by making a copy (because most of the time I have to abandon some non-trivial number of attempts). Then I use "sort" liberally to try to group particular errors for repair. The easy stuff is "o" for "0" in a field that is only numbers. You can use "replace" for that. The impossible stuff is knowing if "02/04/2012" is Feb 4 or March 2. Everything else is somewhere in between.
Often it is useful to dump all the spreadsheets into Access (with all of the fields "text") and use the Access sorting and screening tools. They are much more powerful and you don't risk sorting a field without sorting the identification data (we've all ignored the warnings in Excel).
Good luck and try to stay awake while doing this horrible, mindless task (it really sucks to introduce a bunch of new errors by doing a face plant in the keyboard).
David