20% of Scientific Papers On Genes Contain Conversion Errors Caused By Excel, Says Report
20% of Scientific Papers On Genes Contain Conversion Errors Caused By Excel, Says Report
(OP)
This is an interesting article on Slashdot.
20% of Scientific Papers On Genes Contain Conversion Errors Caused By Excel, Says Report. The link is to Slashdot.
20% of Scientific Papers On Genes Contain Conversion Errors Caused By Excel, Says Report. The link is to Slashdot.
--
JHG





RE: 20% of Scientific Papers On Genes Contain Conversion Errors Caused By Excel, Says Report
Be Aware: Excel can change your data! http://www.tek-tips.com/faqs.cfm?fid=7375
Understanding Dates and Times & why they seem to be so much trouble? http://www.tek-tips.com/faqs.cfm?fid=5827
Skip,
Just traded in my OLD subtlety...
for a NUance!
RE: 20% of Scientific Papers On Genes Contain Conversion Errors Caused By Excel, Says Report
... and even savvy biologists knew all about it more than 10 years ago:
http://bmcbioinformatics.biomedcentral.com/article...
That link is from the recent paper, which can be read in full at:
http://genomebiology.biomedcentral.com/articles/10...
The graph of the frequency of these errors is quite interesting:
As Skip says, we have discussed this "feature" here before, but to blame it all on Excel is ridiculous. Aren't these papers supposed to be peer reviewed? Do they blame unit conversion errors, or misplaced bracket errors on Excel as well?
Anyway, the solution (which apparently was pretty well known in 2005, and has since been forgotten) can be found in the 2004 paper:
But I don't agree with their next statement:
How can a research scientist who lets errors as obvious as these to sslip through into published work be described as "vigilant"?
Doug Jenkins
Interactive Design Services
http://newtonexcelbach.wordpress.com/
RE: 20% of Scientific Papers On Genes Contain Conversion Errors Caused By Excel, Says Report
If we are using a more traditional programming language, it usually forces us to tell it what type of data going into each variable.
An Excel worksheet does not force us to tell it the type of data going into each cell, so we should expect some problems if we don't tell it what type of data (by preformatting).
Thanks skip for the FAQ's that made it pretty simple to grasp.
=====================================
(2B)+(2B)' ?
RE: 20% of Scientific Papers On Genes Contain Conversion Errors Caused By Excel, Says Report
One of the other areas that often causes problems is numeric IDENTIFIERS. data like Invoice Numbers, Item Numbers, Part Numbers, Postal Numbers, Member IDs (and the list goes on): numeric data on which you will NEVER perform a mathematical operation, are often stored as numbers. WHY? It is asking for trouble that an IDENTIFIER would be entered and stored in a format that IS CONVERTED INTO SOMETHING EVERY TIME, and therefore is susceptible to being converted into something unwanted and is not in itself a meaningful number. Every identifier ought to be entered/stored as TEXT, as described in the FAQ. You might need a work-around when joining to other databases but it's worth the effort.
Skip,
Just traded in my OLD subtlety...
for a NUance!
RE: 20% of Scientific Papers On Genes Contain Conversion Errors Caused By Excel, Says Report
RE: 20% of Scientific Papers On Genes Contain Conversion Errors Caused By Excel, Says Report
So are you saying that if you entered those examples on your sheet right now, that there would be absolutely no consistency in the date conversion, entering each of the fractions you mentioned, or does this happen under different circumstances?
I'm very interested in this, since I wrote the FAQs that I posted above and would like to understand the possible pitfalls related to data conversion.
Skip,
Just traded in my OLD subtlety...
for a NUance!
RE: 20% of Scientific Papers On Genes Contain Conversion Errors Caused By Excel, Says Report
RE: 20% of Scientific Papers On Genes Contain Conversion Errors Caused By Excel, Says Report
You do realize that if you want to enter fractions you must format the cell/range as TEXT before entering any data. Changing the Number Format after the fact does nothing, because a Number Format is merely a display feature. Having entered a fraction as text, it renders the value virtually useless as a numerical value.
Skip,
Just traded in my OLD subtlety...
for a NUance!
RE: 20% of Scientific Papers On Genes Contain Conversion Errors Caused By Excel, Says Report
TTFN (ta ta for now)
I can do absolutely anything. I'm an expert! https://www.youtube.com/watch?v=BKorP55Aqvg
FAQ731-376: Eng-Tips.com Forum Policies forum1529: Translation Assistance for Engineers
RE: 20% of Scientific Papers On Genes Contain Conversion Errors Caused By Excel, Says Report
Go to http://russell.htpgusa.com/products/air-cooled-con... get the pdf for small condensers. Copy the table on page 6, paste it in excel, then use text to columns, delimited by spaces. 12-1/2 becomes 12/01/2002, 31-9/16 becomes 31/09/201616! Anything that can be converted to a date will be.
RE: 20% of Scientific Papers On Genes Contain Conversion Errors Caused By Excel, Says Report
You're shooting yourself in the foot!
Skip,
Just traded in my OLD subtlety...
for a NUance!
RE: 20% of Scientific Papers On Genes Contain Conversion Errors Caused By Excel, Says Report
RE: 20% of Scientific Papers On Genes Contain Conversion Errors Caused By Excel, Says Report
1) copying the table in question,
2) pasting into a cell, which, BTW, resulted in one long string,
3) started the Text to columns wizard,
4) chose Delimited in Step 1,
5) Checked Space in Step 2,
6) in Step 3 FIRST made sure that ALL the columns were selected (darkened) in the data preview and then selected the TEXT Column data format, and finally
7) FINISH.
I suspect that you failed to SELECT the columns to import as TEXT in Step 3 of the Text to columns Wizard.
BTW, in Step 3, selecting all the columns (63) one at a time would be tedious: select the first, scroll to right, SHIFT + SELECT last column to select all.
Skip,
for a NUance!
RE: 20% of Scientific Papers On Genes Contain Conversion Errors Caused By Excel, Says Report
CODE
Sub ParseColumns() 'SkipVought 9/16/2016 Eng-Tips 'copy table page 6 in http://russell.htpgusa.com/images/documentation/products/condensers/RU-RDX-0816A.pdf 'paste into A1 Range("A1").TextToColumns _ Destination:=Range("A1"), _ DataType:=xlDelimited, _ TextQualifier:=xlDoubleQuote, _ ConsecutiveDelimiter:=True, _ Tab:=False, _ Semicolon:=False, _ Comma:=False, _ Space:=True, _ Other:=False, _ FieldInfo:=Array(Array(1, 2), Array(2, 2), Array(3, 2), Array(4, 2), Array(5, 2), Array(6, 2), _ Array(7, 2), Array(8, 2), Array(9, 2), Array(10, 2), Array(11, 2), Array(12, 2), Array(13, 2), _ Array(14, 2), Array(15, 2), Array(16, 2), Array(17, 2), Array(18, 2), Array(19, 2), Array(20, 2), _ Array(21, 2), Array(22, 2), Array(23, 2), Array(24, 2), Array(25, 2), Array(26, 2), Array(27, 2), _ Array(28, 2), Array(29, 2), Array(30, 2), Array(31, 2), Array(32, 2), Array(33, 2), Array(34, 2), _ Array(35, 2), Array(36, 2), Array(37, 2), Array(38, 2), Array(39, 2), Array(40, 2), Array(41, 2), _ Array(42, 2), Array(43, 2), Array(44, 2), Array(45, 2), Array(46, 2), Array(47, 2), Array(48, 2), _ Array(49, 2), Array(50, 2), Array(51, 2), Array(52, 2), Array(53, 2), Array(54, 2), Array(55, 2), _ Array(56, 2), Array(57, 2), Array(58, 2), Array(59, 2), Array(60, 2), Array(61, 2), Array(62, 2), _ Array(63, 2)), _ TrailingMinusNumbers:=True End SubSkip,
Just traded in my OLD subtlety...
for a NUance!
RE: 20% of Scientific Papers On Genes Contain Conversion Errors Caused By Excel, Says Report
RE: 20% of Scientific Papers On Genes Contain Conversion Errors Caused By Excel, Says Report
Also, you stated, "before converting." Could you explain what it is you were converting?
Skip,
for a NUance!
RE: 20% of Scientific Papers On Genes Contain Conversion Errors Caused By Excel, Says Report
Sometimes they get converted to US dates, sometimes to UK dates. I didn't use the sub as there are lots of tables from lots of manufacturers and every one is different. Rewriting the sub for every table would be a nightmare. Sometimes, I have to input the table manually because the PDF just contains a bitmap
If I note which columns should be fractions and preset the format of those column to fractions before running steps 3 to 7, everything comes out correctly.
DrawOh - apologies for hijacking your thread
RE: 20% of Scientific Papers On Genes Contain Conversion Errors Caused By Excel, Says Report
RE: 20% of Scientific Papers On Genes Contain Conversion Errors Caused By Excel, Says Report
You and I are talking about two completely different things it appears. The Text to column wizard, Excel 2013, has THREE steps, not SEVEN. What feature are you using?
This lead me to believe that we are, maybe not worlds apart but at least not referring to the same process. The reason for me posting the macro was to determine if my process was the same as your process. Apparently NOT. I want to understand whats going on, because this is one of the features in Excel that is often misunderstood and misused, as the subject of this thread testifies.
Can, will you continue to pursue this with me? TIA
Skip,
Just traded in my OLD subtlety...
for a NUance!
RE: 20% of Scientific Papers On Genes Contain Conversion Errors Caused By Excel, Says Report
TTFN (ta ta for now)
I can do absolutely anything. I'm an expert! https://www.youtube.com/watch?v=BKorP55Aqvg
FAQ731-376: Eng-Tips.com Forum Policies forum1529: Translation Assistance for Engineers
RE: 20% of Scientific Papers On Genes Contain Conversion Errors Caused By Excel, Says Report
Skip,
Just traded in my OLD subtlety...
for a NUance!
RE: 20% of Scientific Papers On Genes Contain Conversion Errors Caused By Excel, Says Report
Skip - your post of 11:55, 16-Sep has 7 steps!
But if I follow the process with the table from the linked pdf in Excel 2016 it works the same as shown by IRStuff above. The only difference from your post is I get 7 rows of data in a single column when I paste, not everything in one cell.
Another point is that if you go through the text to columns process then paste the same data somewhere else, this time it will be pasted in columns, but it doesn't remember the format setting, so the fractions come in as mixed date and text. You have to set the format to text over the whole range of the table before pasting, then it will paste as text in columns without going through the text to columns process. This happens even if you re-copy the data, or open a new workbook and paste there. As far as I can see the only way to make it forget and go back to pasting in a single column by default is to close Excel and re-start.
Doug Jenkins
Interactive Design Services
http://newtonexcelbach.wordpress.com/
RE: 20% of Scientific Papers On Genes Contain Conversion Errors Caused By Excel, Says Report
xwb, I am sorry for causing this confusion.
Doug, thanks for pointing this out to me.
I am puzzled why my copy/paste produces one long string without any line breaks when I paste into Excel.
I did paste the table here and then copy/pasted into Excel and that produced 7 cells in one column, from which I performed the Text to columns.
But I'm still very interested in understanding why the process fails for xwb. xwb, if you would run my procedure and report the results, it would help my understanding.
BTW, before pasting your table into Excel, 1) ENTER any value into A1, 2) SELECT A1, start the Text to columns wizard and DESELECT Space in Step 2 of the DELIMITED option and FINISH.
Then paste in your table in A1. Now you're ready to do the parsing.
Skip,
for a NUance!
RE: 20% of Scientific Papers On Genes Contain Conversion Errors Caused By Excel, Says Report
TTFN (ta ta for now)
I can do absolutely anything. I'm an expert! https://www.youtube.com/watch?v=BKorP55Aqvg
FAQ731-376: Eng-Tips.com Forum Policies forum1529: Translation Assistance for Engineers
RE: 20% of Scientific Papers On Genes Contain Conversion Errors Caused By Excel, Says Report
I have found that, even with macros, sometimes Excel just ignores the locale and assumes US until the 13th, then it switches to UK. I've had macro generated spreadsheets which copy from IE where the first 12 days of the month get switched to US format and the rest come out in UK format. The only way I've found that always works is to preset the format of the column before writing to it.
RE: 20% of Scientific Papers On Genes Contain Conversion Errors Caused By Excel, Says Report
Here is a list of various ways I have manually entered 12-1/2, which appears to represent a numerical value 12 1/2 or 12.5 in the subject table. In the case of entering an intended fraction I entered 12 1/2 (with a SPACE)
The only preset number format that "sticks" is TEXT. Presetting the format as Fraction is overridden by Excel's 'rules' for date entry/conversion. 12-1/2 or 12/1-2 will both result in a date regardless of presetting the cell number format.
Can anyone else, on the east side of the pond, enter 12-1/2 to result in a fraction other than presetting the format to TEXT which results in no fraction at all but rather a visual to be represented as a fraction.
Skip,
Just traded in my OLD subtlety...
for a NUance!
RE: 20% of Scientific Papers On Genes Contain Conversion Errors Caused By Excel, Says Report
I'm not sure that allowing 12-1/2 to be entered as a number is a good idea, because if entered anywhere other than the first value in an equation its value should be 11.5, not 12.5.
Doug Jenkins
Interactive Design Services
http://newtonexcelbach.wordpress.com/
RE: 20% of Scientific Papers On Genes Contain Conversion Errors Caused By Excel, Says Report
So you are stating that the date conversion rules in Australian English settings (and most probably in Great Britain as well), assumes MDY, which I now admit, shoots down my assumption that the conversion is always MDY. So Excel is smarter than I thought, which is better. I want to modify my FAQ to reflect conversion in accordance with regional settings.
My angst is trying to reconcile xwb's experience.
Skip,
Just traded in my OLD subtlety...
for a NUance!
RE: 20% of Scientific Papers On Genes Contain Conversion Errors Caused By Excel, Says Report
That was last week.
This morning I found my system had been updated overnight, and when I did the same process, with the same file, I found the text pasted as a long string in a single cell.
Strangely there seems to be nothing about this anywhere on the Internet, at least I couldn't find it, but with some experimenting I found:
- It's not an Excel thing. The same table pasted into a text editor also appeared as a single cell.
- I then noticed the pdf file had not opened in Adobe Reader, it was opened in Edge. It seems that Microsoft think they know better than I do about what my default programs should be.
- Switching back to Adobe Reader the copy and paste worked as before, pasting as 7 rows.
So the message is, if Windows does an automatic update, check your file associations, particularly for opening pdfs.Doug Jenkins
Interactive Design Services
http://newtonexcelbach.wordpress.com/