Cleaning data in excel (traffic counts)
Cleaning data in excel (traffic counts)
(OP)
I have data of daily vehicle counts in 5 minute increments of a section of road for an entire year. The problem is the data has lots of missing entries done purposefully. I am tasked with using the functions of excel to identify the errors, counting how many exist and then removing them. To do this in excel i am clueless. I guess what I am really asking is how do I firstly define what is an error (missing entry) and then how do i count them?
Any help would be greatly appreciated
Any help would be greatly appreciated





RE: Cleaning data in excel (traffic counts)
If data is missing on purpose, how is that an error?
How do you expect to remove data that isn't there in the first place?
Why count something that you are going to remove?
RE: Cleaning data in excel (traffic counts)
Offer more complete explanation - MintJulep's questions are valid I think. It helps us a lot to know the background so that we know what your real goal is.
Post a small section of your spreadsheet that illustrates the format (with headings) and contains an error. Within the spreadsheet, highlight the error.
Explain to us how you know the highlighted error is, in fact, an error. We don't know if it's a blank line, a sequence of times that skip a 5-minute increment, or what?
Read the forum policies, especially number 4. Posting the same question in more than one forum causes confusion. Having two active threads on the same topic prevents us from seeing the complete train of thought. There is a link to the forum policies below my signature in this post.
Here is a link to your other thread, I suggest you red-flag it for deletion. This is a more appropriate place for your question I think.
http://eng-tips.com/viewthread.cfm?qid=341064
Best to you,
Goober Dave
Haven't see the forum policies? Do so now: Forum Policies
RE: Cleaning data in excel (traffic counts)
RE: Cleaning data in excel (traffic counts)
I see blank spaces in lots of columns and there must be 40 dates missing since your data spans only 325 columns from 1/1 to 12/31. I happened to see that 3/23 and 3/24 are missing (entire days). I'm guessing that you have to count missing days separately from missing values in a column? Please clarify.
You can find blank spaces and add them with a simple if and sum to get a total. (Note: I'm not the most fluent Excel guy. Someone else may have a much more elegant solution.)
You might put in LO4 something like =if(b4="",1,"")
Then copy it 325 columns to the left and however many rows you have down. Then sum the new array for a total.
Does "remove them" mean to fill the blank spaces with something? Delete the blank spaces and move the good data below them up to fill in? Or does it mean something else?
Explain "remove them" in detail, and what to do with missing dates versus missing data on a single date, and you'll get some better help perhaps.
Best to you,
Goober Dave
Haven't see the forum policies? Do so now: Forum Policies
RE: Cleaning data in excel (traffic counts)
Statistical functions in Excel do not appear to include the blank values. I have attached a revised spreadsheet with the following changes:
First column calculates the average for the time period. (Excel function AVERAGE)
Second column calculates the standard deviation for the time period. (Excel function STDEV.P)
Third column counts the number of non-blank entries. (Excel function COUNT)
Fourth column counts the number of blank entries. (Excel function COUNTBLANK)
At the top left of the sheet I have calculated the total number of Blanks.
RE: Cleaning data in excel (traffic counts)
The data can be sorted to group all blanks together for easy deletion.
TerryScan has probably given a more complete explanation.
RE: Cleaning data in excel (traffic counts)
RE: Cleaning data in excel (traffic counts)
If the data is in text format to start with, you may find it more efficient to use sed delete all of the lines except the ones you're interested in. If the data is in CSV format, for example, I think the following will delete lines without blank entries, send them to newfile1:
CODE --> CYGWIN/BASH
CODE --> CYGWIN/BASH
CODE --> CYGWIN/BASH
CODE --> CYGWIN/BASH
Easy peasy.