×
INTELLIGENT WORK FORUMS
FOR ENGINEERING PROFESSIONALS

Log In

Come Join Us!

Are you an
Engineering professional?
Join Eng-Tips Forums!
  • Talk With Other Members
  • Be Notified Of Responses
    To Your Posts
  • Keyword Search
  • One-Click Access To Your
    Favorite Forums
  • Automated Signatures
    On Your Posts
  • Best Of All, It's Free!
  • Students Click Here

*Eng-Tips's functionality depends on members receiving e-mail. By joining you are opting in to receive e-mail.

Posting Guidelines

Promoting, selling, recruiting, coursework and thesis posting is forbidden.

Students Click Here

Jobs

Cleaning data in excel (traffic counts)

Cleaning data in excel (traffic counts)

Cleaning data in excel (traffic counts)

(OP)
I have data of daily vehicle counts in 5 minute increments of a section of road for an entire year. The problem is the data has lots of missing entries done purposefully. I am tasked with using the functions of excel to identify the errors, counting how many exist and then removing them. To do this in excel i am clueless. I guess what I am really asking is how do I firstly define what is an error (missing entry) and then how do i count them?

Any help would be greatly appreciated

RE: Cleaning data in excel (traffic counts)

You've already defined error = missing. What else do you expect us to say?

If data is missing on purpose, how is that an error?

How do you expect to remove data that isn't there in the first place?

Why count something that you are going to remove?

RE: Cleaning data in excel (traffic counts)

mralbuto, please do these things:

Offer more complete explanation - MintJulep's questions are valid I think. It helps us a lot to know the background so that we know what your real goal is.

Post a small section of your spreadsheet that illustrates the format (with headings) and contains an error. Within the spreadsheet, highlight the error.

Explain to us how you know the highlighted error is, in fact, an error. We don't know if it's a blank line, a sequence of times that skip a 5-minute increment, or what?

Read the forum policies, especially number 4. Posting the same question in more than one forum causes confusion. Having two active threads on the same topic prevents us from seeing the complete train of thought. There is a link to the forum policies below my signature in this post.

Here is a link to your other thread, I suggest you red-flag it for deletion. This is a more appropriate place for your question I think.

http://eng-tips.com/viewthread.cfm?qid=341064

Best to you,

Goober Dave

Haven't see the forum policies? Do so now: Forum Policies

RE: Cleaning data in excel (traffic counts)

I assume the errors are the blank spaces and the entire missing dates? Let us know.

I see blank spaces in lots of columns and there must be 40 dates missing since your data spans only 325 columns from 1/1 to 12/31. I happened to see that 3/23 and 3/24 are missing (entire days). I'm guessing that you have to count missing days separately from missing values in a column? Please clarify.

You can find blank spaces and add them with a simple if and sum to get a total. (Note: I'm not the most fluent Excel guy. Someone else may have a much more elegant solution.)

You might put in LO4 something like =if(b4="",1,"")

Then copy it 325 columns to the left and however many rows you have down. Then sum the new array for a total.

Does "remove them" mean to fill the blank spaces with something? Delete the blank spaces and move the good data below them up to fill in? Or does it mean something else?

Explain "remove them" in detail, and what to do with missing dates versus missing data on a single date, and you'll get some better help perhaps.

Best to you,

Goober Dave

Haven't see the forum policies? Do so now: Forum Policies

RE: Cleaning data in excel (traffic counts)

What is your ultimate goal? Do you really need to "remove" the bad data? (odd since it already is not there). Or do you need to perform calculations that don't include the bad data?

Statistical functions in Excel do not appear to include the blank values. I have attached a revised spreadsheet with the following changes:
First column calculates the average for the time period. (Excel function AVERAGE)
Second column calculates the standard deviation for the time period. (Excel function STDEV.P)
Third column counts the number of non-blank entries. (Excel function COUNT)
Fourth column counts the number of blank entries. (Excel function COUNTBLANK)
At the top left of the sheet I have calculated the total number of Blanks.

RE: Cleaning data in excel (traffic counts)

You can use the COUNT(range) function to find how many values (non-blank) are in each column and hence the number of missing values.
The data can be sorted to group all blanks together for easy deletion.
TerryScan has probably given a more complete explanation.

RE: Cleaning data in excel (traffic counts)

I think the original data is in another format- translation error have left blanks so the goal is to get a summary of the time stamps of the unrecovered data so that data technicians can be tasked with recovering data when determined to be worth the effort and given that total file is humongous then manual searching for these gaps is too much. I did something like this semimanually.

RE: Cleaning data in excel (traffic counts)

If you sort your data in Excel, it will group all of your data with blank cells (as described above). The COUNTBLANK function can show you how many blank cells you have. If Excel is too slow, you might do some command line work to get at the information you're after.

If the data is in text format to start with, you may find it more efficient to use sed delete all of the lines except the ones you're interested in. If the data is in CSV format, for example, I think the following will delete lines without blank entries, send them to newfile1:

CODE --> CYGWIN/BASH

cat datafile.csv|sed '/,,/!d' newfile1 
Or deleting lines with blank entries, sending them to newfile2:

CODE --> CYGWIN/BASH

cat datafile.csv|sed '/,,/d' newfile2 
You can then count the lines without blank entries using wc:

CODE --> CYGWIN/BASH

wc -l newfile1 
Or with blank entries:

CODE --> CYGWIN/BASH

wc -l newfile2 

Easy peasy.

Red Flag This Post

Please let us know here why this post is inappropriate. Reasons such as off-topic, duplicates, flames, illegal, vulgar, or students posting their homework.

Red Flag Submitted

Thank you for helping keep Eng-Tips Forums free from inappropriate posts.
The Eng-Tips staff will check this out and take appropriate action.

Reply To This Thread

Posting in the Eng-Tips forums is a member-only feature.

Click Here to join Eng-Tips and talk with other members!


Resources