[Nanocubes-discuss] ncdmp flags

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

[Nanocubes-discuss] ncdmp flags

Alex Bongiovanni
I apologize for asking so many questions, but I have one more: What are the arguments for the dim-tbin flag for ncdmp?  In the example given in the overview, you use "dim-tbin=time,checkin_time,2013_1h,2".  I understand what the first two arguments are, but what are the "2013_1h,2" for?

Let me frame this in the context of an example: suppose I want to load in data from 2012 for which I only have the month and day, no hourly data.  What should I set "dim-tbin" to be (assuming I'm still using time and checkin_time)?

--
Alex Bongiovanni
University of Maryland
Reply | Threaded
Open this post in threaded view
|

Re: [Nanocubes-discuss] ncdmp flags

laurolins
Hi Alex,

I apologize for asking so many questions, but I have one more: What are the arguments for the dim-tbin flag for ncdmp?  In the example given in the overview, you use "dim-tbin=time,checkin_time,2013_1h,2".  I understand what the first two arguments are, but what are the "2013_1h,2" for?

The time dimension is simply an integer number indicating a “time bin". For example, in case you use a 2 bytes resolution for time, we would get "time bins" 0, 1, 2, 3, 4, 5, …, 65535. The problem is that usually we need to convert a full date / timestamp into a "time bin”. The program “ncdmp" does this “time bin” conversion for input values in the form of unix time_t (8 byte long integers) based on specs like “2013_1h” (0 will be the first hour of 2013, 1 the second hour of 2013 and so one) or “2012-06_1d” (0 will be the first day of june on 2012, 1 will be the second day of june on 2012). So in your case if you have an input file with time encoded as a unix time_t column you can use ncdmp to do the conversion (there are some more details on https://github.com/laurolins/nanocube/wiki/ncdmp) and generate nancube compatible data.

So answering your question: (1) 2013_1h ( <date>_<time_length><unit> ) indicates how to map timestamps into time bins; (2) 2 is the number of bytes used for time bin storage: 2 bytes means 65536 time bin resolution.

Let me frame this in the context of an example: suppose I want to load in data from 2012 for which I only have the month and day, no hourly data.  What should I set "dim-tbin" to be (assuming I'm still using time and checkin_time)?

Since your resolution is daily starting in 2012, I would use: "2012_1d”. Note that this is all about the conversion between a .dmp file that is not nanocube-ready into one that is nanocube-ready using the program ncdmp (take a look at https://github.com/laurolins/nanocube/wiki/ncdmp for more details).

Best,
Lauro
Reply | Threaded
Open this post in threaded view
|

Re: [Nanocubes-discuss] ncdmp flags

Alex Bongiovanni

Thank you, I understand now.

--
Alex Bongiovanni
University of Maryland

On Jan 17, 2014 10:23 AM, "Lauro Lins" <[hidden email]> wrote:
Hi Alex,

I apologize for asking so many questions, but I have one more: What are the arguments for the dim-tbin flag for ncdmp?  In the example given in the overview, you use "dim-tbin=time,checkin_time,2013_1h,2".  I understand what the first two arguments are, but what are the "2013_1h,2" for?

The time dimension is simply an integer number indicating a “time bin". For example, in case you use a 2 bytes resolution for time, we would get "time bins" 0, 1, 2, 3, 4, 5, …, 65535. The problem is that usually we need to convert a full date / timestamp into a "time bin”. The program “ncdmp" does this “time bin” conversion for input values in the form of unix time_t (8 byte long integers) based on specs like “2013_1h” (0 will be the first hour of 2013, 1 the second hour of 2013 and so one) or “2012-06_1d” (0 will be the first day of june on 2012, 1 will be the second day of june on 2012). So in your case if you have an input file with time encoded as a unix time_t column you can use ncdmp to do the conversion (there are some more details on https://github.com/laurolins/nanocube/wiki/ncdmp) and generate nancube compatible data.

So answering your question: (1) 2013_1h ( <date>_<time_length><unit> ) indicates how to map timestamps into time bins; (2) 2 is the number of bytes used for time bin storage: 2 bytes means 65536 time bin resolution.

Let me frame this in the context of an example: suppose I want to load in data from 2012 for which I only have the month and day, no hourly data.  What should I set "dim-tbin" to be (assuming I'm still using time and checkin_time)?

Since your resolution is daily starting in 2012, I would use: "2012_1d”. Note that this is all about the conversion between a .dmp file that is not nanocube-ready into one that is nanocube-ready using the program ncdmp (take a look at https://github.com/laurolins/nanocube/wiki/ncdmp for more details).

Best,
Lauro

_______________________________________________
Nanocubes-discuss mailing list
[hidden email]
http://mailman.nanocubes.net/mailman/listinfo/nanocubes-discuss_mailman.nanocubes.net