Quantcast

[Nanocubes-discuss] Different Coordinates

classic Classic list List threaded Threaded
12 messages Options
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

[Nanocubes-discuss] Different Coordinates

Alex Bongiovanni
What if I wanted to use a different coordinate system than latitude and longitude?  What I mean is that I have some non-spatial data, and I want to artificially enforce position onto it, so that I can visualize it as a typical heat map over time.  An actual example of this would be the heat map of DNA data from the wikipedia page on heat maps: http://upload.wikimedia.org/wikipedia/commons/4/48/Heatmap.png

My idea is to just assign latitude and longitude to match my grid, but I'm not sure that's the best way to go about things.  The page on ncdmp says you convert to grid cell addresses, is there any way to "cut out the middle man" and just load in the grid addresses?

--
Alex Bongiovanni
University of Maryland
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: [Nanocubes-discuss] Different Coordinates

laurolins

What if I wanted to use a different coordinate system than latitude and longitude?  What I mean is that I have some non-spatial data, and I want to artificially enforce position onto it, so that I can visualize it as a typical heat map over time.  An actual example of this would be the heat map of DNA data from the wikipedia page on heat maps: http://upload.wikimedia.org/wikipedia/commons/4/48/Heatmap.png

My idea is to just assign latitude and longitude to match my grid, but I'm not sure that's the best way to go about things.  The page on ncdmp says you convert to grid cell addresses, is there any way to "cut out the middle man" and just load in the grid addresses?


Yes, Alex. Redirect the output of ncdmp to a file instead of the ncserve program and use the directive --encoding=t (on ncdmp) for text output. You will see that for quadtree dimensions nanocube actually uses the integer grid cell locations x and y. Latitudes and longitudes are converted outside of the core nanocube server. If you generate a file like the one generated by ncdmp, you don’t need to even talk about latitudes and longitudes. The middleman can be left out!

Lauro


Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: [Nanocubes-discuss] Different Coordinates

Alex Bongiovanni
I certainly can generate a file like the one produced by ncdmp, but I'm having some problems.  Redirecting the .txt file created by ncdmp into ncserve ('cat dump.txt | ncserve' and 'ncserve < dump.txt' produce the same result) causes significantly more points to be added to the cube than are in the file (about a million more), while querying the server gives that there are over 3 billion points in the cube (there ought to be only 2 million).  Am I doing something wrong here?


On Mon, Feb 17, 2014 at 7:55 PM, Lauro Lins <[hidden email]> wrote:

What if I wanted to use a different coordinate system than latitude and longitude?  What I mean is that I have some non-spatial data, and I want to artificially enforce position onto it, so that I can visualize it as a typical heat map over time.  An actual example of this would be the heat map of DNA data from the wikipedia page on heat maps: http://upload.wikimedia.org/wikipedia/commons/4/48/Heatmap.png

My idea is to just assign latitude and longitude to match my grid, but I'm not sure that's the best way to go about things.  The page on ncdmp says you convert to grid cell addresses, is there any way to "cut out the middle man" and just load in the grid addresses?


Yes, Alex. Redirect the output of ncdmp to a file instead of the ncserve program and use the directive --encoding=t (on ncdmp) for text output. You will see that for quadtree dimensions nanocube actually uses the integer grid cell locations x and y. Latitudes and longitudes are converted outside of the core nanocube server. If you generate a file like the one generated by ncdmp, you don’t need to even talk about latitudes and longitudes. The middleman can be left out!

Lauro



_______________________________________________
Nanocubes-discuss mailing list
[hidden email]
http://mailman.nanocubes.net/mailman/listinfo/nanocubes-discuss_mailman.nanocubes.net




--
Alex Bongiovanni
University of Maryland
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: [Nanocubes-discuss] Different Coordinates

laurolins

I certainly can generate a file like the one produced by ncdmp, but I'm having some problems.  Redirecting the .txt file created by ncdmp into ncserve ('cat dump.txt | ncserve' and 'ncserve < dump.txt' produce the same result) causes significantly more points to be added to the cube than are in the file (about a million more), while querying the server gives that there are over 3 billion points in the cube (there ought to be only 2 million).  Am I doing something wrong here?


Sorry, Alex. I forgot to mention that nanocubes only accepts files in the .dmp binary format. The records on the binary format are little endian representations of the input numbers. So if you have a the quadtree grid cell “x” “y” you would write two 32-bit little endian unsigned integers. If you have a categorical variable c1, you would write a single byte there. To do this conversion you can use ncdmp and transform a text .dmp file into a binary .dmp file. Here is an example

cat txt_input.dmp | ncdmp —encoding=b copy=a,a copy=b,b copy=c,c > bin_output.dmp

This command would copy fields a, b, and c into the new file, but in binary format. Give it a shot on you txt file. 

Lauro

P.S. Notice that .dmp files is a format that describes field names, types, records with those fields, how records are encoded (binary/text), and some metadata. The program ncserve can only deal with a subset of .dmp files that are encoded in binary format and whose field types all start with either nc_dim_ or nc_var_,





On Mon, Feb 17, 2014 at 7:55 PM, Lauro Lins <[hidden email]> wrote:

What if I wanted to use a different coordinate system than latitude and longitude?  What I mean is that I have some non-spatial data, and I want to artificially enforce position onto it, so that I can visualize it as a typical heat map over time.  An actual example of this would be the heat map of DNA data from the wikipedia page on heat maps: http://upload.wikimedia.org/wikipedia/commons/4/48/Heatmap.png

My idea is to just assign latitude and longitude to match my grid, but I'm not sure that's the best way to go about things.  The page on ncdmp says you convert to grid cell addresses, is there any way to "cut out the middle man" and just load in the grid addresses?


Yes, Alex. Redirect the output of ncdmp to a file instead of the ncserve program and use the directive --encoding=t (on ncdmp) for text output. You will see that for quadtree dimensions nanocube actually uses the integer grid cell locations x and y. Latitudes and longitudes are converted outside of the core nanocube server. If you generate a file like the one generated by ncdmp, you don’t need to even talk about latitudes and longitudes. The middleman can be left out!

Lauro



_______________________________________________
Nanocubes-discuss mailing list
[hidden email]
http://mailman.nanocubes.net/mailman/listinfo/nanocubes-discuss_mailman.nanocubes.net




--
Alex Bongiovanni
University of Maryland
_______________________________________________
Nanocubes-discuss mailing list
[hidden email]
http://mailman.nanocubes.net/mailman/listinfo/nanocubes-discuss_mailman.nanocubes.net

Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: [Nanocubes-discuss] Different Coordinates

Alex Bongiovanni
This final format is similar to the one output by the Python example code, but a little more processed right (eg a time-bin address instead of a date)?  So I could just alter the "pre-dmp" script to do a "full-dmp" instead?


On Tue, Feb 18, 2014 at 8:39 AM, Lauro Lins <[hidden email]> wrote:

I certainly can generate a file like the one produced by ncdmp, but I'm having some problems.  Redirecting the .txt file created by ncdmp into ncserve ('cat dump.txt | ncserve' and 'ncserve < dump.txt' produce the same result) causes significantly more points to be added to the cube than are in the file (about a million more), while querying the server gives that there are over 3 billion points in the cube (there ought to be only 2 million).  Am I doing something wrong here?


Sorry, Alex. I forgot to mention that nanocubes only accepts files in the .dmp binary format. The records on the binary format are little endian representations of the input numbers. So if you have a the quadtree grid cell “x” “y” you would write two 32-bit little endian unsigned integers. If you have a categorical variable c1, you would write a single byte there. To do this conversion you can use ncdmp and transform a text .dmp file into a binary .dmp file. Here is an example

cat txt_input.dmp | ncdmp —encoding=b copy=a,a copy=b,b copy=c,c > bin_output.dmp

This command would copy fields a, b, and c into the new file, but in binary format. Give it a shot on you txt file. 

Lauro

P.S. Notice that .dmp files is a format that describes field names, types, records with those fields, how records are encoded (binary/text), and some metadata. The program ncserve can only deal with a subset of .dmp files that are encoded in binary format and whose field types all start with either nc_dim_ or nc_var_,





On Mon, Feb 17, 2014 at 7:55 PM, Lauro Lins <[hidden email]> wrote:

What if I wanted to use a different coordinate system than latitude and longitude?  What I mean is that I have some non-spatial data, and I want to artificially enforce position onto it, so that I can visualize it as a typical heat map over time.  An actual example of this would be the heat map of DNA data from the wikipedia page on heat maps: http://upload.wikimedia.org/wikipedia/commons/4/48/Heatmap.png

My idea is to just assign latitude and longitude to match my grid, but I'm not sure that's the best way to go about things.  The page on ncdmp says you convert to grid cell addresses, is there any way to "cut out the middle man" and just load in the grid addresses?


Yes, Alex. Redirect the output of ncdmp to a file instead of the ncserve program and use the directive --encoding=t (on ncdmp) for text output. You will see that for quadtree dimensions nanocube actually uses the integer grid cell locations x and y. Latitudes and longitudes are converted outside of the core nanocube server. If you generate a file like the one generated by ncdmp, you don’t need to even talk about latitudes and longitudes. The middleman can be left out!

Lauro



_______________________________________________
Nanocubes-discuss mailing list
[hidden email]
http://mailman.nanocubes.net/mailman/listinfo/nanocubes-discuss_mailman.nanocubes.net




--
Alex Bongiovanni
University of Maryland
_______________________________________________
Nanocubes-discuss mailing list
[hidden email]
http://mailman.nanocubes.net/mailman/listinfo/nanocubes-discuss_mailman.nanocubes.net


_______________________________________________
Nanocubes-discuss mailing list
[hidden email]
http://mailman.nanocubes.net/mailman/listinfo/nanocubes-discuss_mailman.nanocubes.net




--
Alex Bongiovanni
University of Maryland
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: [Nanocubes-discuss] Different Coordinates

laurolins

> This final format is similar to the one output by the Python example code, but a little more processed right (eg a time-bin address instead of a date)?  So I could just alter the "pre-dmp" script to do a "full-dmp" instead?

Yes. If you want to write the expected dimension values of the records in binary you could pipe the resulting file directly to ncserve, otherwise you generate in text and then use ncdmp to convert to binary.

Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: [Nanocubes-discuss] Different Coordinates

Alex Bongiovanni
What is the byte size expected for the coordinates?  Like, if I'm using struct.pack("<??HL", x, y, time_bin, count) what are the size of x and y?  I don't see anything about it in the metadata tags.


On Tue, Feb 18, 2014 at 9:29 AM, Lauro Lins <[hidden email]> wrote:

> This final format is similar to the one output by the Python example code, but a little more processed right (eg a time-bin address instead of a date)?  So I could just alter the "pre-dmp" script to do a "full-dmp" instead?

Yes. If you want to write the expected dimension values of the records in binary you could pipe the resulting file directly to ncserve, otherwise you generate in text and then use ncdmp to convert to binary.
_______________________________________________
Nanocubes-discuss mailing list
[hidden email]
http://mailman.nanocubes.net/mailman/listinfo/nanocubes-discuss_mailman.nanocubes.net



--
Alex Bongiovanni
University of Maryland
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: [Nanocubes-discuss] Different Coordinates

laurolins
> What is the byte size expected for the coordinates?  Like, if I'm using struct.pack("<??HL", x, y, time_bin, count) what are the size of x and y?  I don't see anything about it in the metadata tags.
>

nc_dim_quadtree_N: 4 bytes for “x" and 4 bytes for “y" (2 uint32_t)   (the values for x and y need to be in {0, 1, … , (2^N) - 1} ).

        so 8 bytes with 2 x 32-bit uint numbers for quadtree.

the other dimensions have the number of bytes on its names

        nc_var_uint_N : N bytes
        nc_dim_cat_N : N bytes
        nc_dim_time_N : N bytes




Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: [Nanocubes-discuss] Different Coordinates

Alex Bongiovanni
Ah, thank you.  Aside from some visualization bugs I was able to successfully create a "proper" .dmp file from Python and load it into ncserve.

What is the purpose of the 'var-one' option for ncdmp?  The page on the wiki says that the value is always 1, and I've only used it for 'count'.  But since I'm skipping ncdmp, could I just give it any value when I create the .dmp file?  My idea being that if it is indeed used to count, what about a situation where I don't have distinct data-points?  

For example, if at a daily resolution I have twenty records for the same location and with the same tag (taxis dropping people off at a hotel for example), could I just make one record and set it's count to 20?


On Tue, Feb 18, 2014 at 9:54 AM, Lauro Lins <[hidden email]> wrote:
> What is the byte size expected for the coordinates?  Like, if I'm using struct.pack("<??HL", x, y, time_bin, count) what are the size of x and y?  I don't see anything about it in the metadata tags.
>

nc_dim_quadtree_N: 4 bytes for “x" and 4 bytes for “y" (2 uint32_t)   (the values for x and y need to be in {0, 1, … , (2^N) - 1} ).

        so 8 bytes with 2 x 32-bit uint numbers for quadtree.

the other dimensions have the number of bytes on its names

        nc_var_uint_N : N bytes
        nc_dim_cat_N : N bytes
        nc_dim_time_N : N bytes



_______________________________________________
Nanocubes-discuss mailing list
[hidden email]
http://mailman.nanocubes.net/mailman/listinfo/nanocubes-discuss_mailman.nanocubes.net



--
Alex Bongiovanni
University of Maryland
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: [Nanocubes-discuss] Different Coordinates

laurolins
> What is the purpose of the 'var-one' option for ncdmp?  The page on the wiki says that the value is always 1, and I've only used it for 'count'.  But since I'm skipping ncdmp, could I just give it any value when I create the .dmp file?  My idea being that if it is indeed used to count, what about a situation where I don't have distinct data-points?  
> For example, if at a daily resolution I have twenty records for the same location and with the same tag (taxis dropping people off at a hotel for example), could I just make one record and set it's count to 20?

Yes! You can speed up the creation of a nanocube if you aggregate it yourself. If you have 20 “events” you can pass 20 to the count dimension directly instead of 20 times a 1 event entry.

Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: [Nanocubes-discuss] Different Coordinates

Alex Bongiovanni
Excellent!  This simplifies things considerably.  


On Tue, Feb 18, 2014 at 11:57 AM, Lauro Lins <[hidden email]> wrote:
> What is the purpose of the 'var-one' option for ncdmp?  The page on the wiki says that the value is always 1, and I've only used it for 'count'.  But since I'm skipping ncdmp, could I just give it any value when I create the .dmp file?  My idea being that if it is indeed used to count, what about a situation where I don't have distinct data-points?
> For example, if at a daily resolution I have twenty records for the same location and with the same tag (taxis dropping people off at a hotel for example), could I just make one record and set it's count to 20?

Yes! You can speed up the creation of a nanocube if you aggregate it yourself. If you have 20 “events” you can pass 20 to the count dimension directly instead of 20 times a 1 event entry.
_______________________________________________
Nanocubes-discuss mailing list
[hidden email]
http://mailman.nanocubes.net/mailman/listinfo/nanocubes-discuss_mailman.nanocubes.net



--
Alex Bongiovanni
University of Maryland
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: [Nanocubes-discuss] Different Coordinates

sushil@kratin.co.in
This post has NOT been accepted by the mailing list yet.
In reply to this post by laurolins
Hi Lauro,

My requirement is very similar to Alex's requirement.
I've custom map ( for e.g. map of a office ) and I've custom co-ordinates (cubicle A, B etc).

I've following fields.
emp_id emp_name date attendance cubicle_position

In documentation you mentioned, I should be using csv2Nanocube.py file convert csv file to dmp file. But I did not find this file in source code.
Please let me know if this file is renamed to something else.

After converting it to dmp file, how do I convert cubicle position to map grid.
(I've very new to nanocube, please forgive me if I asked anything silly)

Thanks,
Sushil
Loading...